SearchBlox Offers Enterprise Search In Spirit of Endeca

February 17, 2014

The sponsored article titled Faceted Enterprise Search from Searchblox on Web Designer Depot promotes SearchBlox as a viable alternative to Google Mini or Search Appliance for enterprise search. The article provides screenshots to show the simplicity of setup in detail. The article explains,

“SearchBlox has crawlers that work for filesystems, websites, RSS feeds, and databases that work straight out-of-the-box. They can index both public and protected content, and can be set to crawl on a specified schedule so your users’ searches are always up to date.

The faceted search plugin that comes with SearchBlox is jQuery based, so it’s easy to integrate it into your website or application. Running WordPress? There’s a custom WP plugin for searching and indexing your WordPress site”

It sounds like the spirit of Endeca is still alive. Prior to SearchBlox being able to index and search the various file types all the user must do is set folder paths or root URLs. SearchBlox promises to be a quick and faceted search built on Apache Lucene. Users can manage everything through a web-based administrative console. SearchBlox allows for crawling third party websites, indexing API, synonym searches and customizable stopwords. All of these capabilities make SearchBlox an interesting choice for enterprise search.

Chelsea Kerwin, February 17, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Efficient eDiscovery with SharePoint

February 10, 2014

Discovery and preservation in SharePoint has long been a time consuming and intense process. However, several good add-on solutions have created a simple and faster method of eDiscovery, including Index Engines’ 5.1 Release. Read more in the PR Web story, “Efficient SharePoint ESI Collection and Preservation Highlights Index Engines’ 5.1 eDiscovery Release.”

The article says:

“Time and access to data for eDiscovery increased with Index Engines’ 5.1 release, which provides litigation support professionals direct indexing of SharePoint for selective culling and collection and also provides support for Exchange 2013 data. Previously, SharePoint extraction was an arduous process that can require the need to copy the data to disk before indexing.”

Stephen E. Arnold is a longtime leader in search, and therefore a longtime follower of SharePoint and enterprise search. His information service, ArnoldIT.com, devotes a lot of attention to SharePoint and the latest tips and trends. Arnold often finds that while SharePoint is a large powerful platform, it is not easily customizable and users often turn to smart add-ons to enhance their satisfaction.

Emily Rae Aldridge, February 10, 2014

Ads Gone Wrong

February 2, 2014

I came across an outfit called Tribune Online. The company offers a news aggregation service. You can examine the different content slices at http://www.frenchtribune.com. There are several interesting characteristics of the site.

First, the company offers country specific “Tribunes online.” One example is http://austriantribune.com/klasse/legen/united-states. Try to navigate to this country specific listing of articles from a range of sources. The information is in English and provides a country-specific “brand” for content available from the main Web page. The company lists some of the country-specific “brands”, but I could not locate a comprehensive list.

Second, the advertisements are flagged. What is intriguing about the paid articles is that they are scattered across different business sectors. I am not sure how many people will related to technology for indexing spun out of SAIC, a new diet for chubbies, and a listicle of metal detectors. I am finding less and less relevance in the online ads displayed to me. I thought technology like Bing’s, Google’s, and other ad services was to deliver relevant advertising.

image

Third, the name of the company is listed as Tribune Online. The principal office appears to be in France. I could not determine if the US news and information company using the word “Tribune” in its name was involved. Running a query on a public Web index company is interesting. Confusion between US Tribune publications’ online presence and “Tribune Online” was interesting.

If you are a Euro news maven and want to get non-European Commission sponsored aggregation of articles, check out some of the Tribune Online’s services. I liked this one: http://frenchtribune.com/categorie/emplacements/spain.

I did not spot a search function. Have we entered the post-search era?

Stephen E Arnold, February 2, 2014

Hardware Management Congruence: Google and Barnes and Noble

January 30, 2014

I have been scanning the Google Motorola news. The write up fall into two camps.

On one hand, there are the Google is really smart. See, for example, “Google to Keep Motorola’s Advanced Technology Group” and “Google’s Tasty Lemonade.”

On the other hand, there are the Google goofed viewpoint; for example, “Analysis: Larry Page’s Smashed Handset Strategy – Google Ends Bid To Be Apple.”

My view is somewhat different.

First, I see some parallels between the Barnes & Noble Nook adventure and Google-Motorola.

Second, the management shifts at Motorola did not have a material impact on the revenue-generating power of Android phones. Keep in mind that these phones were treated just like other vendors’ phones in terms of access to software.

Third, the confusion between indexing the Web and building a business using Overture-type methods and sustaining a diversified business persists. Even the somewhat uneven ZDNet spotted this trend of revenue erosion in AdSense. Check out “Google’s Earnings: What Future for Plunging AdSense Business.”

Now the tide is turning in other important ways. I noted that Google is going to pay a cost of business tax going forward. The most recent indication of this is “Patent Troll Strikes at the Very Heart of Google’s Empire.”

In my own little world of information retrieval, the challenges Google faces are easily viewed by running queries against the Google search system. In order to help my team and some of my clients navigate the interesting world of filtered results, sponsored results, personalized results, and irrelevant results, I have set up DeeperQI. It is a free service.

Google is fascinating and emitting some interesting signals. Not all of them are beamed at Motorola devices and marketers.

Stephen E Arnold, January 30, 2014

Sail Labs Sets Up a One Stop Download Shop

January 20, 2014

Rather than having to read and click through an entire Web site, Sail Labs Technology took a page out of simplicity’s book and placed all of their information in the Download Center. Sail Labs does not dump all of their information in one part of the Web site and wish visitors good luck. They follow the usual Web 2.0 format and follow a standard organization regiment. The Download Center acts as more of an index with the entire Web site’s information downloadable in PDFs.

Sail Labs is world-leading developer in speech technology and multimedia analysis.

“We address the markets of rich media indexing and communication mining, offering cutting-edge technologies in areas such as automatic speech recognition, speaker identification, entity-and topic detection across multiple languages, geographies, and sources. Visualization components (clustering, relationships, trends, GIS) and ontologies complete our product portfolio.”

The company is based in Vienna, Austria. Sail Labs has grown from a small company and continues to garner potential investors and create high-quality software. Sail Labs still remains loyal to its roots by being 100% Austrian owned. Its headlining products are the Media Mining Indexer that allows users to process speech from multiple sources and make real-time annotated text output and the OSINIT line creates actionable intelligence based on multiple sources.

Sail Labs may not have all of the glamour and glitz of Nuance, but they do have a compelling resume based on all of the information in the Download Center.

Whitney Grace, January 20, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

Fake Papers: Blame the Education System

December 30, 2013

I sure don’t want to blame a person. I am delighted that SMRITIweb.com provided some information about where the blame for misinformation, reformation, and disinformation should be placed. I will be able to enter 2014 with that problem shifted to the “education system.”

You can work through the explanation in “How I Published a Fake Paper, and Why It Is the Fault of Our Education System.” The write makes clear that Navin Kabra was not trying to become an expert in fabrication. The purpose was nobler.

There are a number of memorable statements in the article. May I highlight and comment on three. The Navin Kabra writing is in bold and my personal comment is in italic. Here we go:

  1. We submitted to two fake papers to this conference – one was complete gibberish auto-generated by using the online fake paper generator at SCIGen, while the other was auto-generated gibberish interspersed with completely ridiculous statements, movie dialogues, and other random things. Both these papers where accepted by this conference. 

    Are the conference organizers off the hook? What about the people who created the papers? What about the indexing systems and their “smart” software that filters certain content? I think the education system is a good target, but are there others?

  2. One section of the paper consists entirely of dialogues from the movie “My Cousin Vinny.”

    What about the super sophisticated duplication detection algorithms that some large Web indexing outfits perform? What about the plagiarism services? I find it interesting that in this era of smart software, a simple copying is just fine. Fascinating.
  3. The conference organizers allegedly said: “All the papers were reviewed by panelists from a panel of international experts using a double-blind review methodology. Only high quality papers were accepted All accepted papers were sent reviews from at least 3 reviewers each and the authors were then asked to update the papers based on the review comments. (No such thing happened with the 2 papers we submitted to the conference.)Imagine that. Conference organizers not doing what they said. I find that quite hard to believe. The conferences I have attended have been first class operations. No PR pitches. No fouled sessions. No glitches in putting Tab A in Tab B.

I find the education system performing at a level that I find acceptable. I don’t have any first  hand experience with schools any longer. I assume that the standards of excellence remain lofty. Articles about teaching assistants who give A’s because it is the path of least resistance are obviously minority views. The rumor that McKinsey is unhappy with the quality of new MBAs is similarly fallacious. The failure of the kid at Walgreen’s to know how to make change is also a once in a million fluke.

Yep, 2014 is shaping up to be a banner year for pinning problems on systems, not individuals. Notice that the author is not putting the responsibility on the individuals who generate false information:

The root of all evil is this stupid rule that mandates that all M.E./M.Tech. students must have two publications. Until that is changed, this sort of a thing will continue to thrive. (Note: I don’t really know for sure whether there is indeed such a rule, and whether it is applicable to all colleges in India – I’m just repeating what I heard from the students and the organizers of that conference.

Yep, the root of all evil.

And what about search?

Well, if the content is filtered and not findable, researchers won’t find correct or incorrect information. Is there an app for that? If you want tips for finding useful information online, check out the librarian-centric DeeoerQI.com.

Stephen E Arnold, December 30, 2013

Enterprise Search Market Diversifies and Competition Increases

December 30, 2013

The article Enterprise Search Pie on HadoopSphere makes an interesting analogy between a heating up pie and enterprise search. The article claims to bear witness to the altering landscape of the search market. Some of the trends noted include more in-your-face pricing by conservative software, a rising interest in Solr and Lucene-based offerings, cloud based setups and “key spike in the offerings basket.” Analytics for search and content also play a part in enterprise set up, especially for eDiscovery, e-commerce and decision and content management systems.

The article also explains how Cloudera Search is a part of this change:

“Cloudera Search has Apache Solr integrated with CDH, including Apache Lucene, Apache SolrCloud, Apache Flume, Apache Hadoop MapReduce & HDFS, and Apache Tika. Cloudera Search also includes integrations that make searching more scalable, easy to use, and optimized for both near-real-time and batch-oriented indexing. Cloudera has adapted the SolrCloud project  and leveraged Apache Zookeeper to coordinate distributed processing… From a customer perspective, this is an exciting time as Hadoop distributions venture out in broader territory offering them easier data mining capabilities.”

The article also emphasizes IBM Infosphere Data Explorer, once known as Vivismo, which works with BigInsights Hadoop distribution and LucidWorks Search with MapR, which provides data mining capabilities that ingests data into MapR through LucidWorks Search to make the data searchable. The article only imagines more “feature-rich” offerings in the future as competition and interest grow.

Chelsea Kerwin, December 30, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Incapsula Study on Web Activity Gives Insight into Bot Behaviors

December 23, 2013

The article on BBC News Technology titled Bots Now ‘Account for 61% of Web Traffic’ expands on the data from a recent Incapsula study that found humans might only account for a shrinking minority of internet traffic. Last years figure was more like fifty/fifty, but this is not as scary as it might sound since most of the ‘bots’ causing this traffic are tools for search engines indexing website content. There are also other ‘good bots’ like those used by analytics companies rating website performances and other such tasks. The article describes some reservations about the numbers, according to Dr. Ian Brown of the Oxford University Cyber Security Centre:

“There will also be some unavoidable fuzziness in their data, given that they are trying to measure malicious website visits where by definition the visitors are trying to disguise their origin.” Despite the overall growth in bot activity, the firm said that many of the traditional malicious uses of the tools had become less common. It said there had been a 75% drop in the frequency spam links were being automatically posted.”

Part of the explanation for this drop is credited to Google’s vigilance over the last year in stamping out this practice. More good news, Incapsula also reported a 10% drop in hacking activities such as stealing credit cards and hijacking sites (grouped together under the term tool bot activities).

Chelsea Kerwin, December 23, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

Fulcrum Technologies Report Available: A New Xenky Profile

December 19, 2013

The Xenky Web site has published a new enterprise search vendor profile about Fulcrum Technologies, a company founded in Ottawa, Canada. For 10 minutes, you can flash back to 1983 when Fulcrum Technologies offered a comprehensive solution to organization-wide information retrieval. Then you can fast forward to the present because Fulcrum’s software continues to influence findability solutions in the market today. That’s a mind boggling span of 30 years. Stated another way, Fulcrum’s technology is aging. But how well?

Many of the concepts marketed as innovation by vendors in 2013 are quite similar and in some case almost identical to what Ful/Text and Search Server embodied. Want federated search? Fulcrum offered it. Need automated indexing? Fulcrum delivered. Require a knowledge centric system? Fulcrum said it had a solution for “intellectual assets.”

The journey of Fulcrum from start up to a unit of OpenText is instructive as well. The company had a number of owners before being acquired by Datamat, then PCDocs, next Hummingbird, and finally OpenText.

Was the company generating significant cash? Did it have a secret technology sauce protected by patents, successful deployments, and a cadre of loyal partners? Today’s enterprise search companies are following a technical and financial trail walked by Fulcrum.

This profile snapshots the company’s trajectory from its founding to its becoming a property of OpenText. You can access the free profile on the Xenky vendor profile page. Other free search vendor profiles are available for:

  • Convera
  • Delphes
  • Dieselpoint
  • Entopia
  • Fulcrum Technologies
  • SchemaLogic
  • Siderean
  • Verity.

These case studies provide insight into the challenges search vendors have faced in the past. Scanning several profiles reveals the similarity among systems. Please, read the disclaimer for these free, “historical” reports. Within limit, the information in the 15 to 25 reports may help answer the questions:

  • “Are search systems able to deliver a payback to their customers?”
  • “Have marketers created expectations software cannot meet?”
  • “Has information retrieval innovation for the enterprise stalled?”

The information is provided by Arnold Information Technology without charge. You may use the report’s content for your personal learning. Any other use requires prior written permission from ArnoldIT.

If you want to update, correct, or comment on the profile, please, use the comments section of Beyond Search. The Xenky site is not configured for visitor input.

Stephen E Arnold, December 19, 2013

Quote to Note: NLP and Recipes for Success and Failure

December 11, 2013

I read “Natural language Processing in the Kitchen.” The post was particularly relevant because I had worked through “The Main Trick in Machine Learning.” The essay does an excellent job of explaining coefficients (what I call for ease of recall, “thresholds.”) The idea is that machine learning requires a human to make certain judgments. Autonomy IDOL uses Bayesian methods and the company has for many years urged licensees to “train” the IDOL system. Not only that, successful Bayesian systems, like a young child, have to be prodded or retrained. How much and how often depends on the child. For Bayesian-like systems, the “how often” and “how much” varies by the licensees’ content contexts.

Now back to the Los Angeles Times’ excellent article about indexing and classifying a small set of recipes. Here’s the quote to note:

Com­puters can really only do so much.

When one jots down the programming and tuning work required to index recipes, keep in mind the “The Main Trick in Machine Learning.” There are three important lessons I draw from the boundary between these two write ups:

  1. Smart software requires programming and fiddling. At the present time (December 2013), this reality is as it has been for the last 50 years, maybe more.
  2. The humans fiddling with or setting up the content processing system have to be pretty darned clever. The notion of “user friendliness” is strongly disabused by these two articles. Flashy graphics and marketers’ cooing are not going to cut the mustard or the sirloin steak.
  3. The properly set up system with filtered information processed without some human intervention hits 98 percent accuracy. The main point is that relevance is a result of humans, software, and consistent, on point content.

How many enterprise search and content processing vendors explain that a failure to put appropriate resources toward the search or content processing implementation guarantees some interesting issues. Among them, systems will routinely deliver results that are not germane to the user’s query.

The roots of dissatisfaction with incumbent search and retrieval systems is not the systems themselves. In my opinion, most are quite similar, differing only in relatively minor details. (For examples of the similarity, review the reports at Xenky’s Vendor Profiles page.)

How many vendors have been excoriated because their customers failed to provide the cash, time, and support necessary to deliver a high-performance system? My hunch is that the vendors are held responsible for failures that are predestined by licensees’ desire to get the best deal possible and believe that magic just happens without the difficult, human-centric work that is absolutely essential for success.

Stephen E Arnold, December 11, 2013

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta