Semantic Web: Remember That for Enterprise Search?

October 26, 2014

You can find an interesting discussion of the Semantic Web on Hacker News. Semantic Web search engines have had a difficult time capturing the imagination of the public. The write up and the comments advance the notion that the Semantic Web is alive and well, just invisible.

I found the statement from super Googler Peter Norvig a window into how Google views the Semantic Web. Here’s the snippet:

Peter Norvig put it best: “The semantic web is the future of the web, and always will be.” (For what it’s worth, the startup school video that quote comes from is worth watching:

There are references to “semantic search” companies that have failed; for example, Ontoprise. There are links to clever cartoons.

The statement I highlighted was:

The underlying data just doesn’t necessarily map very well into the seem-web representations, so duplicates occur and possible values explode in their number of valid permutations even though they all mean the same handful of things. And it’s the read-only semantic-web, so you can’t just clean it, you have to map it.. Which is why I’m always amazed that works at all. And hopefully one day will be a thing. I remember being excited about for “liberating” messy data into clean linked data… but it turns out that you really don’t want to curate your information “in the graph”; it seems obvious, but traditional relational datasets are infinitely more manageable than arbitrarily connected nodes in a graph. So, most CMS platforms are doing somewhat useful things in marking up their content in machine-readable ways (RDFa, [as evil as that debacle was], HTTP content-type negotiation and so on) either out-of-the-box or with trivially installed plugins.

Ah, content management systems. Now that’s the model for successful information access as long as one does not want engineering drawings, videos, audio, binaries, and a host of proprietary data types like i2 Analyst Notebook files.

Worth checking out the thread in my view.

Stephen E Arnold, October 26, 2014

ArnoldIT Search Requirements Video

October 26, 2014

The goslings continue to experiment with short videos. The most recent on is about enterprise search requirements. The four minute YouTube program hits some highlights about the perilous process of licensing an enterprise search system. The video is located at

Donald C Anderson, October 26, 2014

Stone Temple Consulting Creates Knowledge Panel to Test Google, Siri, and Cortana

October 24, 2014

The article titled The Great Knowledge Box Showdown : Google Now vs. Siri vs. Cortana on Stone Temple Consulting compares the capabilities of the three platforms with 3086 queries.

“This was a…knowledge box comparison, not a personal assistant comparison. For purposes of this study, a “knowledge box” or “knowledge panel” is defined as content in the search results that attempts to directly answer a question asked in a search query…”

The article provides a long list of the sort of questions posed to the three platforms and the ways that the different systems answered correctly and incorrectly. Knowledge boxes might appear in the form of a carousel (a set of images or info above the search results) or as step by step instructions (provided, for example, in response to a question about how to make a certain recipe.) Ultimately, the article found,

Google Now returns twice as many results as Siri and nearly three times as many results as Cortana. This is clear evidence that Google is much further down the path with this type of work than either Apple or Cortana.”

It also is important to mention that according to the article errors occur 15 percent of the time in the best system for mission critical situations. Hmmm.

Chelsea Kerwin, October 24, 2014

Sponsored by, developer of Augmentext

Cautious Words on Microsoft Delve

October 22, 2014

Much buzz has been collecting around Microsoft’s Delve (formerly known as Oslo), the new search-and-discovery component of Office 365. ComputerWorldUK, however, raises some questions in, “Delve, Office Graph Must Transcend Office 365 to be Revolutionary.” The application is designed to tap into the company’s Office Graph machine-learning engine, but apparently has a way to go before fulfilling its creators’ goals. Reporter Juan Carlos Perez writes:

“If Microsoft realizes its Office Graph vision — and it may take years to materialize — then the way information workers interact with business software today and the way they find digital information will seem ancient and grossly inefficient. And Microsoft might fly past competitors in the enterprise with a technology that creates a sort of cockpit that automates and simplifies for employees the use of their Microsoft and non-Microsoft software.”

Delve began gradually rolling out to Office users in September, with the process to be completed sometime next year. The tool can be used as a conventional search engine, but it is designed to do much more. The article supplies this example:

“Delve knows that ‘Joe’ has a meeting in an hour, what its topic is and who will be in attendance. So, Delve proactively fetches relevant documents, files and information about the topic and the participants, and displays them on its dashboard, so Joe can be prepared for the meeting. Joe didn’t have to spend 30 minutes compiling all this data manually, assuming that he even would have had the time to do it, and if he did, that he would have been able to find the information, a big challenge for employees of all stripes everywhere.”

Sounds great! However, Perez notes that some open questions stand between here and the realization of Delve’s potential. Perhaps most obviously, being able to comb only Office applications for data is limiting; most of us don’t limit ourselves to Microsoft products (as much as the company might like us to.) There are considerable technical challenges there. Then there’s the privacy issue—will users find it’s “stealthy technology” creepy, and possibly be worried about nosy supervisors? Apparently, some more end-user controls are planned, but they may not address that concern. See the article for more thorough discussion of these issues. Will Delve overcome these obstacles?

Cynthia Murrell, October 22, 2014

Sponsored by, developer of Augmentext

LucidWorks and Its Clueless Graphic

October 21, 2014

I noted a link to a LucidWorks presentation in a tweet. I navigated to the presentation on Slideshare. The approach in the presentation was trendy. My approach to presentations is untrendy, so I am no judge.

I found one slide particularly suggestive of the company’s approach to marketing. On slide 20 I saw this:


I am not exactly certain what vowel the asterisk represents. The slides strikes me as possibly offensive. But I live in rural Kentucky. What do I know? I assume the message is clear.

Perhaps this type of marketing messaging is one of the reasons ElasticSearch appears to have more momentum in the commercialized open source search sector?

Here’s a representative ElasticSearch slide from “A Gentle Introduction to ElasticSearch.”


Which company’s presentation resonates with you? Cluelessness or clues?

Stephen E Arnold, October 22, 2014

ElasticSearch How To: A Useful Case Example

October 21, 2014

If you want to avoid the hassle of some proprietary search engines, you may want to take a look at this case study about ElasticSearch. Navigate to “Building Scalable Search from Scratch with ElasticSearch.” The author works through his process for putting ElasticSearch to work in content space with a variety of information; for example, products, text collections, and user information.

What makes this write up useful is the logical layout of the article and the inclusion of a requirements summary, block diagrams, and code snippets.

This type of solid user support is one reason ElasticSearch is outpacing some open source search competitors like LucidWorks and Nutch.

Highly recommended. (As far as I can tell, no mid tier consulting firms has surfed on this content. Dave Schubmehl, this may be an opportunity.)

Stephen E Arnold, October 21, 2014

Autonomy: 33 APIs

October 21, 2014

Curious about Hewlett Packard’s Autonomy APIs? You can see the list of 33 at If you are curious about Autonomy’s Big Data capabilities, you may be puzzled about the lack of explicit analytics application programming interfaces. Don’t be. The savvy developer selects operations, takes outputs, and pumps the data into a search based application, third party number crunching system, a data management system, or plain old Excel. What’s interesting is that the naming of the APIs makes clear the search-centric nature of Autonomy. The marketing of IDOL as a service or a cloud solution shifts attention away from search in my view.

Stephen E Arnold, October 21, 2014

Coveo Pivots to Federated Search

October 21, 2014

Through a post at their blog Coveo Insights, enterprise-search firm Coveo urges, “Power Your Customer Service with Unified Search Driven Knowledge.” The write-up gives a few reasons why such “omni-channel” (federated) search functionality is a wise choice for customer service. Writer and Coveo marketing director Tucker Hall explains:

“Customers … engage with companies across a growing number of channels — from self-service portals and contact centers, to social media and field service engagements. Today’s savvy customer expects (and deserves) a seamless and consistent service experience across all of these channels. Omni-channel customer service has now become essential for companies hoping to maximize customer engagement, satisfaction, and retention.

“Successful omni-channel customer service can prove difficult regardless of the specific technologies and systems an organization has in place. That’s because success demands that customers and support personnel alike have swift, intuitive access to the case-resolving knowledge and expertise they need, when and how they need it.”

Hall asserts that many companies are missing out because they “fail to appreciate” the reasons to choose federated search: data and expertise are located in many systems, crowd-sourcing is a thing, and analytics must be actionable. But you, dear reader, already knew those, didn’t you? More on these points can be found in Coveo’s solution brief on the subject (registration required).

It is interesting to note that, while Coveo and others focus on federated search, Microsoft is more into the search-without-searching method called Delve. Let many flowers bloom!

Coveo serves organizations large, medium, and small with solutions that aim to be agile and easy to use yet scalable, fast, and efficient. The company was founded in 2005 by members of the team which developed Copernic Desktop Search. Coveo maintains offices in the U.S., Netherlands, and Quebec.

Cynthia Murrell, October 21, 2014

Sponsored by, developer of Augmentext

Google and Objective Search Results

October 20, 2014

I recall that in one conference presentation in Boston about Google I attended, the Googler (Dave Girouard, now a Xoogler) emphasized the objectivity of Google search results. I have heard the objective claim from many quarters over the years.

I noted the PC Magazine story “Google ‘Fixes’ Stephen Colbert’s Height Listing.” Here’s the passage I noted:

While Google hasn’t exactly dropped a packet full of stock options off on Colbert’s doorstep, it has managed to address Google’s concerns about his height listing. First up, Colbert now appears as 5 foot 10.5 inches tall on Google’s search results when you query for “Stephen Colbert height.” If you prefer metric, his height is now listed as 1.79 meters… “-ish.”

From my hollow in Harrod’s Creek, this strikes me as an example of Google’s ability to modify search results quickly. I am not sure that the “objective” reference used by Mr. Girouard years ago applies today. If true, Google can intervene in the vaunted PageRank process and make results changes quickly and at will.

Are those claims of outfits like Foundem founded? Maybe, just maybe?

Stephen E Arnold, October 20, 2014

Google Scholar and Google Silos of Content

October 18, 2014

I read “Making the World’s Problem Solvers 10% More Efficient.” The article explains that the Google engineer who was “the key inventor” of Google Scholar is leaving the GOOG.

The write up discloses a couple of interesting factoids; for example:

  • Google Scholar has been around for 10 years
  • The founder of Google Scholar took charge of Google’s indexing in year 2000
  • The inventor of Google Scholar had to figure out how to keep Google’s index fresh; that is, new and changed content are reflected in search results.

The most interesting point in the write up is this statement (I have added the boldface):

Also, the nature of academic papers presented some opportunities for more powerful ranking, particularly making use of the citations typically included in academic papers. Those same scholarly citations had been the original inspiration for PageRank, the technique that had originally made Google search more powerful than its competitors. Scholar was able to use them to effectively rank articles on a given query, as well as to identify relationships between papers.

What happened to Eugene Garfield? I know, “Who?” So does this passage mean that today’s Google Web search discards functionality originally included in year 2000?

But the big point for me is that Google is supposed to deliver “universal search.” To make use of Google Scholar, one must navigate to and run separate queries. Is this universal? It seems to be old school siloing.

I like Google Scholar, but I think Google Web search may lack some of the refinements included in Google Scholar. Well, ads are important. Correction: Revenue is important. Perhaps Google will charge for access to Google scholar and compete directly with commercial database vendors? In my view, Google Scholar had a negative impact on commercial database vendors who charge libraries, corporations, and individual for access to curated and indexed professional and scholarly information. Google seems content to allow the Google Scholar service to drift along. Would more purpose be of value? Queries for patent 2012/0251502 A1’s “the isolated nucleic acid molecule includes the nucleotide sequence of SEQ ID NOs: 1 or 10, or a complement thereof. In another, the nucleic acid molecule includes a nucleotide sequence having at least 4, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, 4600, 4700, 4800, or 4900 contiguous nucleotides of the nucleotide sequence of SEQ ID NO: 1” would permit Google to match Ebola ads to Google Scholar content?

Stephen E Arnold, October 18, 2014

« Previous PageNext Page »