Hakia: Pulled by Medical Information Magnetism

June 13, 2008

A colleague and I visited with the Hakia team last summer after the BearStearns’ Internet conference. I’ve tracked the company with my crawlers, but I have not made an effort to contrast Hakia’s approach with that of Powerset, Radar Networks, and the other “semantic” engines now in the market.

I received a Hakia news release today (June 12, 2008), and I noticed that Hakia is following the well-worn path of many commercial databases in the 1980s. The point that jumped out at me is that Hakia is adding content to its index; specifically, the PubMed metadata and abstracts. This is a US government database, and it has a boundary. The information is about health, medicine, and closely related topics. Another advantage is that PubMed like most editorially-controlled scientific, technical, and medical databases has reasonably consistent indexing. Compared to the wild and uncontrolled content available on Web sites and from many “traditional” publishers, this content makes text processing [a] less computationally intensive because algorithms don’t have to figure out how to reconcile schema, find concepts, and generate consistent metadata. [b] Data sets like PubMed have some credibility. For example, we created a test Web site five years ago. We processed some general newspaper articles, posted them, and used the content for a text of a system called ExploreCommerce. Then we forgot about the site. Recently someone called objecting to a story. The story was a throw away, and not intended to be “real”. But if it’s on the Internet, it must be true echoed in this caller’s mind. PubMed has editorial credibility, which makes a number of text processing functions somewhat more efficient.

Kudos to Hakia for adding PubMed. You can read the full news release here. You can try the Hakia health and medical search here.

Several observations will highlight my thoughts about this Hakia announcement:

  1. The PR fireworks about semantic search have made the concept familiar to many people. The problem is that semantic search for me is a misnomer. Semantic technology, I think, can enhance certain content processing operations. I am still looking for a home run semantic search system. Siderean’s system is pretty nifty, and its developers are careful to explain its functionality without the Powerset-Hakia type of positioning. I know vendors will want to give me demonstrations and WebEx presentations to show me that I am wrong, but I don’t want any more dog and pony shows.
  2. My hunch is that using bounded content sets–Wikipedia, specific domains, or vertical content–allows the semantic processes to operate without burdening the companies with Google-scaling challenges. Smaller content domains are more economical to index and update. Semantic technology works. Some implementations are just too computationally costly to be applicable to unbounded content collections and the data management problems these collections create.
  3. Health is a hot sector. Travel, automobiles, and finance offer certain benefits for the semantic technology company. The idea is to find a way to pay the bills and generate enough surplus to keep the venture cats from consuming the management team. I anticipate more verticalization or narrow content bounding. It is cheaper to index less content more thoroughly and target a content domain where there is a shot at making money.
  4. It’s back to the past. I find the Hakia release a gentle reminder of our play at the Courier Journal & Louisville Times Co. with Pharmaceutical News Index. We chose a narrow set of content with high value to an easily identified group of companies. The database was successful because it was narrow and had focus. Hakia is rediscovering the business tactics of the 1980s and may not even know about PNI and why it was a money maker.

I’m quite enthusiastic about the Hakia technology. I think there is enormous lift in semantics in the enterprise and Web search. The challenge is to find a way to make semantics generate significant revenue. Tackling content niches may be one component of financial success.

Stephen Arnold, June 13, 2008


