Hakia Enterprise Search

March 27, 2010

A happy quack to the Hakia’s executive who confirmed the firm’s new enterprise search solution. Hakia added a page to its Web site identified as “Semantic Booster”. When I examined it, the page was about Hakia’s brand new enterprise search appliance. (Yikes, Beyond Search has a news scoop!)

image

The Hakia SemanticBooster appliance. Source: http://company.hakia.com/new/semanticbooster.html

According to the write up, Hakia offers “the lowest cost and the best performance.” The write up continues:

The main utility is to provide internal search function within organization’s document repository. Options include other vital functions like powering a consumer facing search, providing targeted Web search to the workers inside the corporation, external news monitoring and alerting, harvesting quality content from the Web to enrich organizations’ information repository, and categorization of documents for better management.

The options for the system include news monitoring (which seems to hook into Hakia’s invitation only service SenseNews), “automated content acquisition”, and semantic categorization.

hakia schematic

Source: Hakia.com

The solution is available as a fixed price solution “delivered in a box”. The document limit on the box is pegged at 30 million. When you need more capacity, just add another appliance. Hakia provides a 20 page description of its enterprise search solution here.

I chased down a Hakia wizard,Dr. Riza Berkan, CEO and Founder of hakia, who told me:

Semantic technology in enterprise search is now becoming such a competitive advantage that the corporations using it are making it part of their trade secret and remaining silent about it. We help corporations in this transition with our complete semantic solution with unprecedented performance.

Prices begin in the $20,000 range but you will want to deal directly with the company.You can contact the company by emailing bdev@hakia.com.

Stephen E Arnold, March 27, 2010

No one paid me to write this. I would report non payment to the GSA, but Hakia’s appliance is not listed on the GSA schedule. Maybe I was not running the correct search because the GSA search system is pretty darned good.

Dr. Riza Berkan, CEO & Founder of hakia, you can use in your article: 

"Semantic technology in enterprise search is now becoming such a competitive advantage that the corporations using it are making it part of their trade secret and remaining silent about it. We help corporations in this transition with our complete semantic solution with unprecedented performance."

More XML Expertise to Google

March 16, 2010

According to ZDNet, Tim Bray, founder of OpenText and collaborator with Ramanathan Guha on things XML, is now a Googler. The story “Ex-Sun Director Bray Joins Google’s Android Team” notes that Mr. Bray will work on the Android. The addled goose wants to point out that there are some big semantic Web guns in the Google arsenal now. Is Google becoming the big gun in the semantic Web or just the semantic Web?

Stephen E Arnold, March 16, 2010

Nope, a free one. No one paid me to reference semantic weapons. I will report this free write up to the FCC.

The Taxonomy Torpedo

March 9, 2010

Quite an interesting phone call today (March 8, 2010). Apparently the article “A Guide to Developing Taxonomies for Effective Data Management” caught this person’s attention. The write up boils down the taxonomy job to a couple of pages of tips and observations. Baloney in my opinion.

The caller wanted to have me and a gosling provide the green light to a taxonomy project. The method was to use a couple of subject matter experts from marketing and an information technology intern. The idea was to take a word list and use it to index content with the organization’s enterprise search system.

The called told me, “We let the staff add their own key words. There has been a lot of inconsistency. We will develop our controlled term list and that way we have date, time, and creator; the terms the users assign; and the words in our taxonomy. What do you think?”

What I think is that no one will be able to find some of the relevant data. I am surprised that so many vendors point out that their systems “discover” metadata and provide users with suggestions, lists of related content, and the ability to search by entities.

Doesn’t work.

Here’s why:

  1. Fancy interfaces (user experience in today’s lingo) requires consistent, appropriate, and known tags. Most organizations, fresh from doing taxonomy push ups for a day, have wildly inconsistent term lists. A user may know how to locate a document in an idiosyncratic way. If that method involves a controlled word, the user may not get the results she was expecting.
  2. Automatic processes work well when the information objects have enough substantive content to make key word indexing work. I have examined a number of organizations’ content and found inconsistencies in the way in which the organization referred to itself. The controlled terms were rarely used. When a query included a controlled term, the user was puzzled why the result set was not complete.
  3. Most organizations lack the expertise and resources to create a well-formed controlled term list. Ad hoc lists are useful sometimes to those who cooked them up. A comprehensive controlled term list is a great deal of work.

What’s this mean? The stampede to taxonomies will yield the same dissatisfaction that other, partially implemented search features. Talk is easy. Taxonomies and controlled term lists are tough to develop and even harder to keep current.

Stephen E Arnold, March 9, 2010

No one paid me to write this. I mention indexing, so I will report non payment to the Librarian of Congress, or maybe the librarian for the House library, or maybe the librarian for the Senate library. I wonder why there are three libraries for Congress.

The Google PSE circa 2007 Becomes News

March 4, 2010

Yep, another big surprise for the Google mavens, pundits, and azure chip crowd. You can get a good snapshot of the “discovery” at “Google Index to Go Real Time.” The big idea is that a Web publisher can “automatically submit new content to Google.” The news is a bit stale in my opinion. If you take a peek at the five patent documents submitted by Google in February 2007, you can get the full scoop, see code examples, and learn that this “method” has some interesting plumbing; namely, the context server. The inventor of this “new” method is a bright fellow in the Google engineering den. For the detail about this news, which dates from late 2005 or early 2006, check out US200700386616. The four related patent documents (filed on the same day by the way) and the team’s post PSE filings provide more color. The real question is, “What’s next?” I discuss this question in my 2007 monograph, Google Version 2.0, published in mid 2007 by Infonortics Ltd. in Tetbury, Glos. In my opinion reading about a fait accompli is probably not the best way to stay abreast of Google’s technology trajectory. The patent documents make clear how the method works. Let’s see. This is 2010, a bit more than three years since the patent documents appeared. This interval is a typical Google “deployment” interval. Check out the context server and ask, “What’s with this semantic Web stuff?”

Stephen E Arnold, March 4, 2010

No one paid me to write this post. When I get royalties, my publisher sometimes pay me. So I suppose this is a self funded post.

Search Engine Convera Drifts Off

February 16, 2010

The journey was a long one, beginning with scanning marketing brochures in the 1990s has filed for a certificate of dissolution. I think this means that Convera has moved from the search engine death watch to the list which contains Delphes, Entopia, and other firms.

convera splash

Convera splash page on February 15, 2010

You can read the official statement for a few more days on the PRNewswire site. The title of the announcement is / was, “Convera Corporation Files Certificate of Dissolution, Trading of Common Stock to Cease after February 8, 2010 Payment Date Set.” I am no attorney so maybe my lay understanding of “dissolution” is flawed, and Convera under another name will come roaring back. For the purposes of this round up of my thoughts, I am going to assume that Convera is comatose. I hope it bounces back with one of those miracles of search science. I am crossing my wings, even thought each has a dusting of snow this morning. Harrod’s Creek has become a mid south version of Nord Kap.

For me, the key passage in the write up was:

Convera Corporation announced today that it filed its Certificate of Dissolution with the Delaware Secretary of State on February 8, 2010, in accordance with its previously announced plan of complete dissolution and liquidation.  As a result of such filing, the company has closed its stock transfer books and will discontinue recording transfers of its common stock, except by will, intestate succession or operation of law.  Accordingly, and as previously announced, trading of the company’s stock on the NASDAQ Stock Market will cease after the close of business on February 8, 2010.

My Overflight search archive suggested that Excalibur Technologies was around in the 1980s. The founder was Jim Dowe, who was interested in neural networks. The notion of pattern matching was a good one. The technology has been successfully exploited by a number of vendors ranging from Autonomy to Verity. Brainware’s approach to search owes a tip of its Prince Heinrich hat to the early content snow plowing at Excalibur. Excalibur used most of the buzzwords and catchphrases that bedevil me today, including “semantic technology.”

image

Sample of a category search on the Retrieval Ware system. The idea is that you would click a category.

One of my former Booz, Allen & Hamilton colleagues made some dough by selling his ConQuest Software search-related technology to Excalibur Technologies. The reason was that the original Excalibur search system did not work too well. Excalibur, according to my Overflight archive, described itself as “leading provider of knowledge and media asset management solutions.”

Read more

Hakia Changes Results Display

February 15, 2010

Short honk: I learned that Hakia has revamped its results display in a write up called “Hakia Servers Up Comprehensive Universal Search in a New Design.” My recollection is that Google also uses the phrase Universal Search, but I may be muddling which search vendor uses which buzzword.

Interface is getting quite a bit of attention. I think part of the push is a response to Microsoft Bing’s user experience push. The other motivating factor is that search results are not that much different to most searchers. With Google getting about 70 percent of the search traffic, the other Web search folks have to find an angle. Hakia, the semantic search company, displays text search results, images and videos. The company includes categories to allow filtering with a click. I find the new interface interesting. i ran a number of test queries and found the results useful. Now the task is marketing and building traffic. Give Hakia a spin.

Stephen E Arnold, February 15, 2010

No one paid me to write this. A couple of years ago I got a bottle of water when I visited Hakia in Manhattan. I will report this to

Semantic Search Explained

February 11, 2010

A happy quack to the reader who sent me “Breakthrough Analysis: Tow  + Nine Types of Semantic Search”. Martin White (Intranet Focus) and I tried to explain semantic search in the text and the glossary for our Successful Enterprise Search Management. Our approach was to point out that the word “semantic” is often used in many different ways. Our purpose was to put procurement teams on alert when buzzwords were used to explain the features of an enterprise search system. Our approach was focused on matching a specific requirement to a particular function. An example would be displaying results in categories. The search vendor had to have a system that performed this type of value-added processing. The particular adjectives and marketing nomenclature were secondary to the function. The practicality of our approach was underscored for me when I read the Intelligent Enterprise article about the nine types of semantic search.

image

Source: http://writewellcommunications.com/wp-content/uploads/2009/06/homonyms1.jpg

I don’t feel comfortable listing the Intelligent Enterprise list, but I urge you to take  a close look at the write up. Ask yourself these questions:

  1. Do you understand the difference between related searches/queries, concept search, and faceted search?
  2. When you look for information, are you mindful of “semantic/syntactic annotations” operating under the covers or in plain view?
  3. Do you type queries of about three words, or do you type queries with a dozen words or more organized in a question?

Your answer underscores one of the most fragile aspects of search and content processing. A focus on the numerical recipes that different vendors use to deliver specific functions often makes little or no sense even to engineers with deep experience in search and content processing.

A quick example.

If you run a query on the Exstream (the enterprise publishing system acquired by Hewlett Packard), you can get a list of content elements. The system is designed to allow a person in charge of placing a message in a medical invoice or an auto payment invoice and other types of content assembly operations. The system is not particularly clever, but it works reasonably well. The notion of search in this enterprise environment is in my opinion quite 1980s, despite some nice features like saved projects along the lines of Quark’s palette of frequently used objects.

Now run a query on a Mark Logic based system at a major manufacturing company. The result looks a bit like a combination of a results list and a report, but if you move to another department, the output may have a different look and feel. This is a result of the underlying plumbing of the Mark Logic system. I think that describing Mark Logic as a search system and attributing more “meaningful” functions to it is possible, but the difference is the architecture.

A person describing either the Exstream or the Mark Logic system could apply one or more of the “two + nine” terms to the system. I don’t think those terms are particularly helpful either to the users or to the engineers at Exstream or Mark Logic. Here’s why:

  • Systems have to solve a problem for a customer. Describing what the outputs look like are descriptive and may not reflect what is going on under the hood. Are the saved projects the equivalent of an stored Xquery for MarkLogic?
  • Users need to have interfaces that allow them to get their work done. Arguably both Exstream and Mark Logic deliver for their customers. The underlying numerical recipes are essentially irrelevant if these two systems deliver for their customers.
  • The terminology in use at each company comes from different domains, and it is entirely possible that professionals of Exstream and Mark Logic use exactly the same term with very different connotations.

The discourse about search, content processing, and information retrieval is fraught with words that are rarely defined across different slices of the information industry. In retrospect, Martin and I followed a useful, if pragmatic, path. We focused on requirements. Leave the crazy lingo to the marketers, the pundits, and the poobahs. I just want systems to work for their users. Words don’t do this, obviously, which makes lingo easier than implementing systems so users can locate needed information.

Stephen E Arnold, February 11, 2010

No one paid me to put in this shameless plug for Martin White’s and my monograph, Successful Enterprise Search Management. This is a marketing write up, and I have dutifully reported this fact to you.

Extractiv: Content Provisioning

February 7, 2010

A happy quack to the reader who alerted me to Extractiv. The company is in the “content provisioning business”, and I did not know what this phrase meant. I know about “telecommunications provisioning”, but the “content” part threw me. I followed the links my reader sent me and located an interview (“Quick Q&A on Extractiv”) on the AndyHIckl.com blog. It took me about a half hour to figure out that the interviewer and the interview subject seemed to be the same person.

The key points that pierced the addled goose’s skull were:

  • The service “helps consumers ‘make sense’ of large amounts of unstructured text. The method is natural language processing
  • Unstructured text is transformed into structured text for sentiment tracking and semantic search
  • The technology is “unique distributed computing platform makes it possible for us to crawl — and extract content from — zillions of pages at the same time. (Our performance is pretty unbeatable, too: we’re currently able to download and extract content from 1 million pages in just under an hour.)”
  • “Extractiv’s a joint venture between two companies: 80Legs and Language Computer. It’s really a great match. 80Legs offers the world’s first truly scalable web crawling platform, while Language Computer provides some of the world’s best — and most scalable — natural language processing tools.”

The company says:

Extractiv is a new kind of content provisioning service which is making the Web truly actionable. Rather than simply passively “monitoring” the Web, our industry-leading data harvesting and content extraction goes out and delivers the information that really matters to you and your business. With Extractiv, it’s easy to build semantically-aware applications – regardless if you’re a newcomer to the Semantic Web or a deep believer in the power of semantic metadata.

For more information you will want to read “Learning More about Swinly and Extractiv”. The company’s Web site is at www.extractiv.com. The company is in its alpha stage. More information when it becomes available.

Stephen E Arnold, February 8, 2010

No one paid me to write this. I think this company is in Dallas, and I don’t go to Dallas. Texas makes me nervous. I will report this to the US House of Representatives with the “this” meaning money and my nervousness about the Lone Star State.

Microsoft and Mikojo Trigger Semantic Winds across Search Landscape

January 28, 2010

Semantic technology is blowing across the search landscape again. The word “semantic” and its use in phrases like “semantic technology” has a certain trendiness. When I see the word, I think of smart software that understands information in the way a human does. I also think of computationally sluggish processes and the complexity of language, particularly in synthetic languages like English. Google has considerable investment in semantic technology, but the company wisely tucks it away within larger systems and avoiding the technical battles that rage among different semantic technology factions. You can see Google’s semantic operations tucked within the Ramanathan Guha inventions disclosed in February 2007. Pay attention to the discussion of the system and method for “context”.

image

Gale force winds from semantic technology advocates. Image source: http://www.smh.com.au/ffximage/2008/11/08/paloma_wideweb__470x289,0.jpg

Microsoft’s Semantic Puff

Other companies are pushing the semantic shock troops forward. I read yesterday in Network World’s “Microsoft Talks Up Semantic Search Ambitions.” The article reminded me that Fast Search & Transfer SA offered some semantic functionality which I summarized in the 2006 version of the original Enterprise Search Report (the one with real beef, not tofu inside). Microsoft also purchased Powerset, a company that used some of Xerox PARC’s technology and its own wizardry to “understand” queries and create a rich index. The Network World story reported:

With semantic technologies, which also are being to referred to as Web 3.0, computers have a greater understanding of relationships between different information, rather than just forwarding links based on keyword searches.  The end game for semantic search is “better, faster, cheaper, essentially,” said Prevost, who came over to Microsoft in the company’s 2008 acquisition of search engine vendor Powerset. Prevost is still general manager of Powerset.  Semantic capabilities get users more relevant information and help them accomplish tasks and make decisions, said Prevost.

The payoff is that software understands humans. Sounds good, but it does little to alter the startling dominance of Google in general Web search and the rocket like rise of social search systems like Facebook. In a social context humans tell “friends” about meaning or better yet offer an answer or a relevant link. No search required.

I reported about the complexities of configuring the enterprise search system that Microsoft offers for SharePoint in an earlier Web log post. The challenge is complexity and the time and money required to make a “smart” software system perform to an acceptable level in terms of throughput in content processing and for the user. Users often prefer to ask someone or just use what appears in the top of a search results list.

Read more

Bitext and NaturalFinder: Breathes Life into Legacy Search Systems

January 21, 2010

In December 2009, the managing director of Bitext, a Madrid-based software development company, bought me a hamburger. As we poked at London’s version of a Whopper, I learned that Bitext had developed a way to take a not-so-useful legacy search system and give it new life. I thought the idea was a good one because many organizations do not have the money, time, or information technology expertise to do a “rip and replace” solution to their search woes.

What does Bitext offer?

The product is called NaturalFinder. Bitext’s new language technologies make it possible to enrich a user’s queries with linguistic knowledge. The system has emerged in the international market after being implemented in the Spanish national railroad and a number of commercial enterprises.

Users of enterprise search systems have to hit on the exact combination of key words to get the information needed to answer a business question. This takes time and, according to the company’s CEO, Antonio Valderrabanos, “inefficient searching costs companies money. We developed NaturalFinder to take the rough edges off existing enterprise search solutions. We have a search turbo charger which really delivers.”

I told Bitext that research reports from Net Strategy (Paris), the Association of Image and Information Management (USA), and my team here at ArnoldIT.com (USA) say one thing: users want systems that deliver needed information without trial and error.

“With companies asking employees to do more, guessing the secret word combination to unlock the treasure chest of corporate affronting is no longer acceptable. Bitext allows the user to key a word and our technology understands the meanings and displays suggested results. A single click takes the user to results that what the user intended,” Mr. Valderrabanos told me.

The Bitext system uses a range of linguistic technologies, including word stemming, synonyms, and homonyms, among others. These technologies are invisible to the user. Once a query is entered into a search system equipped with the Bitext software, that search system generates rich metadata such as names, concepts, and events.

The system does the work for the user. The result is, “quicker searches with no training and no system slowdown”, according to Mr. Valderrabanos. “One of the technical innovations is that the Bitext NaturalFinder technology requires no changes to an organization’s existing search system. It is load and go.”

After I finished my burger, I got a full demo of the new system. I told Mr. Valderrabanos:

In my opinion, your Bitext system is one of the first search enhancements that work out of the box. The Google Search Appliance gains greater utility without the time and complexity of coding a customized software widget. Bitext has broken new ground with NaturalFinder.

Bitext has engineered its system to work with Microsoft SharePoint Server, the Google Search Appliance, Oracle Text Search, Autonomy, Endeca, and legacy Fast ESP installations. The system enhances the utility of Lucene and other open source search solutions.

NaturalFinder supports English, French, and Spanish, as well as other languages. Bitext’s engineers can develop customized applications for customers who have special requirements such as those of intelligence and law enforcement agencies.

Bitext was founded in 2007 by experts in linguistics and based in Europe. The company is focused on making natural language text meaningful for computers. In addition, Bitext develops linguistic technology (dictionaries, grammars and ontologies) for OEM integration with any third-party application: from search to sentiment analysis, contextual advertising, spam filters, business intelligence, etc.

You can get more information about Bitext at www.bitext.com

Stephen E. Arnold, January 20, 2010

As I pointed out, I received a hamburger and a demonstration. I wish to disclose this to the FDA to make clear that the addled goose will sit through a demo as long as he is fed a quasi-Whopper. Alas! I received no cash. Oh, I got fries.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta