Hakia: No Brand Search Taste Test

January 20, 2010

If you want to run a query and see how Google’s results and Hakia’s results stack up, navigate to NoBrandSearch.com. Google uses its PageRank method plus the accretions added to the core voting system since 1998. Hakia is “a general purpose “semantic” search engine, dedicated to quality search experience. To achieve this goal, our team aspires to establish a new standard of focus, clarity and credibility in Web search.” If you are not familiar with Hakia, you can get basic information and learn about the company’s software and services on the firm’s Web site. For a slightly deeper look at the company’s approach, you can read the interview with Riza C. Berkan in the Search Wizards Speak series on ArnoldIT.com. A running score shows which search system’s results are perceived as “better”. Interesting to run some queries.

Stephen E Arnold, January 19, 2010

A freebie. I did visit the Hakia offices, and I think I got a bottle of water. Otherwise, this is a post done to point out this service. I will report the lack of payment to the Rural Business-Cooperative Service.

ChartSearch: Natural Language Querying for Structured Data

January 19, 2010

On Friday, January 15, 2010, the goslings and I were discussing natural language processing for structured information. Quite a few business intelligence outfits are announcing support for interfaces that eliminate the need for the user to formulate queries. SQL jockeys pay for their hybrid autos because most of the business professionals with whom they work don’t know SELECT from picking a pair of socks out of the drawer. We have looked closely at a number of systems, and each of them offers some nifty features. We heard a rumor about some hot,  new Exalead functionality. Our information is fuzzy, so we wish not to  speculate.

One of the goslings recalled that a former Web analytics whiz named Chris Modzelewski had developed an NLP interface for structured data. You can check out his approach in the patent documents he has filed. These are available from the cracker jack search system provided by the USPTO. His company ChartSearch, provides software and services to clients who want to find a way to give a plain vanilla business professional access to data locked in structured data tables and guarded by a business intelligence guru flanked by two Oracle DBAs.

ChartSearch uses a variant of XML and a rules based approach to locating and extracting the needed data. Once the system has been set up, anyone with a knowledge of Google can fire off a query to the system. The output is not a laundry list of results or a table of numbers. The method generates a report. His patent applications describe the chart generator, the search query parser, the indexing methods, the user interface, the data search markup language, and a couple of broader disclosures. If you are not a whiz with patent searching, you can start with US20090144318 and then chase the fence posts down the IP trail.

What makes this interesting is that the method has been veticalized; that is, a version of ChartSearch makes it easy to handle consumer data and survey data, special enterprise requirements, and companies that “sell” data but lack a user friendly report and analytic tool.

The founder is a whiz kid who skipped college and then dived into data analytics. If you are looking for a natural language interface to structured data, ChartSearch might be worth a look.

Stephen E Arnold, January 19, 2010

Nope, a freebie. I don’t even visit New York very often, so I can’t call on ChartSearch and demand a bottle of water. Sigh. I will report this to the New York City Department of Environmental Protection. Water is important.

ConceptSearching and Its Busy January 2010

January 14, 2010

Concept Searching (“Retrieval Just Got Smarter”) has had a busy January 2010.

The company made several announcements about its information retrieval software.

First the company inked a deal with Union Square Software to use the Concept Searching technology in Union Square’s Workspace product. The Union Square Workspace is an email, document, and knowledge management product for the construction industry.

Second, the company announced support for Microsoft Windows Server R2’s File Classification Infrastructure. Like other Microsoft centric solutions, Concept Searching provides a snap in that extends the features of the Microsoft product.

Third, the company landed a deal with the Consumer Products Safety Commission to deliver search and classification to the CPSC’s public Web site and for the corporate Intranet.

The company was founded in 2002 with the goal of developing statistical search and classification products that “delivered critical functionality… unavailable in the marketplace.” The company’s software processes text, identifies concepts, and allows unstructured information to be classified via semantic metadata. The company supports SharePoint and other platforms. The company says:

Concept Searching are the only company to offer a full range of statistical information retrieval products based on Compound Term Processing. Our unique technology automatically identifies the word patterns in unstructured text that convey the most meaning and our products use these higher order terms to improve Precision with no loss of Recall. The algorithms adapt to each customer’s content and they work in any language regardless of vocabulary or linguistic style.

The company’s headquarters is in the UK, and the firm’s marketing operations are in McLean, Virginia. If you want more information, you can download a 13 megabyte video from K2 Underground.

Stephen E. Arnold, January 14, 2010

Oyez, oyez. A freebie. I shall report this public service to Securities House next time I am in London.

Autonomy Targets Marketers

January 7, 2010

A number of pundits, poobahs, and mavens are beavering away with their intellectual confections that explain enterprise search in 2010. The buzz from those needing billable work is that enterprise search is gone goose (pun intended) and that niche solutions are the BIG NEWS for 2010.

I thought I wrote an article for Search Magazine two or three years ago that made this point. But the Don and Donna Quixotes of the consulting world are chasing old chimera. I nailed the real thing for Barbara Quint, one of my most beloved editors. With Gartner buying Burton Group, the azure chip crowd is making clear that the down market push of Booz, Allen (now a for fee portal vendor) and the up market push of the Gerson Lehrmans of the world is making their sales Panini toasty and squishy.

Against this background, I noted this Reuters’ news item: “Autonomy Interwoven Enables Marketers to Deliver the Most Relevant End-to-End Search Experience.” I have difficulty figuring out which articles branded as Reuters-created is from the “real” Reuters and which comes from outfits that are in the bulk content business (sorry I can’t mention names even though you demand this of me) and which comes from public relations firms with caviar budgets. You will have to crack this conundrum yourself.

The write up points out that Autonomy makes it possible for those engaged in marketing to provide their users with “relevant end-to-end search experience.” I am not clever enough to unwrap this semantic package. For me, the most interesting comment in the write up was:

A recent report published by Gartner entitled Leading Websites Will Use Search, Advanced Analytics to Target Content states: “Search technology provides a mechanism for users to indicate their desires through implicit values, such as their roles and other attributes, and explicit values such as query keywords.  Website managers, information architects, search managers and Web presence managers can adopt search technologies to improve site value and user impact.”  The research note goes on to say, “Choose Web content management (WCM) vendors that have robust search technologies or that have gained them through partnerships, acquisitions or the customization of open-source technology.”

The explanation of “most relevant end-to-end search experience” hooks in part to an azure chip consultant report (maybe a Gartner Group product?) that is equally puzzling to me. Here’s where I ran into what my fifth grade teacher, Miss Chessman, would have called a “lack of comprehension.”

  1. What the heck is relevant?
  2. What is end-to-end?
  3. What is search?
  4. What is experience?
  5. What is a Web presence manager?
  6. What is a robust search technology?

I try to be upfront about my being an old, addled goose. I understand that Autonomy has acquired a number of interesting technologies. I understand that azure chip consulting firms have to produce compelling intellectual knowledge value to pay their bills.

What I don’t understand is what the message is from Reuters (this “news” story looks like a PR release), from Autonomy (I thought the company sold the Intelligent Data Operating Layer, not experience), and from Gartner (what’s with the job titles and references to open source?).

I will be 66 in a few months, and I don’t think anyone in the assisted living facility will be able to help me figure out the info payload of this Reuters’-stamped write up. What happened to the journalism school’s pyramid structure? What happened to who, what, why, when, where, and how? Obviously I am too far down the brain trail to keep pace with modern communication.

Stephen E. Arnold, January 8, 2010

Oyez, oyez, I have to report to the Library of Congress, check out a dictionary, and admit to the guard on duty that I was not paid to explain I haven’t a clue about the meaning of this write up. I do understand the notion of rolling up other companies in order to get new revenue and customers, but this relevance and experience stuff baffles me. I am the goose who has been pointing out that “search sucks” for free too.

Christmas Cheer for Advertisers in 2010

December 28, 2009

Google has some Christmas cheer for advertisers… maybe early in 2010. One of my favorite Googlers, Ramanathan Guha, father of the programmable search engine, has teamed with a gaggle of Googlers to invent “Query Identification and Association.” The invention is a clever one, and it may address some of the shortcomings in US0046314. Advertisers want to get their ads in front of folks who have an interest in their products and services. The Guha gang has figured out to improve ad matching using some algorithmic magic, semantic seasoning, and Google’s brute computational cruncher. You can read about the method for improving ad matching to a user’s query before the user creates a query in US2009/0319517. Filed in June 2009, this puppy jumped from the USPTO’s processing machine on December 24, 2009. I thought DC was shut down for Christmas, but not the USPTO! Fish & Richardson, one of the Google’s go-to patent outfits, explains the invention this way:

Apparatus, systems and methods for predictive query identification for advertisements are disclosed. Candidate query are identified from queries stored in a query log. Relevancy scores for a plurality of web documents are generated, each relevancy score associated with a corresponding web document and being a measure of the relevance of the candidate query to the web document. A web document having an associated relevancy score that exceeds a relevancy threshold is selected. The selected web document is associated with the candidate query.

So clear. For me, the net net is simple: better ad matching means happier advertisers. Happier advertisers spend more money on Google ads.

Stephen E. Arnold, December 28, 2009

Oyez, oyez, I feel compelled to tell the USPTO that no one paid me to praise their Christmas eve work.

Google Nails Patent for Query Synonyms in Query Context

December 24, 2009

If you want to know how smart Steven Baker is, you won’t find the information in the Google index. His patent 7,409,383 is also a slippery fish. Where did he go? I have an email for him, a blog post about search quality, and some odd references to programming. Not much online as of December 22, 2009 at 9 pm Eastern. In fact, I have noticed in my addled goose way that some Google wizards are easy to find; for example, Jeff Dean. Others, like Steven Baker, are tough to find. Steven Baker and John Lamping (also a wizard)  invented the system and method disclosed in “Determining Query Term Synonyms with Query Context.” This type of process is significant, and at Google’s  scale, the invention is quite interesting. The crystal clear prose of Google’s full time and rental legal eagles says:

A method is applied to search terms for determining synonyms or other replacement terms used in an information retrieval system. User queries are first sorted by user identity and session. For each user query, a plurality of pseudo-queries is determined, each pseudo-query derived from a user query by replacing a phrase of the user query with a token. For each phrase, at least one candidate synonym is determined. The candidate synonym is a term that was used within a user query in place of the phrase, and in the context of a pseudo-query. The strength or quality of candidate synonyms is evaluated. Validated synonyms may be either suggested to the user or automatically added to user search strings.

You can breeze over to the USPTO and download this open source document. I recommend checking out the cross references to other Google patents, the method of organizing user queries over time, the numerical recipes disclosed, and the 19 claims. Another piece of the semantic puzzle nailed in my opinion. This invention at Google scale is darned nifty.

Stephen E. Arnold, December 23, 2009

Oyez, oyez, I wish to disclose that I was not paid to highlight this patent document nor to point out that Google engineer Steve Baker has become a tough lad to whom to link. I wonder why. Do you? He has had some interesting computing pals. Think Jon Kleinberg of Clever fame. Maybe I should write the Bureau of Missing Googlers?

Are Google Users Ready to Step Up to Fusion Tables? Nah.

December 16, 2009

WolframAlpha and Google have a tiny challenge. Both firms’ rocket scientists and algorithm wranglers understand the importance of herding data. Take this simple test. Navigate first to WolframAlpha and enter a word pair. Try UK population. Now navigate to Google’s public facing Fusion table demo here. What did you get? How did it work? Do you know why the systems responded as they did? How do you improve your query?

My hunch is that few readers of this Web log can answer these questions? Agree? Disagree? Well, I am not running an academic class, so if you flunked, that’s okay with me. I think most people will flunk, including some of the lesser lights at the Google and at WolframAlpha.

Against this background, the Google rolled out an API for Fusion tables. You can get the Googley story in the write up “Google Fusion Tables API.” My view is that Google’s moves in structured data are quite important, generally unknown, and essentially incomprehensible to those who suffered through high school algebra.

My opinion is that this API will result in some applications that will make Google’s significant commitment and investment in structured data more understandable. If you are ahead of the curve, the Google is on the march. If you have no clue what this post means, maybe you should think about changing careers. Wal+Mart greeter is somewhat less challenging that the intricacies of Google’s context server technology.

Stephen E. Arnold, December 16, 2009

Okay, I rode by Google’s DC headquarters. No one waved. No one paid me. I suppose I report this fact to the manger of the Union Station taxi dispatchers. Nah, those folks don’t care that this is a freebie either.

Kngine: Web 3.0 Search

December 15, 2009

A happy quack to the reader who alerted me to Kngine, not to be confused with Autonomy’s origin kinjin. I think both are pronounced in a similar way. Kngine (based in Cairo) is an:

evolutionary Semantic Search Engine and Question Answer Engine designed to provide meaningful search result, such as: Semantic Information about the keyword/concept, Answer the user’s questions, Discover the relations between the keywords/concepts, and link the different kind of data together, such as: Movies, Subtitles, Photos, Price at sale store, User reviews, and Influenced story. We working on new indexing technology to unlock meaning; rather than indexing the document in Inverted Index fashion, Kngine tries to understand the documents and the search queries in order to provide meaningful search result.

There is some information about Kngine’s plumbing in the High Scalability Web log. The system uses “semantic technology”. One interesting feature of the system is snippet search. The idea is:

Snippet Search results will consist of collection of rich ranked paragraphs rather than collection of documents links. Snippet Search paragraphs is semantically related to what you looking for (i.e. content what you looking) so we will be able to get what he looking for directly without open other pages.

Haytham El-Fadeel in his blog provided additional color about the search system. He wrote on September 4, 2009:

Kngine long-term goal is to make all human beings systematic knowledge and experience accessible to everyone. We aim to collect and organize all objective data, and make it possible and easy to access. Our goal is to build on the advances of Web search engine, semantic web, data representation technologies a new form of Web search engine that will unleash a revolution of new possibilities.

I ran a number of queries on the system. I found the results useful. My query for Amtrak provided relevant hits, some suggested queries, and a thumbnail.

kngine splash

You can contact the company at Info@Kngine.com.

Stephen E. Arnold, December 15, 2009

Okay, okay, someone fed me date nut bread this morning in the hopes I would write about their product. That did not work. I ate the date nut bread and wrote about this outfit in Cairo. I guess this shows that you can pay this goose, but the goose does what it wants. Honk.

Nstein Releases Semantic Site Search

November 23, 2009

Nstein, a company that offers a wide range of software systems, has released its 3S product. Details appear in “Nstein Technologies Releases Semantic Site Search Engine.” The story reported:

The 3S product is built utilizing the registered text-extracting expertise of Nstein for web site search and precise results.

I was initially confused by the 3S moniker. But the write up shed more light on the approach:

3S’s design allows it to source the search content from various stored indexes and published web sources. It works by sensing the attributes of the materials to index them in an orderly fashion. After arranging the materials, Nstein’s patented semantic fortification method is utilized for processing.

According to the article, Nstein’s first client is Gesca Digital, which is part of the Torstar-owned Olive network. Gesca focuses on French language content. I anticipate that the system will become available on www.cyberpresse.ca and www.testesaclaques.tv. I’ve added these sites to my “to review” list.

Stephen Arnold, November 23, 2009

I want to disclose to the Maritime Administration that I was not paid to write about a company near the Great Lakes. No cross border shenanigans for the goose.

A Startling Look at IBM and the Future of Search, IBM?

November 16, 2009

I find it amusing when “the future of search” is invoked. When that phrase occurs when discussing IBM, I enjoy the remark because IBM and search are not synonymous in my experience. Sure, IBM was * the leader * in search with its original STAIRS product. But since that golden era, IBM has floundered in search, buying companies, cutting deals with Endeca and Fast Search & Transfer, among others, and then embracing the Lucene open source search solution. I wrote about IBM’s commerce search recently and then did a search on IBM India’s eCommerce Web site. I reported that IBM’s own search products could not be located. So, that’s the future of search? I hope not.

A youthful looking person, Kas Thomas, who is an “analyst” begs to disagree with my view of IBM’s information retrieval capabilities. Navigate to “IBM, Lucene, and the Future of Search”. Mr. Thomas wrote:

A lot is at stake for IBM, too: The key pieces of IBM’s information-access strategy — including InfoSphere Content Assessment (ICA), InfoSphere Content Collector (ICC), and InfoSphere Classification Module (ICM) — all employ the OmniFind Enterprise Edition search infrastructure in various ways. With Lucene and UIMA occupying center stage, IBM is betting a lot on this technology.

I am not sure IBM is a betting organization. Lucene and other open source products are [a] lower cost and [b] a hedge against Microsoft and Google. IBM is in an information retrieval bind, and I don’t think Lucene is going to do much to release the pressure.

image

IBM is hunting for its search “ball”. Without a ball, IBM is not in the search game. Source: http://www.desbrophy.com/images/gallery/LostBall.JPG.JPG

Here’s why in my opinion:

  1. IBM does not understand search. The lead it enjoyed in the STAIRS era has been eroded because IBM focused on other types of systems. Since STAIRS version III (now devolved into SearchManager/370) was dumped to IBM Germany for revision, the commitment to search, information retrieval, and sophisticated content processing technologies has been pushed into a secondary position. IBM could have been the leader. Instead it is a partner to any company that supports UIMA. On the path to UIMA, IBM has purchased search technology lock, stock and barrel. Anyone remember iPhrase?
  2. IBM now finds itself struggling with Microsoft’s resurgence in search even if Microsoft’s best bet is the aging Fast ESP technology. IBM also sees that its “partner” Google is pushing into areas that IBM once considered beyond IBM’s core competence. (Think data management and collaboration.) Now IBM is without its own search technology and it has embraced open source as the path forward. My research indicates that this is a “cost based decision”. Open source is a wonderful idea for the IBM MBAs, but when applied to IBM’s own products, the Lucene search implementation is not up to par with offerings from such companies as Coveo or Exalead, for example.
  3. IBM has wizards working at its labs on very sophisticated content processing and information retrieval systems. In fact, Google’s current system owes a tip of the hat to the Clever system, which IBM did little to commercialize in a meaningful way. In addition, Google’s semantic context technology is from none other than former IBM Almaden researcher, Ramanathan Guha. IBM is, in my opinion, on a par with Xerox Parc in the ability to generate continuing revenue from content processing innovations.

To sum up, IBM and the future of search don’t go together flow trippingly on the tongue. IBM is increasingly a consulting company that is still hanging on to its software business. IBM, like SAP, is a company that is search challenged. The notion of “prospective standards”, another phrase used by Mr. Thomas, analyst, baffles me.

IBM—just like Google, Micr5osoft, and Oracle–is at its core a vendor of proprietary products and services. Search is a placeholder at IBM. If it were more, why didn’t IBM do something “clever” with Dr. Guha’s inventions? What’s happened to Web Fountain? Where’s the SearchManager/370 technology capability in Lucene? Answer: Lucene is a toy compared to SearchManager/370. IBM has dropped its ball in search in search. Now it is hunting for that ball in a dark, large, hot IBM conference room in White Plains. The future of search at IBM until that ball is found again.

Stephen Arnold, November 16, 2009

I want to disclose to the Treasury Department that I was not paid to point out that IBM uses software in order to generate vendor lock in to its proprietary software and systems. Why would a market driven company pay me to point out that Lucene is a means to an IBM end, not an example of the success of open source software?

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta