Kumo: Now a Wrestling Metaphor

May 24, 2009

Richard Adhikari’s “Can a Semantic Kumo Wrestle Google to the Mat?” explores the Redmond giant’s most recent effort to make headway in the Web search market. Mr. Adhikari reported that Kumo incorporates the Powerset natural language processing technology Microsoft purchased last year via its Powerset acquisition. NLP allegedly gives a search system more ability to understand a user’s query. For me, the most interesting comment in the write up was this passage which has as its source an expert on search, Rob Enderle of the Enderle Group:

“Kumo was designed from the ground up to be a Google killer,” Enderle told TechNewsWorld. “Microsoft put a lot of effort into it.”

Mr. Adhikari then included this statement, the attribution of which struck me as ambiguous:

The project may be a costly one for Redmond. The amount of time and money Microsoft has spent on Kumo has caused deep divisions within the vendor’s management, Enderle said. “I understand a lot of people on the Microsoft board want them to stop this project,” he added. “They want Microsoft to focus on things they do well and not waste any more money.”

In my opinion, it’s tough to know if this set of assertions is 100 percent accurate. What is clear to are these points:

  • The Google “killer” metaphor is now almost obligatory. The issue for Microsoft is not doing better in search in the eyes of eCommerceTimes.com. The object is to kill Google. Can this be true? More than Powerset and marketing will be needed to impede Googzilla. It’s been a decade with zero progress and I keep thinking that try and try again is a great philosophy but a decade?
  • The Board dissention, if true, may accelerate the discussion of splitting Microsoft into three or four units and deriving more shareholder value from the aging software company. With its share price in the value range, a break up would add some spice to a plain vanilla stock.
  • The semantic theme is clearly a PR magnet. Semantics play a role in many search systems, but most of the plumbing is kept out of sight. Users want answers, not polysyllabic promises. Google, despite its flaws, seems to deliver a search suite that appeals to about 70 percent of the search market.

For more on this Microsoft Google tussle, see my Bing Kumo article here.

Yahoo: Chasing Google with Semantic Intent

May 20, 2009

Information Week’s story “Yahoo Aims to Redefine What It Means to Search” which you must read here brought a tear to the eye of the addled goose. Yahoo aimed its former IBM and Verity “big gun” at Googzilla and fired a shot into the buttocks of their Mountain View neighbors. Mr. Cliburn, the author of the Information Week story, offered:

As described by Raghavan, Yahoo is directing its search efforts toward assessing user intent. When a user types “Star Trek,” Raghavan said, he doesn’t want 10 million documents, he wants actors and show times.

Information Week approaches the yawning gap between Google and Yahoo in a kinder, gentler way. Thomas Claburn wrote:

it’s perhaps understandable why Yahoo might want to re-frame the debate. Given its lack of success challenging Google directly — Google’s April search share in the U.S. reached 64.2%, a 0.5 point gain, while Yahoo’s search share fell to 20.4%, a 0.1 point decline, according to ComScore — Yahoo wants to change the game.

How will Yahoo deliver its better mousetrap?

Yahoo is relying on its partners to feed it with structured data.

Google’s approach includes algorithmic methods, the programmable search engine methods (Ramanathan Guha), and user intent (Alon Halevy). Yahoo, on the other hand, wants Web site operators and other humans to do the heavy lifting.

Yahoo’s focus on user intent could lead to happier users, if Yahoo Search can guess user intent accurately. It could also help Yahoo make more money from advertising. “If we can divine the user’s intent, that’s obviously of great interest to advertisers,” said Raghavan.

Advertisers want eyeballs of buyers. Google delivers eyeballs in droves. One percent of two billion is a useful segment. Yahoo has struggled to: [a] deliver segments that make advertisers abandon Google’s big data method for the flawed Panama system, [b] monetize its hot, high traffic services like Flickr in an effective manner, and [c] put real flamethrowers on the GOOG’s hindquarters, which is what Yahoo has seen since mid 2003.

Yahoo will need divine intervention to close the gap with Google. More importantly, neither Google nor Yahoo have an answer to the surging popularity of Twitter, Facebook, and other real time search systems. I am watching the sky for an omen that Woden is arriving to help the Yahooligans. So far, no portents, just PR.

Stephen Arnold, May 20, 2009

Creating Search Confusion

May 13, 2009

In the old days of fear, uncertainty, and doubt, one could count of giant software companies to baffle customers. The idea was that if a customer is not sure what to do, that customer will do nothing or go with the name brand.

Now media giants are creating FUD in the Web search space and, in some cases, muddying the water for enterprise search systems as well. Not a great example but a pretty good one is the article “New Search Engines Aspire to Supplement Google” here. The story ran on the CNN.com Web site.

The write up runs through a laundry list of alternative search engines; for example, Hakia (semantic system), Kosmix (Google centric mash up system), Wolfram Alpha (yet to be released system). The main point of the article, in my opinion, was this statement:

Instead of trying to be Google killers, these sites have more humble aspirations: to be alternatives to the industry giants.

The idea is one that I have been stating since the publication of my 2005 The Google Legacy here; namely, Google’s advantage is scale, cost control, and incremental improvements that are tough for most users and competitors to spot.

There’s another message in this article, and I think it is important. Newcomers in search are not going to knock off, slow, or kill Google quickly. Most of the systems are utilities.

The problem that is not addressed is that the average user has zero information about which search utility to use, under what circumstances, and how the returns enhance or duplicate what Google outputs.

In my work, when a user is confronted with a new search system, some users will test the system. The vast majority of users follow the well worn ruts that have worked in the past. For an astounding number of people worldwide, search means Google.

The challenge for the user is to figure out which new system delivers a payoff and then the new system vendor has to work quite hard to get those users habituated. With new search engines creating a global PR knee jerk, the result is that users will do the turtle; that is, pull in their head and go with what they know. The choice right now seems Google.

Stephen Arnold, May 13, 2009

CMS Experts and Vendors May Be Floundering

May 6, 2009

I had a very unsettling conversation with a young man who recently set up his own content management consulting firm. I met him when I arrived to register for the Boye 09 conference in Philadelphia. I won’t reveal his name or his consulting firm. I do want to highlight three comments he made when we spoke yesterday afternoon and offer a comment about the implications for CMS. When I read “There Was Much Noise about the Closure of Tripod, Sites.Google, Geocities”, I realized the changing of the guard was as much about the failure of CMS as about new ways to tame the bull of electronic information, particularly in an organization.

My three questions:

First, the individual said that he had worked for an integration company that had been hit with the financial downturn. The integrator had little choice. Reduce staff or shut its doors forever. The company provided a range of technical and management services to publishing companies wanting to better manage their content. With the rumors of cut backs at some of the US based information consulting companies along with the reduction in force at Capgemini in India I wrote about here, this news was not surprising. It did indicate that technology advisors are not indispensible. Everyone, it seems, is dispensable, sort of like Kleenex. I am not a people person, but even I could sense that the individual with whom I was speaking was shocked at the change in his employer’s fortunes.

The question that raced through my mind this morning was, “Why should people who work for small service firms be surprised when the top brass has to reduce costs quickly?” I find it difficult to escape the economic news. Perhaps those in service companies in technology fields perceive themselves as insulated from the difficulties the auto industry faces, for instance?

Second, the individual told me that he decided to become a consultant and explore new opportunities. I think this is an excellent strategy. My concern this morning arose from my realization that this young person did not have the benefit of doing hard time at one of the blue chip consulting companies. Second and third tier consulting companies use bright people, but if those people don’t learn the basics of building a client base, marketing expertise, and pricing to win jobs while making a profit, the risk of failure goes up. Even those used to the safety of the Bain or Boston Consulting safety net can and do fail when setting off on their own without a logo that people recognize on their business card.

My mind asked this question, “Why type of training or educational experiences are needed to get a bright young person into a consulting business with adequate knowledge to deal with the rigors of this profession?” I attended a Booz, Allen “charm school”, but I was fortunate. I think this young man needs that type of experience.

Third, a young person entering consulting has to have skills that cause people to part with their money. As I thought about this person’s description of his background, I thought it sounded good. After all, most organizations have big content problems.

This morning, however, I realized that the young man was using Harvard MBA speak to explain his expertise. The notion of “best practices”, “project management,” and “strategy” are ones that are quite difficult to deliver in a successful, profitable way to skeptical clients.

Now what’s this have to do with content management?

I think that as information gains more prominence as a strategic asset, CMS systems and consultants are getting into increasingly hot water. A software package that organizes organizational writing is useful, but it is not a system that creates information that is a strategic asset.

Judging from the comments in this sessions, many CMS experts and attendees are trying to keep their heads above water.  CMS costs are rising. Information is increasingly difficult to manage. The top guns in an organization want information to pay dividends. CMS is on the firing line with no bullet proof vest or much in the way of ammunition to defend themselves against irate users and cost watching financial officers. Open source solutions like Drupal may be one path to explore, but I think the boundaries between information value and CMS may swamp this sector and some of the leading players.

In short, CMS like enterprise search seems to be a troubled software sector.

Stephen Arnold, May 6, 2009

LexisNexis, Its Data and Fraud

May 3, 2009

Robert McMillan’s “LexisNexis Says Its Data Was Used by Fraudsters” here caught my attention. The story reported that “LexisNexis acknowledged Friday [May 1, 2009] that criminals used its information retrieval service for more tan three years to gather data that was used to commit credit card fraud.” Mr. McMillan added that “LexisNexis has tightened up the way it verifies customers.” The article noted that LexisNexis “was involved in other data breaches in 2005 and 2006.” Interesting. So 2005, 2006, 2009. Perhaps the third time will be the charm?

Stephen Arnold, May 2, 2009

NetBase and Content Intelligence

April 30, 2009

Vertical search is alive and well. Technology Review described NetBase’s Content Intelligence here. The story, written by Erica Naone, was “A Smarter Search for What Ails You”. Ms. Naone wrote:

organizes searchable content by analyzing sentence structure in a novel way. The company created a demonstration of the platform that searches through health-related information. When a user enters the name of a disease, he or she is most interested in common causes, symptoms, and treatments, and in finding doctors who specialize in treating it, says Netbase CEO and cofounder Jonathan Spier. So the company’s new software doesn’t simply return a list of documents that reference the disease, as most search engines would. Instead, it presents the user with answers to common questions. For example, it shows a list of treatments and excerpts from documents that discuss those treatments. The Content Intelligence platform is not intended as a stand-alone search engine, Spier explains. Instead, Netbase hopes to sell it to companies that want to enhance the quality of their results.

NetBase (formerly Accelovation) has developed a natural language processing system.Ms. Naone reported:

NetBase’s software focuses on recognizing phrases that describe the connections between important words. For example, when the system looks for treatments, it might search for phrases such as “reduce the risk of” instead of the name of a particular drug. Tellefson notes that this isn’t a matter of simply listing instances of this phrase, rather catching phrases with an equivalent meaning. Netbase’s system uses these phrases to understand the relationship between parts of the sentence.

At this point in the write up, I heard echoes of other vendors with NLP, semantics, bound phrase identification, etc. Elsevier has embraced the system for its illumin8 service. You can obtain more information about this Elsevier service here. Illumin8 asked me, “What if you could become an expert in any topic in a few minutes?” Wow!

The NetBase explanation of content intelligence is:

… understanding the actual “meaning” of sentences independent of custom lexicons. It is designed to handle myriads of syntactical sentence structures – even ungrammatical ones – and convert them to logical form. Content Intelligence creates structured semantic indexes from massive volumes of content (billions of web-pages and documents) used to power question-and-answer type of search experiences.

NetBase asserts:

Because NetBase doesn’t rely on custom taxonomies, manual annotations or coding, the solutions are fully automated, massively scalable and able to be rolled-out in weeks with a minimal amount of effort. NetBase’s semantic index is easy to keep up-to-date since no human editing or updates to controlled vocabulary are needed to capture and index new information – even when it includes new technical terms.

Let me offer several observations:

  • The application of NLP to content is not new and it imposes some computational burdens on the search system. To minimize those loads, NLP is often constrained to content that contains a restricted terminology; for example, medicine, engineering, etc. Even with a narrow focus, NLP remains interesting.
  • “Loose” NLP can squirm around some of the brute force challenges, but it is not yet clear if NLP methods are ready for center stage. Sophisticated content processing often works best out of sight, delivering to the user delightful, useful ways to obtain needed information.
  • A number of NLP systems are available today; for example, Hakia. Microsoft snapped up PowerSet. One can argue that some of the Inxight technology acquired first by Business Objects then by the software giant SAP are NLP systems. To my knowledge, none of these has scored a hat trick in revenue, customer uptake, and high volume content processing.

You can get more information about NetBase here. You can find demonstrations and screenshots. A good place to start is here. According to TechCrunch:

NetBase has been around for a while. Originally called Accelovation, it has raised $9 million in two rounds of venture funding over the past four years, has 30 employees…

In my files, I had noted that the funding sources included Altos Ventures and ThomVest, but these data may be stale or just plain wrong. I don’t have enough information about Netbase to offer substantive comments. NLP requires significant computing horsepower. I need to know more about the plumbing. Technology Review provided the sizzle. Now we need to know about the cow from which the prime rib comes.

Stephen Arnold, April 30, 2009

Rating a Search Engine

April 26, 2009

I am in a bit of a quandary. Martin White and I spent about 10 months writing down what we have learned in our combined 60 years of information, search, content processing, and information management. The result was a monograph that summarized in about 120 pages the method for reducing the likelihood of failure when implementing a search system. You can learn more about Successful Enterprise Search Management here.

I received a link to the article “How Not to Rate a Search Engine” here. I enjoy reading these types of how to’s. You may find some of the tips useful. The phrase that caught my attention was, “As one of my colleagues at Powerset always likes [sic] to remind me: this is rocket science.”

I agree.

Stephen Arnold, April 26, 2009

Google Base Tip

April 23, 2009

Google Base is not widely known among the suits who prowl up and down Madison Avenue. For those who are familiar with Google Base, the system is a portent of Googzilla’s data management capabilities. You can explore the system here. Ryan Frank’s “Optimizing Your Google Base Feeds” here provides some some useful information for those who have discovered that Google Base is a tool for Google employment ads, real estate, and other types of structured information. Mr. Frank wrote:

It is also important to note that Google Base uses the information from Base listings for more than just Google OneBox results. This data may also be displayed in Google Product Search (previously Froogle), organic search results, Google Maps, Google Image Search and more. That adds up to a variety of exposure your site could potentially receive from a single Google Base listing.

Interesting, right? Read the rest of his post for some useful information about this Google service.

Stephen Arnold, April 23, 2009

Personalized Network Searching: Google after People Search

April 22, 2009

The hounds of the Internet are chasing Google’s “Search for Me on Google”. I can’t add to that outpouring of insight about technology that is exciting today but dated by Google time standards. I can, however, direct your attention to US 7,523,096, “Methods and Systems for Personalized Network Searching.” You can download this patent from the USPTO. The document was published on April 21, 2009, and was filed on December 3, 2003. You may want to read the background of the invention and scan the claims. The diagrams are standard Google fare, leaving much to the reader who must bring an understanding of other Google subsystems to the analysis. To put the Search on Me discussion into context, here’s the abstract for the granted patent, now almost six years old:

Systems and methods for personalized network searching are described. A search engine implements a method comprising receiving a search query, determining a personalized result by searching a personalized search object using the search query, determining a general result by searching a general search object using the search query, and providing a search result for the search query based at least in part on the personalized result and the general result. The search engine may utilize ratings or annotations associated with the previously identified uniform resource locator to locate and sort results.

This is an important invention attributed to Stephen Lawrence and Greg Badros. Both have made substantive contributions to Google in the past. You may want to examine the current people search and then check out the dossier invention that I have written about elsewhere. There are some interesting enhancements to the core dossier technology in the future. My assertion is that Google moves slowly. When these “innovations” roll out, some are surprised. The GOOG leaves big footprints in my experience. Where’s Pathfinder when one needs him?

Stephen Arnold, April 22, 2009

Semantic Roll Up: The Effect of Financial Compression

April 21, 2009

A flurry of emails arrived today about the tie up among several companies with good reputations but profiles that are lower than those enjoyed by Autonomy and Endeca. You can read the official news announcement here about the deal among Attensity, Empolis GmbH, and Living-e AG. The conflation is called The Attensity Group. Here’s a snapshot of each company based on the information I ratted out of my files in the midst of new carpet, painting, and hanging new boxer dog pictures:

  • Attensity. Deep text processing. Started in the intel community. Probed marketing. Acted as ring master for the tie up.
  • Empolis GmbH. (Link was dead when I checked it  on April 20, 2009.) A distribution and archiving system and file based content transformation. Orphaned after parent Bertelsmann faced up to the realities facing the dead tree crowd. Now positions itself in knowledge management.
  • Living-e AG. Provides software products that enable efficient information exchange. Web content management, behavior analysis. Founded in 2003 as WebEdition Software GmbH.

The news release refers to the deal as a “market powerhouse”. This is the type of phrase that gets me to push the goslings to the computer terminals to do some company monitoring.

It’s too early for me to make a call about the product line up the company will offer. Should be interesting. Some pundits will make an attempt to presage the future. Not this silly goose. The customers will decide, not the mavens.

Stephen Arnold, April 21, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta