Search Rumor Round Up, Summer 2008

June 14, 2008

I am fortunate to receive a flow of information, often completely wacky and erroneous, in my redoubt in rural Kentucky. The last six months have been a particularly rich period. Compared to 2007, 2008 has been quite exciting.

I’m not going to assure you that these rumors have any significant foundation. What I propose to do is highlight several of the more interesting ones and offer a broader observation about each. My goal is to provide some context for the ripples that are shaking the fabric of search, content processing, and information retrieval.

The analogy to keep in mind is that we are standing on top of a jello dessert like this one.

jellow 2 brighter copy copy

The substance itself has a certain firmness. Try to pick it it up or chop off a hunk, and you have a slippery job on your hands. Now, the rumors:

Rumor 1: More Consolidation in Search

I think this is easy to say, but it is tough to pull off in the present economic environment. Some companies have either investors who have pumped millions into a search and content processing company. These kind souls want their money back. If the search vendor is publicly traded, the set up of the company or its valuation may be a sticky wicket. There have been some stunning buy outs so far in 2008. The most remarkable was Microsoft’s purchase of Fast Search & Transfer. SAS snapped up the little-known Teragram. But the wave of buy outs across the more than 300 companies in the search and content processing sector has not materialized.

Rumor 2: Oracle Will Make a Play in Enterprise Search

I receive a phone call or two a month asking me about Oracle SES10g. (When you access the Oracle Web site, be patient. The system was sluggish for me on June 14, 2008.)The drift of these calls boils down to one key point, “What’s Oracle’s share of the enterprise search market?” The answer is that its share can be whatever Oracle’s accountants want it to be. You see Oracle SES10g is linked to the Oracle relational database and other bits and pieces of the Oracle framework. Oracle’s acquisitions in search and retrieval from Artificial Linguistics more than a decade ago to Triple Hop in more recent times has given Oracle capability. As a superplatform, Oracle is a player in search. So far this year, Oracle has been moving forward slowly. An experiment with Bitext here and a deployment with Siderean Software there. Financial mavens want Oracle to start acquiring search and content processing companies. There are rumors, but so far no action, and I don’t expect significant changes in the short term.

Read more

Microsoft BIOIT: Opportunities for Text Mining Vendors

June 14, 2008

I came across Microsoft BIOIT in a news release from Linguamatics, a UK-based text processing company. If you are not familiar with Linguamatics, you can learn more about the company here. The company’s catchphrase is “Intelligent answers from text.”

In April 2006, Microsoft announced its BIOIT alliance. The idea was to create “a cross-industry group working to further integrate science and technology as a first step toward making personalized medicine a reality.” The official announcement continued:

The alliance unites the pharmaceutical, biotechnology, hardware and software industries to explore new ways to share complex biomedical data and collaborate among multidisciplinary teams to ultimately speed the pace of drug discovery and development. Founding members of the alliance include Accelrys Software Inc., Affymetrix Inc., Amylin Pharmaceuticals Inc., Applied Biosystems and The Scripps Research Institute, among more than a dozen industry leaders.

The core of the program is Microsoft’s agenda for making SharePoint and its other server products the plumbing of health-related systems among its partners. The official release makes this point as well, “The BioIT Alliance will also provide independent software vendors (ISVs) with industry knowledge that helps them commercialize informatics solutions more quickly with less risk.”

Rudy Potenzone, a highly regarded expert in the pharmaceutical industry, joined Microsoft in 2007 to bolster Redmond’s BIOIT team. Dr. Potenzone, who has experience in online with Chemical Abstracts, has added horsepower to the Microsoft team.

This week on June 12, 2008, Linguamatics hopped on the BIOIT band wagon. In its news announcement, Linguamatics co-founder Roger Hale said:

As the amount of textual information impacting drug discovery and development programs grows exponentially each year, the ability to extract and share decision-relevant knowledge is crucial to streamline the process and raise productivity… As a leader in knowledge discovery from text, we look forward to working with other alliance members to explore new ways in which the immense value of text mining can be exploited across complex, multidisciplinary organizations like pharmaceutical companies.

Observations

Health and medicine is an important player in the scientific, medical, and technical information sector. More importantly, health presages money. In the US, the baby boomer bulge is moving toward retirement, bringing a cornucopia of revenue opportunity for many companies.

Google has designs on this sector as well. You can read about its pilot project here. Microsoft introduced a similar project in 2006. You can read about it here.

Several observations are warranted:

  1. There is little doubt that bringing order, control, metadata and online access to certain STM information is a plus. Tossing in the patient health record allows smart software to crunch through data looking for interesting trends. Evidence based medicine also can benefit. There’s a social upside beyond the opportunity for revenue.
  2. The issue of privacy looms large as personal medical records move into these utility-like systems. The experts working on these systems to collect, disseminate, and mine data have good intentions. Nevertheless, this is uncharted territory, and when one explores, one must be prepared for the unexpected. The profile of these projects is low, seemingly controlled quite tightly. It is difficult to know if security and privacy issues have been adequately addressed. I’m not sure government authorities are on top of this issue.
  3. The commercial imperative fuels some potent corporate interests. These interests could run counter with social needs. The medical informatics sector, the STM players, and the health care stakeholders are moving forward, and it is not clear what the impacts will be when their text mining reveals hiterto unknown facets of information.

One thing is clear. Linguamatics, Hakia, and other content processing companies see an opportunity to leverage these broader industry interests to find new markets for its text mining technology. I anticipate that other content processing companies will find the opportunities sufficiently promising to give BIOIT a whirl.

Stephen Arnold, June 14, 2008

Goo Hoo: The Fox Is in the Hen House

June 13, 2008

I am breaking a self-imposed rule about ignoring Web search and focusing on the enterprise or behind-the-firewall search. I am also commenting about online advertising, another aspect of today’s world that is of little interest to me. But I can’t pass up the swarming of Web log authors who want to comment on the Google and Yahoo decision to tell the world that they are now going steady.

Goo Hoo is my moniker for this relationship. I have dipped and sampled about two dozen postings about this deal. You will want to read the comments of Stephen Shankland and his “Yahoo Inks Search-Ad Pact with Google”. It’s an excellent summary of the deal with some informed commentary about the value of the deal on Yahoo’s cash flow. I think this is important but it is not the main point of the tie up, a point to which I will return in the observations section of this news analysis. You will also want to read Henry Blodget’s summary of the key points of the deal. His perspective, as always, is helpful if you want a glimpse of how Wall Street reads the tea leaves on the back seat shenanigans of two high-profile Internet companies. Mr. Blodget’s June 12, 2008, commentary is here. You can grind through the mountain of links on Techmeme.com, Megite.com, and Newspond.com, among others.

The pick of the litter and the one that made me grin was Google’s own announcement here. Please, read this, preferably after taking a gander at my tongue-in-cheek essay called “Goo Jit Su: Google’s Art of Soft Force in Competitive Fights” here. When I wrote that piece in early 2008 for my KMWorld column but I decided it was too frisky for a “real” publication, not this wacky Web log with which I am now saddled.

Google says in its typical fun-free prose:

We have been in contact with regulators about this arrangement, and we expect to work closely with them to answer their questions about the transaction. Ultimately we believe that the efficiencies of this agreement will help preserve competition.

I quite like the phrasing about working closely with regulators. I also like the “preserve competition” phrase. I’m not sure if the logic is as crisp as that set forth in US20080140647, Google’s patent about universal search here, but I think the idea of preserving competition is memorable.

Observations

Let me offer my observations, which, as you know, are based on my view of the online world:

  1. The lawyers are going to be able to buy new cars on this tie up. No, I am not interested in working as a consultant to a law firm nor as an expert witness. But some folks will make a ton of money litigating about this action. The notion that a regulatory body has a firm grasp of how online works is going to get quite a test in the coming months. There is always the chance that the knife that cuts Google’s Achilles’ tendon is going to be wielded by a female attorney from Duke Law.
  2. The GOOG without much effort has managed to rain on the Microsoft parade. Just as the Redmond crowd gets free of the Don Quixote charge at Yahoo, Google announces that the Mountain View couple has become an item. I anticipate T shirts that say, “Goo Hoo”. Maybe I will have Zazzle make a few?
  3. Advertisers won’t know what’s happening for months, if ever. The GOOG makes it clear that no human is involved in the ad system. If Google wizards show a Madison Avenue type an algorithm, I’m not sure that will clear the fog about who does what, how, under what circumstances, and where the money actually goes.
  4. The consultants are going to have a banner quarter. I anticipate expensive analyses from consultancies world wide explaining the upside and the downside of this deal. Let me save you some money: Google wins. Yahoo is now wearing a shock collar and Google controls how much pain to administer and when. Microsoft is puzzled. Attorney are looking for new condos in Costa Rica and Belize. Regulators have a chance to make the six o’clock news for the next three to six months. And that competition point. Hmmm.

Maybe I should change “Goo Hoo” to “Boo Hoo”? The Googlers have done it again. Goo Jit Su. With little effort, the fox-like Google is in the hen house now.

Stephen Arnold, June 13, 2008

Hakia: Pulled by Medical Information Magnetism

June 13, 2008

A colleague and I visited with the Hakia team last summer after the BearStearns’ Internet conference. I’ve tracked the company with my crawlers, but I have not made an effort to contrast Hakia’s approach with that of Powerset, Radar Networks, and the other “semantic” engines now in the market.

I received a Hakia news release today (June 12, 2008), and I noticed that Hakia is following the well-worn path of many commercial databases in the 1980s. The point that jumped out at me is that Hakia is adding content to its index; specifically, the PubMed metadata and abstracts. This is a US government database, and it has a boundary. The information is about health, medicine, and closely related topics. Another advantage is that PubMed like most editorially-controlled scientific, technical, and medical databases has reasonably consistent indexing. Compared to the wild and uncontrolled content available on Web sites and from many “traditional” publishers, this content makes text processing [a] less computationally intensive because algorithms don’t have to figure out how to reconcile schema, find concepts, and generate consistent metadata. [b] Data sets like PubMed have some credibility. For example, we created a test Web site five years ago. We processed some general newspaper articles, posted them, and used the content for a text of a system called ExploreCommerce. Then we forgot about the site. Recently someone called objecting to a story. The story was a throw away, and not intended to be “real”. But if it’s on the Internet, it must be true echoed in this caller’s mind. PubMed has editorial credibility, which makes a number of text processing functions somewhat more efficient.

Kudos to Hakia for adding PubMed. You can read the full news release here. You can try the Hakia health and medical search here.

Several observations will highlight my thoughts about this Hakia announcement:

  1. The PR fireworks about semantic search have made the concept familiar to many people. The problem is that semantic search for me is a misnomer. Semantic technology, I think, can enhance certain content processing operations. I am still looking for a home run semantic search system. Siderean’s system is pretty nifty, and its developers are careful to explain its functionality without the Powerset-Hakia type of positioning. I know vendors will want to give me demonstrations and WebEx presentations to show me that I am wrong, but I don’t want any more dog and pony shows.
  2. My hunch is that using bounded content sets–Wikipedia, specific domains, or vertical content–allows the semantic processes to operate without burdening the companies with Google-scaling challenges. Smaller content domains are more economical to index and update. Semantic technology works. Some implementations are just too computationally costly to be applicable to unbounded content collections and the data management problems these collections create.
  3. Health is a hot sector. Travel, automobiles, and finance offer certain benefits for the semantic technology company. The idea is to find a way to pay the bills and generate enough surplus to keep the venture cats from consuming the management team. I anticipate more verticalization or narrow content bounding. It is cheaper to index less content more thoroughly and target a content domain where there is a shot at making money.
  4. It’s back to the past. I find the Hakia release a gentle reminder of our play at the Courier Journal & Louisville Times Co. with Pharmaceutical News Index. We chose a narrow set of content with high value to an easily identified group of companies. The database was successful because it was narrow and had focus. Hakia is rediscovering the business tactics of the 1980s and may not even know about PNI and why it was a money maker.

I’m quite enthusiastic about the Hakia technology. I think there is enormous lift in semantics in the enterprise and Web search. The challenge is to find a way to make semantics generate significant revenue. Tackling content niches may be one component of financial success.

Stephen Arnold, June 13, 2008

Silobreaker: Breaking thorough Information Access Silos

June 12, 2008

Silobreaker is an information access system that pushes the limits of search, content processing, and text analysis. The company makes its system available here. You can launch queries and manipulate a range of features and functions to squeeze meaning and insights from information.

Mats Bjore–former Swedish intelligence officer and McKinsey & Co. knowledge management consultant–asserts that certain types of “real world” questions may be difficult for search systems to answer. Echoing Google’s Dr. Peter Norvig, Mr. Bjore believes that human intelligence is needed when dealing with facts and data. He told Beyond Search:

We always emphasize the importance of using our technology for decision-support, not to expect the system to perform the decision-making for you. The problem today is that analysts and decision-makers spend most of their time searching and far too little time learning from and analyzing the information at hand. Our technology moves the user closer to the many possible “answers” by doing much of the searching and groundwork for them and freeing up time for analysis and qualified decision-making.

The low-key Mr. Bjore demonstrated the newest Silobreaker features to Beyond Search. Among the features that caught our attention was the point-and-click access to link analysis, mapping, and a useful “trends search” function.

Mr. Bjore said:

The whole philosophy behind Silobreaker is to move away from the traditional keyword based search query which generates just page after page of headline results and forces the user into a loop of continually adjusting the query to find relevance and context. We see the keyword-based query as a possible entry point, but the graphical search results enable the user to discover, navigate and drill down further without having to type in new keywords. No-one can imagine managing numerical data without the use of descriptive graphical representations, so why do we believe that we can handle vast quantities of textual data in any other way? Well we don’t think we can, and traditional search is proving the point emphatically. Today’s Silobreaker is just giving you a first glimpse of how we (and I’m sure others) will use graphics to bring meaning to search results.

Explaining sophisticated information access systems is difficult. Mr. Bjore drew an analogy that provides a glimpse of how technology extends the human decision mechanism. He said:

Silobreaker works like one of our dogs. Their eyes see what is in front of you, the ears hears the tone of voice, the nose smells what has happened, what is now and what’s around the corner.” I agree. Silobreaker is more than search; it’s an extension of the information envelope.

media trends

This graphic shows the key trends in the content processed by the system in a period specified by the user. When the system processes an organization’s proprietary information, a user can see at a glance what the key issues are. Silobreaker can combine internal and external data so that trend lines reflect trends from multiple sources.

The system is available to commercial organizations via a software as a service or an on-premises installation. Mr. Bjore characterized the pricing of the service as “very competitive.” You can contact the company by telephoning either the firm’s London, England, office at +44 (0) 870 366 6737 or the firm’s Stockholm, Sweden, office at +46 (0) 8 662 3230. If you prefer email, write sales at silobreaker dot com. More information about the company is here. Like Cluuz.com, Silobreaker ushers in the next-generation in information access and analysis.

Silobreaker: Sophisticated Intelligence

June 12, 2008

An Interview with Mats Bjore, Silobreaker

I met Mats Bjore at a conference seven, maybe eight years ago. The majority of the attendees were involved in information analysis. Most had worked from government entities or commercial organizations with a need for high-quality information and analysis.

bjore dogs copy

Mr. Bjore and two of his dogs near his home in Sweden.

I caught up with Mats Bjore, one of the wizards behind the Silobreaker service which I profiled in my Web log, in the old town’s Café Gråmunken. Since I had visited Stockholm on previous occasions, I asked the waiter for a cheese plate and tea. No herring for me.

Over the years, I learned that Mr. Bjore shared two of my passions: high-value intelligence and canines. One of my contacts told me that his prized possession is shirt emblazoned with “Dog Father”, crafted by his military client.

Before the waitress brought our order, I asked Mr. Bjore about his interest in dogs and Sweden’s pre-occupation with herring in a mind-boggling number of guises. He laughed, and I turned the subject to Silobreaker.

Silobreaker is one of a relatively small number firms offering a combination intelligence-centric solution to clients and organizations worldwide. One facet of the firm’s capabilities stems from its content processing system. The word “search” does not adequately describe the system. Silobreaker generates reports. The other facet of the company is its deep expertise in information itself.

The full text of my conversation with Mats Bjore appears below:

Where did the idea for Silobreaker originate?

Silobreaker actually has a long history in the sense of the word Silobreaker. When I was working in the intelligence agency and later at McKinsey & Co I was amazed of the knowledge silos that existed totally isolated from each other. I saw the promise of technology to assist in unlocking those Silos, however the big names at that time, Autonomy, Verity, Convera etc failed to deliver, big time… Disappointed and waiting for the technology of the future I registered the name Silobreaker.com , more like a wish for the perfect system. A couple of years later in 2003-2004 I was approached by a team of amazing people–Per Lindh, Björn Löndahl, Jimmy Mardell, Joakim Mårlöv and Kristofer Mansson. These professionals wanted to further develop their software into an intelligence platform. In 2005 my company Infosphere and the software company Elucidon joined forces and we created Silobreaker Ltd as a joint venture. One year later we consolidated software, service and consulting to one brand–Silobreaker.

Today, Silobreaker enables the breaking down of silos built from informational, knowledge, or mental bricks and mortar.

What’s your background?

I am a former lieutenant colonel in the Swedish Army. I was detailed to the Swedish Military intelligence Agency where I founded the Open Source Intelligence function in 1993.

After leaving the government, I became the Scandinavian Knowledge Manager for McKinsey & Company. After several years at McKinsey, I started my own company. Infosphere and the service Able2Act.com.

I am also a former musician in a group called Camping with Penguins. You know that I am a lover of dogs. Too bad you like boxers. You need to get a couple of my friends so you have a real dog. I’m just joking.

I know. I know. What are the needs that traditional search engines like Autonomy, Endeca, and Fast Search (now Microsoft) are not meeting?

Meaning and context and I would also say that traditional engines requires that you always know what to search for. You need to be an expert in your field to make to fully take advantage of the information in databases and unstructured text. With the Silobreaker technology the novice becomes an expert and the expert becomes a discoverer, It might sound like a sales pitch, but its true. Every day in my daily work I have need to jump into new areas, new industries and topics. There is no way that I can formulate keyword search nor have the time to digest 100 or a 1,000 articles that works in the mode of click and read, click and read. With Silobreaker and its technology I start very broadly and the system directly helps me to understand the context of large set of articles in different formats, from different repositories, from different topics. We call this a View 360 with an In Focus summary. Note: here’s an InFocus example provided to me after the interview.

In Focus

When I search in traditional systems based on the search/ read philosophy, I spend to much time searching and reading and too little time of sense making and analysis. With Silobreaker, I directly start with that process and I create new value, for me and for my clients.

In a conversation with one of the Big Brands in enterprise search, the senior VP told me that services producing answers are “just interfaces”. Do you agree?

“Just interfaces” might be a bit harsh on the companies that actually try to provide direct answers to searches – they actually have some impressive algorithms, but to a certain extent we agree. We simply don’t think that an “answer engine” solves any real information overload problem.

If you want to know “What’s the population of Nigeria” – fine, but Wikipedia solves that problem as well. But how do you “answer” the question “What’s up with iPhone”? There are many opinions, facts, news items, and blogs “out there”. Trying to provide an “answer” to any “question” is very hard to do, maybe futile.

We always emphasize the importance of using our technology for decision-support, not to expect the system to perform the decision-making for you. The problem today is that analysts and decision-makers spend most of their time searching and far too little time learning from and analyzing the information at hand. Our technology moves the user closer to the many possible “answers” by doing much of the searching and groundwork for them and freeing up time for analysis and qualified decision-making. Note: This is a 360 degree view of news from Silobreaker provided after the interview.

360 of an article

There’s significant dissatisfaction among users of traditional key word search systems. What’s at the root of this annoyance?

The more information that is generated, duplicated, recycled, edited and abstracted and in combination with the rapid proliferation of “ I never use a spell checker and I write in your language with my own set of grammar”—– the need for smarter system to actually find what you are looking for will increase. In a couple of years from now, we also see the demise of the mouse and keyboard and the emergence of other means of input, the keyword approach is not just it.

Keyword based search works reasonably well for some purposes (like finding your nearest Swedish herring restaurant), but as soon as you take a slightly more analytical approach it becomes very blunt as a tool.

There is no real discovery or monitoring aspect to keyword based search. The paradox is that you’ll need to know what you’re looking for in order to discover.

Matching keywords to documents doesn’t bring any meaning to the content nor does it put the content in context for the user.

Keyword based search is a bottom-ups approach to relevance. The burden is put entirely on the user to dissect large results in order to find relevant articles, who the key players are, how they relate to each other, and other factors.

This burden creates the annoyance and “research fatigue” and as a result users rarely go beyond the first page of results – hence the desperate hunt amongst providers for PageRank, but which may have no or little bearing on the users real needs.

The intelligence agencies in many ways are the true professionals in content analysis. Why have the systems funded by IN-Q-TEL, Interpol, and MI5/MI6 not caught on in the enterprise world?

These systems are often complex and their “end solutions” are often mix of different software that is not well integrated. We already see a change with our technology. Some government customers look at our free service at Silobreaker.com and have a chance to explore how Silobreaker works without sales people hovering over them.

We want our clients to see one technology with its pieces smoothly integrated. We want the clients to experience information access that, we believe, is far beyond our competitors’ capabilities.

Intelligence agencies have often acquired systems that are too complex and too expensive for commercial enterprises. Some of these systems have been digital Potemkins. These systems provide the user with no proof about why a certain result was generated.

Now, this “black box” approach might be okay when you have a controlled content set, like on the classified side within the intelligence community. But the “real world” needs to makes sense of unstructured information here and now.

You have plus 100,000 major companies in the world, and you have 200 or so countries. Basically the need for technology solutions is the same. For me it’s totally absurd that the government complicates their systems instead of looking at what is working here and now.

Furthermore I think one the reasons that government can do complex and sometimes fruitless projects is that some agencies don’t have to make money to survive. The taxpayers will solve that.

In the commercial sector–profit and time are essential. Another factor that corporations take into account when investing in a system such factors as ease of use.

With the usually high turnover in any industry, a system must be easy to use in order to reduce training time and training costs. In some government sectors, turnover is much lower. People can spend a great deal of time learning how to use the systems. Does this match your experience?

Yes, and I agree with your analysis. I had an email exchange with the chief technical officer of a major enterprise search vendor. He asserted that social search was the next big thing. When I pointed out that social search worked when the system ingested a large amount of information, much available covertly, he argued that general social information was “good enough”? Do you agree?

No I don’t. Now we are talking about quality of the information. If you would index and cross reference XING, Facebook, Linkedin you could display fantastic displays of the connections….. However, how many of this links between people are actually true (in the sense that they actually have met or even have some common ground)?

There is a very large set of people that try to get as many connections as possible, thus diluting the value of true connections. I agree that you need a significant amount of information in order to get a baseline. You also need to validate this kind of data with reality checks in other kind of information sources – offline and online.

My main company, Infosphere, did some research into the financial networks in the Middle East, the fact-based search (ownership, shareholdings, etc) provided one picture, then you have to add the family and social connections, the view from media, then look at resident clusters and other factors. We had more than 8,000 dots ( people ) that we connected. But we were just scratching on the surface.

The graphic displays in Silobreaker are quite useful. In a general description, what are you doing to create on the fly different types of information displays?

The whole philosophy behind Silobreaker is to move away from the traditional keyword based search query which generates just page after page of headline results and forces the user into a loop of continually adjusting the query to find relevance and context.

We see the keyword-based query as a possible entry point, but the graphical search results enable the user to discover, navigate and drill down further without having to type in new keywords. No-one can imagine managing numerical data without the use of descriptive graphical representations, so why do we believe that we can handle vast quantities of textual data in any other way. Well we don’t think we can, and traditional search is proving the point emphatically. Today’s Silobreaker is just giving you a first glimpse of how we (and I’m sure others) will use graphics to bring meaning to search results.

Is Silobreaker available for on premises and for SaaS (software as a service”? What do you see as the future access and use case for Silobreaker?

That’s a good question. Let me say that Silobreaker’s business model is divided into three parts.

First, we have a free news search service that eventually will be add-supported but whose equally important role is to show-case the Silobreaker technology and function as a lead generator for the enterprise offerings.

Second, Our Enterprise Service which is due to be released in September or October 2008 is an online, real-time “clipping service” aimed at companies, banks, consultants as well as government agencies and that will offer a one-stop shop for news and media monitoring from defining what you are monitoring to in-depth content aggregation, analysis and report generation. This service will come with a SaaS facility that enables the enterprise to upload its own content and use the Silobreaker technology to view and analyze it.

Third, we offer a Technology Licensing option. This could range from a license to embed Silobreaker widgets in your own site to a fully operational local Silobreaker installation behind your firewall and customized for your purposes and for your content.

Furthermore, parts of the Silobreaker technology are available as SaaS on request.

Let’s talk about content. Most search systems assume the licensee has content. Is this your approach?

Yes and no, we can facilitate access to some content and also integrate crawling with third-party suppliers or if its very specific assist with specialty crawling.

On top of that we can, of course, integrate the fact sheets, profiles, and other content from my other venture, Able2Act.com which gives any system and any content set some contextual stability.

What are the content options that your team offers? Is it possible to merge proprietary content and the public content from the sources you have mentioned?

Yes, the ideal blend is internal and external content. And that really sets our team apart. Most of the Silobreaker group works with information as the key focus on a daily basis, sometimes 24×7 on certain projects. In other words, we are end users that keeps our ear to ground for information. Most companies out there are either tech people or content aggregators that just sell. We are both.

When you look forward, what is the importance of mobile search? Does Silobreaker have a mobile interface?

Mobile “search” is an extremely important field where traditional keyword-based search just doesn’t cut it. The small screen size of mobile devices, and limited (and sometimes cumbersome) input capabilities is just not suitable for sifting through pages of search results just to find that you need another Boolean operator and have to start all over again. We believe that users must be given a much broader 360 view of what they’re searching for in order to get to the “nugget” information faster. Silobreaker does not currently offer a mobile interface, but needless to say we’re working on it.

What are the major trends that you see emerging in the next nine to 12 months in content processing?

That’s a difficult question. I can identify several areas that seem important to my clients: Contextual processing, cross media integration, side-by-side translations, and smart visualization. Note: I have inserted a Silobreaker link view screen shot Mr. Bjore provided me after our conversation.

silogreaker link  map

Observations

Silobreaker caught my attention when I saw a demonstration of the system before it was publicly available. The system has become more useful to intelligence professionals with each enhancement to the system. Compared to laundry-lists of results, the Silobreaker approach allows a person working in a time-compressed environment to size up, identify, and obtain the information needed. The system’s “smart software” shows that Silobreaker’s learning and unlearning function is part of the next generation of information tools. After accessing information with Silobreaker, I am reminded that key word search is a medieval approach to 21st century problems. Silobreaker’s ability to assist a decision maker makes it clear that technology, properly applied, becomes a force multiplier without pushing human judgment to the sidelines. In one of our conversations, Mr. Bjore drew a parallel between Silobreaker and the canines for which he and I share respect and affection. He said, “Silobreaker works like one of our dogs. Their eyes see what is in front of you, the ears hears the tone of voice, the nose smells what has happened, what is now and what’s around the corner.” I agree. Silobreaker is more than search; it’s an extension of the information envelope. Take a close look at this extraordinarily good system here.

Stephen E. Arnold, June 12, 2008

Prepping for Google’s Udi Manber Keynote: Another Datawocky Scoop

June 11, 2008

At the Gilbane Conference in San Francisco next week, the keynote speaker is Udi Manber, top wizard of search at Google. I think this post by Anand Rajaraman in Datawocky is not serendipitous. Once again, Googlers find that Datawocky is a trusted conduit of first class information. My view is that if you’re smart enough to know about Datawocky, then you have earned the right to get some juicy Googlebits before the rest of the world. Mr. Rajaraman delivers I believe a sneak preview of a Google revelation. Mr. Rajaraman, who is best buds with a number of Googlers, spoke with Peter Norvig, who is on leave in order to update his book about artificial intelligence. Dr. Norvig struck me as awfully chatty and well informed for someone on leave and writing a book. But we are in the midst of 40 days and 40 nights of information raining from Google’s senior managers. These folks instant message, MOMA post, and email 24×7, so it’s not surprising that high-powered Googlers are reading from the same page.

Mr. Rajaraman’s “How Google Measures Search Quality” is a very important essay, and I think it is a prime beef summary of what Mr. Manber will touch upon in his keynote at the Gilbane Conference on June 18th. Click here and read Mr. Rajaraman’s post. This link will also provide you with a link to the first part of Mr. Ramanathan’s write up about search quality and artificial intelligence at Google.

The key point in the Datawocky essay for me is this statement:

Google does not use such real usage data to tune their search ranking algorithm. What they really use is a blast from the past. They employ armies of “raters” who rate search results for randomly selected “panels” of queries using different ranking algorithms. These manual ratings form the gold-standard against which ranking algorithms are measured — and eventually released into service.

Google uses humans!

Let’s think about this. Google has legions of wizards. Google has a heck of a super computer. Google has more fancy math than a dozen universities. Yet Google relies on humans. I don’t know much about Google, but I know what human indexers can do; for example:

  1. Make judgments that algorithms at this time cannot match. Most companies have chopped humans out of the indexing and search analysis loop. Google is putting them in. Big news for me.
  2. Google’s use of artificial intelligence is useful but it may be paying off in areas other than search and advertising. Where does the AI deliver a hefty payload? That’s a question that warrants investigation.
  3. Ask, Microsoft, Yahoo and the other Web search engines are falling behind the GOOG in market share. If humans are Google’s secret sauce, the razzle dazzle technology from these three companies will not be able to close the gap. There and other competitors will have to have technology and the money to hire expensive, inefficient humans to make the search results better for actual users. Can these three companies invest in humans? I don’t know. But it’s clear that none of these three is able to slow down Googzilla on its march to search dominance.

Kudos to Mr. Rajaraman for getting another Google scoop. Now I won’t have to attend Mr. Manber’s lecture. I think I know what he’ll be saying. Google has a tendency to create talks and then have its top dogs “run the game plan.”

A happy quack to Techmeme.com for the link to Datawocky. I owe you one.

Stephen Arnold, June 12, 2008

Web Search Data: Maybe Right, Maybe Wrong but the Trend–Spot On

June 10, 2008

I just received in my trusty RSS feed the news that Hitwise has released its Web search market share data for May 2008. You should take a look at the data table here. I don’t think for a Kentucky minute that these numbers are dead accurate. I do think the data, generated with various mathematical voodoo and data from some cooperative folks, show trends.

Here’s what I mean. Google’s market share of Web searches in the sampled area which means the United States has risen over the last three years. Cross referencing my data, Google’s share of Web search has risen for a decade, and that’s not news. In May 2008, Google, says Hitwise number crunchers, accounted for 68 percent of Web search traffic.

Nope. The big insights from my point view are these:

  1. Microsoft’s market share has declined from 7.6 percent in May 2007 to 5.9 percent in May 2008. Limping aardvark Ask.com tallied a May 2008 share of 4.2 percent.
  2. Yahoo, despite the search announcement fusillades, lost share, dropping below 21 percent of Web search traffic in May 2007 to 19 percent in May 2008.

What are the trends? I’m not sure you will agree with my analysis, but this is a Web log, and it’s free.

First, Google keeps on increasing its share. The line goes up. I’m not even interested in whether Hitwise’s data are accurate. Over the time line I track, Google has yet to meet a competitor who can hobble Googzilla. I’m not sure Google is that great a search engine. What I do know is that the competitors’ systems are not able to convince users that their systems are better.

Second, Microsoft has been unable to crack the code for Web search. Maybe financial incentives and advertising will work. I think there’s more to search than $650 million data centers running Windows Servers with peer-to-peer technologies moving data from one behemoth to another. The trend line is nosing down–closing in on Ask.com territory.

Third, Yahoo’s innovation engine isn’t firing on all cylinders. Social this, open that–the search is still leaving users cold. Now it’s drift down appears to be accelerating. I recall Yahoo’s chief technical wizard telling me in 2007, “We have some tricks that Google doesn’t have.” Maybe in their dreams. The reality, if the Hitwise data are accurate, is that Yahoo is slipping. In my files, I have a reference to Yahoo’s share of the search market in 2001 as 50 percent. Looks like a dip to me.

Agree? Disagree? Use the comments section to let me know if you have data that refute the start fact that the GOOG is running free, without meaningful competition, and frolicking in growth as Microsoft and Yahoo struggle to reverse their losses.

Stephen Arnold, June 10, 2008

A Google Amazon Balancing Act

June 10, 2008

CNet featured an essay by Charles Cooper. You can read it here. Click the link quickly. I continue to receive emails from people telling me my links are dead. Some sites move stories around; others just delete them. The title–“Google’s Right but Cloud Computing Timeline Isn’t So Clear–is the type that catches my attention, but the core information really hooked me.

Mr. Cooper references a talk by Googler Rishi Chandra at the Enterprise 2.0 Conference in Boston. You can read the CNet write up of Mr. Chandra’s presentation here. The main point of the Googler’s remarks is that cloud computing is the future. That’s an old message, but in the spring and summer Google transparency offensive of 2008, it’s becoming clear that Google believes in network computing. Okay. Maybe this is old news. The implication is that Google is serious about the enterprise market. Okay. Also old news. Mr. Cooper describes Mr. Chandra’s revelations and insights in an objective manner. I don’t think I could have done that were I reporting on the Googler’s talk.

Mr. Cooper does an excellent job of summarizing the Google “game plan”, and I won’t attempt to summarize his clear, tight writing.

But the payload of this must-read article is that Mr. Chandra made a gentle reference to the reliability of certain cloud-based services. When I saw this, my radar lit up. After years of ignoring Amazon’s push into services and features that are easy for Google to deliver, Google seems to be jerking its Googzilla-sized self into action. Amazon is making an effort to out Google Google at a fraction of the amount that Google spends on technology. Amazon, based on my research, is doing cloud computing on a very abstemious sum that is about one fifth of Google’s. My accountant father would be proud of Mr. Bezos’ penny pinching. I’m from a different generation, and I learned in the nuclear power work I did in the early 1970s that it pays to engineer certain functions without cutting corners. A flawed infrastructure is bad news in a BWR (boiling water reactor) and not-so-good news in a cloud-computing system.

For me, this single passing reference translates to increased pressure from the Google enterprise team. In the column I submitted to KMWorld which will run in either July or August 2008, I describe how “Google physics” work. Imagine my delight when Mr. Cooper provided additional information to buttress my analysis. I’m not going to explain “Google physics” in this post. You will have to wait until the KMWorld publication becomes available, but you can deduce that when the pressure goes up, the competitive arena behaves differently. The GOOG may be cranking up the heat now.

Kudos, Mr. Cooper. I appreciate a thorough reporting job.

Stephen Arnold, June 10, 2008

Mobile Search

June 10, 2008

I try to steer clear of mobile search. The notion is broad and like most terms used to describe information retrieval the phrase mobile search is frequently undefined. The idea, I assume, is that everyone knows what mobile search is.

I asked my neighbor what mobile search was, and he said, “I just use my phone for calls.” Functions like sending a query to Yahoo’s mobile service aren’t used very often by me, not at all by him, and probably not by you, gentle reader, either.

But if you you get text or graphic information on a mobile device, it’s mobile search. Most pundits feel that this definition is close enough for horse shoes. The problem is that it is the equivalent of cutting a cherry pie with a Husqvarna 455 Rancher chain saw, a popular model here in the hills of Kentucky.

mobile search disappoints

This is a photograph of a Beyond Search programmer expressing dissatisfaction with the mobile search function on an Apple iPhone and a Treo 650. “Both are terrible,” says ArnoldIT.com’s chief technical officer.

The USA Today business section ran this front page story on June 10, 2008: “Are Google, Yahoo the Next Dinosaurs?” I couldn’t find the story on USAToday’s Web site. If it does appear online, I think this is the link that will display it for you. If you can’t locate this story online, you may have to hunt for a tree-unfriendly printed version.

The story, written by Leslie Cauley, is that “many [vendors are] on the hunt for a way to cash in on wireless search.” The idea is that no one, not even Google, Microsoft, or Yahoo have cracked the code for mobile search. The “dinosaur” part is a bit of color. The notion is that because neither Google nor Yahoo have cracked the code for mobile search, these two firms could be left in the dust by younger, more hip innovators. Ergo: Google and Yahoo become the brontosauri of online with regard to mobile search. Ms. Cauley mentions an up-and-coming company called Medio, careful to explain that this is just one interesting company among many. You can read more about Medio here. Could Medio be the next Google?

Because mobile devices are more plentiful than other types of computers, whoever cracks the code can make boat loads of cash selling ads to mobile phone device users running search. I’m not going to cite USAToday’s statistics. I have heard that Gannett takes a dim view of old researchers tapping into their high-value statistical data captured in bar charts without data tables.

I urge you buy “America’s newspaper”; make Gannett’s accountants happy.

The challenges of mobile search are formidable. There are established business models ossified in the American telecommunications industry. There are device issues; namely, screens smaller than the 48 inches of flat panel I have in front of me at this moment, lousy keyboards, and users who aren’t too keen on taking time to paw through a laundry lists of results.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta