Search Rumor Round Up, Summer 2008

June 14, 2008

I am fortunate to receive a flow of information, often completely wacky and erroneous, in my redoubt in rural Kentucky. The last six months have been a particularly rich period. Compared to 2007, 2008 has been quite exciting.

I’m not going to assure you that these rumors have any significant foundation. What I propose to do is highlight several of the more interesting ones and offer a broader observation about each. My goal is to provide some context for the ripples that are shaking the fabric of search, content processing, and information retrieval.

The analogy to keep in mind is that we are standing on top of a jello dessert like this one.

The substance itself has a certain firmness. Try to pick it it up or chop off a hunk, and you have a slippery job on your hands. Now, the rumors:

Rumor 1: More Consolidation in Search

I think this is easy to say, but it is tough to pull off in the present economic environment. Some companies have either investors who have pumped millions into a search and content processing company. These kind souls want their money back. If the search vendor is publicly traded, the set up of the company or its valuation may be a sticky wicket. There have been some stunning buy outs so far in 2008. The most remarkable was Microsoft’s purchase of Fast Search & Transfer. SAS snapped up the little-known Teragram. But the wave of buy outs across the more than 300 companies in the search and content processing sector has not materialized.

Rumor 2: Oracle Will Make a Play in Enterprise Search

I receive a phone call or two a month asking me about Oracle SES10g. (When you access the Oracle Web site, be patient. The system was sluggish for me on June 14, 2008.)The drift of these calls boils down to one key point, “What’s Oracle’s share of the enterprise search market?” The answer is that its share can be whatever Oracle’s accountants want it to be. You see Oracle SES10g is linked to the Oracle relational database and other bits and pieces of the Oracle framework. Oracle’s acquisitions in search and retrieval from Artificial Linguistics more than a decade ago to Triple Hop in more recent times has given Oracle capability. As a superplatform, Oracle is a player in search. So far this year, Oracle has been moving forward slowly. An experiment with Bitext here and a deployment with Siderean Software there. Financial mavens want Oracle to start acquiring search and content processing companies. There are rumors, but so far no action, and I don’t expect significant changes in the short term.

Written by Stephen E. Arnold · Filed Under Feature, Google, Microsoft, Online (general), Search, Semantic, Social | Comments Off on Search Rumor Round Up, Summer 2008

Microsoft BIOIT: Opportunities for Text Mining Vendors

June 14, 2008

I came across Microsoft BIOIT in a news release from Linguamatics, a UK-based text processing company. If you are not familiar with Linguamatics, you can learn more about the company here. The company’s catchphrase is “Intelligent answers from text.”

In April 2006, Microsoft announced its BIOIT alliance. The idea was to create “a cross-industry group working to further integrate science and technology as a first step toward making personalized medicine a reality.” The official announcement continued:

The alliance unites the pharmaceutical, biotechnology, hardware and software industries to explore new ways to share complex biomedical data and collaborate among multidisciplinary teams to ultimately speed the pace of drug discovery and development. Founding members of the alliance include Accelrys Software Inc., Affymetrix Inc., Amylin Pharmaceuticals Inc., Applied Biosystems and The Scripps Research Institute, among more than a dozen industry leaders.

The core of the program is Microsoft’s agenda for making SharePoint and its other server products the plumbing of health-related systems among its partners. The official release makes this point as well, “The BioIT Alliance will also provide independent software vendors (ISVs) with industry knowledge that helps them commercialize informatics solutions more quickly with less risk.”

Rudy Potenzone, a highly regarded expert in the pharmaceutical industry, joined Microsoft in 2007 to bolster Redmond’s BIOIT team. Dr. Potenzone, who has experience in online with Chemical Abstracts, has added horsepower to the Microsoft team.

This week on June 12, 2008, Linguamatics hopped on the BIOIT band wagon. In its news announcement, Linguamatics co-founder Roger Hale said:

As the amount of textual information impacting drug discovery and development programs grows exponentially each year, the ability to extract and share decision-relevant knowledge is crucial to streamline the process and raise productivity… As a leader in knowledge discovery from text, we look forward to working with other alliance members to explore new ways in which the immense value of text mining can be exploited across complex, multidisciplinary organizations like pharmaceutical companies.

Observations

Health and medicine is an important player in the scientific, medical, and technical information sector. More importantly, health presages money. In the US, the baby boomer bulge is moving toward retirement, bringing a cornucopia of revenue opportunity for many companies.

Google has designs on this sector as well. You can read about its pilot project here. Microsoft introduced a similar project in 2006. You can read about it here.

Several observations are warranted:

There is little doubt that bringing order, control, metadata and online access to certain STM information is a plus. Tossing in the patient health record allows smart software to crunch through data looking for interesting trends. Evidence based medicine also can benefit. There’s a social upside beyond the opportunity for revenue.
The issue of privacy looms large as personal medical records move into these utility-like systems. The experts working on these systems to collect, disseminate, and mine data have good intentions. Nevertheless, this is uncharted territory, and when one explores, one must be prepared for the unexpected. The profile of these projects is low, seemingly controlled quite tightly. It is difficult to know if security and privacy issues have been adequately addressed. I’m not sure government authorities are on top of this issue.
The commercial imperative fuels some potent corporate interests. These interests could run counter with social needs. The medical informatics sector, the STM players, and the health care stakeholders are moving forward, and it is not clear what the impacts will be when their text mining reveals hiterto unknown facets of information.

One thing is clear. Linguamatics, Hakia, and other content processing companies see an opportunity to leverage these broader industry interests to find new markets for its text mining technology. I anticipate that other content processing companies will find the opportunities sufficiently promising to give BIOIT a whirl.

Stephen Arnold, June 14, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Text analytics, Text processing | Comments Off on Microsoft BIOIT: Opportunities for Text Mining Vendors

Goo Hoo: The Fox Is in the Hen House

June 13, 2008

I am breaking a self-imposed rule about ignoring Web search and focusing on the enterprise or behind-the-firewall search. I am also commenting about online advertising, another aspect of today’s world that is of little interest to me. But I can’t pass up the swarming of Web log authors who want to comment on the Google and Yahoo decision to tell the world that they are now going steady.

Goo Hoo is my moniker for this relationship. I have dipped and sampled about two dozen postings about this deal. You will want to read the comments of Stephen Shankland and his “Yahoo Inks Search-Ad Pact with Google”. It’s an excellent summary of the deal with some informed commentary about the value of the deal on Yahoo’s cash flow. I think this is important but it is not the main point of the tie up, a point to which I will return in the observations section of this news analysis. You will also want to read Henry Blodget’s summary of the key points of the deal. His perspective, as always, is helpful if you want a glimpse of how Wall Street reads the tea leaves on the back seat shenanigans of two high-profile Internet companies. Mr. Blodget’s June 12, 2008, commentary is here. You can grind through the mountain of links on Techmeme.com, Megite.com, and Newspond.com, among others.

The pick of the litter and the one that made me grin was Google’s own announcement here. Please, read this, preferably after taking a gander at my tongue-in-cheek essay called “Goo Jit Su: Google’s Art of Soft Force in Competitive Fights” here. When I wrote that piece in early 2008 for my KMWorld column but I decided it was too frisky for a “real” publication, not this wacky Web log with which I am now saddled.

Google says in its typical fun-free prose:

We have been in contact with regulators about this arrangement, and we expect to work closely with them to answer their questions about the transaction. Ultimately we believe that the efficiencies of this agreement will help preserve competition.

I quite like the phrasing about working closely with regulators. I also like the “preserve competition” phrase. I’m not sure if the logic is as crisp as that set forth in US20080140647, Google’s patent about universal search here, but I think the idea of preserving competition is memorable.

Observations

Let me offer my observations, which, as you know, are based on my view of the online world:

The lawyers are going to be able to buy new cars on this tie up. No, I am not interested in working as a consultant to a law firm nor as an expert witness. But some folks will make a ton of money litigating about this action. The notion that a regulatory body has a firm grasp of how online works is going to get quite a test in the coming months. There is always the chance that the knife that cuts Google’s Achilles’ tendon is going to be wielded by a female attorney from Duke Law.
The GOOG without much effort has managed to rain on the Microsoft parade. Just as the Redmond crowd gets free of the Don Quixote charge at Yahoo, Google announces that the Mountain View couple has become an item. I anticipate T shirts that say, “Goo Hoo”. Maybe I will have Zazzle make a few?
Advertisers won’t know what’s happening for months, if ever. The GOOG makes it clear that no human is involved in the ad system. If Google wizards show a Madison Avenue type an algorithm, I’m not sure that will clear the fog about who does what, how, under what circumstances, and where the money actually goes.
The consultants are going to have a banner quarter. I anticipate expensive analyses from consultancies world wide explaining the upside and the downside of this deal. Let me save you some money: Google wins. Yahoo is now wearing a shock collar and Google controls how much pain to administer and when. Microsoft is puzzled. Attorney are looking for new condos in Costa Rica and Belize. Regulators have a chance to make the six o’clock news for the next three to six months. And that competition point. Hmmm.

Maybe I should change “Goo Hoo” to “Boo Hoo”? The Googlers have done it again. Goo Jit Su. With little effort, the fox-like Google is in the hen house now.

Stephen Arnold, June 13, 2008

Written by Stephen E. Arnold · Filed Under Google, Microsoft, News, Online (general) | Comments Off on Goo Hoo: The Fox Is in the Hen House

Goo Jit Su: Google’s Art of Soft Force in Competitive Fights

June 13, 2008

Note to PR mavens. This is an essay based on my personal opinions. Please, don’t call me to set me straight. The author wears bunny rabbit ears. Thank you for your attention.

I have a friend who is a Georgia Tech computer wizard. I don’t think he went to class; he just took tests and aced them. But like me, he’s logged a number of years on his disc drives. But I recall fondly his many references to various martial arts. He was fascinated by akido, the art of soft force. He even introduced me to his sensei before the two of these unathletic looking lads went off to the Times Square subway station in the hopes of having a street gang try to mug them. Quite a duo: A math wizard and an umpteenth degree black belt from somewhere west of Marina del Rey.

The idea of “soft” fighting is that you use your opponent’s force to defeat the opponent. I remember one day when my son was in high school. My friend asked me, “Will your son wrestle me?”

Now, my son was a quite a good high school footballer and quite fit. He had muscles where I didn’t know one could have muscles. When he arrived home from school, I said, “Howie wants to wrestle you. Please, don’t hurt him. We have to get the system running tonight, and I don’t have time to take him to the hospital.” “Sure,” he said.

My son smiled and then without warning grabbed my friend’s arm and twisted it–or tried to twist it. This Georgia Tech engineer who looked like a Georgia Tech engineer, not a street fighter, turned toward my son and gently put him to the ground. My son went for a tackle and ended up in the marigolds. “That’s it,” my wife said. “You guys get out of my flowers.” My son asked my friend, “How do you do that?”

My friend said, “Ah, grasshopper, you need to study akido with my sensei. The secret is to use use your energy to achieve my ends. It is strength from soft force. It is power without effort.”

I thought it was baloney. But that “power without effort” idea stuck in my mind. I also quite liked the phrase soft force. I thought the silliness of dojos, pajamas, and strength with minimal effort was poppycock. But there it was: My fit son gently deflected and controlled by my Georgia Tech pal and his grasshopper parody from the old TV show Kung Fu.

Then I made the connection between my friend, a math and computer whiz, and Google. I realized that the GOOG was practicing its own black art of Goo Jit Su.

This is an illustration of Googzilla, dressed in traditional garb designed to make US wrestlers chuckle, using “soft force” to throw an opponent into a tizzy. Notice that Googzilla expends little effort. The opponent is headed for a shock with his energy redirected against him. Googzille seems to be lowering the opponent to the ground almost gently. Appearances can be deceiving.

Let me explain.

Google has demonstrated for the second time in less than a year its mastery of a new form of “soft” force. I call this form of fighting “Goo Jit Su”. Instead of defining it and using those cute line drawings that show how to kill an opponent with the crane or other animal inspired technique, let me give you two examples of Goo Jit Su.

Verizon

Google is a peanut compared to Verizon. It’s not just revenue. Verizon is big. It has the AT&T pre-Judge-Green DNA in its digital marrow. Verizon understands lobbying. Verizon knows how to win government contracts. Verizon knows how to squeeze money from its customers. I heard that in Washington, DC, even the drug dealers pay their Verizon wireless bills on time. No reason to annoy Mother Verizon.

Verizon’s approach to business combat is similar to extreme martial arts–anything goes. There’s one objective: triumph.

Google pulled its Goo Jit Su on Verizon. Without any effort beyond some letter writing and hiring familiar lobbyist type drones, Verizon agreed to open its wireless spectrum. I don’t have a clue what “open” means, but as a former Bell Labs contractor, work at Bellcore, and my USWest Web work, “open” is not what phone companies do. AT&T defined “open” in one way–AT&T’s way. Verizon’s agreeing to open spectrum is tantamount to one of the Mt. Rushmore faces turning up in the Poconos.

How did Google achieve this feat with little cost, modest effort, and generally disorganized PR? The answer, “Goo Jit Su.” Google used the force of Verizon the way my friend turned a collision with my son into a romp.

Written by Stephen E. Arnold · Filed Under Feature, Google, Microsoft, Search | 1 Comment

Hakia: Pulled by Medical Information Magnetism

June 13, 2008

A colleague and I visited with the Hakia team last summer after the BearStearns’ Internet conference. I’ve tracked the company with my crawlers, but I have not made an effort to contrast Hakia’s approach with that of Powerset, Radar Networks, and the other “semantic” engines now in the market.

I received a Hakia news release today (June 12, 2008), and I noticed that Hakia is following the well-worn path of many commercial databases in the 1980s. The point that jumped out at me is that Hakia is adding content to its index; specifically, the PubMed metadata and abstracts. This is a US government database, and it has a boundary. The information is about health, medicine, and closely related topics. Another advantage is that PubMed like most editorially-controlled scientific, technical, and medical databases has reasonably consistent indexing. Compared to the wild and uncontrolled content available on Web sites and from many “traditional” publishers, this content makes text processing [a] less computationally intensive because algorithms don’t have to figure out how to reconcile schema, find concepts, and generate consistent metadata. [b] Data sets like PubMed have some credibility. For example, we created a test Web site five years ago. We processed some general newspaper articles, posted them, and used the content for a text of a system called ExploreCommerce. Then we forgot about the site. Recently someone called objecting to a story. The story was a throw away, and not intended to be “real”. But if it’s on the Internet, it must be true echoed in this caller’s mind. PubMed has editorial credibility, which makes a number of text processing functions somewhat more efficient.

Kudos to Hakia for adding PubMed. You can read the full news release here. You can try the Hakia health and medical search here.

Several observations will highlight my thoughts about this Hakia announcement:

The PR fireworks about semantic search have made the concept familiar to many people. The problem is that semantic search for me is a misnomer. Semantic technology, I think, can enhance certain content processing operations. I am still looking for a home run semantic search system. Siderean’s system is pretty nifty, and its developers are careful to explain its functionality without the Powerset-Hakia type of positioning. I know vendors will want to give me demonstrations and WebEx presentations to show me that I am wrong, but I don’t want any more dog and pony shows.
My hunch is that using bounded content sets–Wikipedia, specific domains, or vertical content–allows the semantic processes to operate without burdening the companies with Google-scaling challenges. Smaller content domains are more economical to index and update. Semantic technology works. Some implementations are just too computationally costly to be applicable to unbounded content collections and the data management problems these collections create.
Health is a hot sector. Travel, automobiles, and finance offer certain benefits for the semantic technology company. The idea is to find a way to pay the bills and generate enough surplus to keep the venture cats from consuming the management team. I anticipate more verticalization or narrow content bounding. It is cheaper to index less content more thoroughly and target a content domain where there is a shot at making money.
It’s back to the past. I find the Hakia release a gentle reminder of our play at the Courier Journal & Louisville Times Co. with Pharmaceutical News Index. We chose a narrow set of content with high value to an easily identified group of companies. The database was successful because it was narrow and had focus. Hakia is rediscovering the business tactics of the 1980s and may not even know about PNI and why it was a money maker.

I’m quite enthusiastic about the Hakia technology. I think there is enormous lift in semantics in the enterprise and Web search. The challenge is to find a way to make semantics generate significant revenue. Tackling content niches may be one component of financial success.

Stephen Arnold, June 13, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search, Semantic | Comments Off on Hakia: Pulled by Medical Information Magnetism

Silobreaker: Breaking thorough Information Access Silos

June 12, 2008

Silobreaker is an information access system that pushes the limits of search, content processing, and text analysis. The company makes its system available here. You can launch queries and manipulate a range of features and functions to squeeze meaning and insights from information.

Mats Bjore–former Swedish intelligence officer and McKinsey & Co. knowledge management consultant–asserts that certain types of “real world” questions may be difficult for search systems to answer. Echoing Google’s Dr. Peter Norvig, Mr. Bjore believes that human intelligence is needed when dealing with facts and data. He told Beyond Search:

We always emphasize the importance of using our technology for decision-support, not to expect the system to perform the decision-making for you. The problem today is that analysts and decision-makers spend most of their time searching and far too little time learning from and analyzing the information at hand. Our technology moves the user closer to the many possible “answers” by doing much of the searching and groundwork for them and freeing up time for analysis and qualified decision-making.

The low-key Mr. Bjore demonstrated the newest Silobreaker features to Beyond Search. Among the features that caught our attention was the point-and-click access to link analysis, mapping, and a useful “trends search” function.

Mr. Bjore said:

The whole philosophy behind Silobreaker is to move away from the traditional keyword based search query which generates just page after page of headline results and forces the user into a loop of continually adjusting the query to find relevance and context. We see the keyword-based query as a possible entry point, but the graphical search results enable the user to discover, navigate and drill down further without having to type in new keywords. No-one can imagine managing numerical data without the use of descriptive graphical representations, so why do we believe that we can handle vast quantities of textual data in any other way? Well we don’t think we can, and traditional search is proving the point emphatically. Today’s Silobreaker is just giving you a first glimpse of how we (and I’m sure others) will use graphics to bring meaning to search results.

Explaining sophisticated information access systems is difficult. Mr. Bjore drew an analogy that provides a glimpse of how technology extends the human decision mechanism. He said:

Silobreaker works like one of our dogs. Their eyes see what is in front of you, the ears hears the tone of voice, the nose smells what has happened, what is now and what’s around the corner.” I agree. Silobreaker is more than search; it’s an extension of the information envelope.

This graphic shows the key trends in the content processed by the system in a period specified by the user. When the system processes an organization’s proprietary information, a user can see at a glance what the key issues are. Silobreaker can combine internal and external data so that trend lines reflect trends from multiple sources.

The system is available to commercial organizations via a software as a service or an on-premises installation. Mr. Bjore characterized the pricing of the service as “very competitive.” You can contact the company by telephoning either the firm’s London, England, office at +44 (0) 870 366 6737 or the firm’s Stockholm, Sweden, office at +46 (0) 8 662 3230. If you prefer email, write sales at silobreaker dot com. More information about the company is here. Like Cluuz.com, Silobreaker ushers in the next-generation in information access and analysis.

Written by Stephen E. Arnold · Filed Under Enterprise, News, Online (general), Search, Semantic, Text processing | Comments Off on Silobreaker: Breaking thorough Information Access Silos

Silobreaker: Sophisticated Intelligence

June 12, 2008

An Interview with Mats Bjore, Silobreaker

I met Mats Bjore at a conference seven, maybe eight years ago. The majority of the attendees were involved in information analysis. Most had worked from government entities or commercial organizations with a need for high-quality information and analysis.

Mr. Bjore and two of his dogs near his home in Sweden.

I caught up with Mats Bjore, one of the wizards behind the Silobreaker service which I profiled in my Web log, in the old town’s Café Gråmunken. Since I had visited Stockholm on previous occasions, I asked the waiter for a cheese plate and tea. No herring for me.

Over the years, I learned that Mr. Bjore shared two of my passions: high-value intelligence and canines. One of my contacts told me that his prized possession is shirt emblazoned with “Dog Father”, crafted by his military client.

Before the waitress brought our order, I asked Mr. Bjore about his interest in dogs and Sweden’s pre-occupation with herring in a mind-boggling number of guises. He laughed, and I turned the subject to Silobreaker.

Silobreaker is one of a relatively small number firms offering a combination intelligence-centric solution to clients and organizations worldwide. One facet of the firm’s capabilities stems from its content processing system. The word “search” does not adequately describe the system. Silobreaker generates reports. The other facet of the company is its deep expertise in information itself.

The full text of my conversation with Mats Bjore appears below:

Where did the idea for Silobreaker originate?

Silobreaker actually has a long history in the sense of the word Silobreaker. When I was working in the intelligence agency and later at McKinsey & Co I was amazed of the knowledge silos that existed totally isolated from each other. I saw the promise of technology to assist in unlocking those Silos, however the big names at that time, Autonomy, Verity, Convera etc failed to deliver, big time… Disappointed and waiting for the technology of the future I registered the name Silobreaker.com , more like a wish for the perfect system. A couple of years later in 2003-2004 I was approached by a team of amazing people–Per Lindh, Björn Löndahl, Jimmy Mardell, Joakim Mårlöv and Kristofer Mansson. These professionals wanted to further develop their software into an intelligence platform. In 2005 my company Infosphere and the software company Elucidon joined forces and we created Silobreaker Ltd as a joint venture. One year later we consolidated software, service and consulting to one brand–Silobreaker.

Today, Silobreaker enables the breaking down of silos built from informational, knowledge, or mental bricks and mortar.

What’s your background?

I am a former lieutenant colonel in the Swedish Army. I was detailed to the Swedish Military intelligence Agency where I founded the Open Source Intelligence function in 1993.

After leaving the government, I became the Scandinavian Knowledge Manager for McKinsey & Company. After several years at McKinsey, I started my own company. Infosphere and the service Able2Act.com.

I am also a former musician in a group called Camping with Penguins. You know that I am a lover of dogs. Too bad you like boxers. You need to get a couple of my friends so you have a real dog. I’m just joking.

I know. I know. What are the needs that traditional search engines like Autonomy, Endeca, and Fast Search (now Microsoft) are not meeting?

Meaning and context and I would also say that traditional engines requires that you always know what to search for. You need to be an expert in your field to make to fully take advantage of the information in databases and unstructured text. With the Silobreaker technology the novice becomes an expert and the expert becomes a discoverer, It might sound like a sales pitch, but its true. Every day in my daily work I have need to jump into new areas, new industries and topics. There is no way that I can formulate keyword search nor have the time to digest 100 or a 1,000 articles that works in the mode of click and read, click and read. With Silobreaker and its technology I start very broadly and the system directly helps me to understand the context of large set of articles in different formats, from different repositories, from different topics. We call this a View 360 with an In Focus summary. Note: here’s an InFocus example provided to me after the interview.

When I search in traditional systems based on the search/ read philosophy, I spend to much time searching and reading and too little time of sense making and analysis. With Silobreaker, I directly start with that process and I create new value, for me and for my clients.

In a conversation with one of the Big Brands in enterprise search, the senior VP told me that services producing answers are “just interfaces”. Do you agree?

“Just interfaces” might be a bit harsh on the companies that actually try to provide direct answers to searches – they actually have some impressive algorithms, but to a certain extent we agree. We simply don’t think that an “answer engine” solves any real information overload problem.

If you want to know “What’s the population of Nigeria” – fine, but Wikipedia solves that problem as well. But how do you “answer” the question “What’s up with iPhone”? There are many opinions, facts, news items, and blogs “out there”. Trying to provide an “answer” to any “question” is very hard to do, maybe futile.

We always emphasize the importance of using our technology for decision-support, not to expect the system to perform the decision-making for you. The problem today is that analysts and decision-makers spend most of their time searching and far too little time learning from and analyzing the information at hand. Our technology moves the user closer to the many possible “answers” by doing much of the searching and groundwork for them and freeing up time for analysis and qualified decision-making. Note: This is a 360 degree view of news from Silobreaker provided after the interview.

There’s significant dissatisfaction among users of traditional key word search systems. What’s at the root of this annoyance?

The more information that is generated, duplicated, recycled, edited and abstracted and in combination with the rapid proliferation of “ I never use a spell checker and I write in your language with my own set of grammar”—– the need for smarter system to actually find what you are looking for will increase. In a couple of years from now, we also see the demise of the mouse and keyboard and the emergence of other means of input, the keyword approach is not just it.

Keyword based search works reasonably well for some purposes (like finding your nearest Swedish herring restaurant), but as soon as you take a slightly more analytical approach it becomes very blunt as a tool.

There is no real discovery or monitoring aspect to keyword based search. The paradox is that you’ll need to know what you’re looking for in order to discover.

Matching keywords to documents doesn’t bring any meaning to the content nor does it put the content in context for the user.

Keyword based search is a bottom-ups approach to relevance. The burden is put entirely on the user to dissect large results in order to find relevant articles, who the key players are, how they relate to each other, and other factors.

This burden creates the annoyance and “research fatigue” and as a result users rarely go beyond the first page of results – hence the desperate hunt amongst providers for PageRank, but which may have no or little bearing on the users real needs.

The intelligence agencies in many ways are the true professionals in content analysis. Why have the systems funded by IN-Q-TEL, Interpol, and MI5/MI6 not caught on in the enterprise world?

These systems are often complex and their “end solutions” are often mix of different software that is not well integrated. We already see a change with our technology. Some government customers look at our free service at Silobreaker.com and have a chance to explore how Silobreaker works without sales people hovering over them.

We want our clients to see one technology with its pieces smoothly integrated. We want the clients to experience information access that, we believe, is far beyond our competitors’ capabilities.

Intelligence agencies have often acquired systems that are too complex and too expensive for commercial enterprises. Some of these systems have been digital Potemkins. These systems provide the user with no proof about why a certain result was generated.

Now, this “black box” approach might be okay when you have a controlled content set, like on the classified side within the intelligence community. But the “real world” needs to makes sense of unstructured information here and now.

You have plus 100,000 major companies in the world, and you have 200 or so countries. Basically the need for technology solutions is the same. For me it’s totally absurd that the government complicates their systems instead of looking at what is working here and now.

Furthermore I think one the reasons that government can do complex and sometimes fruitless projects is that some agencies don’t have to make money to survive. The taxpayers will solve that.

In the commercial sector–profit and time are essential. Another factor that corporations take into account when investing in a system such factors as ease of use.

With the usually high turnover in any industry, a system must be easy to use in order to reduce training time and training costs. In some government sectors, turnover is much lower. People can spend a great deal of time learning how to use the systems. Does this match your experience?

Yes, and I agree with your analysis. I had an email exchange with the chief technical officer of a major enterprise search vendor. He asserted that social search was the next big thing. When I pointed out that social search worked when the system ingested a large amount of information, much available covertly, he argued that general social information was “good enough”? Do you agree?

No I don’t. Now we are talking about quality of the information. If you would index and cross reference XING, Facebook, Linkedin you could display fantastic displays of the connections….. However, how many of this links between people are actually true (in the sense that they actually have met or even have some common ground)?

There is a very large set of people that try to get as many connections as possible, thus diluting the value of true connections. I agree that you need a significant amount of information in order to get a baseline. You also need to validate this kind of data with reality checks in other kind of information sources – offline and online.

My main company, Infosphere, did some research into the financial networks in the Middle East, the fact-based search (ownership, shareholdings, etc) provided one picture, then you have to add the family and social connections, the view from media, then look at resident clusters and other factors. We had more than 8,000 dots ( people ) that we connected. But we were just scratching on the surface.

The graphic displays in Silobreaker are quite useful. In a general description, what are you doing to create on the fly different types of information displays?

The whole philosophy behind Silobreaker is to move away from the traditional keyword based search query which generates just page after page of headline results and forces the user into a loop of continually adjusting the query to find relevance and context.

We see the keyword-based query as a possible entry point, but the graphical search results enable the user to discover, navigate and drill down further without having to type in new keywords. No-one can imagine managing numerical data without the use of descriptive graphical representations, so why do we believe that we can handle vast quantities of textual data in any other way. Well we don’t think we can, and traditional search is proving the point emphatically. Today’s Silobreaker is just giving you a first glimpse of how we (and I’m sure others) will use graphics to bring meaning to search results.

Is Silobreaker available for on premises and for SaaS (software as a service”? What do you see as the future access and use case for Silobreaker?

That’s a good question. Let me say that Silobreaker’s business model is divided into three parts.

First, we have a free news search service that eventually will be add-supported but whose equally important role is to show-case the Silobreaker technology and function as a lead generator for the enterprise offerings.

Second, Our Enterprise Service which is due to be released in September or October 2008 is an online, real-time “clipping service” aimed at companies, banks, consultants as well as government agencies and that will offer a one-stop shop for news and media monitoring from defining what you are monitoring to in-depth content aggregation, analysis and report generation. This service will come with a SaaS facility that enables the enterprise to upload its own content and use the Silobreaker technology to view and analyze it.

Third, we offer a Technology Licensing option. This could range from a license to embed Silobreaker widgets in your own site to a fully operational local Silobreaker installation behind your firewall and customized for your purposes and for your content.

Furthermore, parts of the Silobreaker technology are available as SaaS on request.

Let’s talk about content. Most search systems assume the licensee has content. Is this your approach?

Yes and no, we can facilitate access to some content and also integrate crawling with third-party suppliers or if its very specific assist with specialty crawling.

On top of that we can, of course, integrate the fact sheets, profiles, and other content from my other venture, Able2Act.com which gives any system and any content set some contextual stability.

What are the content options that your team offers? Is it possible to merge proprietary content and the public content from the sources you have mentioned?

Yes, the ideal blend is internal and external content. And that really sets our team apart. Most of the Silobreaker group works with information as the key focus on a daily basis, sometimes 24×7 on certain projects. In other words, we are end users that keeps our ear to ground for information. Most companies out there are either tech people or content aggregators that just sell. We are both.

When you look forward, what is the importance of mobile search? Does Silobreaker have a mobile interface?

Mobile “search” is an extremely important field where traditional keyword-based search just doesn’t cut it. The small screen size of mobile devices, and limited (and sometimes cumbersome) input capabilities is just not suitable for sifting through pages of search results just to find that you need another Boolean operator and have to start all over again. We believe that users must be given a much broader 360 view of what they’re searching for in order to get to the “nugget” information faster. Silobreaker does not currently offer a mobile interface, but needless to say we’re working on it.

What are the major trends that you see emerging in the next nine to 12 months in content processing?

That’s a difficult question. I can identify several areas that seem important to my clients: Contextual processing, cross media integration, side-by-side translations, and smart visualization. Note: I have inserted a Silobreaker link view screen shot Mr. Bjore provided me after our conversation.

Observations

Silobreaker caught my attention when I saw a demonstration of the system before it was publicly available. The system has become more useful to intelligence professionals with each enhancement to the system. Compared to laundry-lists of results, the Silobreaker approach allows a person working in a time-compressed environment to size up, identify, and obtain the information needed. The system’s “smart software” shows that Silobreaker’s learning and unlearning function is part of the next generation of information tools. After accessing information with Silobreaker, I am reminded that key word search is a medieval approach to 21st century problems. Silobreaker’s ability to assist a decision maker makes it clear that technology, properly applied, becomes a force multiplier without pushing human judgment to the sidelines. In one of our conversations, Mr. Bjore drew a parallel between Silobreaker and the canines for which he and I share respect and affection. He said, “Silobreaker works like one of our dogs. Their eyes see what is in front of you, the ears hears the tone of voice, the nose smells what has happened, what is now and what’s around the corner.” I agree. Silobreaker is more than search; it’s an extension of the information envelope. Take a close look at this extraordinarily good system here.

Stephen E. Arnold, June 12, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Interview, Online (general), Search, Semantic, Text processing | 1 Comment

Prepping for Google’s Udi Manber Keynote: Another Datawocky Scoop

June 11, 2008

At the Gilbane Conference in San Francisco next week, the keynote speaker is Udi Manber, top wizard of search at Google. I think this post by Anand Rajaraman in Datawocky is not serendipitous. Once again, Googlers find that Datawocky is a trusted conduit of first class information. My view is that if you’re smart enough to know about Datawocky, then you have earned the right to get some juicy Googlebits before the rest of the world. Mr. Rajaraman delivers I believe a sneak preview of a Google revelation. Mr. Rajaraman, who is best buds with a number of Googlers, spoke with Peter Norvig, who is on leave in order to update his book about artificial intelligence. Dr. Norvig struck me as awfully chatty and well informed for someone on leave and writing a book. But we are in the midst of 40 days and 40 nights of information raining from Google’s senior managers. These folks instant message, MOMA post, and email 24×7, so it’s not surprising that high-powered Googlers are reading from the same page.

Mr. Rajaraman’s “How Google Measures Search Quality” is a very important essay, and I think it is a prime beef summary of what Mr. Manber will touch upon in his keynote at the Gilbane Conference on June 18th. Click here and read Mr. Rajaraman’s post. This link will also provide you with a link to the first part of Mr. Ramanathan’s write up about search quality and artificial intelligence at Google.

The key point in the Datawocky essay for me is this statement:

Google does not use such real usage data to tune their search ranking algorithm. What they really use is a blast from the past. They employ armies of “raters” who rate search results for randomly selected “panels” of queries using different ranking algorithms. These manual ratings form the gold-standard against which ranking algorithms are measured — and eventually released into service.

Google uses humans!

Let’s think about this. Google has legions of wizards. Google has a heck of a super computer. Google has more fancy math than a dozen universities. Yet Google relies on humans. I don’t know much about Google, but I know what human indexers can do; for example:

Make judgments that algorithms at this time cannot match. Most companies have chopped humans out of the indexing and search analysis loop. Google is putting them in. Big news for me.
Google’s use of artificial intelligence is useful but it may be paying off in areas other than search and advertising. Where does the AI deliver a hefty payload? That’s a question that warrants investigation.
Ask, Microsoft, Yahoo and the other Web search engines are falling behind the GOOG in market share. If humans are Google’s secret sauce, the razzle dazzle technology from these three companies will not be able to close the gap. There and other competitors will have to have technology and the money to hire expensive, inefficient humans to make the search results better for actual users. Can these three companies invest in humans? I don’t know. But it’s clear that none of these three is able to slow down Googzilla on its march to search dominance.

Kudos to Mr. Rajaraman for getting another Google scoop. Now I won’t have to attend Mr. Manber’s lecture. I think I know what he’ll be saying. Google has a tendency to create talks and then have its top dogs “run the game plan.”

A happy quack to Techmeme.com for the link to Datawocky. I owe you one.

Stephen Arnold, June 12, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search | 3 Comments

Google Info Floweth Like Water

June 11, 2008

Super writer Ken Auletta bagged a Googzilla for an interview. Eric Schmidt succumbed to the allure of New Yorker Magazine, a conference organized by a number of old media leviathans. The write up I liked best was Dan Farber’s piece for CNet. You can read “Eric Schmidt in Conversation with Ken Auletta” here. Mr. Farber does a very good job of summarizing the conversation.

Two points fairly jumped from the page into my Kentucky brain cells. Let me highlight these and then step back to offer some observations about the information that is now rushing from the GOOG’s senior managers like water down the streets of Columbus, Indiana.

First, Mr. Farber captures a key thought when he reports that Mr. Schmidt says, “The most impressive products are those that use artificial intelligence…” I don’t have the context, but to me this is a hugely significant point. It gains more oomph coming from Mr. Schmidt.

Second, Mr. Farber picks up another point that others covering the event missed; to wit, Mr. Schmidt’s thought:

What is really important about technology is you have the opportunity to redefine the game over and over…and the winner redefines the game.

Please, read Mr. Farber’s summary of the interview and check out other write ups from The Technology Chronicles and Forbes, among others. By the time you see this, there will be dozens of views of what Mr. Schmidt said.

Let me wrap up with these observations:

This artificial intelligence, computational intelligence, and smart software is, based on my research, one of the core competencies at Google. The fact that Mr. Schmidt mentions artificial intelligence lit up my radar. I think there will be some interesting new services and features coming very soon from the GOOG.
The notion of changing the game is the Google strategy. The idea that Google is concerned with search and advertising is an older model for Google. Today’s Google wants to change the rules in enterprise applications, back office services, cloud services, and several business sectors. Telecommunications is just one sector that’s been caught in Google’s rule changing snare.

If you want more detail about these two points, you can find more in depth information in my two Google studies. Both are available from Infonortics.com here.

Stephen Arnold, June 11, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Google, News | Comments Off on Google Info Floweth Like Water

Autonomy: Adjusting to a Post-Google World

June 11, 2008

Autonomy may be facing some financial challenges. This may be old news to you, but it was news to me. The May 13, 2008, report from Cazenove offered me a different view of the search business in general and search giant Autonomy in particular. Autonomy’s next financial results will be due on July 30, 2008, and those data and the Cazenove report may influence perceptions of one of the leading software companies in Europe and a leader in search, text processing, and information access systems.

For several years, I worked as a resource for one of the largest investment banks in the United States. I woke up one morning, and the bank was out of business, absorbed into an even larger bank. I have to admit that some of the fancy dancing in the global financial ballroom baffles me.

An acquaintance whom I met at an international conference sent me a report about the search giant Autonomy. The firm issuing the report–Cazenove–is one with which I was not too familiar. Furthermore, these “reports” are written to inform Harvard and INSEAD MBA masters of the universe about the inner workings of publicly-traded companies. To top it off, the reports are written in weird financial-haiku speak and stuffed with financial arcana. Even though I have provided data to banks working on such flights of fancy with folks afflicted with spreadsheet fever, I usually don’t read these documents. (Between you and me, I don’t think anyone else does either. The reports can hard to get, but if you have enough money to use Investext or a warm and fuzzy relationship with a big-name brokerage house’s institutional investors, you can get these reports.)

Imagine my surprise when I received an email with the question, “What do you think?” and a copy of the Cazenove, May 13, 2008, report about Autonomy, arguably the number one or the number two vendor of search and retrieval systems. (Fortunately I saved the PDF of the report because my computer crashed, and I lost my current email. Grrrr.)

Key Points for Me

Two points in the Cazenove write up struck me as relevant to the the information access sector which I make a lame attempt to monitor. Keep in mind that I am summarizing the Cazenove report, not offering my analysis of Autonomy.

First, the consolidation that seems to be a characteristic in the search and content processing space has left Autonomy sitting on the sidelines. The Cazenove report identifies Oracle as a likely buyer of Autonomy, but the deal would cost Oracle too much even in today’s wacky financial market. A company like Oracle would have to find a way to pump up Autonomy revenues to make the deal pay off. As I read the Cazenove report, that’s unlikely. Autonomy is having to dog paddle furiously to keep its head above water, but remember: this is my untutored view of the Cazenove MBA haiku writing.

Second, the Cazenove report sniffs into Autonomy’s use of acquisitions to create new revenue opportunities. On the surface, Autonomy has been ahead of many other search companies in identifying hot sectors and moving into them. I’ve stated a number of times that Autonomy has the best instinct for the future of search and how to leverage that instinct in its marketing at this time. Google’s marketing is a go cart to Autonomy’s F-1 race car. However, the analyst report suggests that Autonomy may have a tough time making its acquisitions pay off in a big, big way.

The net net (yes, that’s MBA speak that means conclusion) for me is that organic revenue is not too exciting, and, revenue from Autonomy’s acquisitions isn’t providing the hoped for lift (that’s MBA speak for payoff).

Cazenove rated the stock as “under perform neutral”, which I think is the equivalent of taking a step back, maybe looking for another investment opportunity altogether until Autonomy submits its next financial report.

Observations

But I don’t really care too much about any individual search and text processing company. My interest is broader, more along the lines of “What’s this mean?” to the industry. What the report triggered in my mind was questions to which I don’t have an answer:

If organic growth is slowing for a well-known company like Autonomy is it worth reassessing the impact of open source search solutions, the price pressure applied by hungry competitors, and the search toaster approach from Google? These perturbations in traditio0nal search may be having a real and lasting impact. That’s a hypothesis worth investigating. Maybe the commoditization of search will further destabilize an already volatile sector?
Should organic growth in search license revenue slow, will the large superplatforms like IBM, Microsoft, and Oracle accelerate their bundling of search and content processing into higher value enterprise applications? In effect, will these companies stop selling search and offer search and content processing as a tool, utility, or standard function? This begs another question: “Who buys a separate, search platform when search is baked into the broader application framework?”
If acquisitions don’t generate top line growth quickly, then acquisition strategies will have to change. Maybe search vendors should buy a company in order to get out of search? In effect, the acquired company allows the search vendor to get out of the business of selling search and retrieval? This means buying other search vendors is not a good idea for a search vendor.

The answers to these questions reinforces my assertion that the traditional enterprise search sector is under considerable stress. If Autonomy can’t gain traction in “pure” search with its own technology and Verity’s, who can?

Google has morphed from its Google Search Appliance to a broader enterprise applications’ positioning. Sure, search is there; it’s just not the only weapon in the GOOG’s revenue arsenal. Fast Search is off the table, and for me it’s not a blue-chip player at this time. Fast Search could return to the field of play wearing a Microsoft jersey. As part of the Microsoft team, Fast Search may become a SharePoint gizmo. Endeca accepted cash infusions from Intel and SAP’s investment arm. I’m not sure what that signals.

Agree? Disagree? Use the comments section to push back or add additional information.

Stephen Arnold, June 11, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Google, Microsoft, News, Search, Text processing | 2 Comments

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Employment
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Search Rumor Round Up, Summer 2008

Microsoft BIOIT: Opportunities for Text Mining Vendors

Goo Hoo: The Fox Is in the Hen House

Goo Jit Su: Google’s Art of Soft Force in Competitive Fights

Hakia: Pulled by Medical Information Magnetism

Silobreaker: Breaking thorough Information Access Silos

Silobreaker: Sophisticated Intelligence

Prepping for Google’s Udi Manber Keynote: Another Datawocky Scoop

Google Info Floweth Like Water

Autonomy: Adjusting to a Post-Google World

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta