The Future of Search Layer Cake
August 14, 2008
Yesterday I contributed a short essay about the future of search. I thought I was being realistic for the readers of AltSearchEngines.com, a darn good Web log in my opinion. I wanted to be more frisky than the contributions from SearchEngineLand.com and Hakia.com too. I’m not an academic, and I’m not in the search engine business. I do competitive technical analysis for a living. Search is a side interest, and prior to my writing the Enterprise Search Report, no one had taken a comprehensive look at a couple dozen of the major vendors. I now have profiles on 52 companies, and I’m adding a new one in the next few days. I don’t pay much attention to the university information retrieval community because I’m not smart enough to figure out the equations any more.
From the number of positive and negative responses that have flowed to me, I know I wasn’t clear about my focus on behind the firewall search and Google’s enterprise activities. This short post is designed to put my “layer cake” image into context. If you want to read the original essay on AltSearchEngines.com, click here. To refresh your memory, here’s the diagram, which in one form or another I have been using in my lectures for more than a decade. I’m a lousy teacher, and I make mistakes. But I have a wealth of hands on experience, and I have the research under my belt from creating and maintaining the 52 profiles of companies that are engaged in commercial search, content processing, and text analytics.
I’ve been through many search revolutions, and this diagram explains how I perceive those innovations. Furthermore, the diagram makes clear a point that many people do not fully understand until the bills come in the mail. Over time search gets more expensive. A lot more expensive. The reason is that each “layer” is not necessarily a system from a single vendor. The layers show that an organization rarely rips and replaces existing search technology. So, no matter how lousy a system, there will be two or three or maybe a thousand people who love the old system. But there may be one person or 10,000 who want different functionality. The easy path for most organizations is to buy another search solution or buy an “add in” or “add on” that in theory brings the old system closer to the needs of new users or different business needs.
Now here’s the problem.
The baseline system in one form or another retains the same basic string matching capability through time in an organization. Even if you start with a system based on trigrams like Brainware, it also does some string matching. The basic blocking and tackling is going to be with us for the foreseeable future. Typically this is key word search or I could have labeled the pink bar running along the entire length of the x axes as Structured Query Language or some other explicit method for extracting data from a file, file system, or some other construct.
Here’s an example of how the layers come into being, probably in your own organization or on your own laptop.
The marketing department information team with their bright smiles and entitlement training want to process news stories about the company and its competitors. This is a legitimate need, and the organization wide search system doesn’t process Internet content and it doesn’t cluster exactly the way the chipper marketing folks want information clustered.
Taking matters into its own hands, the marketing team licenses a new, separate system. Maybe marketing signs up for InfoDesk or buys a Google Search Appliance. It doesn’t matter whose for the purposes of this illustration. So, the baseline system remains in operation and the company now has another system. This adds costs and complexity. Periodically someone will try to integrate the two systems which adds more costs. Let’s say the IT department hooks the classification system to the key word search system.
Then someone in the legal department ends up in a legal matter with terabytes of material from the discovery process. This group wants to keep its information out of the general system and doesn’t want anyone in marketing to know that the big law suit could change the company’s financial fortunes significantly. What’s the firm’s legal eagle do? She licenses another system.
The layer cake then in its simplest form says, “Licensees of one search engine are promiscuous.” It doesn’t take much time with an MBA to figure out that “search” is getting expensive, duplicative, complex, and a security challenge.
But wait. Some vendors are savvy to this penchant of customers for buying more search systems and layering them on one another or collecting them the way my mother used to round up figurines of girls with baskets of flowers. These aggressive vendors license or invent components and stuff them into their key word or other content processing system. No vendor wants to “leave money on the table.”
Over time, the search systems become “platforms”, which I refer to as snowballs of functions. Like a snowball, these agglomerations are okay under certain conditions. When the sun comes out, some of these systems melt down.
For example, I have an example of a company good at clustering that decided to morph into a platform. The engineers fitted key word, collaborative, and routing functionality. When a customers wants a new feature, the vendor can pitch a single component of the larger platform, hoping to migrate more of its services into that organization. Alternatively when a prospect wants a “platform”, the vendor can jump in and compete with IBM an others in the “platform” business.
You know as well as I do that both of these forces are operating in many organizations. I usually have three or four people working in my lab at any one time. We have a dozen or more search systems. Sure, that’s unusual in Harrod’s Creek, but what about your own organization? You probably have whatever system IT has deployed. Maybe it’s SharePoint? Maybe it’s a “free” system included with your WebSphere system. You have a search system on your PC. There’s a search system in the database system in your office. When you do an inventory, you will find that you have many search systems. Most of these are just okay.
Now imagine the time and technical demands for each of these systems in a year. Push that analysis into a government agency and you will find that a typical search cost can expand by a factor of 15X over a seven year period. I know because I had to gather those data for a project. In organizations, the same problem exists.
So, look at my layer cake. It’s tempting for both organizations and vendors to move into the pastry shop. Now think about the costs that most people don’t bother to track or can’t track because the accounting system is not set up to relate direct and indirect costs of search over time. Cost obesity works just like a person who eats donuts for three meals a day. Let me repeat: cost obesity. Ugly and unhealthy.
I not optimistic about the future of search because customers and vendors have excavated a big hole in their credibility. I report what is; I no longer “invent” search systems. I got out of that business with our miraculous sale of The Point (Top 5% of the Internet) to Lycos a decade ago. Now, how can the next “big thing” from three 25-year-olds at a big name university reverse 40 years of delivering systems that make 50 to 75 percent of its users less than thrilled? Let me know if you have some facts for me. I only know the data I have gathered from my work with search and retrieval. I could be really wrong here as could the others who have come forth this year with similar data about user dissatisfaction with regard to systems in use for information retrieval in an organization.
If you want more information about the problems in enterprise search and I suppose you can expand my argument to Web search to some degree, you may want to browse these studies I have written:
- Enterprise Search Report, 3rd edition, 2006. Check out the three cost scenarios and the discussion of pitfalls. I’m not sure the 3rd edition is still available but you can check with the publisher here.
- Beyond Search, first edition, April 2008 here.
- The Google Legacy and Google Version 2.0 here. My analysis explains why Google has licensed more than 22,000 Google Search Appliances since the product became available four or five years ago.
- The archive on my Web site here
- The personal observations in this Web log.
Agree? Disagree? Let me know.
Stephen Arnold, August 14, 2008
Comments
3 Responses to “The Future of Search Layer Cake”
[…] ScottGu Yesterday I contributed a short essay about the future of search. I thought I was being realistic for the readers of AltSearchEngines.com, a darn good Web log in my opinion. I wanted to be more frisky than the contributions from SearchEngineLand.com and Hakia.com too. I’m not an academic, and I’m … […]
My gut reaction reminds me of the Freedom of Information Act, which seemed
fine when it started but started to stink when people realized how it got to be used to hide information. But was the original concept ok? I think so. Taking this to search, I think what has happened parallel to the technical side is that corporations, governments and much of everyone else has learned how to game the system – a trinkle down democratizing function. What I want to know is if the disgruntled 50-75% are upset they can’t un-game their game when they need to or because they can’t game it the way they wanted to in the first place? I suggest seeing this process thru crooked SEO eyes and shifting focus somehow to cases like those that brought the original web search as an example of search that reinvented the public domain. Did this work until the CIO realized they could find more with it than inhouse? My feeling is that the gaming has stretched way beyond the enterprise and back around open web search and that the challenge today is to see if there is anything left to work with. Just a few thoughts…
Sperky,
Thanks for offering your view point. I must admit I don’t think too much about Web search. That “public content” is easy to manipulate. Spiders gobble what’s there and intelligent routines to find disinformation are not yet good enough to deliver a clean intelligence product. Post processing is required. Most of my remarks are, therefore, narrowed to the behind-the-firewall space. I know I don’t make this explicit in my public addled goose observations. I will try to include that positioning in the future.,
Stephen Arnold, August 15, 2008