Search 2009: The Arnold Boye08 Lecture
November 11, 2008
What began as a routine speech became a more definitive statement of my views about enterprise search in 2009. I delivered a lecture on this topic to a standing room only crowd in Aarhus, Denmark, at the JBoye 08 conference. The conference organizer asked me to provide a version of my talk for the conference attendees who were unable to attend my lecture. I have now posted the full text of my remarks on the ArnoldIT.com Web site. You can read the PDF of this lecture here.
Let me highlight several of the features of this talk, which concatenated remarks I have made about the future of search over the last 90 days:
- I identify the major trends that I am watching in the enterprise search “space”. I don’t dig into social search and some of the more trendy topics. I identify what will keep people using a system and those responsible for search and content processing in their jobs.
- I highlight a small number of companies that I think are going to be important in 2009. I mention five companies, but I have a much longer list of promising players. These five are examples of what is going to drive search success going forward.
- I spell out some meta challenges that vendors and licensees face. To give one example of what’s in this short list, think SharePoint. With 100 million licensees, SharePoint is likely to have as significant an impact on enterprise information access as Google. But there is a dark side to SharePoint, and I mention it in this report.
I have one request. Feel free to use the information for your personal learning. If you are engaged in teaching, you may reproduce the document and invite your students to critique my ideas. If you are a consultant shopping for a phrase or idea to borrow, that’s okay. Just point back to my original document. I see many “beyonds” now. Beyond Google, Beyond Business Intelligence, and so on. I expect that “just there” search will experience similar diffusion. Of course, if you just pirate my phrasing, I think the addled goose will point out this activity. Geese can lay golden eggs; geese can spoil an automobile’s finish as well.
As always, I have had to cut material from this write up. You may point out my errors, omissions, and shortcomings in the comments section to this Web log. Keep in mind that this Web log is free, and it is an easy way for me to keep track of my ideas and lectures.
Stephen Arnold, November 12, 2008
Interview with Martin White, Intranet Focus Ltd.
November 11, 2008
Martin White, co author of Successful Enterprise Search Management (Galatea, November 28, 2008), spoke with Stephen Arnold (his co-author) about the new Galatea study about search management. The interview touches upon the challenges that organizations face with information access, including search. The new study, which becomes available on November 28, 2008, tackles subjects that have not been discussed in terms of management, return on investment, and problem solving. Mr. White said:
Very rarely is poor search a result solely of poor technology. It is all about effective management of the entire search procurement, installation and implementation processes.
On the subject of business intelligence–what some pundits are calling “smart search” or “active intelligence”–Mr. White said:
Look at all those financial institutions with their BI applications. Did it stop them making a fool of themselves and us over sub-prime loans. BI is only as good as the way in which the correlations are set up and usually that is poorly. To me the new search is when search is all but invisible – embedded in a workflow process.
You can read the full text of this interview, conducted on November 10, 2008 here.
About the Study
The scope of the new study Successful Enterprise Search Management is unusual. Most studies of search are little more than profiles of vendors. After more than one year of work, Mr. White’s and his co-author’s approach is to approach the management aspect of search, information access and content processing by putting a conceptual foundation in place, reviewing the technology of search, discussing the vendor selection process, exploring the implementation stage, including pre launch testing of the system, and a series of suggestions called “action this day.”
The book includes case studies, references to specific vendors’ systems, and practical guidance from Mr. White and his co-author. Specific topics addressed include text mining and advanced content processing, information governance, and the challenges language itself presents. The book consists of five major sections and 17 chapters. The book is illustrated with screenshots from referenced systems and diagrams that highlight the management tasks addressed.
You can learn more about the book and place a pre-publication order on the Galatea Web site here.
Google: Compensation Deflation
November 11, 2008
Valley Wag published an interesting article called “One Third of Googlers Have Underwater Options.” You can read the full text here. “Underwater” is bank jargon for owning something that’s worth less than one planned. You buy a house for $1.0 million. The market price for the house is $500,000. You are underwater. Googlers by the thousands have stock options worth less than the Googlers hoped. In my opinion, the most interesting comment in the article was this comment:
A surprising majority of GOOG employees I know don’t really like their jobs.
If true, Google’s compensation deflation could lead to the loss of some key talent. With Yahoo struggling, who would benefit? In my limited view of the world, I think the defection of Googlers will pump up some promising start ups. I also think that some Googlers may decide to ply their trade for state owned research programs. A single company may not be able to replicate Google, but a country might.
Stephen Arnold, November 11, 2008
Sun Microsoft: Tit for Tat
November 11, 2008
When Google pulled Open Office from its Google Pack, I wondered how long it would take for more information about a growing rift between Sun Microsoft Systems and Google to surface. On November 10, 2008, ZDNet UK published “Microsoft, Sun Agree Web Search Deal.” You can read the story here. Google at one time had quite a few employees who had work experience at Sun. Eric Schmidt, Google’s top Googler, the chief technical officer of Sun before he left to head up Novell. The interesting twist in the ZDNet story from Reuters is that when user downloads Java, Microsoft’s toolbar comes along. Not long ago, Microsoft and Sun were exchanging legal hand grenades about Microsoft’s implementation of Java. In my opinion, the disruptive power of Google is becoming more evident with this deal. With Java on about 800 million PCs, Microsoft should get an immediate foot print boost. Its share of the Web search market should rise, and I think the effect will be visible as soon as the January 2009 data become available. What will Google do? In my opinion, I think Google sees Sun as a company that has past its prime. With rumors of Sun Microsystems being an acquisition target, Google may be correct. Microsoft benefits from this deal with users and the public relations coup it delivers to Googzilla’s nose.
Stephen Arnold, November 11, 2008
Yahoo and Mobile Search
November 11, 2008
I have fooled around with speech to text for years. When I get a mobile device that wants me to talk to it, I disable the function. The reason is that I make calls from noisy places. The ambient noise combined with my goose honks baffles the speech to text software. A reader sent me a link to Stephen Shankland’s summary of his experience with Yahoo’s mobile voice search. You can read the full text of his story here. Mr. Shankland does a good job of summarizing what’s good and bad with the Yahoo system. In my opinion, the most interesting comments in the write up were not the assessment of Yahoo’s mobile search. I noted:
- Yahoo licensed technology from Vlingo. Google’s Sergey Brin has a voice search patent (US7027987) in this field which underscores the difference between the two competitors’ approaches. Google, according to my recollection, also licenses some technology, but one of the big guys has his hand in this field at Google.
- Mr. Shankland’s view is that mobile search is immature. I have been using mobile phones and searching for information from the day I got my first Motorola that could connect. Mobile search has worked; the problem is the form factor and the ambient noise problem that plagues me.
- Yahoo calls the service “OneSearch with Voice”. I have a tough time keeping these names straight. For me this is Yahoo Mobile Search.
Mr. Shankland also includes a chunk of useful market data in his article. Will Yahoo surge to the top of the mobile search market? We will have to wait and see. Yahoo has been struggling of late.
Stephen Arnold, November 11, 2008
SemantiFind: Semantic Plumbing Exposed
November 11, 2008
Two readers sent me links to SemantiFind, a company that offers a Web service for semantic ontology and search of the Internet. I am on record as suggesting that semantic technology has an important role to play, but behind the scenes. Most users can reap rich information access awards when semantic and other advanced technology work as plumbing. You use the system by registering and installing a browser plug in. You then navigate to Google.com, Live.com, or Yahoo.com and run your query. SemantiFind converts your default query into a list of suggestions. You select the word that best matches your intended query. A useful page can be flagged and its content used to in formulated future queries. SemantiFind provides a community and a system that may be useful to users who have difficulty thinking of words to perform query expansion or query narrowing. Our test queries returned acceptable results. Check it out. Use it if you find it helpful. More information about the company is available here.
Stephen Arnold, November 11, 2008
Micro Mart’s Surprising Web Search Findings: Google Is an Also Ran
November 11, 2008
The trusty newsreader served up a link to a three-part article by Peter Hayes. He wrote a feature “The Secret Life of Search Engines” for Micromart.com. I have conflicting date information for this article. It may have been written yesterday or a year ago.
You can find the first part here. The second part here. And the third part here. The Micromart.com site search engine leaves a bit to be desired because its index does not contain a pointer to the first part of this article. Sigh. My own tools ferreted out the three parts, and I think you will find Mr. Hayes’ analysis surprising. The key point for me is that when a journalist runs benchmark queries across search systems, the gulf between those who understand what readers find interesting and those who build search engines becomes evident. In fact, if Mr. Hayes’ analysis were used as the definitive guide for finding information on the public Web, there would be considerable consternation at a number of high profile firms and cause for joy among a group of search engines that are going nowhere in terms of usage. I want to consider this point at the end of my Beyond Search post. Let’s look at the key points in each of the three parts of this analysis, shall we?
Part One: Outline Politics
Straight off let me say I don’t know what ‘outline politics means. I don’t think it matters much beyond privacy and the ambivalent nature of an index’s utility. I did not get the impression that the phrase is particularly significant in the flow of his argument. The series begins with the notion that you can make money offering a product people use everyday. The idea is flawless when it comes to a fungible product, but I am not sure it applies to the somewhat more slippery world of information. Nevertheless, the point is that traffic is good. Furthermore, the Internet is changing. Content is tricky. Mr. Hayes introduces the notion of official content and unofficial content. That’s a useful distinction, but it did not resonate with me. Mr. Hayes then asserts that search engines have, and I quote:
two major functions. One is to teach, the other is to search. While both have a large positive side we shouldn’t pretend that there isn’t a downside to any tool. Any tool used for good can also be used for bad.
He is now in full stride and hitting a hot button almost guaranteed to whip up interest among European Web uses–privacy. He then heads for the end of Part One with this comment:
My final thought is that search engines are only passengers on the Internet train and not the train itself. The growth of the Internet gives them the prospect of a healthy and prosperous future – but at the same time it is reliant on the safekeeping and update of the Internet to keep up with demand and to protect it from vandals. As our newspaper headlines tell us, the world is not totally a safe and law abiding place.
I must admit that I am not quite sure of the logic of this first section, but let’s move on to Part Two.
Part Two: Tools
Mr. Hayes dives in with location searching and touches upon Boolean logic, promising to tackle this topic elsewhere in his series. His first injunction is to keep a search simple. Web indexes are divided into systems dependent on software and systems dependent on humans. Mr. Hayes does not provide a context for the disparity in usage between these two types of systems, a distinction that will return to haunt him in Part Three of his series. He points out that search systems are not “born equal”. The promised analysis of Boolean arrives and I learn:
Boolean (which consists of the three words AND, OR, NOT, remember) is best explained by example. Some engines don’t allow it and some only use the NOT part. This follows the general rule that nothing to do with the Internet is ever totally straightforward! Typing NOT will take out examples that don’t fit the bill (‘Arsenal NOT soccer’, for example), but this is hard word to use and control. In Yahoo, double meanings are automatically divided out. Also the engine can easily come up with word connections that you would never think of in a million years – including simple names.
I think I understand even though Mr. Hayes’ own examples use symbols for AND, and he does not provide an example of a successful NOT search statement. NOT for Mr. Hayes is a “hard word to control”. I imagine that for him NOT may be troublesome. He points out that:
AND is the least useful of all because most of time, it is taken as read on all known engines that work via keywords. Type ‘Peter Hayes Writing Genius’ it will give the same result as ‘Peter+Hayes+Writing+Genius’ or ‘Peter AND Hayes AND Writing AND Genius’.
The statement confirms my suspicions that Mr. Hayes has taken a very different view of Boolean logic, its complexities, and the way in which logical operators work in his world. I quite like AND, NOT, OR, and even NAND in some systems. You too may find AND and NOT useful as well.
I am not certain what the sub section “Getting It Right” means. The resonance of AND and NOT inutility echoes in my mind. Part Two ends with an observation about how much of the Internet is indexed. That’s a good question, and I now turn to Part Three, where the intellectual rigor of Mr. Hayes meets the Information Superhighway, if I may indulge in a bit of metaphorical whimsy.
Part Three: The Best UK Web Search Engines
I knew I was in for a delightful few minutes after the first two parts of Mr. Hayes’ feature. In Part Three he lays out 10 test queries. I can’t reproduce the full list, but I can highlight two of his queries:
- Bring me the site of the best selling newspaper in the UK (The Sun)
- Find a local newspaper covering the Shetlands
I noted that each query is expressed as a string of text. Some vendors would rush to point out that Mr. Hayes is using natural language queries. Not many systems support natural language queries in particularly sophisticated ways. Some, for instance, create a Boolean query from whatever the user enters in the search box. Other systems consult a look up table of what’s been a satisfactory result for the query recently and delivers that result from its cache. Others dump stop words and go with the meaningful words with an simplicity AND or OR Boolean operator. Others look at what’s available from an advertiser and dumps those results directly to the user. Others predict what a user will prefer based on that user’s profile or the user’s usage history. This list is not exhaustive by any means.
What did Mr. Hayes learn from his analysis of the 10 queries sent to the UK sites for Lycos, AltaVista, Dogpile, Excite, HotBot, Metacrawler, MSN, Yahoo, Ask, and Google. I have converted Mr. Hayes’ findings into the summary table below. Keep in mind that these are his data in a slightly different form. These are not my or my team’s findings:
Rank | Engine | Hayes’ Take |
1 | Lycos | Answered questions well |
2 | AltaVista | Useful but obscure results |
3 | Dogpile | Surprised it didn’t do better |
4 | Excite | Respectable performer |
5 | HotBot | Good all round performer; Mr. Hayes’ favorite |
6 | Metacrawler | Biggest surprise of the lot |
7 | MSN | Slick and impressive performer |
8 | Yahoo | Handpicked and categorized results a plus |
9 | Ask | Plain English queries |
10 | Did not outperform the opposition |
Mr. Hayes includes “scores” for each engine. The top rated engine Lycos received a Hayes number of 83%; the lowest rated engine Google received a Hayes number of 78%.
Observations
I came away from my reading of this three part series in a semi stunned state. I had a number of major and minor quibbles gallivanting around my cranial cavity. Let me highlight three points and move on:
- This article made it clear to me that people don’t know what they don’t know about Web search, its technology, and its nuances. Google is probably correct in sticking with its very simple interface and its behind the scenes functions to answer most of the users’ questions with “good enough” information with its approach to results. If Mr. Hayes is an informed user of Web search systems, the fact that he finds the HotBot results more useful to him than other systems’ results, that’s well and good. The idea of using one system to conduct research of any type is an anathema to me. Overlap, freshness, scope of index–these are essential factors for each Web indexing system. Insensitivity to these issues makes me downright nervous. I thought, “If Mr. Hayes can’t figure out the important parts, what about a less informed online user?”
- The queries Mr. Hayes formulated reveal why natural language systems are not understood. Forget semantic methods. I am not sure how to remediate Mr. Hayes’ test queries. The approach is foreign to me as is Mr. Hayes’ failure to differentiate each of the test systems with more precision. There is a big difference between a system that is federating results, one that indexes only frequently accessed pages, and one that operates with orphaned code on a shoestring.
- The failure to point out that Google serves about 70 percent of the queries in North America and more in Denmark, Germany, and the UK is an oversight. The giant gets the lowest score, which doesn’t make sense to me. Mr. Hayes uses subjective criteria to generate his Hayes numbers and provides zero detail about the method used to calculate a score. I think the idea of scoring Lycos as a better search engine on freshness, features, relevance as measured by the number of on target hits in the first 10,000 results in a result set, and similar criteria will suggest that Lycos, AltaVista, and HotBot aren’t competitive in today’s market. Microsoft’s Live.com and Yahoo search are in some ways easier to benchmark against the Google. The other vendors are non starters in my mind because none has the technical nor financial resources to index at the Google, Microsoft Live.com, and Yahoo levels.
Mr. Hayes omitted a Web search engine that I think is better than eight or nine of those on this list; namely, Exalead. I am well pleased with the results I obtain from Exalead.com here. In general, the French make me nervous with the math skills and sense of style, but Exalead is the functional equivalent of Google, operated by Europeans, and a country mile better on my relevance tests than the orphans AltaVista, Excite, and HotBot.
Keep in mind I am stating my opinion. I am an addled goose. I am sure the experts who organize search conferences will be delighted to feature Mr. Hayes as a keynote speaker. The conference organizers and Mr. Hayes’ understanding of search may be well matched.
Stephen Arnold, November 11, 2008
Overflight Award for Excellence
November 10, 2008
ArnoldIT.com and J Boye created an award to recognize the best presentations at the Boye 08 Conference held in Aarhus, Denmark. The conference attracted more than 260 attendees and featured more than 40 speakers from around the world.
The winner of the Overflight Award for Excellence was Caroline Coetzee from Cambridge University Hospitals in the UK. Caroline did a very interesting and relevant talk on The business case game (or is a website really more important than a maternity unit?) which explained how to get senior management support in the first place. An honorable mention went to Niklas Sinander from EUMETSAT in Germany, who did a popular talk on Wiki from theory to practice. The winner and runner up received a Lucite trophy with the award logo. The winner received 500 euros.
Left to right, Stephen Arnold, ArnoldIT.com, Niklkas Sinander, EUMETSAT, runner up, Caroline Coetzee, Cambridge University Hospitals, and Janus Boye, JBoye.com.
Janus Boye and Stephen Arnold created the award to permit the community attending the conference to identify presentations that met the following criteria:
- Information that would be useful to delegates upon returning to work
- Research supporting the presentation
- Quality of the delivery and examples
- Importance of the speakers’ topics at the time of the conference.
A panel of distinguished attendees and information practitioners had the task of assessing the presentations and determining the winners. The judges were:
- Andrew Fix, Shell
- Volker Grünauer, Wienerberger
- Magnus Børnes Hellevik, The Norwegian Labour and Welfare Administration
- Ove Kristiansen, Region Syddanmark
- Pernilla Webber, Alfa Laval.
The Overflight Award will become a permanent feature of the conferences organized by Janus Boye. A happy quack to the winners and to the judges who made the selection for this award.
Stephen Arnold, November 10, 2008
Who Is the Microsoft Cloud Boss?
November 10, 2008
Eric Lai, writing in ITWorld.com, tries to answer the question, “Who Owns the Cloud Business inside Microsoft?” With the cloud still forming inside Microsoft, I am not certain there is a definitive answer to this question. For his take, you will want to read his article here. ITWorld uses pop ups that I found incredibly annoying. You may enjoy these, but I found them junky. Mr. Lai identifies these executives as cloud owners:
- If a group “owns” a product or service for on premises installations, then that group will “own” the cloud service as well.
- Bob Muglia’s server and tools division “owns” SQL Server and Windows Server.
- The Office Web version is under the control of Microsoft’s business division.
- Azure is owned by Ray Ozzie.
The most interesting segment of the article was this passage in my opinion:
“I think it makes sense for the original product group to own the product so it can create a vertically integrated strategy,” said Rob Helm , an analyst with the independent research firm Directions on Microsoft. He cited as an example Microsoft’s tactic of offering vouchers to Exchange and SharePoint customers that let them try Exchange and SharePoint Online for free. Helm’s main criticism is that Microsoft has too many cloud services today, with an overlap as a result. He cited the example of SQL Server Data Services , Live Mesh , and its Sync Framework. “How is Microsoft going to resolve this?” Helm asked.
I don’t think Microsoft itself knows.
Stephen Arnold, November 10, 2008
Multi Core Chips and Search Performance
November 10, 2008
Most enterprise search systems hit a wall sooner or later. The vendors will suggest that the problem is the hardware, not their software. Server vendors will recommend faster gizmos, everything from routers to CPUs. The fix for most slugging search and content processing systems is to throw more hardware at the problem. One vendor has wrangled an investment from a chip maker. The idea is that with the chip maker’s hardware engineers working with the search vendors engineers, performance will be improved. Other vendors surf on Intel’s hot new CPUs. The problem is that the CPU is not often the problem with search system performance. If you are waiting on Intel’s multi core CPUs in order to turbo charge your enterprise search system, you will want to read IEEE Spectrum’s article “Multicore Is Bad News for Supercomputers” here. The key point in the article for me was this passage that asserts scientists
expect that supercomputer programmers will either turn off the extra cores or use them for something ancillary to the main problem. At the heart of the trouble is the so-called memory wall—the growing disparity between how fast a CPU can operate on data and how fast it can get the data it needs. Although the number of cores per processor is increasing, the number of connections from the chip to the rest of the computer is not. So keeping all the cores fed with data is a problem.
Search systems can be made to deliver speedy content processing, faster index updates, and subsecond query processing and results delivery. But to get the system to run like a goose before Thanksgiving dinner, system administrators have to do more than plugging in more servers, adding faster storage devices, and loading up motherboards with random access memory. What’s this mean for search vendors with computationally intensive systems that chew through hours and days indexing content and updating the master index? Not much. Performance will continue to creep up until engineers crack the problem of getting other parts of a server to keep up with the hot new CPUs. Slow search systems will probably remain, well, slow for the foreseeable future.
Stephen Arnold, November 10, 2008