Collective Intelligence Anthology Available

May 14, 2008

The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.

The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.

The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.

This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.

Social Search

The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.

Social search is the name given to an information retrieval system that incorporates one or more of these functions:

Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies

There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.

Old Statistics and Cheap, Powerful Computers

What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.

What’s unleashed Bayesian mathematics in search has been the availability of cheap, powerful computers. Instead of choking the college mainframe, a dual-core PC can whip through recursive procedures without creating a bottleneck. The same line of reasoning applies to linguistic techniques, both brute force and slightly more clever iterative methods. Even the Google innovation engine is using established mathematical procedures. The Google “secret sauce” is that its computing infrastructure can handle the calculations these “old line” algorithms require. If a Google engineer doesn’t get the result from one algorithm, another one can be dropped into the “slot”. This ability to test quickly gives Google an advantage that its competitors are just now beginning to appreciate.

Little wonder then that the new search systems incorporate what appear to be state-of-the-art techniques. Cheap, powerful computers only recently made it feasible to run algorithms that only a few years ago would have choked an organization’s servers. (I included in my contribution to this collection of essay a list of more than 40 vendors who are in the social search space.)

The Real Payoff

On the surface, you may not find the result sets generated by the new systems much more useful than the result sets available from Yahoo.com or even the hanging-in-limbo AllTheWeb.com service that Fast Search operates for Yahoo. In fact, when I showed two or three of the hotter social search systems to my 86-year-old father, he wanted to know why he couldn’t stick with Yahoo. The reason for the slow take off of social search in a consumer sector or even among employees at a large organization is that search is perceived as dissatisfying. But the mental cost of switching and understanding a different system is too high. If the social system outputs the answer the user wants or puts a point-and-click list of See Also references on the screen, the mental friction can be overcome.

The point is that the benefit of social techniques is to behind-the-scenes processes. The interface makes the outputs more useful. Exposing the rafters and wiring of social search systems is not too useful to a great many online system users. (In a forthcoming interview with PolySpot, a French search and content processing company), you will hear the managing director make much the same point.) Utility, not technology, is what’s needed to minimize the dissatisfaction with search-and-retrieval systems.

Where’s the payoff?

The payoff is in analyzing user behavior, system processes, and the information itself. In my view, the social revolution is a monitoring and analyzing shift. No longer will information hang in a void without context. Social techniques, using well-known mathematical principles, make it possible to figure out what’s been done, what’s going on, and what is likely to happen.

This type of social function may make some people nervous. It’s too late. The mating dance of fancy math and cheap computers is over. These two love birds are cranking out digital progeny at a very rapid pace. The problem is that no one is keeping track of the progeny in a formal way. Anecdotal evidence does not make the enormity of the shift easy to perceive or communicate. Well, it’s happened.

Observations

Let me conclude with the suggestion that you take a glance at Collective Intelligence. I’m not sure if your local library has the money to buy specialist publications like this one, but you can get a copy from the publisher. My mother already has my copy. What a great Mother’s Day gift. I’m sure she’ll devour it just as she has my other professional writings.

Several concluding remarks may put my assertions in more practical terms:

Social systems can be “gamed” if they are overt. Therefore, the most effective social search functionalities will operate in covert ways. This is the core notion of monitoring and analyzing patterns for exceptions or interesting data.
Overt social systems, like the MySpace and Facebook services, will follow an up-and-down pattern. Users come to the service. Use it extensively for a period of time. Then the users wander off. Some users return. This behavior means that effective social monitoring requires a way to give each user a unique identifier and then aggregate behaviors across systems. Umbrella social aggregation will be a potentially “hot” investment and innovation niche.
“Average” users are unaware that the well-known mathematical techniques can operate in two ways simultaneously. For known or stateful users, the maths can predict with high confidence certain types of behaviors. These stateful data can be used to refine what’s needed by stateless users (anonymous users). Thus, masses of stateless users provide useful data about the behaviors of clustered “users” or data derived from cluster analysis. These data when “averaged” with the stateful data make it possible to add considerable richness to the analysis of machine processes, information, and users.

In order to make optimal use of “social” search techniques, a big honking computer is needed. (Another clue about my duck logo.) Companies in this social search sector have to have the technical talent, the money, and the mathematical expertise to handle massive amounts of data. It doesn’t take much analysis to see that only a handful of companies can push into this market space. Once three or four companies offer these types of systems, the winner will be the one with the most efficient engineering, the fastest algorithms, and the smartest people. In my Google Version 2.0 (September 2007 for Infonortics Ltd. in Tetbury, Glou.) I elaborate on this point, suggesting that companies like Amazon, AT&T IBM, Microsoft, Oracle, Verizon, and other big gun technology firms have their work cut out for them. I’m not sure these firms understand the challenge they face in the next 18 to 36 months from Google, a very social company in terms of its technologies.

Stephen Arnold, May 14, 2008

Written by Stephen E. Arnold · Filed Under Database, Feature, Online (general), Search, Semantic, Social

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.