Collective Intelligence Anthology Available
May 14, 2008
The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.
The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.
The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.
This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.
Social Search
The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.
Social search is the name given to an information retrieval system that incorporates one or more of these functions:
- Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
- The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
- The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
- Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies
There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.
Old Statistics and Cheap, Powerful Computers
What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.
Former Clandestine Operative Says Automated Systems Not Good Enough
May 13, 2008
Editor’s Note: Robert Steele, former Marine Corp. officer and intelligence operative, was one of the first, if not the first, intelligence professional since World War II to question the relative value of secret sources and technologies in relation to open sources and technologies. Mr. Steele agreed to meet me near his office in suburban Washington, D.C. The full text of the interview appears below. After we spoke, Mr. Steele provided me with illustrations he referenced in our conversation. I have included these in the transcript at the point where Mr. Steele references them. You can read more about Mr. Steele at his Web site, OSS.Net.
How did you get interested in using information that’s readily available to anyone in a library, in newspapers, and online as a source of useful intelligence?
I went into the international spy program at CIA with a Master’s in International Relations, and knew quite a bit about citation analysis and primary research. What I was not expecting over the course of my clandestine career was the obsession with stealing secrets to the exclusion of all that could be known from open sources.
Robert D. Steele
The clandestine officers also refused to interact with the analysts—before leaving for my first overseas assignment, the Chief of Station took me to the analysis side of the house, and on my way there he said something along the lines of “these folks know nothing useful, and we tell them nothing.”
When the Marine Corps asked me to leave CIA to create the Marine Corps Intelligence Center in 1988, I promptly did what I thought the government wanted; that is, I spent $20 million on a codeword analysis center, including a Special Intelligence Communications (SPINTCOM) work station. I thought it would do everything except kill the terrorist.
Was I in for a shock. I had put a PC with Internet access in an isolated room, not connected to any government network. The PC had a modem. I was curious about online and bulletin board systems. In a short time, analysts were leaving their super charged workstations to stand in line to use the PC. These professionals were looking for information that was not in the government system and not known to our officers in the field (including diplomats and commercial or defense attaches).
What a wake up call.
That is when I learned that expensive systems are as good as their sources—narrow casting into the secret world made much of our multi-billion dollar technology virtually worthless. Analysts using the PC showed me that 80 to 90 percent of the information we needed could be obtained using the PC and public information to include direct calls to overt human experts. I also learned that useful information was available in 183 other languages no one in the US Government can speak or understand. Even today, a large number of Washington officials don’t understand the intelligence value of open sources of information including commercial imagery, foreign-language broadcasts that must be accessed locally, and gray literature, such as university yearbooks for a photo of a terrorist. Washington is completely out of touch with human experts that are not US citizens eligible for a secret clearance—the spies don’t want them unless they agree to commit treason, and the analysts are not allowed to talk to them by paranoid ignorant security officials.
Almost every vendor asserts that their systems can “do” business or competitive intelligence. In your experience is this accurate?
Look. BI and CI are not really intelligence.
BI or business intelligence is commonly used as a descriptor for what is nothing more than internal knowledge management, spiced up with a point-and-click graphics dashboard. Not only are most of these system non-interoperable with everything else, they are as smart or as stupid as the digital data they can access.
The reality of information in most organizations is that most of what is really valuable is not digital. And, most CEOs have zero idea what intelligence (decision support) actually means.
CI or competitive intelligence focuses on competitors. What I practice, Commercial Intelligence, focuses on
- External information
- Collaborative work
- Knowledge management
- Organizational intelligence.
Commercial intelligence leverages what can be drawn from the human social networks interacting with an organization and the other sources of information. External information is not information about competitors. It includes such factors as “true cost” of goods and next-generation “cradle to cradle” opportunities. You have to factor in the art and science of retaining Organizational Intelligence. I will send you a diagram that shows my view of this commercial intelligence space.
In my experience, today’s systems are edging toward failure. The systems aren’t very good, useful, or usable. As the Gartner Group recently said about Windows, it is untenable. I like Microsoft for its cash flow—they need to dump the legacy and launch an open source network with shared call centers and Blue Cube power processing.
Not Your Microsoft Social: It’s Enterprise “The Social”
April 27, 2008
Internet News reported that “the social”–an umbrella noun that includes blogs, wikis, podcasting, mashups, RSS, social networking and widgets–will generate either $707 million or $2.7 billion by 2011.
To be fair to Kenneth Corbin, the Internet News journalist, his story relies on data from two sharp-pencil outfits: Forrester and the Gartner Group. Please, read the story yourself in order to imbibe the magnitude of “the social” in enterprise software.
The key point in the story for me appears in the final paragraph of the story, dated April 22, 2008, is: “Admitting that Web 2.0 features are still in their infancy, the Forrester researchers noted that the technologies are moving steadily toward the mainstream, as older users come to understand and embrace them, and major media firms ink deals with Web 2.0 vendors to soup up their online properties with more interactive features.”
Like most emerging trends, the excitement for Facebook-like and wiki-type functions will have to work within the regulatory net tossed over certain commercial enterprises such as the ever-innovative US financial sector, the slippery pharmaceutical companies with their interesting approach to clinical trials and compartmentalized data, and the reliable health care organizations.
Organizations need to move “beyond search” with regard to information. But what happens if that shift takes the enterprise into unexplored territory?
The role of the Internet as a method of communication is a tired subject. Uncertain and litigation-averse senior managers in commercial firms have to trade off Web 2.0 payoffs against the very real possibility that a misstep can sink their careers and possibly their company.
Stephen Arnold, April 25, 2008
Key Word Search Vendors: Panting Laggards
March 31, 2008
In September 2003, I gave an invited lecture at LANL, an acronym for Los Alamos National Laboratories for those of you who don’t keep up with some of the US government’s most interesting research nomenclature. I poked around my digital warehouse today when I saw an announcement that a major search-and-retrieval vendor was now officially in the “information access business”. I used to work for Ziff Communications Co., and we owned an outfit called Information Access Co. That was a great company name, but the whole shooting match was sold to the giant Thomson Corporation and the name Information Access fell into disuse or so I thought.
I marvel at the “back from the dead” certain terminology demonstrates. IAC, as Information Access was known for more than 15 years, allowed a person to search for electronic information. The idea was a good one, and IAC had revenues of more than $100 million at the time of the sale. The idea was simple. We used bibliographic records or what today would be called “structured metadata”, full text of articles or what today would be called content, and proprietary scripts to generate reports or what today would be called business intelligence. The user of our General Business File product in 1990 would pick from a menu of options; for example, look for a job. Then the user would pick from one of the major cities whose employment opportunities we indexed (now tagged) and the system would display job openings. A mouse click sent the report to the printer, and we had happy users. We sold more than 1,000 of these systems in less than nine months in 1990. Considering each system was in the $20,000 plus range, the General Business File would be a success in our Googley world.
The LANL group wanted to know about the future of search and “The Information Implications of Social Software”. Now in 2003, there wasn’t the popular awareness of social software because MySpace.com, Facebook.com, the Web 2.0 “revolution”, and AJAX were dreams or oddities known to a handful of code bangers.
One of the key points in my presentation was that “information access” was an umbrella term for a bundle of activities and functions. These separate entities were now able to interact to form new, often quite surprising products and services. Social software–which I defined as the use of network technology for communication, collaboration, and combination–was a terrible term, but we were stuck with it. (To learn more about my annoyance with information terminology, Searcher Magazine is running an features story that updates to my 1999 article and my year 2000 article about technology convergence. Sorry. I don’t have a publication date yet, but the editor, Barbara Quint, is working on my lousy prose now.)
Take a look at one diagram from my lecture. Keep in mind that I prepared this five years ago, but for our purpose it is, I hope, useful to you.
Someone complained that I was copyrighting my work on this Web log. Okay, I won’t put the copyright symbol on this graphic. If you want to recycle my work, please, send me an email and get permission. I get annoyed when certain individuals borrow with neither attribution nor permission. Right, Mr. Hermans?
Let’s take a quick tour of this diagram, and then I will close with some observations about the “panting laggard” that is behind-the-firewall search.
Yellow Spheres
Notice the “yellow spheres”. You may have to click on the small image in order to read the notations on this diagram. The heading is “Enabling”. The idea is that each of the “yellow spheres” represents a category of technology that makes online information more useful. For example, “Converting Creating Content” refers to content authoring and content transformation. Behind-the-firewall systems have to take different file types and homogenize them so the system can manipulate them. If a search or content processing system can’t “read” a file, the system won’t process it. The idea, then, is to get the content regardless of its form and format into the search and content processing system. The bottom “yellow ball” is labeled “Spidering, Indexing, and Searching”. You recognize these ideas because 90 percent of a search vendor’s sales pitch talks about this “yellow ball”. In terms of this diagram, it’s easy to see that these three operations–spidering, indexing, and search–are just a cog in a much larger system. Vendors who pitch you about these three features are “panting laggards”. These vendors are almost out of the race and almost certainly won’t win in the long run in my opinion.
Purple Spheres
The “purple spheres” are identified as “Analysis”. Each of these four spaces are now mainstream. Vendors offer these services because each is easier for a manager to assess in terms of a payoff. Few people in an organization want to see laundry lists of information. Filtering eliminates information that rules, methods, or user-defined specifications say, “I don’t want information about enterprise search. I want information about predictive analytics.” Clustering is a catch-all term. In it reside classification, grouping, categorization, and any thing to do with today’s idées du jour–taxonomies and ontologies. The idea is that the system groups similar documents in a meaningful way. If you don’t know what you really want to review, you scan the category labels and browse the results. The third “purple sphere” is data mining. Companies like SPSS and SAS Institute are familiar to you if you took advanced statistics in college. These companies are not in the business of text processing and offering a burgeoning array of features and functions designed to whip unstructured content into shape. SAS Institute bought Teragram, and their PR team told me that SAS will become an “enterprise search company”. I detest this term, but the move is a good one. SAS wants to chop up text, pull out the juicy bits, count them, crunch them, and generate reports for users. The final “purple sphere” is labeled “static / video imaging”. Most organizations are awash in digital information, but most of that is text. Not for long will it be text. “Going forward”, I said in 2003, “behind-the-firewall search systems will have to come to grip with the information-charged binary files–chemical structures, engineering drawings, audio recordings, and video.” Now five years later, only Autonomy has a reasonable solution to video. The other data types remain “outside” the behind-the-firewall system vendors capabilities.
Gray Bar
The “gray bar” was intended to be a spectrum. My lousy Photoshop skills produced this blah “gray bar”. The idea is that “Enabling” and “Analysis” are two distinct types of pressure on search and content processing opportunities. As the “yellow spheres” get bigger, they will exert pressure on the folks in the “gray bar”. Similarly, as the “purple spheres” exert their influence on users, a catalytic reaction occurs in the “gray bar”. In 2003, I identified three significant changes in the way employees will interact with digital information.
First, instead of a search box, people looking for information want some sort of information finder “landing page”. For want of a better term, I used the word portal for the notion of gaining access to information in a search and content processing system.
Second, I identified the shift from getting laundry lists of “hits” to a type of collaborative work. Vendors often forget that documents are created by people, unless you are lucky enough to live inside some hyper-advanced culture like Google’s. But the GOOG is an anomaly, so think about your company. You want to accomplish a work task. Many work tasks require working with one or more colleagues. So, the world of search and retrieval becomes an enabler of collaborative interaction.
Third, the search system is a means of keeping track of what’s been done and how information has changed. In my new study, Beyond Search, published by the Gilbane Group, I talk about one of Google’s most interesting acquisitions data management acquisitions in 2006. (A discussion of this company and its technology appears in Beyond Search.) This company was working is this type of hyper-search space, and if Google does more than launch betas, the technology could revolutionize its enterprise applications division. The point is that search is simply one facet of a much more significant set of processes coming about as the “yellow spheres” and the “purple spheres” expand and change the “pressure” for next-generation applications.
Going Nuclear at LANL
To wrap up, I was making explicit that key word search was a dead end. The action was in the “yellow spheres” and the “purple spheres”. As these various functional and technical areas grew more robust and fell in price, the notion of key words is irrelevant to the real opportunities in the “gray bar”.
In my discussion of the prescient Sagemaker technology here, I make it clear that the flabby key word search had short comings that were well known a decade ago. Now many leaders in search and retrieval are repositioning themselves–actually distancing themselves–from key word search. Not only is it a commodity, the financial difficulties of some of the highest profile vendors make it clear that generating revenue is not easy to do. You can snag Lucene (discussed here) or Flax (discussed here) and save yourself some money.
The LANL folks were not thrilled with my talk. I thought some in the audience would explode. Webmasters and government marketers had just completed a redesign of the LANL Web site. Key word search was offered, but it was slow as molasses. I think it’s been improved now. None of the functions I identified as important in the “gray bar” were available on the LANL’s public-facing or employee-only Web site.
These wizards invited a guy from rural Kentucky, and I did the intellectual equivalent of tracking mud on their white carpet. Competition for clicks among the national labs is fierce. LANL, long the number one research facility, had suffered some security disappointments and the wily wizards at Oak Ridge National Lab had rolled out a niftier Web site. Believe it or not, a high-traffic Web site makes a difference at budget time on Capitol Hill. Here I was making a mess of the new white carpet. I turned in my fancy badge and high-tailed it back to Kentucky.
Most vendors of search and content processing systems have been slow to provide the functionality shown on my amateurish diagram. These vendors are now charging forward with new positioning, new buzzwords, and new ways to explain the benefits of their systems. Like the out-of-shape athlete, some of these folks are coming into our offices looking much the worse for wear. Most are “panting laggards”–not fit for serious information access duty and several years too late.
Stephen Arnold, April 1, 2008
Exalead’s François Bourdoncle Interviewed
February 25, 2008
François Bourdoncle, the engaging founder of Exalead, reveals an important new service now available from Exalead. BAAGZ is a social and semantic system. Mr. Bourdoncle said, “BAAGZ is, for the record, the first social network to allow people to connect because they have shared interests.” BAAGZ is Exalead’s new semantic system.
Mr. Bourdoncle added, “We are listening to our users, and we believe that the time for simplistic and “naked” search engines is over. Now is the time for full-fledged “search products”, not simplistic “search engines”. Think of the difference between a car’s engine and the car itself. BAAGZ, which some of the alpha testers have seen, was described as “a new form of search-inspired social networking”. Another alpha tester called it “a new form of social networking-inspired search”.
BAAGZ will be released in a public beta this week (February 25, 2005,) You can try this new service at www.baagz.com.
Mr. Bourdoncle continued, “At Exalead, from day one, we focused on multi-threaded, 64-bit architectures from Day One…. Today, Exalead has, I know, the most mature, robust and scalable search software. We make full use of today’s multi-core processors. Our products are also able to adapt automatically to various memory / processor / disk configurations.”
Exalead, based in Paris, is one of the four vendors whose system has been identified in Beyond Search: What to Do When Your Search System Doesn’t Work as a “company to watch.” Exalead has a growing presence in the United States and a technical capability that parallels Google’s.
If you though Paris was only for lovers, you need to expand your
horizons. Paris is a place for new approaches to Web and behind-the-firewall
search technology. My website contains the entire interview with Mr. Bourdoncle.
Stephen Arnold, February 25, 2008
Entopia: A Look Back in Time
February 16, 2008
Periodically I browse though my notes about behind-the-firewall systems, content processing solutions, and information retrieval start ups. I think Entopia, a well-funded content processing company founded in 1999, shut down, maybe permanently some time in 2006.
In my “Dormant Search Vendors” folder, I keep information about companies that had interesting technology but dropped off my watch list. A small number of search vendors are intriguing. I revisit what information I have in order to see if there are any salient facts I have overlooked or forgotten.
KangarooNet and Smart Pouches
Do you remember Entopia? The company offered a system that would key word index, identify entities and concepts, and allow a licensee to access information from the bottom up. The firm open its doors as KangarooNet. I noticed the name because it reminded me of the whimsical Purple Yogi (now Stratify). Some names lure me because they are off-beat if not too helpful to a prospective customer. I do recall that the reference to a kangaroo was intended to evoke something called a “smart pouch”. The founders, I believe, were from Israel, not Australia. I assumed some Australian tech wizards had crafted the “smart pouch” moniker, but I was wrong.
Do you know what a “smart pouch” is? The idea is that the kangaroo has a place to keep important items such as baby kangaroos. The Entopia “smart pouch” was a way to gather important information and keep it available. Users could share “smart pouches” and collaborate on information. Delicious.com’s bookmarks provide a crude analog of a single “smart pouch” function.
I recall contacting the company in 2000, but I had a difficult time understanding how the company’s system would operate at scale in an affordable way. Infrastructure and engineering support costs seemed likely to be unacceptably high. No matter what the proposed benefits of a system, if the costs are too high, customers are unwilling to ink a deal.
Shifting Gears: New Name, New Positioning
Entopia is a company name derived from the Greek word entopizo. For those of you whose Greek is a rusty, the verb means to locate or bring to light. Entopia’s senior technologists stressed that their K-Bus and Quantum systems allowed a licensee to locate and make use of information that would otherwise be invisible to some decision makers.
When I spoke with representatives of the company at one of the Information Today conferences in New York, New York, in 2005. I learned that Entopia was, according to the engineer giving me the demo, was “a third-generation technology”. The idea was that Entopia’s system would supplement indexing with data about the document’s author, display Use For and See Also references, and foster collaboration.
I noted that I also spoke with Entopia’s vice president of product management, David Hickman, a quite personable man as I recall. My notes included this impression:
Entopia wants to capture social aspects of information in an organization. Relationships and social nuances are analyzed by Entopia’s system. Instead of a person looking at a list of possibly relevant documents, the user sees the information in the context of the document author, the author’s role in the organization, and the relationships among these elements.
In my files, I found this screen shot of Entopia’s default search results display. It’s very attractive, and includes a number of features that systems now in the channel do not provide. For example, if you had access to Entopia’s system in 2006 prior to its apparent withdrawal from the market, you could:
- See concepts, people, and sources related to your query. These appear in the left hand panel on the screen shot below
- Get a results list with the creator, source, date, and relevance score for each item clearly presented. In contrast to the default displays used by some of the company’s in my Beyond Search study, Entopia’s interface is significantly more advanced
- The standard search box, a hot link to advanced search functions, and one-click access to saved searches keep important but little used functions front and center.
When the firm was repositioned in 2003, the core product was named, according to my handwritten notes, the “K-Bus Knowledge Extractor”. I think the “k” in K-Bus is a remnant of the original “kangaroo” notion. I wrote in my notes that Entopia was a spin out from an outfit called Omind and Global Catalyst Partners.
Other features of the Entopia system were:
- Support for knowledge bases, taxonomies, and controlled term lists
- An API and a software development kit
- Support for natural language processing
- Classification of content
- Enhanced metatagging
The K-Bus technology was enhanced with another software component called Quantum. The software system created a collaborative workspace. The idea was that system users to assemble, discuss, and manipulate the information processed by the K-Bus. This is the original SmartPouch technology that allows a user to gather information and keep it in a virtual workspace.
System Overview
In my Entopia folder, I found white papers and other materials given to me by the company. Among the illustrations was this high-level view of the Entopia system.
Several observations are warranted even though the labels in the figure are not readable. First, licensees had to embrace a comprehensive information platform. In the 2005 - 2006 period, a number of content processing vendors had added the word “platform” to their marketing collateral. Entopia to its credit does a good job of depicting how significant an investment is required to make good on the firm’s assertions for discovering information.
Second, it is clear that the complex interactions required to make the system work as advertised cannot tolerate bottlenecks. A slow down in one component — for instance, the horizontal gray rectangle in the center of the diagram is the “Session Facade Beans” subsystem. If these processes slow down the Web framework in the horizontal blue box above the horizontal gray box slows down user access. Another hot spot is the Data Access Module — the gray rectangle below the horizontal gray rectangle just referenced. A problem in this component prevents the metadata from being accessed. In short, a heck of an infrastructure of systems, storage, and bandwidth availability are needed to keep the system performing at acceptable levels.
Finally, the complexity of the system appears to require on-site support and in some cases, technical support from Entopia. A licensee’s existing information technology staff could require additional headcount to manage this K-Bus architecture.
As I scanned these notes, now more than two years’ old, I was struck by the fact that Entopia was on the right track. The buzz about social search makes sense, particularly in an organization where one-to-one relationships occur out of a hierarchical organizational structure. Software can provide some context for knowledge workers who are often monads, responsible to other monads, not the organization as a whole.
Entopia wanted to blend expertise identification, content visualization, social network analysis, and content discovery into one behind-the-firewall system. I noted that the company’s system started at $250,000, and I assume the up-and-running price tag would be in the millions.
When I asked, “Who are Entopia’s customers?”, I learned that Saab, the US government, Intel, and Boeing were licensees. Those were blue-chip names, and I thought that these firms’ use of the the K-Bus indicated Entopia would thrive. Entopia was among the first search vendors to integrate with Salesforce.com. The system also allowed a licensee to invoke the Entopia functions within a Word document.
What Can We Learn?
Entopia seems to have gone dark quietly in the last half of 2006. My hunch is that the intellectual property of the company has been recycle. Entopia could be in operation under a different corporate name or incorporated as a proprietary system in other content processing systems. When I clicked on the Entopia.com Web address in my folder, a page of links appeared. Running queries on Live.com, Google, and Yahoo returned links to stale information. If Entopia remains in business, it is doing a great job of keeping a low profile.
If you read my essay “Power Leveling”, you know that two common challenges in search and content processing are getting caught in a programming maze. The need to solve a particular problem fails to meet a licensee’s needs. The second problem is that when the system developer assembles the local solutions, the overall result is not efficient. Instead of driving straight from Point A to Point B, the system iterates and explores every highway and by way. Performance becomes a problem. To get the system to go fast, capital investment is necessary. When licensees can’t or won’t spend more on hardware, the system remains sluggish.
Entopia, on the surface, appears to be an excellent candidate for further analysis. My cursory looks at the system in 2001, again in 2005, and finally in 2006 revealed considerable prescience about the overall direction of the content processing market. Some of the subsystems were very clever and well in advance of what other vendors had on the market. The use of the social metadata in search results was quite useful. When these clever subsystems were hooked together, my recollection is now hazy, but I had noted that response time was sluggish. Maybe it was. Maybe it wasn’t. The point is that a complex system like that illustrated above would require on-going work to keep operating at peak performance.
Unfortunately, I don’t have an Entopia system to benchmark against the systems of the 24 companies profiled in Beyond Search. I wanted to include this Entopia information, but I couldn’t justify a historical look back when there was so much to communicate about systems now in the channel.
In Beyond Search, I don’t discuss the platforms available from Autonomy , Endeca, Fast Search & Transfer. IBM, and Oracle. I do mention these companies to frame the new players and little known up and comers that figure in Beyond Search. I would like to conclude this essay with several broad observations about the perils of selling organizations platforms.
First, any company selling a platform is essentially trying to obtain a controlling or central position in the licensee’s organization. A platform play is one that has a potentially huge financial pay off. A platform is a sophisticated “lock in”. Once the platform is in position, competitors have a difficult time making headway against the incumbent platform.
Second, the platform is the core product of IBM (NYSE:IBM), Microsoft (NASDAQ:MSFT), and Oracle (NASDAQ:ORCL). One might include SAP (NYSE:SAP) in this list, but I will omit the company because it’s in transition. These Big Three have the financial and market clout to compete with one another. Smaller outfits p9ushing platforms have to out market, out fox, and out deliver any of the Big Three. After all, why would an Oracle DBA want another information processing platform in an all-Oracle environment. IBM and Microsoft operate with almost the same mind set. Smaller platform vendors — perhaps we could include Autonomy (LON:AU) and Endeca in this category — are likely to face increasing pressure to mesh seamlessly with whatever a licensee has. If this is correct, Fast Search’s ESP has a better chance going forward than Autonomy. It’s too early to determine if Endeca’s deal with SAP will pay similar dividends. You can decide for yourself if Autonomy can go toe-to-tow with the Big Three. From my observation post in rural Kentucky, Autonomy will have to shift into a higher gear in 2008.
Third, super-advanced systems are vulnerable in business environments where credit is tight, sales are in slow or low growth cycles, and a licensee’s technical staff may be understaffed and overworked.
In conclusion, I think Entopia was a forward-thinking company. Its technology anticipated market needs now more clearly discernable. Its system was slick, anticipating some of the functionality of the Web 2.0 boom. The company demonstrated a willingness to abandon overly cute marketing for more professional product and company nomenclature. The company did apparently have one weakness — too little revenue. Entopia, if you are still out there, please, let me know.
Stephen Arnold, February 16, 2008
Social Search: No Panacea
February 11, 2008
I wrote a chapter for the forthcoming book of essays, Collective Intelligence. Information about the volume is at the Oss.net Web site. If you don’t see a direct link to the study, check back. The book is just in its final run up to publication.
I’m thinking about my chapter “Search Panacea or Ploy: Can Collective Intelligence Improve Findability?” As we worked on the index for my contribution, we talked about the notion of social search. Wikipedia, as you might have suspected, has a substantial entry about social search. A search for the phrase “social search” on any of the Web search engines returns thousands of entries. As of February 11, 2008, here are Yahoo’s.
Few will doubt that the notion of social search — with humans providing metatags about information — is a hot trend in search.
I can’t recycle the arguments presented in my contribution to Collective Intelligence. I can, however, ask several questions about social search to which I think more research effort should be applied:
Gaming the System
In a social setting, most people will play by the rules. A small percentage of those people will find ways to “game” or manipulate the system to suit their purposes. Online social systems are subject to manipulation. Digg.com and Reddit.com have become targets of people and their scripts. The question is, “How can a user trust the information on a social system?” This is a key issue for me. Several years ago I gave a talk at a Kroll (Marsh McLennan) officer’s meeting where the audience was keenly interested in ways to determine the reputation of people and the validity of their actions in a social online system.
Most Lurk, Two Percent Contribute
My work in social search last year revealed a surprising — to me at least — piece of data. Take a social search site with 100 users. Only two people contribute on a regular basis. I think more research is needed to understand how active individuals can shape the information available. The question is, “What is the likelihood that active participants will present information that is distorted or skewed inadvertently?” The problem is that in an online space where there is no or a lax editorial policy, distortion may be “baked into” the system. Naive users can visit a site in search of objective results, and the information, by definition, is not objective.
Locked in a Search Box
Some of the social search systems offer tag clouds or a graphic display of topics. The Mahalo.com site makes it easy for a user to get a sense of the topics covered. Click on the image below, and you will readily see that Mahalo is a consumer centric system, almost an updated version of Yahoo’s original directory:
The question is, “What else is available in this system?” Most of the social search sites pose challenges to users. There’s no index to the content, and no easy way to know when the information was updated. I’ve had this issue with About.com for years. The notion of scope and currency nag at me, and the search box requires that I guess the secret combination of words before I can dig deeply into the information available.
In my contribution to Collective Intelligence, I cover a number of more complex issues. For example, Google is — at its core — a social search system. The notion of links and clicks are artifacts of human action and attention. By considering these, Google has its pulse on its users’ behavior. I think this aspect of Google’s system has be long understood, but Google’s potential in the social search space has not been viewed in some of the social buzz.
Stephen Arnold, February 11, 2008



