SSNOrder Google: The Digital GutenbergSurf on Google

Kartoo Tweaks Its Interface

October 13, 2009

I have found the Kartoo.com service useful and innovative. I learned today that the company has rolled out a new interface and links that make it easier to locate the company’s other content processing technology. The new interface provides thumbnails of the top hits. You can explore other results by clicking on the links on the page. The default interface for the query “text mining” appears below:

kartoo new interface

Other new features include:

  • E-reputation tools
  • Metasearch functions
  • Support for anonymous search
  • Support for French, English and Dutch language.

If you have not explored the Kartoo service, give it a whirl.

Stephen Arnold, October 13, 2009, published because I like the French

Kartoo Adds New Interface Functions

July 9, 2009

Kartoo’s interface has added some features. If you have not visited the site for a while, you will want to navigate to the Kartoo main page. Set your preferences for this Flash based metasearch system. The interface has visual impact, but an addled goose like me wanted pop up explanation of the icons. The options page looks like this:

kartoo options

Now enter your query in the search box at the top of the page. Unlike the Kartoo interface of the past, you have a larger, cleaner presentation of the relevant hits. When you hover over an icon, Kartoo displays a relationship line. For the query “US financial crisis” the system displayed these results:

banking crisis

When you click on one of the thumbnail images, Kartoo sends you to the source site. If you hover, Kartoo displays a pop up with a text snippet.

On the left column of the interface are two buttons. You can select what supplementary content you want to see. I selected topics, allowing me quick access to only those hits about one of the identified categories. I also instructed the system to show me images. You can see the images, which are presented in low resolution, in the scrollable side bar below the topics.

Kartoo Technologies is based in Paris. The company has been one of the firms pushing the envelope in search interface designs and controls. Information about the company’s products and technologies may be found on the Kartoo corporate Web site. The company now has more than 200 customers who use the firm’s technologies for visualization and intelligence monitoring. The Kartoo teams are located in Clermont-Ferrand, France.

Stephen Arnold, July 9, 2009

BBC: Search Is a Backwater

September 27, 2008

I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.

With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:

  1. Key word search is brute force search.
  2. Yahoo BOSS is a way to embrace and extend search
  3. The Xoogler Cuil.com system looked promising but possibly disappoints
  4. Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
  5. Naver from Korea is a super electronic game portal.
  6. Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
  7. SearchMe, also built by Xooglers, uses a flash-based interface.

searchmeresults

Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.

These points were followed by this statement:

When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.

I would like to comment on each of these points:

Read more

SearchCloud: Term Weighting Arrives

August 1, 2008

Yahoo’s BOSS (Build Your Own Search Service) has caught the attention of a number of companies in the information retrieval sector. A happy quack to the reader who alerted me to SearchCloud.net, a BOSS user.

SearchCloud.net, according to KillerStartUps, allows the user to weight certain terms:

The hook used by SearchCloud is providing users with the ability to weight the importance of keywords by changing the size of the fonts. Theoretically, this should allow for more accurate search results and the ability to search within given Web sites by simply placing the site name in a big font and the topic in a smaller one. While it is a great idea with a lot of potential, testing of the engine brought back very mixed results and the interface is not very well-designed. Searching for “Killerstartups” in a large font and “Cuil” in a smaller one did bring back a number of Killerstartups related pages but none with “Cuil” referenced.

You can read the KillerStartUps review here. In talking with the developers of SearchCloud.net, the SearchCloud.net team pointed out that KillerStartUps search would have returned better results had KillerStartUps reverse their weightings. The most specific search terms should be weighted higher by using larger letters. Here’s an example:

my weighted query

You can see the weights I assigned to each of my query terms. A larger font means the term has more weight in the query.

You can see that the terms that I wanted to emphasize I put in larger letters using the selector button above the cloud. And you may be interested in a contrarian review of SearchCloud.net on TechCrunch review here. I am tipping toward the positive with regard to this new service.

I found SearchCloud.net intuitive, and the system allows me to control the importance of certain terms in my query. For example, let’s take a query I ran this morning for a client about Google’s mobile search results.

I saw a report from South Africa that suggested Google was delivering a “mash up” of results from different Google indexes. I needed to locate information about this alleged Google function. You can read about what I learned here. I found SearchCloud.net–despite some start up rough edges–quite useful.

search cloud grid

The tag cloud appears to the left of the results list. I have selected the grid display of results. I can scroll through a large number of relevance ranked hits very quickly. This is a useful interface option.

SearchCloud.net, like Kartoo.com, exploits Adobe technology to good effect.

There are some functions that I would like to see the SearchCloud.net team add; for example, in the results view, I want to be able to fiddle with the term weights and see the results rerank themselves. My hunch is that this function will be implemented, but like most start ups, SearchCloud.net must husband its resources.

When I spoke with the young-at-heart owners of SearchCloud.net, I was impressed with their candor and willingness to listen to my questions and suggestions. Right now, the company is self-funded and based in Milwaukee, Wisconsin. Ads are one of the revenue sources the team is discussing at this time.

Steven Eisenhauer, president, told me:

We would like to see the major players in the industry realize that the user is smart enough to control the parameters of their searches.  It would be nice too see Google or Yahoo integrate our technology as an option for their users.

Milwaukee is known for beer, not investment banks. If you want to own a piece of a search company, maybe you could contact SearchCloud.net at info at searchcloud dot net?

SearchCloud.net shows considerable promise, and I have long been skeptical of Adobe’s Web technology. I may have to soften my stance based on what the SearchCloud.net wizards have been able to accomplish with Flex. I have added this company to my watch list.

Stephen Arnold, August 1, 2008

Collective Intelligence Anthology Available

May 14, 2008

The Arnoldit.com mascot admires the new collection of essay by Mark Tovey. Collective Intelligence: Creating a Prosperous World at Peace, published by the Earth Intelligence Network in Oakton, Virginia (ISBN: 13: 978-0-97-15661-6-3) contains more than 50 essays by analysts, consultants, and intelligence practitioners. You can obtain a copy from the publisher, Amazon, or your bookseller.

ci_art_02 copy

The ArnoldIT mascot completed reading the 600-page book with remarkable alacrity for a duck.

The collection of essays is likely to find many readers among those interested in social phenomena of networks. Many of the essays, including the one I contributed, talk about information retrieval in our increasingly inter connected world.

This essay will provide a synopsis of my contribution, “Search–Panacea or Play. Can Collective Intelligence Improve Findability”, which I wrote shortly before completing Beyond Search: What to Do When Your Search System Doesn’t Work“. My essay begins on page 375.

Social Search

The dominance of Google forces other vendors to look for a way over, under, around, or through its grip on the Web search. The vendor landscape now offers search and content processing systems that arguably do a better job of manipulating XML (Extensible Markup Language) content, figuring out who knows whom (the social graph initiative), and the “real” meaning of content (semantic search). There are more than 100 vendors who have technology that offers, if one believes the marketing collateral and conference presentations, a way to squeeze more information from information.

Social search is the name given to an information retrieval system that incorporates one or more of these functions:

  1. Users can suggest useful sites. Examples: Delicious.com and StumbleUpon.com
  2. The system discovers relationships between and among processed documents and links: Powerset.com and Kartoo Visu
  3. The system analyzes information extracts entities and identifies individuals and their relationships: i2 Ltd (now part of ChoicePoint) and Cluuz.com
  4. Monitoring of user behavior and using data to guide relevance, spidering and other system functions: public Web indexing companies

There are other types of social functions, but these provide sufficient salt and pepper for this information side dish. The reason I say side dish is that social functions are not going to displace the traditional functions on which they are based. Social search has been in the mainstream from the moment i2 Ltd. introduced its workbench product to the intelligence community more than a decade ago. “Social” functions, then, are a recent add-on to the main diet in information retrieval.

Old Statistics and Cheap, Powerful Computers

What’s overlooked in the rush to find a Google “killer” is that the new companies are using some well-known technologies. For example, the inner workings of Autonomy’s “black box” is somewhat dependent on the work of a slightly unusual Englishman, Thomas Bayes. Mr. Bayes left the world a couple of centuries ago, but his math has been a staple in college statistics courses for many years. To deploy Bayesian techniques on a large scale is, therefore, not exactly a secret to the thousands of mathematicians who followed his proofs in pursuit of their baccalaureate.

Read more

Kartoo’s Visu: Semantic Search Plus Themescape Visualization

May 11, 2008

In England in December 2007, I saw a brief demonstration of Kartoo.com’s “thematic map”, which was announced in 2005.

The genesis for the company was developed from the relationships with large publishing groups into 1997. Mr. Baleydier was working to make CD-ROMs easily searchable. Founded in 2001 by Laurent and Nicholas Baleydier to provide a more advanced search interface. You can find out more about the company at Kartoo.net. Kartoo S.A. offers a no-charge metasearch Web system at Kartoo.com.

The original Kartoo service was one of the first to use dynamic graphics for Web search. Over the last few years, the interface became more refined. But the system presented links in the form of dynamic maps. Important Web sites were spherical, and the spheres were connected by lines. Here’s an example of the basic Kartoo interface as it looked on May 11, 2008, for the query “semantic search” run against the default of English Web sites. (The company also offers Ujiko.com, which is worth a quick look. The interface is a bit too abstract for me. You can try it here.)

defaultresultsonmay2008

The dark blue “ink blots” connect related Web sites. The terms provide an indication of the type of relationship between or among Web sites. You can click on this interface and explore the result set and perform other functions. Exploration of the interface is the best way to explore its features. Describing the mouse actions is not as effective as playing with the system.

Another company–Datops SA–was among the first to use interesting graphic representations of results. I recall someone telling me that the spheres that once characterized Groxis.com’s results had been influenced by a French wizard. Whether justified or not, when I saw spheres and ink blots, I said to myself, “Ah, another vendor influenced by French interface design”. In talking with people who use visualizations to help their users understand a “results space”, I’ve had mixed feedback. Some people love impressionistic representations of results; others, don’t. Decades ago I played a small role in the design of the F-15 interface or heads-up display. The one lesson I learned from that work was that under pressure, interfaces that offer too many options can paralyze reaction time. In combat, that means the pilot could be killed trying to figure out what graphics means. In other situations where a computational chemist is trying to make sense of 100,000 possible structures, a fine-grained visualization of the results may be appropriate.

Read more

Who Quaeros?

March 11, 2008

Europe is concerned about Google.

When I was in Denmark in November 2006, I learned that about 85 percent of the country’s search traffic was a result of Google searches. I think Google has increased its share of traffic in Denmark to Germany’s level. For those of you not paying attention, Google drives about 90 percent of the traffic in Deutschland.

There are two initiatives under way to “kill Google.” The first is Quaero, a French inititive. You can read about it here. The second is a German-flavored project called THESEUS, which received funding a year ago. My understanding is that Fast Search & Transfer is in the saddle for the THETUS project, but my information may be stale. The French, not to be outdone, have routed money to Quaero. Check out his story — “France Cleared to Fund Search Project”. Here’s a snippet:

France won EU approval Tuesday to give $152 million to several companies hoping to build a European rival to U.S. search giant Google Inc. … The commission said the grant would not give Thomson market power because rivals will likely keep up their investment in research and development. It cleared the German government to give $165 million to the German arm of the project, called THESEUS. That money will fund “icebreaker” companies — Siemens AG, SAP AG, Deutsche Thomson oHG and EMPOLIS GmbH, owned by Bertelsmann AG — to kick start research. The aid will later spread to smaller firms.

Am I misreading this? My work has publicized some of France’s most promising search and content processing companies.So, what do you do if your are German or French? Replicate the Silicon Valley VC environment? Alter the tax laws? Reduce bureaucratic red tape? Encourage university incubators? Nah, too complex. Just give the money to industrial giants and tell them “build a better Google”.

In my experience, governments dumping money on industrial giants leads to predictable outcomes. Those that come to my mind include Halliburton’s contributions in Iraq, IBM’s work to implement the Documentum content management system for the US Senate, and the numerous reengineerings of the Internal Revenue Service’s computer systems.

Look at these to re familiarize yourself with French engineering and computer science:

What puzzles me is how will France figure out which of these companies will get a wedge of euros to “kill Google”. What will the Thomson oHG operation do with French wizards who are hacking away in un dortoir? Probably nothing.

With French venture capital forcing some French entrepreneurs to leave France for such places as — gasp! — England, I hope some of the euros feather the nests of young entrepreneurs.

The battle lines are drawn. The German “icebreakers” Siemens AG, SAP AG, Deutsche Thomson oHG and EMPOLIS GmbH, owned by Bertelsmann AG will try to crush Google and, of course, France. The French companies will try to “kill Google” and turn off the power for THESEUS, of course.

If we factor these battle lines, you will notice that I think Google will chunk forward, allowing the icebreakers to smash and crunch forward.

For those of you who don’t know what a European icebreaker looks like. Take a gander. Do you think this can smash over Google? Will these efforts run aground? I will watch the progress closely and plot the activity on Google Maps until Google is crushed that is.

Stephen Arnold, March 12, 2008

Search: The Problem with Words and Their Misuse

January 30, 2008

I rely on several different types of alerts, including Yahoo’s service, to keep pace with developments in what I call “behind the firewall search”.

Today was particularly frustrating because the number of matches for the word “search” has been increasing, particularly since the Microsoft – Fast Search & Transfer acquisition and the Endeca cash injection from Intel and SAP. My alerts contain a large number of hits, and I realized that most of these are not about “behind the firewall” search, nor chock full of substantive information. Alerts are a necessary evil, but over the years, the primitive key word indexing offered by free services don’t help me.

The problem is the word search and its use or misuse. If you know of better examples to illustrate these types of search, please, post them. I’m interested in learning about sites and their search technology.

I have a so-so understanding of language drift, ambiguity, and POM (plain old marketing) work. For someone looking for information about search, the job is not getting easier. In fact, search has become such a devalued term that locating information about a particular type of search requires some effort. I’ve just finished compiling the Glossary for “Beyond Search”, due out in April 2008 from the Gilbane Group, a high-caliber outfit in the Boston, Massachusetts area. So, terminology is at the top of my mind this morning.

Let’s look at a few terms. These are not in alphabetical order. The order is by their annoyance factor. The head of the list contains the most annoying terms to me. The foot of the list are terms that are less offensive to me. You may not agree. That’s okay.

Vertical search. Number one for 2008. Last year it was in second place. This term means that a particular topic or corpus has been indexed. The user of a vertical search engine like Sidestep.com sees only hits in the travel area. As Web search engines have done a better and better job of indexing horizontal content — that is, on almost every topic — vertical search engines narrow their focus. Think deep and narrow, not wide and shallow. As I have said elsewhere, vertical search is today’s 20-somethings rediscovering how commercial databases handled information in the late 1970s with success then but considerably less success today.

Search engine marketing. This is last year’s number one. Google and other Web engines are taking steps to make it harder to get junk sites to the top of a laundry list of results. This phrase search engine marketing is the buzzword for the entire industry of getting a site on the first page of Google results. The need to “rank high” and has made some people “search gurus”. I must admit I don’t think too much of SEM, as it is called. I do a reasonable job of explaining SEM in terms of Google’s Webmaster guidelines. I believe that solid content is enough. If you match that with clean code, Web indexing bots will index the information. Today’s Web search systems do a good job of indexing, and there are value-added services such as Clusty.com that add metadata, whether the metadata exists on the indexed sites or not. When I see the term search used to mean SEM, I’m annoyed. Figuring out how to fool Google, Microsoft Live.com, or Yahoo’s indexing systems is not something that is of much interest to me. Much of the SEM experts’ guidance amounts to repeating Google’s Web master guidelines and fiddling with page elements until a site moves up in the rankings. Most sites lack substantive content and deserve to be at the bottom of the results list. Why do I want to have in my first page of results a bunch of links to sites without heft? I want links to pages significant enough to get to the top of results list because of solid information, not SEM voodoo. For basics, check out “How Stuff Works.”

Guided, faceted, assisted, and discovery search. The idea that is difficult to express in words and phrases is a system that provides point-and-click access to related information. I’ve heard a variation on these concepts expressed as drill-down search or exploratory search. These are 21st-century buzzwords for “Use For” and “See Also” references. But by the time a vendor gets done explaining taxonomies, ontologies, and controlled term lists, the notion of search is mired in confusion. Don’t get me wrong. Rich metadata and exposed links to meaningful “See Also” and “Use For” information is important. I’m just burned out with companies using these terms when their technology can’t deliver.

Enterprise search. I do not know what “enterprise search” is. I do know that there are organizations of all types. Some are government agencies. Some are non-profit organizations. Some are publicly-traded companies. Some are privately held companies. Some are professional services corporations. Some are limited liability corporations. Each has a need to locate electronic information. There is no one-size-fits-all content processing and retrieval system. I prefer the phrase “behind the firewall search.” It may not be perfect, but it makes clear that the system must function in a specific type of setting. Enterprise search has been overused, and it is now too fuzzy to be useful from my point of view. A related annoyance is the word “all”. Some vendors say they can index “all the organization’s information.” Baloney. Effective “behind the firewall” systems deliver information needed to answer questions, not run afoul of federal regulations regarding health care information, incite dissatisfaction by exposing employee salaries, or let out vital company secrets that should be kept under wraps.

Natural language search. This term means that the user can type a question into a system. A favorite query is, “What are the car dealerships in Palo Alto?” You can run this query on Google or Ask.com. The system takes this “natural language question”, coverts it to Boolean, and displays the results. Some systems don’t do anything more than display a cached answer to a frequently asked question. The fact is that most users–exceptions include lawyers and expert intelligence operatives–don’t do “natural lanaguage queries”. Most users type some words like weather 40202 and hit the Enter key. NLP sounds great and is often used in the same sentence with latent semantic indexing, semantic search, and linguistic technology. These are useful technologies, but most users type their 2.3 words and take the first hit on the results list.

Semantic search. See natural language search. Semantic technologies are important and finally practical in every day business operations. Used inside search systems, today’s fast processors and cheap storage make it possible to figure out some nuances in content and convert those nuances to metatags. It’s easy for vendors to bandy about the term semantic and Semantic Web than explain what it delivers in terms of precision and recall. There are serious semantic-centric vendors, and there are a great many who use the phrase because it helps make sales. An important vendor of semantic technology is Siderean Software. I profile others in “Beyond Search”.

Value-added search. This is a coinage that means roughly, “When our search system processes content, we find and index more stuff.” “Stuff”, obviously, is a technical word that can mean the file type or concepts and entities. A value-added search system tries to tag concepts and entities automatically. Humans used to do indexing but there is too much data and not enough skilled indexers. So, value-added search means “indexing like a human used to do.” Once a result set has been generated, value-added search systems will display related information; that is, “See Also” references. An example is Internet the Best. Judge for yourself if the technique is useful.

Side search. I like this phrase. It sounds nifty and means nothing to most people in a vendor’s marketing presentation. What I think the vendors who use this term mean is additional processes that run to generate “Use For” and “See Also” references. The implication is that the user gets a search bonus or extra sugar in their coffee. Some vendors have described a “more like this” function as a side search. The idea is that a user sees a relevant hit. By clicking the “more like this” hot link, the system uses the relevant hit as the basis of a new, presumably more precise, query. A side search to me means any automatic query launched without the user having to type in a search box. The user may have to click the mouse button, but the heavy lifting is machine-assisted. Delicious offers a side search labeled as related terms. Just choose a tag from the list of the right side of the Web page, and you see more hits like these. The idea is that you get related information without reentering a query.

Sentiment search. I have just looked at a new search system called Circos. This system lets me search in “color”. The idea is that emotions or feeling can be located. People want systems that provide a way to work emotion, judgment, and nuance into their results. Lexalytics, for examples, offers a useful, commercial system that can provide brand managers with data about whether customers are positive or negative toward the brand. Google, based on their engineering papers, appears to be nosing around in this sentiment search as well. Worth monitoring because using algorithms to figure out if users like or dislike a person, place, or thing can be quite significant to analysts.

Visual search. I don’t know what this means. I have seen the term used to describe systems that allow the user to click on pictures in order to see other pictures that share some colors or shapes of the source picture. If you haven’t seen Kartoo, it’s worth a look. Inxight Software offers a “search wall”. This is a graphic representation of the information in a results list or a collection as a three-dimensional brick wall. Each brick is a content object. I liked the idea when I first saw in five or six years ago, but I find visual search functionality clunky. Flying hyperbolic maps and other graphic renderings have sizzle, but instead of steak I get boiled tofu.

Parametric search. Structured search or SQL queries with training wheels are loose synonyms for parametric search and close enough for horse shoes. The term parametric search has value, but it is losing ground to structured search. Today, structured data are fuzzed with unstructured data by vendors who say, “Our system supports unstructured information and structured data.” Structured and unstructured data treated as twins, thus making it hard for a prospect to understand what processes are needed to achieve this delightful state. These data can then be queried by assisted, guided, or faceted search. Some of the newer search systems are, at their core, parametric systems. These systems are not positioned in this way. Marketers find that customers don’t want to be troubled by “what’s under the hood.” So, “fields” become metatags, and other smoothing takes place. It is no surprise to me that content processing procurement teams struggle to figure out what a vendor’s system actually does. Check out Thunderstone’s offering and look for my Web log post about parametric (structured search) in a day or two. In Beyond Search, I profile two vendors’ systems each with different but interesting parametric search functionality. Either of these two vendors’ solutions can help you deal with the structured – unstructured dichotomy. You will have to wait until April 2008 when my new study comes out. I’m not letting these two rabbits out of my hat yet.

Unstructured search. This usually implies running a query against text that has been indexed for its key words because the source lacks “tags” or “field names”. Email, PDFs, and some Word documents are unstructured. A number of content processing systems can also index bound phrases like “stock market” and “white house”. Others include some obvious access points such as file types. Today, unstructured search blends into other categories. But unstructured search has less perceived value than flashier types of search or a back office ERP (enterprise resource planning) application. Navigate to ArnoldIT.com and run a query in my site’s search box. That’s an unstructured search, provided by Blossom Software, which is quite interesting to me.

Hyperbolic search. There are many variations of this approach which is called “buzzword fog”. Hyperbolic geometry and modular forms play an important role is some vendors’ systems. But these functions are locked away out of sight and fiddling by licensees. When you hear terms other than plain English, you are in the presence of “fog rolling in on little cat’s feet.” The difference is that this fog doesn’t move on. You are stuck in an almost-impenetrable mist. When you see the collision coming, it is almost always too late to avoid. I think the phrase means, “Our engineers use stuff I don’t understand, but it sure sounds good.”

Intuitive search. This is a term used to suggest that the interface is easy enough for the marketer’s mother to use without someone telling her what to do. The interface is one visible piece of the search system itself. Humans like to look at interfaces and debate which color or icon is better for their users. Don’t guess on interfaces. Test different ones and use what gets the most clicks. Interfaces that generate more usage are generally better than interfaces designed by the senior vice president’s daughter who just graduated with an MFA from the University of Iowa. Design opinion is not search; it’s technology decoration. For an example, look at this interface from Yahoo. Is it intuitive to you?

Real-time search. This term means that the content is updated frequently enough to be perceived as real time. It’s not. There is latency in search systems. The word “search,” therefore, doesn’t mean real-time by definition. Feed means “near real time”. There are a lot of tricks to create the impression of real time. These include multiple indexes, caching, content boosting, and time stamp fiddling. Check out ZapTXT. Next compare Yahoo News, AllTheWeb.com news, and Google News. Okay, which is “real time”? Answer: none.

Audio, video, image search. The idea is that a vendor indexes a particular type of non-text content. The techniques range from indexing only metadata and not the information in the binary file to converting speech to ASCII, then indexing the ASCII. In Japan, I saw a demonstration of a system that allowed a user to identify a particular image — for example, a cow. The system then showed pictures the system thought contained cows. These type of search systems address a real need today. The majority of digital content is in the form of digitized audio, video, and image files. Text is small potatoes. We don’t do a great job on text. We don’t do very well at all on content objects such as audio, video, and images. I think Blinkx does a reasonably good job, not great, reasonable.

Local search. This is a variation on vertical search. Information about a city or particular geographic area is indexed and made available. This is Yellow Pages territory. It is the domain of local newspaper advertising. A number of vendors want to dominate this sector; for example, Google, Microsoft, and Yahoo. Incumbents like telcos and commercial directory firms aren’t sure what actions to take as online sites nibble away at what was a $32 billion dollar paper directory business. Look at Ask City. Will this make sense to your children?

Intelligent search. This is the old “FOAI” or familiar old artificial intelligence. Most vendors uses artificial intelligence but call it machine learning or computational intelligence. Every major search engine uses computational intelligence. Try Microsoft’s Live.com. Now try Google’s “ig” or Individualized Google service. Which is relying more on machine learning?

Key word search. This is the ubiquitous, “naked” search box. You can use Boolean operators, or you can enter free text and perform a free text search. Free text search means no explicit Boolean operators are required of a user. Enlightened search system vendors add an AND to narrow the result set. Other system vendors, rather unhelpfully, add an OR, which increases the number of results. Take a look at the key word search from Ixquick, a New York City investment banker developed engine now owned by a European company. What’s it doing to your free text query?

Search without search. Believe me, this is where the action is. The idea is that a vendor — for example, Google — will use information about information, user behavior, system processes, and other bits and pieces of data — to run automatically and in the background, queries for a user. Then when the user glances at his / her mobile device, the system is already displaying the information most likely to be wanted at that point of time by that user. An easy way to think of this is to imagine yourself rushing to the airport. The Google approach would look at your geo spatial coordinates, check your search history, and display flight departure delays or parking lot status. I want this service because anyone who has ridden with me knows that I can’t drive, think about parking, and locate my airline reliably. I can’t read the keyboard on my mobile phone, so I want Google to convert the search result to text, call me, and speak the information as I try to make my flight. Google has a patent application with the phrase “I’m feeling doubly lucky.” Stay tuned to Google and its competitors for more information on this type of search.

This short list of different types of search helps explain why there is confusion about which systems do what. Search is no longer something performed by a person training in computer science, information science, or a similar discipline. Search is something everyone knows, right? Wrong. Search is a service that’s readily available and used by millions of people each day. Don’t confuse using an automatic teller machine with understanding finance. The same applies to search. Just because a person can locate information about a subject does not mean that person understands search.

Search is among the most complex problems in computer science, cognitive psychology, information retrieval, and many other disciplines. Search is many things, but it definitely is not easy, well understood, or widely recognized as the next application platform.

Stephen Arnold, January 30, 2008