Order Google: The Digital GutenbergTop Banner

Concept Searching Update

July 3, 2009

Founded in 2002, Concept Searching provides licensees with search, auto-classification, taxonomy management and metadata tagging solutions. You can download a fact sheet about the privately firm here. The software can be used on an individual user’s computer or mounted on servers to deliver enterprise solutions. The company’s secret sauce is its statistical metadata generation and classification method. The technology uses concept extraction and compound term processing to facilitate access to unstructured information. The company operates from Stevenage in Hertsfordshire. A list of the Concept Searching offices is here.

The company emphasizes the value of lateral thinking, and its approach to content analysis implements numerical recipes to find these insights and linkages within unstructured text.

When I updated my profile for this company earlier this year, I noted that the firm had signed Portal Solutions, a company that focuses on things Microsoft. The idea is to make it possible for a user to search for “insider dealing” and retrieve documents where that bound phrase does not appear but a related phrase such as “insider trading” does appear. This type of system appeals to intelligence officers and financial analysts. Concept Searching’s methods generated lists of related topics. You can see an example of the system in action by navigating to this page. I ran several test queries and the interface provided useful information and suggestions about other related content in the processed corpus. A screen shot of the output appears below:

concept hmso

Concept Searching is a Microsoft and Fast Search partner. The idea is that Concept Searching’s technology complements and in some cases extends the search and content processing services in Microsoft products. In May 2009, the company sponsored a best practices site for Microsoft SharePoint. The deal involves a number of companies, including ShemaLogic, KnowlegeLake, and K2 Technologies among others. The site is supposed to go live in the next couple of weeks, but I don’t have a url or a date at this time.

The company had a busy May, signing deals with Allianz Global Investors, Directory, and AT&T Government Solutions.

For me, the most interesting system that Concept Searching offers is its ability to generate and classify terms found in SharePoint documents into a taxonomy. The company has prepared a brief video that demonstrates this functionality. You can find the video here. The company’s approach does not require a separate index. Microsoft Enterprise Search can use the outputs of the Concept Searching system. I noted two “uniques” in the narrative to the video, and I remain skeptical about categorical affirmatives. I think the bound phrase extraction and the close integration with SharePoint are benefits. I just bristle when I hear “unique”, which means the one and only anywhere in the world. Broad assertion in my experience.

concept searching block diagram

Concept Searching’s president, Martin Garland, said here:

Our intellectual property is still unique as we are the only statistical search technology able to indentify multi-word patterns within text and insert these patterns directly into the index at ingestion or creation time. We call this “Compound Term Processing”.

Last week I sat in a briefing given by one of Microsoft’s enterprise search team. I thought I heard descriptions of functions that struck me as quite similar to those performed by Concept Search and such companies as Interse in Copenhagen, Denmark.

I think it will be fruitful to watch what features and functions are baked into the upcoming Microsoft Fast ESP version of the old Fast Search & Transfer system. Remember: the roots of Fast Search stretch deep to 1997, a year before Google poked its nose from the Stanford baby crib.

Partners like Concept Searching have invested significant resources in Microsoft technologies. Will Microsoft respect these investments, or will Microsoft in an effort to recoup is $1.23 billion investment take a hard line toward such companies as Concept Searching.

I am on the fence regarding this issue.

Stephen Arnold, July 3, 2009

Search Sucks: A Mini Case

June 30, 2009

I listened occasionally to the Gillmor Gang when it was available on iTunes. I noticed that the program disappeared, and I lost track of it. My RSS reader snagged a story about a verbal shoot out between the one man TV network Leo LaPorte and one of the participants in the Gilmore Gang. To make a long and somewhat confused story short, the show disappeared. I figured this would be a good topic to use to test Bing.com and Google.com. My premise was that neither service would be indexing the type of information about flaps in the wobbly world of real time content on the rich media Web.

I ran the query Gilmore Gang on Google and finally found a link to a story published on June 13, 2009, called “Hanging on for Dear Life.” The problem with the Google results was that the top rated links were just plain wrong in terms of answering my query. Granted I used a two word query and I was purposely testing the Google system to see if it was sufficiently “smart” to figure out that I wanted current and accurate information. Well, in my opinion, it was like a promising student who stayed up late and did not do his home work. Here is the result list Google generated for me on June 28, 2009:

google hits gilmor

The result I wanted I found using other tools.

Read more

Google and Image Recognition

June 29, 2009

Not content with sophisticated image compression, Google continues to press forward in image recognition. Face recognition surfaced about a year ago. You can get some background about that home-grown technology in “Identifying Images Using Face Recognition”, US2008/0130960, filed in December 2006. The company has  long history of interest in non text objects. If you are not familiar with Larry Page’s invention “Method for Searching Media” US2004/0122811 was filed in 2003.

app of face recogniton

Source: Neven Technologies, 2006

The catalyst for the missing link between auto identified and processed images and assigning meaningful tags to images such as “animal” or “automobile” arrived via Google’s purchase of Neven Vision (originally I think the company used the “Eyematic” name. The switch seems to have taken place in 2003 or 2004.)

At that time, All Business described the company in this way:

Neven Vision purchased Eyematic’s assets in July 2003. Dr. Hartmut Neven, one of the world’s leading machine vision experts, led the technical team that created the original Eyematic system. Dr. Neven is also developing groundbreaking “next generation” face and object recognition technologies at USC’s Information Sciences Institute (ISI).

Google snagged with the acquisition the Eyematic patent documents. These make interesting reading, and I direct your attention to “Face Recognition from Video Images”, US6301370, which seems to be part of the Neven technology suite. The US patent document is – ah, somewhat disjointed.

Mixing Picasa, home grown technology, and the image recognition technology from Neven, Google had the ingredients for tackling a tough problem in content processing; namely, answering the question, “What’s that a picture of?”

Google provided some information in June 2009. A summary of Google’s image initiative appeared in Silicon.com, which published “Google Gets a New Vision When It Comes to Pictures”. (Silicon.com points to CNet.com which originally ran the story.) Tom Krazit reported:

Google thinks it has made a breakthrough in “computer vision”. Imagine stumbling upon a picture of a beautiful landscape filled with ancient ruins, one you didn’t recognize at first glance while searching for holiday destinations online. Google has developed a way to let a person provide Google with the URL for that image and search a database of more than 40 million geotagged photos to match that image to verified landmarks, giving you a destination for that next trip. The project is still very much in the research stage, said Jay Yagnik, Google’s head of computer vision research.

For me the key point in the Silicon.com story was that Google used its “big data” approach to making headway in image recognition. When matched to technology evolving from the FERET program, Google can disrupt a potentially lucrative sector for some big government integration firms.  The idea is that with lots of data, Google’s “smart software” can figure out what an image is about. Tapping Google’s clustering technology, Google’s Picasa image collection has been processed engineers to assign meaningful semantic tags to digital objects that don’t contain text.

Read more

Arnold at NFAIS: Google Books, Scholar, and Good Enough

June 26, 2009

Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this Web log. I find that it is easier to put some of my work in a Web log than it is to create a PDF and post that version of a presentation on my main Web site, www.arnoldit.com. I have skipped the “who I am” part of the talk and jump into the core of the presentation.

Stephen Arnold, June 26, 2009

In the past, epics were a popular form of entertainment. Most of you have read the Iliad, possibly Beowulf, and some Gilgamesh. One convention is that these complex literary constructs begin in the middle or what my grade school teacher call “In media res.”

That’s how I want to begin my comments about Google’s scanning project – an epic — usually referred to as Google Books. Then I want to go back to the beginning of the story and then jump ahead to what is happening now. I will close with several observations about the future. I don’t work for Google, and my efforts to get Google to comment on topics are ignored. I am not an attorney, so my remarks have zero legal foundation. And I am not a publisher. I write studies about information retrieval. To make matters even more suspect, I do my work from rural Kentucky. From that remote location, I note the Amazon is concerned about Google Books, probably because Google seeks to enter the eBook sector. This story is good enough; that is, in a project so large, so sweeping perfection is not possible. Pages are skewed. Insects scanned. Coverage is hit and miss. But what other outfit is prepared to spend to scan books?

Let’s begin in the heat of the battle. Google is fighting a number things. Google finds itself under scrutiny from publishers and authors. These are the entities with whom Google signed a “truce” of sorts regarding the scanning of books. Increasingly libraries have begun to express concern that Google may not be doing the type of preservation job to keep the source materials in a suitable form for scholars. Regulators have taken an interest in the matter because of the publicity swirling around a number of complicated business and legal issues.

These issues threaten Google with several new challenges.

Since its founding in 1998, Google has enjoyed what I would call positive relationships with users, stakeholders, and most of its constituents. The Google Books’ matter is now creating what I would describe as “rising tension”. If the tension escalates, a series of battles can erupt in the legal arena. As you know, battle is risky when two heroes face off in a sword fight. Fighting in a legal arena is in some ways more risky and more dangerous.

Second, the friction of these battles can distract Google from other business activities. Google, as some commentators, including myself in Google: The Digital Gutenberg may be vulnerable to new types of information challenges. One example is Google’s absence from the real time indexing sector where Facebook, Twitter, Scoopler.com, and even Microsoft seem to be outpacing Google. Distractions like the Google Books matter could exclude Google from an important new opportunity.

Finally, Google’s approach to its projects is notable because the scope of the project makes it hard for most people to comprehend. Scanning books takes exabytes of storage. Converting images to ASCII, transforming the text (that is, adding structure tags), and then indexing the content takes a staggering amount of computing resources.

image

Inputs to outputs, an idea that was shaped between 1999 to 2001. © Stephen E. Arnold, 2009

Google has been measured and slow in its approach. The company works with large libraries, provides copies of the scanned material to its partners, and has tried to keep moving forward. Microsoft and Yahoo, database publishers, the Library of Congress, and most libraries have ceded the scanning of books work to Google.

Now Google finds itself having to juggle a large number of balls.

Now let’s go back in time.

I have noticed that most analysts peg Google Books’s project as starting right before the initial public offering in 2004. That’s not what my research has revealed. Google’s interest in scanning the contents of books reaches back to 2000.

In fact, an analysis of Google’s patent documents and technical papers for the period from 1998 to 2003 reveals that the company had explored knowledge bases, content transformation, and mashing up information from a variety of sources. In addition, the company had examined various security methods, including methods to prevent certain material from being easily copied or repurposed.

The idea, which I described in my The Google Legacy (which I wrote in 2003 and 2004 with publication in early 2005) was to gather a range of information, process that information using mathematical methods in order to produce useful outputs like search results for users and generate information about the information. The word given to describe value added indexing is metadata. I prefer the less common but more accurate term meta indexing.

Read more

Lucid Meet Up: Open Source Search Draws Crowd

June 23, 2009

I was in San Francisco the day of the open source Lucene meet up sponsored by Lucid Imagination. The New Idea Engineering Web log wrote a useful summary of what transpired. You can find “Impressions of First Lucene / Solr Meet Up” on the Enterprise Search Blog. Keep in mind that the founders of the Enterprise Search Blog liked the study “Successful Enterprise Search Management” Martin White and I wrote. People who like what I do may have unusual tolerance for addled geese. You have been warned.

I noted the upside and downside of a technical meet up, but I wanted to know more. I chased down David Fishman, one of the spark plugs for Lucid Imagination. You can read an interview with one of the founders of  Lucid Imagination, Marc Krellenstein, in the ArnoldIT.com “Search Wizards Speak” series.

I came away from my discussion with Mr. Fishman more than a little impressed. Some of the items that remained pinned to my brain’s search bulletin board warrant sharing.

First, open source is hot. Few information technology professionals want to go to a meeting about search without first hand information about Apache Lucene (http://lucene.apache.org/) and Solr.

Second, Lucid Imagination (www.lucidimagination.com) is gaining traction with its industrial strength approach to the open source search technology that promises relief from the seven figure licensing fees imposed by some of the high profile search and retrieval vendors.

The meet up brought together almost 50 engineers and programmers on June 3. Featured speakers included Grant Ingersoll, of Lucid Imagination, and of the Apache Lucene project development team, as well as Erik Hatcher, author of Lucene in Action, of the Apache Lucene project development team, and with Ingersoll, a co-founder of Lucid Imagination. Jason Rutherglen and Jake Mannix of Linked-In talked about how they’ve implemented search at the core of their cutting edge social network. Other speakers talked about a wide range of deep search questions, from numeric search, aka Trie Range queries. Avi Rappoport, a search consultant, talked about the approach to “stop words” — encouraging search application developers not to ignore words like “the”, “in”, and the like given the power of today’s compute resources to deal with such nuances.

Back to Lucid: Grant Ingersoll’s talk focused on innovations in Solr 1.4, the forthcoming release of the search platform built around the Lucene Search engine. While there are a good number of important new features, including Trie-range queries for better searching of numeric data, and advanced replication and better logging for improved scalability and deployment, that’s just the latest in a string of enterprise grade innovations that the open source community has rolled together, closing the gap with many, if not most, of the meaningful technology features of commercial enterprise search software. Erik Hatcher spoke about a new search engine for search developers (http://search.lucidimagination.com) that Lucid sponsors for the community, using Lucene and Solr technology to plow through the abundant discussions and technical info created over the years — providing faster troubleshooting and education than programmers could get before.

There were three takeaways from the meeting, according to David Fishman, who does marketing for Lucid Imagination. The breadth and depth of the search problem set means that it’s not going to be solved by one company or one set of people; the active, engaged open source community is constantly adding and innovating new features, putting them through their paces, and pushing the frontier faster than any single company could.

The technology upon which open source search rests is as good or maybe better than some of the commercial products’ code base. Many hands and many eyes mean that the gotchas hiding in some of the high profile brands’ products are not going to jump out and bite an administrator.

That demand is real: innovative companies, as different as IBM, Zappos, Netflix, Linked In, Digg, AOL, MySpace, Apple, Comcast Interactive and more — all these have built mission critical search services at the core of their business using this technology. The people who came to this meet up, and one just like it two weeks earlier in Reston Virginia (http://www.meetup.com/NOVA-Lucene-Solr-Meetup/) are part of that rapidly accelerating adoption curve, since there’s no need to call a salesperson or schedule a demo to get started — the community lowers the barriers to experimentation and participation.

Not least important is what wasn’t covered, said Fishman. Innovation is half the battle; the other, reliability. As Mark Bennett observed on his blog , this meet up was not the crowd that keeps datacenter and IT managers sleeping soundly through the night. Commercial grade reliability comes from a commercial-grade company with the expertise to help get it working and keep it working. And having talked to the Lucid Imagination team, they not only “get” search. They “get” service level agreements. That’ may be one reason why they’re in the business of offering commercial grade support for these technologies.

To sum up, what strikes me as new is that Lucid’s pool of engineers is available to help — many of them, the same engineers who help write the code and manage the innovations with the Apache Lucene community. What the IT guys get by working with Lucid is the combination of innovation with peace of mind and better control of customization and maintenance.

My hunch is that a company with a search system is going to invest in professional services for  support no matter what search solution you deploy. Even if open source makes it easy to get search, it takes expertise to get search right.

If I know Marc Krellenstein, the Lucid Imagination team will be able to deliver that expertise at competitive rates. Certainly, the range of companies represented suggest that open source search is moving toward center stage.

Can open source search gain traction in the enterprise? The answer: In some organizations, the answer is, “Yes.”
Open source search is here and Lucene/Solr promises to push beyond simple search and retrieval.

Stephen Arnold, June 23, 2009

A Glimpse of the Google Collaborative Plumbing

June 19, 2009

On June 18, 2009, the ever efficient US Patent & Trademark Office published  US2009/0157608, “Online Content Collaboration Model”, a patent document filed by the Google in December 2007. With Wave in demo mode, I found this document quite interesting. Your mileage may vary because you may see patent documents as flukes, hot air, or legal eagle acrobatics. I am not in concert with that type of thinking, but if you are, navigate to one of the Twitter search engines. That content will be more comfortable.

The inventors were two Googlers, William Strathearn and Michael McNally, neither identified as part of the Australian team responsible for Wave. I like to build little family trees of Googlers who work on certain projects. Mr. Strathearn seems to have worked on the Knol team, which works on collaboration and knowledge sharing. Mr. McNally, another member of the Knol team, and he has written a Knol about himself which is at this time (June 19, 2009) online as a unit of knowledge.

The two Googlers wrote:

A collaborative editing model for online content is described. A set of suggested edits to a version of the online content is received from multiple users. Each suggested edit in the set relates to the same version. The set of suggested edits is provided to an authorized editor, who is visually notified of differences between the version of the content and the suggested edits and conflicts existing between two or more suggested edits. Input is received from the editor resolving conflicts and accepting or rejecting suggested edits in the set. The first version of the content is modified accordingly to generate a second version of the content. Suggested edits from the set that were not accepted nor rejected and are not in conflict with the second version are carried over and can remain pending with respect to the second version.

What’s happening is that the basic editorial system for Knol and other Google products gets visual cues, enhanced work flow, and some versioning moxie.

knol collaboration

Figure 2 from US2009/0157608

Is this a big deal? Well, I think that some of the big content management players will be interested in Google’s methodical enhancement of its primitive CMS tools. I also think that those thinking of Wave as a method for organizing communications related to a project might find these systems and methods suggestive as well.

Read more

CNN: The Coming Cost Cataclysm

June 8, 2009

I found myself in Atlanta, stranded because of modern air travel. What to do with a few spare hours? The Atlanta Dot Net Web site had one suggestion. Tour CNN Headquarters. I navigated to this link and read here:

Ever wondered what the inside of a news studio looks like? Take the Inside CNN Studio Tour in Atlanta and view for yourself. Guests can take a 50-minute CNN studio tour featuring the Control Room Theater, Special Effects studio and Interactive News Desk section.

As a senior, I qualified for a $12 admission. My impressions:

  1. The CNN studios in Atlanta occupied a building that once housed an amusement park. The cavernous atrium was a reminder of wasted money. The area sucked energy, heat in the winter and A/C in the summer. I tried to calculate the cost per square foot but I got a headache and the tour guide did not know how to respond to my question, “What is the total cubic feet of this atrium?” He smiled a lot and pointed out that CNN was the first 24 hour video news outlet.
  2. There were a lot of people in the usable space in the gargantuan structure. There were security guards at every stairwell. There were security guards at the metal detector which I set off thus triggering a pat down. I had no contraband, and I did enjoy the frisk, quite up close and personal.
  3. The guide pointed out that 20 percent of the staff were engaged in information technology. He pointed out cameras that were run from a control room, obviating the need for a human to keep the red eye in front of the talent. There were dozens of people performing work flow functions like research, writing, editing, and directing. The talent read stories that floated in front of their eyes so “eye contact was intimate”.

Stepping back after the tour, I reflected on my impressions and the three observations I summarized in the dot points above. I thought about the Google Wave technology. At some point in the future, I envisioned moving the CNN news process to the Wave system. I also thought about the one person television network that Leo LaPorte has built in Petaluma, California. I thought about the number of people on the tour who took pictures and made videos with mobile phones. I thought about the billboard ad I saw whilst riding Atlanta’s truncated mass transit system for high speed wireless networks. I through about the young man on the tour who sent SMS messages to his pals who were apparently interested in what he had to say about the inner sanctum of CNN.

Bottom-line: CNN is on track for a cost cataclysm. In my opinion, software can reduce the friction in the CNN process. By pushing news down to those with mobile devices and out to the fringes of civilization, a software based company can offer good enough video news without the punishing cost burden CNN as well as Bloomberg and Thomson Reuters must bear.

image

CNN sits on a San Andreas fault of costs. The earthquake can come at any time.

If this analysis sounds familiar, it is the same theme that has been running through some of the business commentaries about the problems traditional newspapers face. Upstarts using technology have sucked ad revenue and content from the custodial embrace of traditional publishing companies. The result has been a divorce of information methods and revenue. The traditional approach finds itself bleeding from many tiny wounds which sap its ability to leapfrog from where the organizations are today and where they have to be tomorrow.

The young people whom I know (few in number and quite strange to me) love video info. In fact, I note with some horror the dependence Google has upon videos to explain complex processes. I think the trend is locked in because when one writes, there is a formalism imposed. Even an addled goose like me has to plan what’s up. With a video, the rhetoric is that of the demo, a conversation, or a YouTube.com “insider” video. If the message is garbled, just do another video. Easy and without boundaries. The approach is just right for those decades younger than I.

How does this create trouble for a “too big too fail” television news operation?

Read more

MarkLogic: The Shift Beyond Search

June 5, 2009

Editor’s note: I gave a talk at a recent user group meeting. My actual remarks were extemporaneous, but I did prepare a narrative from which I derived my speech. I am reproducing my notes so I don’t lose track of the examples. I did not mention specific company names. The Successful Enterprise Search Management (SESM) reference is to the new study Martin White and I wrote for Galatea, a publishing company in the UK. MarkLogic paid me to show up and deliver a talk, and the addled goose wishes other companies would turn to Harrod’s Creek for similar enlightenment. MarkLogic is an interesting company because it goes “beyond search”. The firm addresses the thorny problem of information architecture. Once that issue is confronted, search, reports, repurposing, and other information transformations becomes much more useful to users. If you have comments or corrections to my opinions, use the comments feature for this Web log. The talk was given in early May 2009, and the Tyra Banks’s example is now a bit stale. Keep in mind this is my working draft, not my final talk.

Introduction

Thank you for inviting me to be at this conference. My topic is “Multi-Dimensional Content: Enabling Opportunities and Revenue.” A shorter title would be repurposing content to save and make money from information. That’s an important topic today. I want to make a reference to real time information, present two brief cases I researched, offer some observations, and then take questions.

Let me begin with a summary of an event that took place in Manhattan less than a month ago.

Real Time Information

America’s Top Model wanted to add some zest to their popular television reality program. The idea was to hold an audition for short models, not the lanky male and female prototypes with whom we are familiar.

The short models gathered in front of a hotel on Central Park South. In a matter of minutes, the crowd began to grow. A police cruiser stopped and the two officers were watching a full fledged mêlée in progress. Complete with swinging shoulder bags, spike heels, and hair spray. Every combatant was 5 feet six inches taller or below.

The officers called for the SWAT team but the police were caught by surprise.

I learned in the course of the nine months research for the new study written by Martin White (a UK based information governance expert) and myself that a number of police and intelligence groups have embraced one of MarkLogic’s systems to prevent this type of surprise.

Real-time information flows from Twitter, Facebook, and other services are, at their core, publishing methods. The messages may be brief, less than 140 characters or about 12 to 14 words, but they pack a wallop.

image

MarkLogic’s slicing and dicing capabilities open new revenue opportunities.

Here’s a screenshot of the product about which we heard quite positive comments. This is MarkMail, and it makes it possible to take content from real-time systems such as mail and messaging, process them, and use that information to create opportunities.

Intelligence professionals use the slicing and dicing capabilities to generate intelligence that can save lives and reduce to some extent the type of reactive situation in which the NYPD found itself with the short models disturbance.

Financial services and consulting firms can use MarkMail to produce high value knowledge products for their clients. Publishing companies may have similar opportunities to produce high grade materials from high volume, low quality source material.

Read more

Search Archaeology

May 30, 2009

I find it amusing to look at articles about search, content processing and text mining. Perhaps I am tired or just confused. The past to me stretches back to cards with holes and wire rods and to the original NASA RECON system. For Computer Active, the past stretches all the way back to Lycos. You may find this revisionist view of history interesting. Click here to read “Bunch of Fives: Forgotten Search Engines.”

Let me comment of the five search engines, adding a bit of addled goose color to the authors’ view of search:

  • Cuil.com. Cuil is a product of a Googler (Anna Patterson), her husband, and some other wizards. The company had a connection to Google. Dr. Patterson’s patents are still stumbling out of the USPTO with Google as an assignee. Xift, Dr. Patterson’s search system, was not mentioned in Computer Active. It was important for its semantic method and it exposed Dr. Patterson to the Alta Vista team. Alta Vista played some role in Google’s rise to success and its current plumbing. Cuil has improved, and I thought I saw a result set including some Google content before the system became publicly available. I use Cuil.com, and I am not sure if “forgotten” is a good word for it or its technology.
  • MSN Live. I have lost count of Microsoft’s search systems. Microsoft search initiatives have moved through many iterations. The important point for me is that Microsoft is persistent. The search technology is an amalgamation of home grown, licensed, purchased, and reworked components. The search journey for Microsoft is not yet over. Bing is a demo. The rebuild of Fast as a SharePoint product is now in demo stage but not yet free of its Web and Linux roots. More to come on this front and, believe me, Microsoft search is not forgotten by Google or others in the search business.
  • Alta Vista. Yep, big deal. The reason is that Alta Vista provided the Googlers with a pool of experienced and motivated talent. The job switch from the hopelessly confused Hewlett Packard to the freewheeling Google was an easy one. Alta Vista persists today, and I still use the service for certain types of queries. What’s interesting is that Alta Vista may have been one of the greatest influences on both Google and Microsoft. Again. Not forgotten.
  • Lycos. We sold our Point system to Lycos, so I have some insight into that company’s system. The key point for me is that Fuzzy and his fellow band of coders from Carnegie Mellon sparked the interest in more timely and comprehensive Web search. Lycos was important at a sparkplug, but the company was among the first to add some important index update features and expanded snippets for each hit. Lycos has had a number of owners, but I won’t forget it. When we sold Point to the outfit, the check cleared the bank. That I will remember along with the fact that architectural issues hobbled the system just as the Excite Architext system was slowed. These are search as portal examples today.
  • Ask Jeeves. I can’t forget. One of the first Ask Jeeves execs used to work at Ziff. I followed the company’s efforts to create query templates that allowed the system to recognize a question and then deliver an answer. The company was among the first to bill this approach “natural language” but it wasn’t. Ask Jeeves was a look up service and it relied on humans to find answers to certain questions. Ask.com is the descendent of Ask Jeeves’ clunky technology, but the system today is a supported by ace entrepreneur Barry Diller who, like Steve Ballmer, is persistent. The key point about Ask Jeeves is that it marketed old technology with a flashy and misleading buzzword “natural language”. Marketers of search systems today practice this type of misnaming as a standard practice. Who can forget this when a system is described one way and then operates quite another.

Enjoy revisionism. Much easier in a Twitter- and Facebook-centric world with a swelling bulge of under 40 experts, mavens, and pundits. These systems failed in some ways and succeeded in others. I remember each. I still use each, just not frequently.

Stephen Arnold, May 31, 2009

Enkia: Early Player in Smart Search

May 26, 2009

Last week, I received a call from a defrocked MBA looking for work. (No surprise that!) The young wizard wanted to know about Enkia, a spin out of Georgia Tech’s incubator program in the late 1990s. If you poke around Web traffic reports, you see a surge for Enkia in year 2000 and then a flat line. In November 2008, a person sent this Twitter message that plopped into my tracking system: “Enkia is alive.” I told the job hunter that I would poke through my search archives to see what information I had. I will be in Atlanta in June, and I will try to swing by the company’s office at 85 Fifth Street in Atlanta to see what’s shakin’. (The last time I tried this approach the TeezIR folks kept the door locked. Big addled geese are often not welcome. Gee, maybe it’s because the addled geese don’t believe the chunks of marketing food tossed at them by vendors.)

The Company

According to an August 2000 article here, the company was

building the foundation of the Intelligent Internet(TM) based on the latest discoveries in cognitive science and artificial intelligence. Enkia’s middleware products overcome the limitations of current Internet search technology by sensing what a browser or shopper wants and recommending information quickly and automatically. This software enables portal providers to create personalized experiences that encourage return site visits and increased sales. Founded in 1998, Enkia is a member of the Advanced Technology Development Center (ATDC), the Georgia Institute of Technology’s high-tech business incubator.

What It Does

Enkia, the name of a Sumerian god with special brain power, was an early entrant in the “artificial intelligence for the Web movement”. If you have been following the exploits of Google, Microsoft, and Yahoo, the notion of smart software is with us today. The marketing verbiage is different, but the notion is the same as it was for Enkia.

Here’s a description from a year 2000 business journal story:

The software [Dr. Ashwin Ram and his students developed, called Enkion, has a type of ESP, if you will, sensing browsers’ needs by what they click. Enkion builds on techniques of artificial intelligence to model the human mind. The technology automatically recommends relevant information so that users don’t have to wade through hundreds of search results.

The company put a demo online, and I had a screen shot of the service. I thought I had results screen shots, but my memory deteriorates more quickly than the value of a US government Treasury note.

image

Screen shot of the Enkia Search Orbit interface, no longer available.

When the service rolled out, Dr. Ram said here:

“EnkiaGuide helps anyone find their ‘needles’ in haystacks of data on and off the Internet,” Dr. Ram adds. “It can help users find their way through technical support libraries or large e-commerce sites, and allow corporations to organize pathways through their large proprietary databases. The EnkiaGuide can make sense out of information chaos.”

The Technology

In my archive, I had a copy of an older white paper which is still available online as of May 25, 2009, here:

The IRIA architecture builds upon and extends the experience-based agent approach by embedding it in a knowledge discovery and presentation engine using techniques from artificial intelligence and machine learning. Crushing demands on resources limit the amount of “smarts” typical web search engines can apply to any particular information resource requests.  IRIA’s design overcomes this problem by leveraging existing search engines for the brute force work of indexing and searching the web and by focusing its “smarts” on modeling and understanding the efforts of an individual or workgroup. The core of IRIA that makes this understanding possible is its reminding engine.  The reminding engine directly applies the experience-based agent approach to the problem of information search, consisting of a context-sensitive search mediator which uses a unified semantic knowledge base called a knowledge map to represent indexed pages, queries, and even browsing sessions in a single format.  This uniform representation enables the development of an experience-based map of available information resources, along with judgments about their relevance, allowing precise searches based on the history of research for an individual, group or online community.  The knowledge map is furthermore a browsable information resource in its own right, accessible by standard internetworking protocols; with appropriate security precautions, this enables workgroups at remote sites to view and exploit information collected by another workgroup.

Read more

Next Page »