Vivisimo’s Remix

January 29, 2008

I’ve been interested in Vivisimo since I learned about the company in 2000. Disclaimer: my son worked for Vivisimo for several years, and I was involved in evaluating the technology for the U.S. Federal government. A new function, called “Remix“, caught my attention and triggered this essay.

Background

Carnegie Mellon University ranks among the top five or six leading universities in computer science. Lycos was a product of the legendary Fuzzy and his team. Disclaimer: my partner (Chris Kitze) and I sold search technology to Lycos in the mid-1990s. Dr. David Evans has practiced his brand of innovation with several successful search-centric start ups, including a chunk of the technology now used in JustSystems‘ XML engine. (Disclaimer: I have done some work for JustSystems in Tokyo, Japan.) Vivisimo, founded by Raul Valdes-Perez and Jerome Pesenti, was among the first of the value-added processing search systems. I have been paying attention to Vivisimo for more than a decade.

I’ve been impressed with Vivisimo’s innovations, and I have appropriated Mr. Valdes-Perez’s coinage, “information overlook” in my verbal arsenal. As I understand the term, “overlook” is a way for a person looking for information is a way to get a broader view of the information in the results list. I think of it in terms of standing on a bluff and being able to see the lay of the land. As obvious as an overlook may be, it is a surprisingly difficult problem in information retrieval. You’ve heard the expression “We can’t tell the forest from the trees,”. Information overlook attempts to get the viewer into a helicopter. From that vantage point, it’s easier to see the bigger picture.

A Demonstration Query

Vivisimo’s technology has kept that problem squarely in focus. With each iteration and incremental adjustment to the Vivisimo technology, overlook has been baked in to the Vivisimo approach to search-and-retrieval. Here’s an example.

Navigate to Clusty.com, Vivisimo’s public facing search system. Note that Clusty is a metasearch system. Your query is passed to other search systems such as Live.com and Yahoo. The results are retrieved and processed before you see them. Now enter the query ArnoldIT. You will see a main results page and a list of folders in the left hand column of your screen. You can browse the main results. Note that Vivisimo removes the duplicates for you, so you are looking at unique items. Now scan the folder names.

Those names represent the main categories or topics in that query’s result list. For ArnoldIT, you can see that my Web site has information about patents, international search, and so on. Let me highlight several points about the foundation of Vivisimo:

First, I’ve been impressed with Vivisimo’s on-the-fly clustering. It’s fast, unobtrusive, and a very useful way to get a view of what topics occur in a query’s result set. I use Vivisimo when I begin a research project to help me understand what topics can be researched via the Web and which will require the use of analysts making telephone calls.

Second, in the early days of online, deduplication was impossible. Dialog and Orbit, two of the earliest online systems, manipulated fielded flat files. A field name variation make it computationally expensive to recurse through records to identify and remove duplicate entries. When I was paying for results from commercial online sysetms, these duplicates cost me money. When I learned about Vivisimo’s duplicate detection function, I looked at it closely. No one at Vivisimo would give me the details of the approach, but it worked and still works well. Other systems have introduced deduplication, but Vivisimo made this critical function a must-have.

Third, Vivisimo’s implementation of metasearch remains speedy. There are a number of interesting approaches to metasearch, including the little-known ez2Find.com system developed by a brother and sister team working in the south of France. I also admire the Devilfinder search engine that is now one of the faster metasearch systems available. But in terms of features, Vivisimo ranks at the top of the list, easily outperforming ixquick, Dogpile, and other very useful tools.

Fourth, like Exalead, Vivisimo has been engineered using the Linux tricks of low-cost scaling and clustering for high performance. These engineering approaches are becoming widely known, but many of these innovations originated at Stanford, Uniersity of Waterloo, MIT, and Carnegie Mellon University.

The Shift to the Enterprise

Three years ago, Vivisimo made the decision to expand its presence in organizations. In effect, the company wanted to move from a specialist provider of clustering technology to delivering behind-the-firewall search. When Vivisimo’s management told me about this new direction, I explained that the market for behind-the-firewall search was a contentious, confused sector. Success would require more marketing, more sales professionals, and a tougher hide. Mr. Valdes-Peres looked at me and said, “No problem. We’re going to do it.”

The company’s first high-profile win was the contract for indexing the U.S. Federal government’s unclassified content. This contract was originally held by Inktomi in 2000 to 2001. Then Fast Search & Transfer with its partner AT&T held the contract from 2001 to 2005. When Vivisimo displaced Fast Search’s technology, the company was in a position to pursue other high-profile search deals.

Today, Vivisimo is one of the up-and-coming vendors of behind-the-firewall search solutions. I have learned that the company has just won another major search deal. I’m not able to reveal the name of the new client, but the organization touches the scientific and technical community worldwide. Based on my understanding of the information to be processed, Vivisimo will be making the research work of most US scientists and engineers more productive.

Remix

This essay is a direct result of my learning about a new Vivisimo function, Remix. You can use the remix function when you have a result set visible in your Clusty.com results display. In our earlier sample query, ArnoldIT, you see the top 10 topics or clusters of results for that query. When you select Remix, the system, according to Vivismo, “With a single click, remix clustering answers the question: What other, subtler topics are there? It works by clustering again the same search results, but with an added input: ignore the topics that the user just saw. Typically, the user will then see new major topics that didn’t quite make the final cut at the last round, but may still be interesting.”

The function is important for three reasons:

First, Vivisimo has made drill down easy. Some systems perform a similar function, but the user is not always aware of what’s happened or where the result list originated. Vivisimo does a good job of keeping the user in control and aware of his / her location in the results review sequence.

Second, Remix allows one-click access to categories that otherwise would not be seen by the Clusty user. The benefit of Remix is that the result sets do not duplicate any topics the user saw before clicking the Remix button. Just as Vivisimo’s original deduplication function worked invisibly, so does Remix. The function just happens.

Third, the function is speedy. Vivisimo has a number of innovations in its system to make on-the-fly processing of search results take place without latency–the annoying delays some systems impose upon me. Vivisimo’s value-added processing occurs almost immediately. Like Google, Vivisimo has focused on delivering fast response time and rocket science for the busy professional.

Some Challenges

Companies like Vivisimo will have to deal with the marketing challenges of today’s search-and-retireval marketplace. The noise created by Microsoft’s acquisition of Fast Search and Endeca‘s injection of cash from Intel and SAP means that interesting companies like Vivisimo have to make themselves known. I don’t envy the companies trying to get traction is the search sector.

If you are looking for a behind-the-firewall system, you will want to take a look at Vivisimo’s system. In fact, you will want to spend additional time reviewing the search solutions available from the up-and-comers I profile in my new study “Beyond Search”, due out in April 2008. You will find that you can deliver a robust solution without the teeth-ratting licensing fees required by some of the higher-profile vendors.

I can’t say that any one search system will be better for you than another. In fact, when you compare ISYS Search Software, Siderean Software, and Exalead with Vivisimo, you may find that each is an exceptionally robust solution. Which system you find is best for you comes down to your requirements. The key point is that the up-and-coming systems must not be excluded from your short list because the companies are not making headlines on a daily basis.

If you have the impression that Vivisimo is not up to an enterprise-scale content processing job, you have flawed information. Give Vivisimo’s technology a test drive. Judge for yourself. I wrote about Vivisimo in the first, second, and third editions of The Enterprise Search Report. I won’t be repeating that information in Beyond Search. You can explore Vivisimo and learn more about the system from the company’s useful white papers and case studies.

Stephen E. Arnold, January 29, 2008

Comments

3 Responses to “Vivisimo’s Remix”

  1. Chris Tackett on January 29th, 2008 1:37 pm

    I found your site on technorati and read a few of your other posts. Keep up the good work. I just added your RSS feed to my Google News Reader. Looking forward to reading more from you.

    Chris Tackett

  2. Admin Hut» Blog Archive » Vivisimo’s Remix on January 29th, 2008 3:22 pm

    […] Original post by Beyond Search Share and Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages. […]

  3. John on January 29th, 2008 4:46 pm

    Caught your post from a Google alert on Vivisimo. Nice to hear your comments. Thinking of adding remix to our presentation.

  • Archives

  • Recent Posts

  • Meta