Teragram: SAS’s Search Launchpad

March 20, 2008

This week SAS announced that it purchased Teragram, a content processing company with deep roots in, computer science, mathematics, and blue – chip clients. If you poke around Teragram’s Web site, you learn that the company supports double byte languages. If I read the Teragram information correctly, this little-known outfit not far from Harvard Yard has proprietary technology strongly suggestive of the super – sophisticated techniques in use at IBM, Google, Microsoft, and Yahoo.

The Teragram system can match other systems advanced functions like advanced function — NLP (natural language processing)? Automatic summarization? No problem. Hosted services option? Check. Autonomy – Recommind type patten matching? Done. Attensity and Bitext style linguistic analysis? Covered. Teragram has a warehouse chock full of search and content processing goodies.

Now SAS owns this “search tech” tool box.

Teragram, founded in 1997, was a privately-held content processing company in Cambridge, Massachusetts. Two wizards — both from Luxembourg — have applied their computer science and mathematical expertise to unstructured information for more than a decade. That’s a long time in the fast-moving search and text processing sector.

I learned about Teragram when someone told me that the company was a technology provider to Fast Search & Transfer SA. Fast Search’s Dr. John Lervik is a canny technologist, and he has a good nose for solid technology.

A couple of years ago, I did some poking around, eventually talking with one of the company’s founders (Dr. Yves Schabes). I tried tried to connect with Dr. Emmanuel Roche (co – founder of the company). Happily Dr. Schabes was forthcoming, and I learned that Teragram had snared some high – profile clients. These included America Online and the New York Times, among others. I also learned that EasyAsk (a parametric search system now owned by Progress Software) was a Teragram customer. Teragram is one of those high – tech companies like Thunderstone in Cleveland, Ohio, that licensees its technology to other firms and stays out of the spot lights.

Technology

The “guts” of Teragram are rules. I know this is a great oversimplification, but you can do more reading about the company on its Web site. You will find that Teragram builds knowledge bases, crafts mathematical routines, and linguistic procedures within the Teragramcomponents. But what struck me as interesting is the company’s use of what I call “nested rules”. Some rules are very specific; others are fuzzy and soft. The idea is that linguistic and semantic analyses of text take place. Different rules come into play to figure out what the text “means”, what items can be tagged, etc.

I may be off base, but I think Teragram’s technique for “unfolding” its rules is one of the most distinctive aspects of Teragram’s appraoch. When you license a Teragram system, you get software components that perform such functions as:

  • On-the-fly document classification
  • A question answering component called “Direct Answers”
  • Entity extraction
  • A “box” of components you can use to tune relevance
  • A taxonomy management function.

teragram_21

The screen shot above, from the ArnoldIT.com archives, shows Teragram’s taxonomy interface. The term HIPAA in the left hand panel appears with suggested terms in the right – hand panel. I don’t have a date for this screen shot, and it is possible that the 2008 interface has changed. The point is that an editor can double – check automatic indexing and intervene if necessary. © Teragram 2006.

What Teragram’s technology allows a licensee to construct, customize, and operate is a full-blown behind-the-firewall search system or build a specific widget to repair a broken component in another vendor’s system. Teragram enables key word indexing, and most of the advanced text processing features associated with higher-profile search vendors.

You can enhance a Teragram content processing system to that users receive alerts “pushed” to them when information of interest to each user becomes available.A few days ago, I wrote about Arikus, a Canadian vendor of search technology. As you may know, Arikus (like Oracle, I might add) provides taxonomies and word lists with its system. Teragram provides these knowledge bases as well.

What Can SAS Now Do?

Based on my sources, I have learned that SAS plans to operate for now Teragram as a stand – alone unit. This has three benefits:

First, it keeps the existing Teragram customers calm and “on the ranch”. Acquisition ignites a poaching gene in some competitors. For now and the foreseeable future, Teragram will conduct its business as it has in the past. Obviously, the financial, marketing, and technical resources of the profitable business analytics giant in Cary, North Carolina, will be warmly received by the Teragram wizards.

Second, SAS has time to understand more fully what can and cannot be done with the Teragram technology. My hunch is that SAS will look at the search and content processing functions first. The reason is that some SAS licensees are now forced to get behind-the-firewall search from other vendors. Teragram allows SAS to offer a comprehensive search-and-retrieval system to its existing customers.

Third, at some point in the future, SAS will determine whether to continue licensing the Inxight Software content processing tools or move to Teragram’s tools.

My sources tell me that SAS has two or three years remaining on its Inxight license. But Business Objects (a competitor of SAS’s) bought Inxight, then SAP (the German software superplatform) bought Business Objects. Common sense suggests that SAS will want to take steps to control its own text processing destiny. Ergo, at some point, Inxight functions will begin to fade into the background and Teragram technology will move to center stage. This type of shift in large organizations takes years.

The answer to the question, “What can SAS do now?” is clear.SAS will begin to compete against Autonomy, Endeca, Microsoft / Fast Search, Oracle, and the more than 150 other vendors in the behind-the-firewall search sector.ObservationsA number of points warrant brief comment.You can dig through the recycled SAS news release on other Web logs or in the traditional media. I want to flip over a couple of smaller stones to see if there’s some useful information that others have ignored in the rush to “cover this story”.

First, Teragram’s approach is modular. The rules require some care and feeding. Both of these characteristics are almost more important than the specifics of the Teragram technology. SAS’s business model generates revenue from modular components assembled to meet a customer’s data mining, business intelligence, and analytic requirements. Teragram, therefore, enhances the existing SAS approach to its business. A fully automated system like Google’s to name one would be somewhat at odds with the SAS “way”.

Second, SAS is moving from data mining to text processing for a reason. That reason is that customers want to extract actionable information from text. ClearForest (whose technology has a strong “rules” orientation) has made a name for itself crunching through automobile warranty and repair data. SAS wants this type of business and believes that Teragram’s technology will help it generate new revenues from customers who want to process any information for high – value insights.

Third, SAS sniffs money in the behind-the-firewall search sector. Trusted vendors offering a search and content processing “add on” to other SAS modules makes sense to many organizations. After all, SAS customers have been trained to program using SAS tools. These customers believe in the SAS technology. Licensing a behind-the-firewall search system is not a “new” buy. The deal is an add – on.

Wrap Up

To wrap up this essay, I think that SAS’s moving from data mining to behind-the-firewall search is healthy for the search sector. Niche players are going to find themselves competing with established enterprise system vendors, not a five-person outfit funded by a university’s tech transfer office. I am looking forward to the changes this deal and others sure to come will occur in behind-the-firewall search. Let me know if you agree or disagree.

Stephen Arnold, March 21, 2008

Comments

One Response to “Teragram: SAS’s Search Launchpad”

  1. The Disappearing Middle: Liposuction for High-Profile Search Vendors : Beyond Search on April 5th, 2008 12:46 pm

    […] apace in this sector. Check my Web log posting about SAS Institute’s plans for Teragram here. My instinct suggests there’s money to be made betting on these horses in the search […]

  • Archives

  • Recent Posts

  • Meta