Thunderstone Software LLC
An Interview with John Turnbull
Thunderstone has been in the search-and-retreival business since 1981. In human and Internet years, that's a long time. Over the years, I kept bumping into search professionals who spoke highly of this company. Matt Koll, founder of Personal Library Software hinted that he licensed the Thunderstone "stemmer" for his ground breaking desktop search product. Then, I learned that Thunderstone was one of the technology vendorshelping the start up eBay deliver fast, useful search. |
Thunderstone "blue boxes" turned up several years before the Google Search Appliance. One colleague told me that Thunderstone "invented the search appliance category".
The president and tech guru at Thunderstone in a poised, soft-spoken Englishman. Now transplanted from the "sceptred isle" to Cleveland, Ohio, whose motto is "the forest city". John Turnbull agreed to talk with me in a deli located in Cuyahoga County. Cleveland, like Mr. Turnbull, was surprisingly urbane in March. The snow was light and the temperature hovered around zero degrees centigrade. Several times during our conversation, the sun made a brief appearance.
So, how do you like the weather in Cleveland? Does it remind you of "merrie olde England"?
Some. Cleveland's a dynamic city. In some ways, the weather here is not that different from the UK and Switzerland where I grew up. The principal difference is that Cleveland does get a bit more snow. It's quite nice today actually.
What's Thunderstone's core business?
People are quite surprised to learn that Thunderstone Software LLC pioneered concept-based searching, real-time searching, and simultaneous searching of both structured and unstructured data in the 1980's.
We have continued to develop our search technology, and we think that despite being in Cleveland, our system is one of the most powerful, scalable and flexible enterprise search solutions available today.
Where did the name "Thunderstone" come from? Most search vendors try for tech sounding names or use Greek or Latin to give their systems some cachet. Is your company named after the rock band?
No, Thunderstone is the name given to rocks that have been weathered into a donut shape. The Pacific Islanders believed they were created by a god blasting the hole with lightning. We've always been more focused on solving the customer's problem rather than the academic or theoretical possibilities. Thunderstone connotes our pragmatic approach to search.
I know Case Western Reserve is here. Kent State is not too far away. But why Cleveland for a search and content processing company? Why aren't you in Silicon Valley where the weather is almost as nice as Cleveland's?
The original founders--Kathy and Michael Pincus--are from this area. And you are right about Case as it is now known. It's an excellent institution. The city's reputation is for manufacturing, but engineering is a major part of the city's heritage. We're here. Booz, Allen & Hamilton has a major technology presence here. Hyland Software, the makers of OnBase, are not too far from our offices. The benefit for us is that young engineers who are interested in text retrieval and content processing make themselves known to us. If we were located in Palo Alto or Seattle, there would be much greater competition for technical specialist with the types of skills we need.
Where's Thunderstone's emphasis in 2008? Licensing tools? Appliances? On-premises search systems?
We have a strong presence in each of these search businesses. Some companies license our technology. We are quite respectful of our clients' requests for confidentiality. I'm not comfortable identifying our licensees and OEM [original equipment manufacturing] customers. I can say that our search systems are in the Department of Defense, nuclear power generation plants, and law firms where Thunderstone technology is being used for litigation support, among other organizations.
We are quite active in the search appliance sector. The on-premises licensing of our Texis product is quite strong. We see a great deal of interest in both our appliance solution and our on-premises search and content processing solution.
What's the product line up today?
Texis is our high-powered search platform that licensees can employ to build complex search applications. Texis offers NLP [natural language search], parametric search, and the ability to index non-text objects like video. Texis has application programming interfaces. Our customers embed, integrate, and customize Texis into many different enterprise applications. These range from BI [business intelligence] to litigation support.
We offer Webinator, which is a more out-of-the-box productization of Texis (our flagship engine) for Web search only. You can see a demonstration of Webinator on the Thunderstone Web site. We also have our search appliance, also built off Texis.
I recall seeing the Thunderstone logo on eBay. Did you provide search technology to that company?
Yes, they licensed our software before they went public. We were live on their site within a week of them first contacting us, and we had our logo there. Eventually they removed the logo. We were the only commercial search product able to handle their load. They eventually hired some search engineers and created their own engine from scratch. The idea, I believe, was to "be more in control of their destiny". We wish them well as eBay works through some challenges in their business.
Let's talk about the Thunderstone search appliance. What's your box deliver to the customer?
Our appliance processes content and supports full-text key word searches. It's designed to be flexible both in terms of how data are acquired and how the search is performed. The appliance can be tuned a customer's specific needs.
We also have a parametric search appliance. It offers a parametric search system.
Parametric means structured search with field tags and values?
Yes, that's correct. A licensee can configure the appliance to allow users to filter on up to 50 data fields. The system processes unstructured text quite effectively. And, the system can with no programming make structured data available to a person looking for information.
When the results come back to the user, these results can be sorted and grouped by any defined attributes in the data. If there's field or metatag for "Revenue", then the results can be sliced and diced by that tag. No programming required.
Any of the available tags or fields can be used to provide navigation links. Some vendors call this "assisted navigation", but I prefer to describe this function as point-and-click navigation.
You were one of the first search vendors to offer an appliance solution, correct?
No, not the first. We've been in the search appliance business since January 2003. One of our OEM [original equipment manufacturer] customers did create an appliance early on with our software. But that OEM customer was not successful in marketing the product. Then, the Google Search Appliance came out.
We learned that it's performance was in the 60 queries per minute range. And we decided that we could do better. Our appliance launched about five years ago with performance in the 1,000 queries/minute range. We've kept a performance edge as well.
Google has been putting some marketing muscle into their GSA as the company calls its search appliance. How has Google affected Thunderstone's appliance?
That's a good question. In the last 18 to 24 months, we've seen an increase in inquiries about our Thunderstone appliance. Google's sub-$2,000 price for a basic appliance helps people understand what a search appliance can do. Once the Google customer tries to customize the GSA, some find that the customization -- particularly the tuning of relevance -- is not possible. When that realization takes hold, we think these GSA customers start looking for an appliance solution that offers the benefits of the appliance plus our configuration options. We also know that the interest in searching structured data is on the rise too. Our appliance and the Texis SQL engine make that possible without the time-consuming programming that some search solutions require.
We also see our DataLoad API attracting attention as well. This API allows data to be populated from any source, and a wide variety of connectors have been developed for many common enterprise data sources.
In addition, our appliance can directly access existing databases. And built-in, advanced extraction tools can extract data from Web pages or files that already exist. These features are "baked into" our appliance.
Keep in mind that our API uses SQL-like syntax. There's not much of a learning curve for developers who know some SQL.
In short, the GSA has been good for our appliance business because GSA users have to create these functions because Google does not provide them. I'm not sure about how other appliance vendors see Google. We want them to keep building the market for search solutions that are easy to deploy, maintain, customize, and configure.
Can you give me an example of your appliance in action?
Sure, I can't mention any clients' names, of course. But let's say an employee needs to search for a particular product-item description across the entire enterprise. Keyword, full-text search will certainly provide a good list of results from a variety of documents and content sources. But what if you were only interested in the item's occurrence within a specific document type (say, a purchase order), and one that was issued after a certain date and by a specific purchasing agent?
Keyword, full-text search simply cannot provide this context for the information you're pursuing.
I know you've heard me say that I have a list of more than 150 vendors licensing behind-the-firewall search technology. What sets Thunderstone apart?
We've been in the search and retrieval business for more than 25 years. I would speculate that most of the companies on your list have been in search for less time. We've accumulated technology and expertise while retaining our focus on delivering systems that work.
A major part of our success is our long-standing commitment to creating flexible tools that work efficiently with both structured and unstructured information. We don't want to be a consulting firm. We want our systems to work without the build-from-scratch effort that many vendors impose on their customers.
Can you give me an example?
Yes. Most information technology professionals are familiar with SQL [structured query language]. If an engineer doesn't learn SQL in the first year at university, he or she will learn it in their first job. SQL is extremely useful for many tasks. We are SQL-based. A developer who knows SQL can use that knowledge base to get our search system running exactly the way he or she wants it to perform. We also have built tools to allow a licensee to do rapid prototyping. You can see examples of the code for certain Texis functions on our Web site. We don't hide this information because we think our customers want to know how to extend Texis. [The code samples are here.]
What differentiates the on premises Thunderstone from the Thunderstone in the appliance?
The core Texis engine is identical between the appliance and what customers can license. The main difference is that the appliance customers do not have access to the application to modify it, although we do have OEM customers creating custom appliances based off of our standard one.
Texis and Webinator customers can have access to almost everything the appliance can do, but can extend it, or for Texis customers do something radically different.
Keep in mind that our products have an English language vocabulary of 250,000 word and phrase concept associations for natural language queries. Our technology permits proximity searching, fuzzy searching, regex [regular expression] searching, and searches like "three hundred thousand" which the system knows is "300,000".
I compared your appliance with the offerings from Planet Technology, Google, and other vendors' solutions. I found that Thunderstone was fast and feature rich. What other characteristics of this product did you and your team refine for the your appliance?
We believe that different customers will need different solutions, so we have tried to build as much flexibility into the product as possible, while still keeping it an appliance From a non-technical view we have developed pricing and licensing terms that match with our customer expectations, while still providing outstanding technical support.
What are new features and functions added since the last point release of your product?
We have expanded the number of immediately available connectors for the appliance to 10. Our Parametric Search Appliance is new, adding the ability to search more structured data in an appliance, including geographic data, along with sorting and grouping on arbitrary fields. And we recently added the ability to search images into our products.
Does Texis support concept searching?
Yes, I mentioned our knowledge base. We also have Metamorph. This is our concept-based retrieval function at the heart of Texis.
What's the engineering "secret sauce" of the Thunderstone solution?
Having been around as long as we have we still have a bias towards getting the most out of the hardware, as back in 1981 there wasn't much choice. Another part is the Midwest engineering approach of making sure what we do is both practical and useful so we keep innovating without being distracted by the "cool" ideas.
With people talking about social search and semantic wikis, mobile search, and other trendy stuff, what do you see as the next big things in search?
I think there's a growing realization that there are many reasons for search, and that the most appropriate results will depend on what the user is doing at that time. This includes explicit search, where the user asks a question, and implicit search where results are available based on what the user is doing, as well as if the user is asking for specific information, trying to retrieve a specific document, looking for background information or something else.
Would you share your thoughts on the consolidation in the search sector? Will Thunderstone go public? Sell to a larger player?
The consolidation is to be expected as the search market is exploding, and new companies appear, and the larger players keep up with technology. Thunderstone has historically been a privately held company, but I can't comment on the future.
ArnoldIT Comment
With its roots in SQL and the company's focus on practical search solutions, Thunderstone has demonstrated that it has a formula for success. The most recent version of Texis delivers NLP, classification, and assisted navigation without losing sight of performance. The company's APIs allow licensees with a knowledge of SQL to extend and customize the Texis system. Thunderstone's appliance delivers high-velocity content processing and numerous configuration options at a competitive price. Thunderstone -- despite its commitment to Cleveland -- delivers top-notch search technology and no-hassle technical support.
Stephen E. Arnold, March 24, 2008