dtSearch: At a Crossroads
January 9, 2009
For years, vendors with a snap in solution to Microsoft SharePoint were like a soccer player with an unobstructed path to the goal. Microsoft kept out of the way. As long as the vendor had a way to “fix” the baked in SharePoint search system, the vendor had a shot at a sale. Often the only risk was that a competitor would block the shot. Life was good.
The Microsoft changed the rules for SharePoint search. First, the company gave away a baby search application. Next, it included a beefed up version in SharePoint but asked customers to license a separate server to get the system. Then, when that MOSS (Microsoft Office SharePoint Search) system bumped against an internal document limit, Microsoft bought Fast Search & Transfer SA for $1.23 billion.
Since the deal went through last year, Microsoft has not made substantive changes in the Fast Search technology. In fact, the changes have been in trade show swag (see illustration below), the cloud of silence dropped over the police action taken against the company in Norway for alleged wrong doing, and more aggressive sales tactics.
The only difference in booth handouts has been the addition of the phrase “A Microsoft Subsidiary.” Otherwise, same old, same old.
Here’s how the Microsoft sales tactics are alleged to work. I want to avoid talking about foam booth handouts and police investigations. These are matters for wiser geese than I. Microsoft visits with a customer with a clutch of Microsoft products and servers. The Microsoft sales professionals assert that the Microsoft solution is the optimal approach. The customer says, “I want to buy the Google Search Appliance or some other search system.” Microsoft sweetens the deal. The customer says, “I will use Microsoft and its search solution.” The idea is not new. Incentives are standard in search licensing. Microsoft now has a third party solution that is okay if properly resourced. Just make sure you know the implications of “resourced”.
What does this mean to vendors who are dependent on Microsoft’s lousy native search system to create a market?
That’s a good question. I had arranged to get the answers directly from the senior managers at dtSearch last year. Then without warning, dtSearch refused to participate in the Search Wizards Speak series. You can find the list of wizards who have participated here. So far about 30 firms have participated. The holdouts are Google (no big surprise there. The company wants this goose killed and grilled.), Microsoft (on going investigation so no comment I was told), and dtSearch (Microsoft centric search vendor).
Without input from dtSearch, I decided to take a quick look at what the company’s products do and then ask myself some questions about the firm’s reluctance to talk after agreeing to my terms. Brutal are these terms. I submit written questions, prepare a draft, and allow the company to review my final write and suggest changes. So far, the interview subjects have been reasonably happy. One outfit–Paris-based Exalead–reprinted the interview which was subsequently picked up and reproduced in publications as far way from rural Kentucky as Japan.
I remain curious about the impact of the Microsoft Fast tie up on companies like dtSearch. These are smaller firms who could easily be crushed beneath the wheels of the giant Microsoft marketing machine.
dtSearch
Based on the open source information available to me, dtSearch is a privately held company operating in Bethesda, Maryland. Not quite a suburb, Bethesda is home to a number of companies, including the super secret mapping unit of the US government.
The system can be used for publishing and searching database-driven Web sites, incorporation into information management applications, searching of technical documentation, incorporation into forensics applications, email filtering usage, and incorporation into a broad range of vertical-market applications (legal, medical, financial, recruiting and staffing, etc.)
The company advertises in dead tree publications. The firm’s search system is also heavily promoted by Programmers.com and its ecommerce site doing business as Programmers’s Paradise. You can get a full run down on the dtSearch prices by running the query “dtSearch” and clicking through the choices. The system reports that dtSearch begins at $189.99 for a Desktop with Spider v7.55 up to $2,375 for a three server version. Developer versions incur higher license fees, but these require what amounts to a custom price quote.
The features of the three-server versions are:
- The dtSearch Engine provides developer access to indexed, un-indexed, full-text and fielded data search options, including support for hundreds of international languages through Unicode.
- Index size has been expanded to one terabyte per index.
- The 64 bit developer system also provides developer access to dtSearch’s integrated file parser and file format support.
- File format support includes dtSearch’s WYSIWYG hit highlighted search display of Web-ready files. File format support includes proprietary built-in HTML converters for non-web-ready files (like OpenOffice and MS Office documents).
- Developer file format support also works in connection with distributed or federated searching, including integrated relevancy ranking and hit-highlighted display of local and remote content.
- The dtSearch Engine for Windows and Dot Net supports C++, Java and Dot Net. For example, the dtSearch Engine for Windows and Dot Net includes a choice of an ADO.NET API, a Java API, a C++ API and a COM API for indexing and searching SQL-type databases, along with associated BLOB data. The dtSearch Engine also supports search filters and other data classification options.
- A Dot Net Spider API makes the full dtSearch Spider functionality accessible to developers.
- The dtSearch Engine for Linux provides C++ and Java APIs to developers.
As I worked through the information on the Programmer’s Paradise Web site, I was struck by the lack of references to Windows and SharePoint.
The “Old” dtSearch
I dug through my goose nest of files and discovered that I had some information about dtSearch Version 7.2. I scanned that material, and here’s what I pulled from my files. Keep in mind that these are my items of information about dtSearch. These items are not authorized or endorsed by dtSearch.
First, the product made the leap from Windows only to Linux operating systems in 2005. That’s a plus. dtSearch was obviously planning for Microsoft’s wanting to get third party search vendor license money itself. And the products’ prices were essentially the same for Version 7.2. I find this interesting, if true. Many vendors are increasing their license fees because low prices don’t mean higher volume sales. So, with fewer deals, the license fees have to go up or the company must add professional services and make it look as if the license fee is going down while the total cost of search rises. dtSearch does not play this game. I wonder if low cost translates to strong revenue growth at a bargain price point. Since dtSearch won’t talk, I guess I will not know. My hunch is that revenue pressure is increasing for dtSearch and other low-price point vendors. But that’s my speculation.
dtSearch interface, version 7.2 Hit highlighting in a PDF shown. © dtSearch 2006.
Second, the features of the system struck me a year or so ago as reasonably good. I liked the hit highlighting which made it easy to locate my search term. I made a note that the handling of Adobe PDF files was not elegant. I compared ISYS Search Software’s approach and Coveo’s approach. Both were superior to dtSearch’s implementation. Since then, ISYS and Coveo have improved their handling of PDF files even more. My quick look at dtSearch’s current build suggested to me that not much has changed.
Third, for a lower cost search system, dtSearch included a number of features. I reacted positively to the notion of “adjustable fuzziness”. Misspellings in text generated from optical character recognition systems can be easily navigated with this feature. I like phonetic search as well. For example, I could search for Quaddafi or Kadafi using either spelling.
Relevance
Based on my look at the system, dtSearch can sort and re-sort searches by relevancy by the number of hits, file name, and the file date. dtSearch’s approach to relevance ranking uses proprietary algorithms that are used when the user enters a “plain English” or unstructured indexed search request. The algorithm converts the natural language query into a syntax that the dtSearch engine uses to run the query.
Automatic term weighting is based on the frequency and density of hits in the index to the content. For example, in the search request “Get me Sam’s memo on the 2008 NewCo takeover”, if 2008 appears in 3,000 files, and Sam appears in only two files, then Sam would get a much higher relevancy rating. One can think of this approach to relevance ranking as term weight derived from term frequency in the index. (If have mixed up myself, shoot me a correction via the comments to this Web log.)
dtSearch also includes variable term weighting options for both indexed and unindexed searches. There are two types of weighting available to the dtSearch licensee. First, there is positive term weighting. Positive weighting places extra emphasis on one or more words; for example, soup:8 or recipe:3. The developer can specify these weights. Second, negative term weighting can assign negative emphasis to one or more words: red or green or yellow:-7. The weighting function is assigned by the licensee from dtSearch’s administrative control panel.
In my view, this approach is not as sophisticated as some of the methods to which I have been exposed. But dtSearch’s approach may be just what you need to meet your needs.
Upside
dtSearch provides a solid search product at a very competitive price. The product maturity is evident in its administrative tools and very good performance. Indexing and query processing were very good. dtSearch can also provide a more speedy, stable, and useful search function for Microsoft SharePoint. Other benefits of the dtSearch approach include a good mix of functionality and customization options. With the support for Linux, dtSearch gains flexibility. dtSearch makes updates available at its Web site, a practice that Enterprise Search Report prefers to the “pushed” update approach:
- Support for natural language processing, fuzzy search, and external word lists.
- Fast performance when indexes are kept within the company’s guidelines for keeping indexes under a terabyte
Support for collections of documents. For large content sets, dtSearch can query two or more collections. This approach allows dtSearch to scale to handle very large indexing and search requirements - Close integration with Microsoft servers and security.
Semantic functions can be added via a deal between dtSearch and Bitext in Madrid, Spain. I like the Bitext system, by the way.
Possible Issues
The flip side is that dtSearch has strong Windows roots. dtSearch is a company that has a strong following among developers. dtSearch seems to have, based on my conversation with a handful of programmers, quite good visibility as an off-the-shelf search technology that is easy to integrate into other applications. Many of its advanced features require scripted or programmed customization, as opposed to browser-based, point-and-click interfaces found in many competing offerings. The downside of its visibility among the programmers is that most of the business generalists I asked about dtSearch had modest awareness of the dtSearch brand. Similarly, some of the advanced features require user training (e.g., entering custom values in a search box). In my opinion, dtSearch’s marketing seems to be lagging behind that of some of its rivals. Autonomy, for example, issues news every few days. I don’t expect dtSearch to command the brand impact of Google and its search appliance, but in the research I did for my 2008 study for the Gilbane Group in Boston, dtSearch was all but invisible. Accordingly, I did not include dtSearch in that study of 24 “hot” search vendors. Perhaps this was an error on my part, but my company inclusion reflected what my research unearthed?
Bottomline
My radar bleeps when search vendors won’t participate in the Search Wizards Speak feature. I ask myself, “What’s up?”
My radar lighting up. Silence triggers a blip.
In my opinion, dtSearch is a product at a very competitive price. One should consider it. The company has a number of licensees and developers. Furthermore, dtSearch can be extended by developers with knowledge of the Web Services and object oriented development environment. In my opinion, I think dtSearch assumes that a customer has developers or some technical acumen. For organizations that indeed have access to capable programmers and familiarity with Windows or Linux, dtSearch may be a good solution.
However, product selection teams are not likely to know about dtSearch unless a member of the team has roots in the development community. Its architecture offers some (but clearly not all) of the benefits of higher priced systems now available. Visibility, particularly when big boys like Microsoft are playing hard ball, matters.
So what’s the SharePoint squeeze mean to dtSearch? I don’t think it will have a significant impact on the company for now. Whether that’s a result of dtSearch’s low profile or the price point of the product or dtSearch’s target customer, I am not sure. dtSearch will probably weather the current financial storm which is more than I can assert about some search and content processing vendors.
And my radar is still blinking.
Stephen Arnold, January 9, 2009
Comments
One Response to “dtSearch: At a Crossroads”
I develop and sell a specialist, Microsoft based, web publishing system. We run it on about 300 servers across 150 customers. We use the very old MS Index Server for a our limited-quality integrated search facility. It is free, pre-installed, and does just about enough, after lots of massaging by us. However, this is being phased out by MS, with no practical free replacement. So what do I buy?
– The Open Source (Lucene) offerings are really only supported on Linux.
– The pre-FAST Sharepoint offerings a clumsy, SQL-based, poor API, poor documentation, and mostly require a Sharepoint! MS Search Server is rubbish.
– FAST was a possibility until MS bought them.
I really want more facilities than dtSearch can offer. I want intelligent keyword generation; I want automated ‘related document’, I want automated ‘did you mean’. But is there anything better I can afford? The trouble is, I don’t want to spend more than a few hundred quid per site. I could get a 500 server dtSearch licence for £30K.
The point is, there is a market for API-based search engines, and there is a gaping whole in the Microsoft-based arena, now Microsoft has abandoned developers and just wants to shift SharePoint.