An Interview with Charlie Hull
I made my way to Pipasha, an Indian restaurant, not far from Lemur Consulting's offices in Cambridge, England. It was a beach day; that is, about 50 degrees Fahrenheit with a damp mist. Ah, summer in England. Chaucer, familiar with these parts, fancied April, but here I was in June, cold and wet with no sweet showers, just cold mist and fog.
My trek from London had one goal: Find out how Lemur Consulting managed to rattle the likes of Autonomy and Endeca with its nifty implementation of a MyDeco.com, a furniture retailer's Web site.
When I first saw MyDeco's Web site, my thought was that the company was using one of the high-profile systems. I was wrong. A bit of Sherlock Holmesian sleuthing uncovered Lemur Consulting. I wrote a profile of the company in my Beyond Search Web log, and I wanted to meet the people who were working magic with an open source plus services business model. Therefore, to Cambridge I did go.
Lemur Consulting has been making waves in enterprise and Web site search with this model. Charlie Hull, Lemur wizard, met me for our question-and-answer session. The full text of the discussion appears below. I have been unable to reproduce the mango chutney I dribbled on my note pad, however.
What's a Lemur FLAX? Not a ferret-like creature that will attack me, I hope.
Lemur Consulting is the company name (a colleague spent some time in Madagascar and took many photos of lemurs, so we had a ready-made company emblem).
Flax is the enterprise search product. We've taken a stable, powerful open-source information retrieval library that's used around the world (called Xapian), and used the Python programming language to add many extra layers of functionality: web spiders, file format translators, scheduled indexers, template-based result pages and a web-page administration interface to tie it all together.
We often say (Flax = Xapian + Lemur) - the core technology plus our own many years of experience in the search space, working for customers as varied as the University of Cambridge and the UK's Newspaper Licensing Agency. We also had a hand in developing some earlier search engine techology such as Muscat and Smartlogik.
We looked at other open source search solutions; for example, Lucene, which is another open-source information retrieval library. Nutch is built on Lucene and is mainly for searching collections of web pages. SOLR is also built on Lucene and is more analagous to Flax, for enterprise search applications.
We concluded that Xapian offered us the basics to which we could add our enhancements, not least because while at a former employer, our Development Director led the team that built it.
How do you deal with the many options for "free" or low cost search?
There are a lot of options, I agree: this is partly why we developed Flax. Our clients don't necessarily want to know what technology our system uses or what language it's written in. They want to know about the features of the whole system. If you look at the IBM/Yahoo Omnifind offering for example, it's very hard to understand where it comes from, which company developed it and how it fits into IBM or Yahoo's suite of software: we try to get past all of that by just offering a search solution called Flax, with a simple name and a simple premise.
Well, FLAX is free in two ways. First, there is "free" as in "free beer". FLAX doesn't cost anything to download and there are no recurring license fees. It's also "free" as in "free speech" - the licensing allows anyone (not just us) to build on and improve the system, as the source code is provided.
However in most cases, customers will have to invest time and effort developing their search system, and that's where we can help.
How do you navigate between the Scylla and Charybdis of open source and commercial companies who will rip off what you do?
Actually the license that Flax uses, the GPL, prevents commercial companies from ripping off what we do. Under the GPL, you can only build Flax into other GPL-licensed software, and if you do so you have to provide the source code for this as well.
Of course, another company could set up to provide Flax services; but we know the system very well, we've built it from the ground up, and we're experts in all aspects of search.
We are not too far from Autonomy's Cambridge offices. How are you able to compete with companies like Autonomy and now Microsoft - Fast Search & Transfer, which also has a major research facility not far from here as well?
The technology that both Flax and Autonomy are based on - probabilistic Bayesian relevancy ranking - has been around since the 1970s, and lots of other companies use this.
Cambridge has a well established reputation in mathematics, so it is logical that companies would set up operations here. It's a lovely part of England and the access to top-notch minds is very good.
Autonomy gave us some good publicity recently. Have you seen the IET Knowledge Network write up called "Search: The Next Generation"?
No, will you send me the link? I will insert it in the text of the interview.
Of course, my point is that in that essay, '[Mike] Lynch [of Autonomy] sees an opening for open source among smaller companies "provided you like programming"' - so Autonomy must see some advantages in our business model! It's rather innacurate to say that our customers would have to like programming though, that's what we're here to help them with after all. Also, any enterprise search engine tends to need customisation to some degree, even Autonomy's IDOL.
What are the key strengths of the FLAX engine?
That's my favorite question. Let me highlight three points.
First, FLAX scales. Our marketing jargon is "scalability and power". The core technology was originally built to search a collection of 500 million Web pages, and scales easily to over four billion items. We've implemented indexes of 30-100 million items on a single standard server. It's also extremely fast to search a Flax database. We routinely see sub-second retrieval times.
Second, relevance. We're quite proud of our system's relevance and accuracy of results. We use the latest probabilistic models, which can be further tuned if required using clickthrough data and relevance feedback (users can help refine results in an iterative process), filtering by metadata or numeric range etc.
Third, openness and extensibility. Our customer has complete control and the complete source code is provided. The customer can verify the engine is doing what we say it does, and they can have confidence that no matter what happens to us the system can continue to be maintained and developed (this isn't something you can say about commercial systems, that often stagnate once their parent company is acquired or closes down).
I looked at the MyDeco site, and I liked its set up and responsiveness. It struck me as a clone of assisted navigation from Endeca, Fast, or Mercado. What did you do with regards to search?
MyDeco is an example of the type of collaboration we do with a customer. The owners of MyDeco had a good idea of what was needed to make a successful retail site. Our technical team worked with MyDeco's engineers to develop the search engine.
That's our standard approach--work closely with the client's technical team, including other consultants. We agree on the specifications of what they required and to implement them on time and on budget. MyDeco required some unique features (price restriction; search by dimension) which we worked with them to develop.
Your business model pivots on services, correct? Are customers receptive to this approach?
Yes, it seems so. Our view is that any enterprise search system will necessitate some degree of installation, integration or customisation - so a customer will always pay for services. However, with open-source you don't have to pay any license fees on top. In today's economic climate this cost saving is more and more important.
We've seen year-on-year growth of the business, as well as a dawning realisation that our open-source approach puts the control back in the hands of the customer - you don't have to take our word for it that the 'black box' of enterprise search is working, you have complete visibility and control over the search system.
What are the new features in the current release of FLAX? What value adds do you bring to the table?
We've recently added automatic spelling correction; dynamic autocompletion of search queries; thesaurus and synonym support; facets and tags; database replication support
Assume that you have a customer in Taiwan. How do you support that customer?
Yes, that's a good question. We do a lot of business internationally using the internet. We can log in remotely to a customer's server and install and configure Flax without ever visiting the customer physically.
And we provide web-based reporting systems that customers can use 24 hours a day to report issues or problems. We also have a worldwide community of developers to draw on to help with issues such as internationalisation and translation.
As you look ahead, what are some of the new features that you will be adding to your service offering?
We're looking at image search; further improvements to speed and scalability; more integration with content and document management systems; better support for Far Eastern and Middle Eastern languages.
Looking forward, what do you see as the major trends in search in the next nine to 12 months?
I know you have been quite vocal in your alerting people to the dissatisfaction users have with enterprise search systems.
We think that customers will begin to ask some very hard questions of their vendors: Why does it cost me so much more every time I add another 5,000 documents? Why can't my SQL or Oracle powered database do full-text search very well? Why don't we get relevant results when you say your technology "understands" our content? Why can't I see inside the "black box"? What's happened to our search engine now your company has been bought out?
Flax and other open-source solutions can offer a solid alternative to commercial systems. The open-source model is now accepted world-wide, but not so much yet in the area of enterprise search, and we believe the advantages of this model to customers are very clear.
We've even provided a free downloadable package - Flax Basic - for Windows systems. This can build searchable indexes of thousands of files in common formats, and uses a simplified version of the Flax interface. Try it, modify it, use it - and if you need help, call us.
We're confident that open source software coupled with a services model will gain momentum.
A number of companies are pursuing a similar approach to enterprise software. Lemur Consulting has a growing client list and deep technical resources. Organizations, particularly large ones, are conservative. Lemur's team has been able to win a number of head-to-head competitions with better known and larger vendors of search systems. One refreshing aspect of Lemur's approach is that it makes customer and technical support the responsibility of the company's managers. Ring the firm and you will be talking to one of Lemur's engineers. Service, not marketing baloney, separates Lemur from the hundreds of companies competing in the enterprise search market place.
Stephen E. Arnold, July 8, 2008