Lucene Revolution Preview: Otis Gospodnetic, Sematext
July 13, 2010
The Lucene Revolution Conference is shaping up. Among the presenters are open source developers representing a wide range of organizations. One of the speakers is Otis Gospodnetic, Sematext’s founder. Mr. Gospodnetic is also the author of Lucene in Action with co-authors Erik Hatcher and Michael McCandless. His firm implements open source search, natural language processing, and text analytics technology in the enterprise. His team focuses on the design and development of scalable, high-performance search and solutions.
I spoke with Mr. Gospodnetic earlier this week. Here are the highlights of our conversation:
Why are you interested in Lucene/Solr?
I’ve always been interested in information gathering, information extraction, search, and related areas. I’m think that’s because I feel that information gathering, extraction, and searching are precursors for gaining knowledge, and knowledge has always been a hobby of mine. If I look back at all my professional experience, everything I ever built had a strong search component. This is why I was happy when I stumbled upon Lucene around 2000 and why I immediately joined the project, even before it was an Apache project, and why I’ve been using Lucene ever since.
What is your take on the community aspect of Lucene/Solr?
Community around Lucene and Solr is as real and as alive and active as it can be. It’s very knowledgeable and quick to help. I’ve been a part of it for around 10 years now, and have witnessed the community grow, as well as its knowledge breadth and depth increase.
When it comes to Lucene/Solr community, the quote I like to give comes from the former Netflix search guy:
I posted, went to get a sandwich, and came back to see two answers. The change works, and I can get the fix into production today. This list is magic.
Both user and development communities are so strong and active that it’s becoming really hard for people to keep up with the volume of output these communities produce. Earlier this year we started publishing monthly Lucene and Solr Digest blog posts. These posts are for people who want to keep up with (or keep an eye on) Lucene and Solr, but don’t have the time to read some 60+ non-trivial-to-read email messages these communities produce every day. See http://blog.sematext.com/ or http://twitter.com/sematext . I hope we are not going through the trouble of getting this published every month just because of some mythical community!
Commercial companies are playing what I call the “open source card.” Won’t that confuse people?
Judging from the demand, I’d say this is not confusing to people. On the contrary, I get the feeling they like the open-source/commercial blend. Plus, there is precedent – commercial support for open-source software has been around for many years now: MySQL, Red Hat have been doing this for years. Not only is this not confusing, it is welcomed. Some people and organizations love and can rely on the community support. Others prefer paid support. At Sematext we do both – some of us participate on Lucene/Solr mailing lists helping as much as we can via that channel. We also publish the already mentioned monthly Lucene and Solr Digest that summarize the new and interesting developments from those two projects, and we offer paid tech support and other types of services for Lucene, Solr, Hadoop, and other related technologies.
What are the primary benefits of using Lucene/Solr?
Let me highlight the points my work has driven home as pivotal.
First, there is the notion of TCO or total cost of ownership. TCO is *much* lower. There are no license fees, no
limitations about the index size, query rates, number of servers, etc.
Second, Lucene/Solr offer flexibility. If you don’t like how something works in Lucene/Solr, you can change it today and deploy it tomorrow. If your use case is good, the community will adopt it and you won’t have to maintain your customized, forked Lucene/Solr version.
Third, quality. Lucene and Solr are mature. They’ve been worked on by many smart people 24/7 around the world for more than 10 years. These people work on Lucene/Solr because that is their passion, not because they are paid to do so, except for the lucky few who also get paid to work on what they love. Lucene and Solr can do a lot – they have lots of features, they are reliable, they are still being worked on and are improved on a daily basis.
And, finally, agility: You need search? You can have something working today. You don’t have to go through budget approvals, through long sales and negotiation cycles, you don’t have to go through wine and dine dates that just create delays that ultimately increase your costs.
When someone asks you why you don’t use a commercial search solution, what do you tell them?
I tell them to wake up. It’s 2010. There are alternatives. Cheaper. Faster. Better. I tell them to read the answers to the previous questions. When I see how much some (all?) of the commercial search solutions cost and I compare that to what we at Sematext can do for a customer for that sort of money… I recently happened to see a quote from one well-known commercial search vendor and my jaw dropped. Well, not really, because I know they charge an arm and the leg for their software, but when you think about how many kids you can put through college for that kind of money.
Let me also quote something that came up recently in a thread titled “Arguments in Favor of Lucene over Commercial Competition”.
In my initial foray into Lucene several years ago, by the time I’d sent a support request to the vendor of a commercial product and received an answer telling me that I hadn’t included the
correct license info and I’d have to provide it before they could talk to me, I’d found Lucene, downloaded it, indexed some of our data and run searches against it. Not to mention that rather than waiting for days to get a response from the commercial vendor, my questions on the Lucene user’s list were answered within a very few hours. With grace and tolerance for my ignorance.
How do people reach you?
Sematext is at http://sematext.com/ and that is the best way to reach the professional me. Our blog and the Digest posts mentioned earlier are at http://blog.sematext.com/ . We are also at http://twitter.com/sematext if you prefer us in 140 char bites.
Will you elaborate on these points in your Lucene Revolution lecture?
Absolutely. Looking forward to the conference and hearing the great speakers. I understand Cisco is giving a talk too.
Stephen E Arnold, July 13, 2010
Post sponsored by Lucid Imagination and the Lucene Revolution Conference.