Lucene Revolution Preview: Otis Gospodnetic, Sematext

July 13, 2010

The Lucene Revolution Conference is shaping up. Among the presenters are open source developers representing a wide range of organizations. One of the speakers is Otis Gospodnetic, Sematext’s founder. Mr. Gospodnetic is also the author of Lucene in Action with co-authors Erik Hatcher and Michael McCandless. His firm implements open source search, natural language processing, and text analytics technology in the enterprise. His team focuses on the design and development of scalable, high-performance search and solutions.

I spoke with Mr. Gospodnetic earlier this week. Here are the highlights of our conversation:

Why are you interested in Lucene/Solr?

I’ve always been interested in information  gathering, information extraction, search, and related areas.  I’m think  that’s because I feel that information gathering, extraction, and  searching are precursors for gaining knowledge, and knowledge has always  been a hobby of mine. If I look back at all my professional experience,  everything I ever built had a strong search component.  This is why I  was happy when I stumbled upon Lucene around 2000 and why I immediately  joined the project, even before it was an Apache project, and why I’ve  been using Lucene ever since.

What is your take on the community aspect of Lucene/Solr?

Community around  Lucene and Solr is as real and as alive and active as it can be.  It’s  very knowledgeable and quick to help.  I’ve been a part of it for around  10 years now, and have witnessed the community grow, as well as its  knowledge breadth and depth increase.

When it comes to Lucene/Solr community,  the quote I like to give comes from the former Netflix search guy:

I posted,  went to get a sandwich, and came back to see two answers. The change  works, and I can get the fix into production today. This list is magic.

Both user and  development communities are so strong and active that it’s becoming  really hard for people to keep up with the volume of output these  communities produce.  Earlier this year we started publishing monthly  Lucene and Solr Digest blog posts.  These posts are for people who want  to keep up with (or keep an eye on) Lucene and Solr, but don’t have the  time to read some 60+ non-trivial-to-read email messages these  communities produce every day.  See http://blog.sematext.com/ or  http://twitter.com/sematext .  I hope we are not going through the  trouble of getting this published every month just because of some  mythical community!

Commercial companies are playing what I call the “open source card.”  Won’t that confuse people?

Judging from the demand, I’d say this is not  confusing to people.  On the contrary, I get the feeling they like the  open-source/commercial blend.  Plus, there is precedent – commercial  support for open-source software has been around for many years now:  MySQL, Red Hat have been doing this for years.  Not only is this not  confusing, it is welcomed.  Some people and organizations love and can  rely on the community support.  Others prefer paid support.  At Sematext  we do both – some of us participate on Lucene/Solr mailing lists  helping as much as we can via that channel.  We also publish the already  mentioned monthly Lucene and Solr Digest that summarize the new and  interesting developments from those two projects, and we offer paid tech  support and other types of services for Lucene, Solr, Hadoop, and other  related technologies.

What are the primary benefits of using  Lucene/Solr?

Let me highlight the points my work has driven home as pivotal.

First, there is the notion of TCO or total cost of ownership. TCO is *much* lower.  There are no license  fees, no
limitations about the index size, query rates, number of  servers, etc.

Second, Lucene/Solr offer flexibility. If you don’t like how something works in Lucene/Solr, you  can change it today and deploy it tomorrow.  If your use case is good,  the community will adopt it and you won’t have to maintain your  customized, forked Lucene/Solr version.

Third, quality. Lucene and Solr are mature.   They’ve been worked on by many smart people 24/7 around the world for  more than 10 years.  These people work on Lucene/Solr because that is  their passion, not because they are paid to do so, except for the lucky  few who also get paid to work on what they love.  Lucene and Solr can do  a lot – they have lots of features, they are reliable, they are still  being worked on and are improved on a daily basis.

And, finally, agility: You need  search?  You can have something working today.  You don’t have to go  through budget approvals, through long sales and negotiation cycles, you  don’t have to go through wine and dine dates that just create delays  that ultimately increase your costs.

When someone asks you why you don’t  use a commercial search solution, what do you tell them?

I tell them to wake  up.  It’s 2010.  There are alternatives.  Cheaper.  Faster.  Better.  I  tell them to read the answers to the previous questions.  When I see how  much some (all?) of the commercial search solutions cost and I compare  that to what we at Sematext can do for a customer for that sort of  money…  I recently happened to see a quote from one well-known  commercial search vendor and my jaw dropped.  Well, not really, because I  know they charge an arm and the leg for their software, but when you  think about how many kids you can put through college for that kind of  money.

Let me also quote  something that came up recently in a thread titled “Arguments in Favor  of Lucene over Commercial Competition”.

In my initial foray  into Lucene several years ago, by the time I’d sent a support request to  the vendor of a commercial product and received an answer telling me  that I hadn’t included the
correct license info and I’d have to provide  it before they could talk to me, I’d found Lucene, downloaded it,  indexed some of our data and run searches against it. Not to mention that rather than  waiting for days to get a response from the commercial vendor, my  questions on the Lucene user’s list were answered within a very few  hours.  With grace and tolerance for my ignorance.

How do people reach you?

Sematext is at  http://sematext.com/ and that is the best way to reach the professional  me. Our  blog and the Digest posts mentioned earlier are at http://blog.sematext.com/ . We are also at http://twitter.com/sematext if  you prefer us in 140 char bites.

Will you elaborate on these points in your Lucene Revolution lecture?

Absolutely. Looking forward to the conference and hearing the great speakers. I understand Cisco is giving a talk too.

Stephen E Arnold, July 13, 2010

Post sponsored by Lucid Imagination and the Lucene Revolution Conference.

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta