Open Source Search Run Down

October 25, 2010

Open Source Search with Lucene & Solr” provides a useful overview of information similar to that presented at the Lucene Revolution in Boston, October 7 and 8, 2010. I found the information useful. Even though I poked my head into most sessions and met a number of speakers, Igvita.com has assembled a number of useful factoids. Here’s a selection of four.

First, the Salesforce.com implementation of Lucene “consists of roughly 16 machines, which in turn contain may small and sharded Lucene indexes. Currently, [Salesforce.com] handles 4,000 queries per second (qps) and provides an incremental indexing model where the new user data is searchable within ~ three minutes.”

Second, iTunes is a Lucene user “said to be handling up to 800 queries per second.” I thought Apple was drinking Google Kool-Aid or was before the friction between the two companies entered into a marital separation without counseling.

Third, I found this description of Lucene/Solr interesting:

If Lucene is a low-level IR toolkit, then Solr is the fully-featured HTTP search server which wraps the Lucene library and adds a number of additional features: additional query parsers, HTTP caching, search faceting, highlighting, and many others. Best of all, once you bring up the Solr server, you can speak to it directly via REST XML/JSON API’s. No need to write any Java code or use Java clients to access your Lucene indexes. Solr and Lucene began as independent projects, but just this past year both teams have decided to merge their efforts – all around, great news for both communities. If you haven’t already, definitely take Solr for a spin.

Finally, this passage opened my eyes to some interesting opportunities.

Instead of running Lucene or Solr in standalone mode, both are also easily integrated within other applications. For example, Lucandra is aiming to implement a distributed Lucene index directly on top of Cassandra. Jake Luciani, the lead developer of the project, has recently joined the Riptano team as a full-time developer, so do not be surprised if Cassandra will soon support a Lucene powered IR toolkit as one of its features! At the same time, Lily is aiming to transparently integrate Solr with HBase to allow for a much more flexible query and indexing model of your HBase datasets. Unlike Lucandra, Lily is not leveraging HBase as an index store (see HBasene for that), but runs standalone, albeit tightly integrated Solr servers for flexible indexing and query support.

Navigate to the Igvita Web site and get the full scoop, not a baby cup of goodness.

Stephen E Arnold, October 25, 2010

Freebie

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta