Merging of Lucene Solr Reported

December 17, 2010

A reader sent me a link to “Lucene and Solr Development Merged.” We are working to track down the details, but I wanted to capture the news item. In addition to the development merger, the write up references Riak Search. Here is the passage that caught my attention:

With merged dev, there is now a single set of committers across both projects. Everyone in both communities can now drive releases – so when Solr releases, Lucene will also release – easing concerns about releasing Solr on a development version of Lucene. So now, Solr will always be on the latest trunk version of Lucene and code can be easily shared between projects – Lucene will likely benefit from Analyzers and QueryParsers that were only available to Solr users in the past. Lucene will also benefit from greater test coverage, as now you can make a single change in Lucene and run tests for both projects – getting immediate feedback on the change by testing an application that extensively uses the Lucene libraries. Both projects will also gain from a wider development community, as this change will foster more cross pollination between Lucene and Solr devs (now just Lucene/Solr devs).

Riak Search is described in “Riak 0.13, Featuring Riak Search” and “Riak Search and Riak Full Text Indexing”.

The primary information appears on the Riak Web site in a Web page titled “Riak Search.”

Riak Search uses Lucene and features “a Solr like API on top.” According to the Basho blog’s article “Riak 0.13 Released”:

At a very high level, Search works like this: when a bucket in Riak has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search. You can then find and retrieve your Riak objects using the objects’ values. The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak MapReduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.

The story “Riak 0.13 Released” provides additional information, including explicit links to download Riak 0.13 and Riak Search for a variety of platforms.

At first glance, Riak Search makes search and retrieval available to NoSQL data stores like the Basho Riak open source scalable data store.

A number of questions require some further data collection and consideration:

  1. Will other NoSQL implementations “bundle” or “snap in” a search component?
  2. What are the technical considerations of this approach to search in NoSQL data stores?
  3. Are there any performance or scaling issues to consider?

The blending of the Lucene Solr merging story with the Riak Search information caught us by surprise. Time to flip through the Rolodex to see whom we can call for more information. If a reader has additional insight on these two items, please, use the comments section of the blog to make the information available to the other two readers of Beyond Search.

We did a bit of sleuthing and wanted to pass along that Riak may be using some of the Lucene/Solr analyzers. One view is that the indexing and search code may not be Lucene based. The implication is that scaling and performance may be an issue. Faceting and group may also be an issue. Without digging too deeply into the innards of Riak Search, we suggest you do some testing on a suitable data set or corpus.

We located some information about Solr as NoSQL. You can find that information on the Lucid Imagination Web site at this link.

Stephen E Arnold, December 17, 2010

Freebie

Comments

4 Responses to “Merging of Lucene Solr Reported”

  1. Seth Grimes on December 17th, 2010 9:35 am

    Steve, the Lucene-Solr merger move started early this year. I found a link: http://search-lucene.com/m/jISUj1CXObA . Riak (which I hadn’t known about) is a user, not core to L-S, no?

    P.S. Do you owe me $1 now, since I asked a question?

  2. Otis Gospodnetic on December 17th, 2010 10:46 pm

    Steve,

    Like Seth pointed out (using our handy search-lucene.com – go Seth!), this is “old news” now. The good news is things are working well after the merge. Riak Search exposes an API compatible with Solr’s, but my understanding is it doesn’t actually use Lucene for the actual search – it’s a Lucene-like reimplementation in Erlang.

    People and companies are already working on merging large distributed data stores (whether KV stores, document DBs, column-oriented DBs or …) and search. See Lucandra & Solandra (see http://blog.sematext.com/2010/02/09/lucandra-a-cassandra-based-lucene-backend/ ), HSearch (doesn’t use Lucene for search, just Lucene’s analysis module), and a few more.

  3. Alex Popescu on December 20th, 2010 2:27 pm

    Hi Stephen,

    The original link for “Lucene and Solr Development Merged” is still on myNoSQL 🙂 http://nosql.mypopescu.com/post/2335352160/lucene-and-solr-development-merged

    The one you got is just someone republishing without permission the content from my blog plus a couple more.

    Now regarding your questions: so far in the NoSQL land there have been a couple of different approaches:

    1. people trying to create very simple reversed indexes stored in the same NoSQL engine

    Except Lucandra, I’m not aware of any of these getting too far.

    2. using a 3rd party indexing/search solution. The 3 most referenced are: Lucene, Solr, and ElasticSearch.

  4. Sunil Guttula on December 24th, 2010 1:14 pm

    We have developed HSearch, a search engine on HBase. The index design is our own to take advantage of the scale offered by Hbase. For more info please visit the website (http://bizosyshsearch.sourceforge.net) or drop us a note.

  • Archives

  • Recent Posts

  • Meta