Location Aware Search via Lucene / Solr
January 19, 2010
I located an interesting and helpful post “Location Aware Search with Apache Lucene and Solr” on IBM’s developer works Web site. If you are not familiar with Developer Works you can get additional information by clicking this link. This is IBM’s resource for developers and IT professionals. If you want to search for an article about “location aware Lucene”, you can get a direct link to “Location Aware Search with Apache Lucene and Solr” from the search box at www.ibm.com. That’s a definite plus because the IBM Web site can be tough to navigate.
The write up is quite useful. Like some of the other content on the Developer Works Web site, the author is not an IBM employee. The Lucene / Solr write up is by a member of the technical staff at Lucid Imagination, a company that offers open source builds of Lucene and Solr as well as professional services. (Lucid is interesting because it resells commercial content connectors developed by the Australian company ISYS Search Software.)
The write up is timely and provides quite a bit of detail in the 6,000 word write you. You get a discussion of key Lucene concepts, geospatial search concepts, information about representing spatial data, a discussion of combining spatial data with text in search, examples, sample code, a how to for indexing spatial information in Lucene, a review of how to search by location, and compilation of links to relevant information in other technical documents, interviews with experts, and code, among other pointers.
Several observations:
- The effort that went into this 6,000 word write up is considerable. The work is quite good, and it strikes me as cat nip for some IBM centric developers. IBM is a Lucene user, and I think that IBM and Lucid want to get as many of these developers to use Lucene / Solr as possible. This is a marketing approach comparable to Google’s push to get Android in everything from set top boxes to netbooks.
- The information serves as a teaser for a longer work that will be published under the title of Taming Text. That book should find a ready audience. Based on the data I have seen, many organizations—even those with modest technical resources—are looking at Lucene as a way to get a search system in place without the hassles associated with a full scale search procurement for a commercial system.
- The ecumenical approach taken in the write up is a plus as well. However, in the back of my mind is the constant chant, “Sell consulting, sell consulting, sell consulting”. That’s okay with me because the phrase runs through my addled goose brain every day of the week. But the write up makes clear that there is some heavy lifting required to implement a function such as location aware search using open source software.
The complexity is not unexpected. It does contrast sharply with the approach taken by MarkLogic, an information infrastructure vendor who is making location type search part of the basic framework. Google, on the other hand, takes a slightly different approach. The company allows a developer to use its APIs to perform a large number of geospatial tricks with little fancy dancing. Microsoft is on the ease of use trail as well.
Some folks who are new to Lucene may find the code a piece of cake. Others might take a look and conclude that Lucene is going to be a recipe that requires Julia Childs in the kitchen.
Stephen E Arnold, January 19, 2010
A freebie. An IBM person once gave me an hors d’oeuvre and an Lucid professional bought me a flavored tea. Other than these high value inducements, I wrote this without filthy lucre’s involvement. I will report this to the National Institutes of Health.
Comments
3 Responses to “Location Aware Search via Lucene / Solr”
With time, the currently required heavy lifting will become easier with Solr. Documentation will be better, more articles and books[1] will become available, friendlier, easier to understand or integrate with APIs will become available.
[1] http://www.jroller.com/otis/entry/contributors_for_solr_in_action — we plan on covering Spatial Search in the upcoming Solr in Action
Interesting article, thanks for pointing it out, Stephen. We’ve implemented some systems with geolocation in the U.K., and the problem we often encounter is translating addresses to lat/long coordinates. As Grant mentions in the article, there are some free services, but these are either bandwidth-limited, have terms and conditions that prevent commercial use or are just plain innaccurate.
In the UK, the most complete postcode (that’s what we call a zipcode) database is owned by the Ordnance Survey, who (despite being a public body paid for by public money) have historically charged a lot for access. Things are changing slowly though, and there are some encouraging noises coming from Government.
Another common problem is locating the user in the case when they haven’t given you their full address – IP address gives you only a partial picture, and so far there isn’t a foolproof solution to this.
As Otis mentions, there’s some support for this in Solr, which is a little less “chef-intensive” than Lucene. Check out the talk posted on this Lucid Imagination Website, http://www.lucidimagination.com/How-We-Can-Help/webinar-from-search-to-found