Solr Deep Paging Fix

August 1, 2011

After being spoiled by modern technology, let’s face it: who has three seconds to spare? This ultimately is the question posed in the “Deep Paging Problem” post on the Solr Enterprise Search blog, which presents an interesting performance tweak for the open source system.

Querying data buried deep in the information banks can be a bit hairy. Even the search giant Google stands at arms length from the problem, only returning 90-pages or so of results. If Solr was asked to retrieve the 500th document from an index, it must cycle through each of the first 499 documents to grab it. What can be done to save valuable time as well as ease the strain on the system, you ask?

Here enters the power of filters, handy from cigarettes to spreadsheets and nearly everything in between. The author asserts:

“The idea is to limit the number of documents Lucene must put in the queue. How to do it? We will use filters to help us, so Solr we will use the fq parameter. Using a filter will limit the number of search results. The ideal size of the queue would be the one that is passed in the rows parameter of query. … The solution … is making two queries instead of just one – the first one to see how limiting is our filter thus using rows=0 and start=0, and the second is already adequately calculated.”

So use the two saved seconds in searching to write that down. One query to recover the first page of results and a second, two-part query to check the number of results and then return the desired elements. For a useful example of the code in action, check the original post linked above.

Sarah Rogers, August 1, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Written by Stephen E. Arnold · Filed Under News, Open source, Search

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.