Solr Deep Paging Fix
August 1, 2011
After being spoiled by modern technology, let’s face it: who has three seconds to spare? This ultimately is the question posed in the “Deep Paging Problem” post on the Solr Enterprise Search blog, which presents an interesting performance tweak for the open source system.
Querying data buried deep in the information banks can be a bit hairy. Even the search giant Google stands at arms length from the problem, only returning 90-pages or so of results. If Solr was asked to retrieve the 500th document from an index, it must cycle through each of the first 499 documents to grab it. What can be done to save valuable time as well as ease the strain on the system, you ask?
Here enters the power of filters, handy from cigarettes to spreadsheets and nearly everything in between. The author asserts:
“The idea is to limit the number of documents Lucene must put in the queue. How to do it? We will use filters to help us, so Solr we will use the fq parameter. Using a filter will limit the number of search results. The ideal size of the queue would be the one that is passed in the rows parameter of query. … The solution … is making two queries instead of just one – the first one to see how limiting is our filter thus using rows=0 and start=0, and the second is already adequately calculated.”
So use the two saved seconds in searching to write that down. One query to recover the first page of results and a second, two-part query to check the number of results and then return the desired elements. For a useful example of the code in action, check the original post linked above.
Sarah Rogers, August 1, 2011