Sphinx: Inscrutable Search
May 9, 2009
The Register’s Ted Dziuba’s “Sphinx – Text Search the Pirate Bay Way” here is a good case example for open source search technology. Before you cancel your Microsoft Fast ESP license, keep in mind that Sphinx is for structured data, specifically MySQL tables. You can get more detail here. There are some doubters in the crowd, particularly with regard to open source search technology. Based on the email I receive and the implementations I have examined, the open source search technology cannot be dismissed or ignored. For me, one of the more interesting comments in the article was:
Internet-famous MySQL wonk Jeremy Zawodny, who had the foresight to jump from the ship’s bow as Yahoo started to take on water, replaced MySQL full text search at Craigslist with Sphinx. Craigslist used 25 machines to handle roughly 50 million queries per day on MySQL. Under that kind of load, Zawodny found that MySQL wasn’t using much CPU or doing much disk I/O, which means it’s spending all of its time waiting on thread locks. Oops. Maybe we should have paid attention to parallelism after all. The Sphinx implementation took those 25 machines down to 10, with plenty of room to grow. While Sphinx didn’t handle the traffic out of the box at the time, Zawodny was able to patch it to handle Craigslist’s specific need – and fix a few bugs along the way.
The “green angle” is important. The comments about vowels and stopwords are also interesting. Worth putting this write up in the open source search archive.
Stephen Arnold, May 9, 2009
Comments
2 Responses to “Sphinx: Inscrutable Search”
I don’t think Sphinx is just for structured data. It can search across fields in database records and deal with web pages, so that makes it a text search engine in my book. Or at least on my site:
Whoops, that should be: http://www.searchtools.com/tools/sphinx.html