A Full Text Engine Blooms in Life
January 9, 2014
Basic search for static Web sites stink. They are just a generic code that takes a one-size fits all approach to search and as we all know that never works. Stavros Korokithakis realized this problem and decided that he wanted to create a full-text search engine that was accurate. In his article, “Writing A Full-Text Search Engine Using Bloom Filters,” Korokithakis details how he wrote his own search using an inverted index and bloom filters. An inverted index works by mapping every word in a document to the ID of the document. As one can imagine that list grows very big and the basic search engine for a static Web site returns every hit. A search plug-in limits itself to titles, tags, and key words. How do you get the same results for a static search?
A bloom filter is the answer. A bloom filter is a data structure that stores elements in a fixed number of bits and tells users whether it has seen those elements before queried. It is also apparently easy to implement a bloom filter:
- “Create one filter per document and add all the words in that document in the filter.
- Serialize the (fixed-size) filter in some sort of string and send it to the client.
- When the client needs to search, iterate through all the filters, looking for ones that match all the terms, and return the document names.
- Profit!”
He even has a quick implementation guide in Python. It sounds like a wonderful way to improve static Web site search, but could not the same problem be solved with a simple plug-in as described above? With the rampant use of people relying pre-made Web site servers such as Word Press, tumblr, etc. they come with built-in plug-ins. Is this for the bigger Web sites people deploy?
Whitney Grace, January 09, 2014
Sponsored by ArnoldIT.com, developer of Augmentext
Comments
One Response to “A Full Text Engine Blooms in Life”
This design is spectacular! You most certainly know how to keep a reader entertained.
Between your wit and your videos, I was almost moved to start my own blog (well, almost…HaHa!) Great job.
I really enjoyed what you had to say, and more than that,
how you presented it. Too cool!