Whoosh: Python Search Libraries

February 13, 2009

A happy quack to the reader who alerted me to Whoosh, “a fast, featureful full-text indexing and searching library implemented in pure Python.” You can read about Matt Chaputt’s code here. Among the Whoosh features we noted were:

  • Support for fielded indexing and search.
  • Zippy indexing and query processing
  • Snap in scoring algorithm (including BM25F), text analysis, storage, posting format, etc.
  • A Python spell-checker (as far as I know, the only one).

You won’t be using this on your Vista machine. The libraries are useful in such applications as:

  • When a pure-Python solution is desirable to avoid having to build/compile native libraries (or force users to build/compile them)
  • When you need a research platform
  • When the Pythonic interface is useful.

You can download the snake-quick system here.

Stephen Arnold, February 13, 2009

Comments

2 Responses to “Whoosh: Python Search Libraries”

  1. Charlie Hull on February 13th, 2009 4:37 am

    Not sure why you wouldn’t be using this on your Vista machine, Python works fine on Vista….

    We’ve been checking out Whoosh and have tested it against another open source engine, Xapian:
    http://xapian.wordpress.com/2009/02/12/xapian-performance-comparision-with-whoosh/
    Our conclusion: it’s pretty good for what it is, but it may not cope with large (100k+) indexes.

    Nice to see another open source search engine, though. Python users might also be interested in Xappy, a Python interface to Xapian and part of our Flax platform: http://pypi.python.org/pypi/xappy/0.5

  2. Stephen E. Arnold on February 13th, 2009 9:43 am

    Charlie Hull,

    Thanks for the links and information.

    Stephen Arnold, February 13, 2009

  • Archives

  • Recent Posts

  • Meta