An Oddly Mystical, Whimsical Listicle Combining Big Data and Search
July 4, 2015
Some listicles are clearly the work of college students after a tough beer pong tournament. Others seem as if they emanate from beyond Pluto’s orbit. I am not sure where on this spectrum between the addled and extraterrestrial the listicle in “Top 11 Open Source big Data Enterprise Search Software” falls.
Here’s the list for your contemplation. I have added some questions after each company’s name. Consult the original write up for the explanation the inclusion of these systems in the list. I found the write ups without much heft or “wood” to use a Google term.
- Apache Solr. Yep, uses Lucene libraries, right. Performance? Exciting sometimes.
- Apache Lucene Core. Ah, Lego blocks for the engineer with some aspirations for continuous employment.
- Elasticsearch. The leader in search and retrieval. To do big data, there are some other components required. Make sure your programming and engineering expertise are up to the job.
- Sphinx. Okay, workable for structured data. Work required to stuff unstructured content into this system.
- Constellio. Isn’t this a part time project of a consulting firm focused on Canadian government work?
- DataparkSearch Engine. Yikes.
- ApexKB. Okay, a script. For enterprise applications. Big Data? Wow.
- Searchdaimon ES. Useful, speedier than either Lucene or Elasticsearch. Not a big data engine without some extra work. Come to think of it. A lot of work.
- mnoGoSearch. Well, maybe for text.
- Nutch. Old in the tooth. Why not use Lucene?
- Xapian. Very robust. Make certain that you have programming expertise and engineering knowledge. Often ignored which is too bad. But be prepared for some heavy lifting or paying a wizard with a mental fork lift to do the job.
Now which of these systems can do “big data.” In one sense, if you are exceptionally gifted with engineering and programming skills, I suppose any of these can do tricks. As Samuel Johnson allegedly observed to his biographer:
“Sir, a woman’s preaching is like a dog’s walking on his hind legs. It is not done well; but you are surprised to find it done at all.”
On the other hand, these programs can be used as a utility within a more robust content processing system which has been purpose built to deal with large flows of structured and unstructured content. But even that takes work.
Anyone want to give Constellio a shot at processing real time Facebook posts? Anyone want to use any of these systems to solve that type of search problem? Show of hands, please?
Stephen E Arnold, July 4, 2015