An Oddly Mystical, Whimsical Listicle Combining Big Data and Search

July 4, 2015

Some listicles are clearly the work of college students after a tough beer pong tournament. Others seem as if they emanate from beyond Pluto’s orbit. I am not sure where on this spectrum between the addled and extraterrestrial the listicle in “Top 11 Open Source big Data Enterprise Search Software” falls.

Here’s the list for your contemplation. I have added some questions after each company’s name. Consult the original write up for the explanation the inclusion of these systems in the list. I found the write ups without much heft or “wood” to use a Google term.

  1. Apache Solr. Yep, uses Lucene libraries, right. Performance? Exciting sometimes.
  2. Apache Lucene Core. Ah, Lego blocks for the engineer with some aspirations for continuous employment.
  3. Elasticsearch. The leader in search and retrieval. To do big data, there are some other components required. Make sure your programming and engineering expertise are up to the job.
  4. Sphinx. Okay, workable for structured data. Work required to stuff unstructured content into this system.
  5. Constellio. Isn’t this a part time project of a consulting firm focused on Canadian government work?
  6. DataparkSearch Engine. Yikes.
  7. ApexKB. Okay, a script. For enterprise applications. Big Data? Wow.
  8. Searchdaimon ES. Useful, speedier than either Lucene or Elasticsearch. Not a big data engine without some extra work. Come to think of it. A lot of work.
  9. mnoGoSearch. Well, maybe for text.
  10. Nutch. Old in the tooth. Why not use Lucene?
  11. Xapian. Very robust. Make certain that you have programming expertise and engineering knowledge. Often ignored which is too bad. But be prepared for some heavy lifting or paying a wizard with a mental fork lift to do the job.

Now which of these systems can do “big data.” In one sense, if you are exceptionally gifted with engineering and programming skills, I suppose any of these can do tricks. As Samuel Johnson allegedly observed to his biographer:

“Sir, a woman’s preaching is like a dog’s walking on his hind legs. It is not done well; but you are surprised to find it done at all.”

On the other hand, these programs can be used as a utility within a more robust content processing system which has been purpose built to deal with large flows of structured and unstructured content. But even that takes work.

Anyone want to give Constellio a shot at processing real time Facebook posts? Anyone want to use any of these systems to solve that type of search problem? Show of hands, please?

Stephen E Arnold, July 4, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta