Open Source Projects at GitHub
March 5, 2012
We’ve run across a couple of interesting open source components on the collaboration site GitHub. Founded in 2008, the site hosts over two million code repositories and provides tools with which subscribers can manage their projects. The two we’d like to highlight are the Nutch-Elasticsearch-Indexer and MongoDB.
The Nutch-Elasticsearch-Indexer allows for the indexing of crawl data from the Nutch search system into the ElasticSearch system. The project’s Readme explains:
“This is similar in nature to that of the SolrIndexer that comes with Nutch which let you index directly into Solr. This provides a way directly index data into elasticsearch coming directly from Nutch.
This is just the code necessary to create the solution. You must start by having the Nutch codebase and have it setup in your development environment (Eclipse) see http://wiki.apache.org/nutch/RunNutchInEclipse for how do this. Once you are set up and is working well. You are ready to get started. The following files below are necessary to integrate into the Nutch base and then re-build Nutch.”
Participating developers must have access to the Nutch source and an ElasticSearch environment. See the GitHub project page for further details.
The page on MongoDB is the other project here that sparked our interest. It includes developer details such as related utilities; where to do for info on building a database; how to run Mongo; client drivers; documentation; build notes; and licensing information. We are pleased to see that this page sees a fair amount of activity.
Maybe the search mantra should be, “Go, open source”?
Cynthia Murrell, March 5, 2012
Sponsored by Pandia.com
 
	




