RapidMiner: Open Source Data Mining

April 11, 2009

A happy quack to the reader who reminded me that Google Apps supports Java. If you are interested in data mining, you may want to catch up with RapidMiner, an open source data mining system. RapidMiner drinks Java, so you may want to think about ways to make use of Google Apps and RapidMiner. The person who wrote me wanted some information about this idea.

My April 2009 column for KMWorld talks about Google Apps, but I don’t have any information about hooking RapidMininer into Google Apps. In fact, I had not thought about it.

RapidMiner is “the world-wide leading open-source data mining solution due to the combination of its leading-edge technologies and its functional range. Applications of RapidMiner cover a wide range of real-world data mining tasks.” There is an enterprise version plus consulting services available.

You can download the RapidMiner community edition here. The documentation is quite good. You can snag a copy of those documents here. The community edition offers a number of features, and it is extensible. Here’s an example of a data output from RapidMiner:

rapidminer

You can find a useful discussion by Michael Wurst of the open source version at Nemoz.org here. This write up provides some useful examples that show one way to hook RapidMiner into a Java application. What is quite useful is the code sample for using the text classifier on a chunk of text. RapidMiner’s classification component is called RapidMinerTextClassifier.

There are some limitations to the Google Apps implementation of Java, but I think the person who wrote me has an interesting idea. The notion of combining sophisticated RapidMiner oiperations with the Google Apps struck me as interesting. If you have any interesting examples of this type of hybridization, use the comments section of this Web log to pass along the information.

Stephen Arnold, April 11, 2009

Comments

3 Responses to “RapidMiner: Open Source Data Mining”

  1. RapidMiner: Open Source Data Mining : Beyond Search | Open Hacking on April 11th, 2009 1:37 am

    […] the rest here: RapidMiner: Open Source Data Mining : Beyond Search This entry was posted on Saturday, April 11th, 2009 at 12:02 am and is filed under News, […]

  2. Otis Gospodnetic on April 16th, 2009 12:41 pm

    A better suggestion would be Mahout, actually (uses Hadoop and can make good use of a GAE cluster). A new version was just released: http://www.jroller.com/otis/entry/pylucene_and_mahout_releases

  3. el chief on January 18th, 2011 9:20 pm

    here’s a five part video series on text mining with rapidminer:

    http://vancouverdata.blogspot.com/2010/11/text-analytics-with-rapidminer-loading.html

  • Archives

  • Recent Posts

  • Meta