Text Mining: No-Cost Resources
April 19, 2008
Engineers without Fears has a post by Matt Moore that contains four useful links. If you are looking for a way to get up to speed on this “beyond search” function, navigate to this post.
None is without some constraints; each is useful. First, you can read a six-page paper comparing four systems: Leximancer, Megaputer, SAS Institute, and SPSS. Keep in mind that each of these is approaches text mining from very different angles of attack. Leximancer is a useful system that can become difficult to navigate in visualization mode. Megaputer, developed by wizards from a university in Russia, is robust but can be complex to operate. SAS has licensed technology from Inxight Software (now owned by SAP’s Business Objects) and the recent buyer of text processing specialist, Teragram. Expect some changes in the SAS approach in the near future. SPSS, a company best known for data mining, acquired LexiQuest and uses that company’s technologies in its systems. Nevertheless, you can pick up some helpful information in “An Evaluation of Unstructured Text Mining Software”. The link appears on Engineers without Fears.
The link to the National Centre for Text Mining is particularly helpful. The information available on the site ranges from traditional society boilerplate to the more useful comments about tools and research. You may find it useful to spider the entire site. Information can appear and disappear, so an archive is helpful if you plan on extending your research over a period of years.
The links to a lecture by Dr. Marti Hearst is a must read. Most vendors have sucked concepts, phrases, and data from Dr. Hearst’s work, often without giving her credit. This particular paper dates from late 2003, and a quick search of Google and the University of California – Berkeley Web site will point you to more current information. (You may want to narrow your query to computer science and allied disciplines. The site is sprawling, and it can difficult to locate what you need. UC Berkeley obviously doesn’t pay much attention to Dr. Hearst’s expertise.)
The link to the 2003 New York Times’s article satisfies a researcher’s need to get the “gray lady’s” take on a technical topic. I don’t pay much attention to the information in newspapers, but you can decide for yourself. Engineering documents, patent applications, and technical articles often provide more useful information without the rhetorical over extension needed to convert an equation into a two word phrase or a metaphor.
If you have a budget, you will want to look at the profiles of text mining companies in Beyond Search, a 300-page review of text mining and its component parts. The study also includes a discussion of approaches to content processing that “wrap” text mining in more usable applications. More information about this resource is located here.
Stephen Arnold, April 19, 2008
Comments
3 Responses to “Text Mining: No-Cost Resources”
Speaking of no-cost resources, I recommend this one: http://alias-i.com/lingpipe/web/competition.html. The folks at alias-i do a great job of enumerating and summarizing the various text mining offering out there, including the free options from university labs.
I appreciate your adding this link.
Stephen Arnold, April 19, 2008
Leximancer..I recently asked for a trial copy for their 3.5 version.
After exchanging 10 emails with 2 of their company representatives, they declined to provide me with a free trial.
Instead they said, “…that we can negotiate for a paid desktop trial.”
Although they advertise a free trial at their site, they actually trying to make people pay for a tial version.
How crappy and scummy is that?
Stay away from them. The software costs 1500$ AUD, and they expect you to pay without testing it.
Unless of course you pay for a trial version first.
Most probably their software is full of bugs..
They con people into buying their crappy software!
Stay away from them!