Leximancer: Divining Meaning from Words
April 17, 2008
In Australia last year, I met several information technology professionals who mentioned the Leximancer text and content processing system to me. Leximancer now has offices in three cities: Brisbane, Australia, London, England, and Boulder, Colorado. I updated my Leximancer files and made a mental note that that company had some nifty visualization technology. Based on comments made to me, analysts in police and intelligence as well as the academic community find the product of significant value. I heard that the company has more than 200 licensees and is growing at a brisk pace.
At the eContent conference in Phoenix, Arizona, one of the attendees was grilling me about text analytics. As the grill-ee, I was reluctant to provide too much information to the grill-er. Most of what the young, confident MBA wanted is in my new study Beyond Search: What to Do When Your Enterprise Search System Won’t Work. Furthermore, she was convinced after her text mining industry research which included healthy bites of blue-chip consultancies’ pontifications that no firm combined text analysis, discovery, and useful point-and-click visualizations of the topic and concept space of a collection.
Sigh. Like the Fortune 500 country clubbers, vendors are so darn inadequate. Maybe? Sometimes it’s the Fortune 500 Ivy leaguers who are missing a card or two in their deck, not the vendors. Just a thought.
This short essay is a partial response to her assertion, which was–by the way–100 percent incorrect. For some reason, her research overlooked high-profile tools from dozens of vendors as well as point specialists. On the flight back last night, I recalled the Leximancer system, and I thought I would provide some color about that firm’s approach for two reasons: [a] I find it useful to look at companies with interesting search-related technologies and [b] I want to underscore that her assertion and her research was woefully inadequate.
What’s a Leximancer?
Leximancer is text mining software that you can use to analyze the content of collections of textual documents. The system then displays the the extracted information in a browser. Leximancer’s approach to visualization is to use a “concept map”. The idea is that a user can glance at the map, get an overview, and then explore the relationships that Leximancer discovers within the text.
Technical Approach
The technical approach Leximancer’s engineers follow combines several different disciplines. Some text processing companies seize upon a core technology and then wrap it with additional functions. Other companies combine a statistical approach and a linguistic approach and balance the functions of each system component. Leximancer takes a different track, and it struck me when I first looked at the company’s technology that Leximancer and Google share some common ideas about extracting meaning from text.
Specifically, Leximancer’s system uses a half dozen different disciplines as sources of algorithms, systems, and methods. Instead of betting the content processing farm on a single pony such as Bayesian maths or semantic analysis heavily dependent on knowledge bases, Leximancer takes a more catholic approach. I find that openness refreshing because many vendors use marketing hype to present a system as computationally rich. Leximancer has a computationally charged system and trims back on the marketing baloney.I know that many Fortune 500 firms love that type of salami from their vendors, but at the end of the day, an unsatisfying system hurts the licensee and ultimately the vendor as well.
Here’s a run down of Leximancer’s approach:
- Computational linguistics. Leximancer uses a blend of techniques, including Bayesian statistics, to note that the appearance of a word is correlated with the appearance of certain other words.
- Content analysis. The Leximancer system quantifies the knowledge within text by coding text segments with a set of concepts. Each concept is defined by a set of relevant words.
- Information science. Leximancer makes use of learnings from traditional information retrieval or IR for processing, indexing, and navigating in a “concept space”. (More information about this technique appears in Beyond Search.)
- Machine learning. The company uses a method for iteratively growing a thesaurus of words around a set of initial seed words.
- Network theory. The visualizations used by Leximancer have benefited from innovation in complex networks. The idea is that emergent behavior can be identified in Leximancer’s emergent themes, which, according to the company, “provide a measure of the meaning of a text”.
- Physics. The company says that “the idea of a measurable short-range order between words was influenced by algorithms used in solid state physics”. As you may know, Google approach spam with algorithms anchored in theoretical physics.
The Leximancer system, then, is based upon multiple, complementary techniques. Instead of having one horse in the text analysis system, Leximancer is using a team of stallions.
System Outputs
You can configure the system to display processed content in a variety of ways. I have selected a collection of information related to the 9-11 attack on the World Trade Center. I have focused on the concept intelligence, and I am able to view the concepts and their frequency in a term list. A click displays the content within the cluster.
The Java-based visualization component makes it easy for a user to scale the visualization, add or trim concepts, and explore content with a single mouse click. A user can explore a concept or a sub topic by text document summary or a ranked concept list. I find the summary particularly useful once I am oriented in the content space. Here’s a representative summary generated by the Leximancer system:
You can explore different output modes on the company’s Web site.
The company offers its multi-language, automated content processing system in three versions. You can license a professional version designed for use on a single-user computer or workstation. An academic version is available, but you will need to register with the company to obtain this product. Leximancer also offers a server version for enterprise deployments. Prices begin at about US$400 for the academic license. Commercial licenses begin at about US$800. Compared to prices for comparable systems, Leximancer is one of the most attractively-priced content processing and discovery systems available. You can register to download a trial version of the system here.
My hope is that the enthusiastic and confident but naive MBA who could not find a suitable product will give Leximancer a test drive. If I had more energy, I would send her a link to this Web log essay, but I need a rest. Trading quips with a 20-something wears me down. Come to think of it, MBAs and many senior executives in general have that effect on me, the squawking goose in Harrod’s Creek, Kentucky. Honk, honk.
Stephen Arnold, April 16, 2008
Comments
2 Responses to “Leximancer: Divining Meaning from Words”
Very interesting. Are you still using the software or have you used it in production?
Leximancer..I recently asked for a trial copy for their 3.5 version.
After exchanging 10 emails with 2 of their company representatives, they declined to provide me with a free trial.
Instead they said, “…that we can negotiate for a paid desktop trial.”
Although they advertise a free trial at their site, they actually trying to make people pay for a tial version.
How crappy and scummy is that?
Stay away from them. The software costs 1500$ AUD, and they expect you to pay without testing it.
Unless of course you pay for a trial version first.
Most probably their software is full of bugs..
They con people into buying their crappy software!
Stay away from them!