The Financial Times Rediscovers Text Mining

October 11, 2008

On October 8, 2008, the former owner of Madame Tussaud’s wax museum until 1998, published Alan Cane’s “New Techniques Find Meanings in Words.” Click “fast” because locating Financial Times’s news stories can be an interesting exercise. You can read this “news” in the Financial Times, a traditional publishing company with the same type of online track record as the Wall Street Journal and the New York Times. The premise of Mr. Cane’s article is that individuals need information about people, places, and things. Apparently Mr. Cane is unfamiliar with the work of i2 in Cambridge, England, Linguamatics, and dozens of other companies in the British Commonwealth alone actively engaged in providing systems that parse content to discern and make evident information of this type. Nevertheless, Mr. Cane reviews the ideas of Sinequa, Google, and Autonomy. You can read about these companies and their “new” technology in this Web log. For me, the most interesting comment in this write up was this passage attributed in part to the Charles Armstrong, CEO of Trampoline Systems, a company with which I am not familiar:

“The rise of Web 2.0 in the consumer world alerted business to the role that social contacts and networks play. When you are dealing with a project that requires a particular knowledge, you look for the person with the knowledge, not a document.” Mr Armstrong says Trampoline’ [System]s search engine is the first to analyse not just the content of documents but the professional networks of those connected to the documents.

There are three points in this snippet that I noted on my trusty yellow pad:

  1. Who is Charles Armstrong?
  2. What is the connection between the specious buzzword “Web 2.0” and entity extraction. I recall Dr. Ramana Rao talking about entity extraction in the mid-1980s. Before that, various government agencies had systems that would identify “persons of interest”. Vendors included ConQuest Technologies, acquired by Excalibur and even earlier saved queries running against content in the Dialog and LexisNexis files. Anyone remember the command UD=9999 from 1979.
  3. What’s with the “Web 2.0” and the “first”? You can see this type of function on public demonstration sites at www.cluuz.com and www.silobreaker.com. You can also ring your local Kroll OnTrack office, and if you have the right credentials, you can see this type of operation in its industrial strength form.

Here’s what I found:

  • CRM Magazine named Trampoline Systems a rising start in 2008
  • Charles Armstrong, Cambridge grad, is an “ethnographer turned technology entrepreneur.” The company Trampoline Systems was founded in 2003 to “build on his research into how small communities distribute information to relevant recipients.” Ah, the angle is the blend of entity extraction and alerts. Not really new, but more of an angle on what Mr. Armstrong wants to deliver to licensees. Source: here. You can read the Wikipedia profile here. His Linked In profile carries this tag: “Ethnographer gone wrong” here. His Web log is here.
  • Craig McMillan is the technology honcho. According to the Trampoline Web site here, he is a veteran of Sun Microsystems where he “led the technical team building the Identrus Global Trust Network Identity assertion platform led technical team for new enterprise integration and meta-directory platform.” Source: here. I found it interesting that the Fast Forward Web log, the official organ of the pre-Microsoft Fast Search & Transfer, wrote about Mr. McMillan’s work in early 2007 here in a story called “Trampoline Systems: Rediscovering the Lost Art of Communications.” The Fast Forward article identifies Raytheon, the US defense outfit, as a “pilot”. Maybe Fast Search should have purchased this company before the financial issues thrust Fast Search into the maw of Microsoft?
  • I located an Enron Explorer here. This seems to be a demo of some of the Trampoline functionality. But the visualizer was not working on October 10, 2008.
  • The core products are packaged as the Sonar Suite. You can view a demo of a Tacit Software like system here. You can download a demo of the system here. The graphics look quite nice, but the entity precision, relevance, throughput and query response time are where the rubber meets the road. A nice touch is that the demos are available for Macs and PCs. With a bit of clicking from the Trampoline Systems’ home page, you can explore the different products the company offers.
  • Web Pro News has a useful write up about the company which appeared in 2006 here.

Charles Armstrong’s relationships as identified by the Canadian company Cluuz.com appear in the diagram below. You can recreate this map by running this query “Charles Armstrong” + Trampoline on Cluuz.com. The url for the map below is http://www.cluuz.com/ClusterChart.aspx?req=633592276174800000&key=9

armstong map

This is Cluuz.com’s relationship map of Charles Armstrong, CEO of Trampoline Systems. “New” is not the word I would use to describe either the Cluuz.com or the Trampoline relationship visualization function. Both have interesting approaches, but the guts of this type of map have been around for a couple of decades.

Let me be clear: I am intrigued by the Trampoline Systems’ approach. There’s something there. The FT article doesn’t pull the cart, however. I am, therefore, not too thrilled with the FT’s write up, but that’s my opinion to which I am entitled.

Make up your own mind. Please, read the Financial Times article. You will get some insight into why traditional media struggles to explain technology. Neither the editors nor the journalist takes the time or has the expertise to figure out what’s “new” and what’s not. My hunch is that trampoline does offer some interesting features. Ripping through some contacts with well known companies and jumping to the “new” assertion calls into question the understanding of the subjects about which the UK’s top journalists write. Agree? Disagree? Run a query on FT.com for “Trampoline Systems” before you chime in, please.

Stephen Arnold, October 10, 2008

Comments

One Response to “The Financial Times Rediscovers Text Mining”

  1. Ernest Perez on November 14th, 2008 8:40 pm

    Hi, Steven,

    Ran across your “rediscovering text mining” piece in the last few days. I was not familiar with the “entity extraction” usage, and enjoyed your explanation and dry humor.

    My own background includes a Ph.D. in Library & Information Studies. My specialty was designing & managing news text retrieval systems (Houston Chronicle & Chicago Sun-Times) and packages/portals giving end users complex combinations of resources & services (Started & managed the State Library of Oregon’s online information system for State employees, plus developing taxonomy for the cluster of State websites, and implementing the search engine for State of Oregon Websites.

    Since retiring, I’ve consulted informally with Power Text Solutions, a company whose software technology I evaluated in a feature article for a 2003 issue of Online Magazine. They contacted me after retirement, and invited me to examine and comment on their “next generation” product.

    I think you might enjoy taking a look at it. The core technology is “iResearch Reporter.” Here’s the text of a short description I wrote for one of my library publication contacts….

    ——————————————————
    iResearch-Reporter by Power Text Solutions (PTS), of Salem OR, is tested software technology that is a “virtual research assistant.” It uses intelligent post-processing to select and add value to information from large multi-document sets retrieved via popular search engines.

    PTS uses text-mining and summarization processing to analyze multiple documents sets. Then it uses linguistic analysis to create a readable, concise, accurate and organized summary report. The report contains inserted URL links throughout, for quick access to all source documents.

    The end user gets instant intelligent analysis and summary text describing the major or important information topics about their query. It’s great for:
    1) “Quick study” or comprehensive background information; as well as,
    2) Clear, concise, selected, organized and summarized information and links that serves as the entry to detailed study of the user’s question area.

    PTS uses any external computer retrieval system to identify groups of highly relevant documents relating to a search topic. iResearch-Reporter.com, the current demo system on the Web uses Google as the Search Engine. But in another installation, for a Homeland Security regional agency grouping in Texas, PTS software uses text-mining technology to search and summarize results from Google, Yahoo, and MSN, online databases, as well as local agency intelligence and security document collections.

    The HLS system manager says, “It can write a good summary report in a minute or two, where a human might have to scan maybe 15,000 documents to create an equivalent report.”

    PTS Technology on the Web:

    iResearch Reporter offers a free Web demo research tool at . This has been the PTS testbed site, with no active marketing up to this time. This site and preceding versions already serves thousands of users, mainly from Europe, with a growing U.S. .user base.

    NewsFeed Researcher PTS software technology produces detailed background reports of all news stories contained in Google NewsFeeds. It is throughout the day, essentially delivering complete, current background information about all major current news stories. [This one’s impressive – total background information package on all current news stories!]

    Both of the sites are available for free use. They’re looking for specialized datafile information providers, for contract customization. In the meantime, PTS invites information professionals and the general public to try the systems out as a free demo.
    ——————————————————

    This product is one of the best I’ve seen. The text extracts and organized summaries are so good that you generally won’t even notice that it was written by a computer system, rather than a human.

    I really enjoyed your incisive analysis and judgement of the Trampoline product. In any case, I hope you may find this system of interest. As noted above, it’s the best I’ve seen in the text mining area.

    Cheers,
    –ernest
    _______________________
    Ernest Perez, Ph.D.
    1119 Satara Ct. NW
    Salem OR 97304
    503-588-3650 Home
    503-884-4233 Cell
    ernest.r.perez@gmail.com

  • Archives

  • Recent Posts

  • Meta