The Financial Times Rediscovers Text Mining

October 11, 2008

On October 8, 2008, the former owner of Madame Tussaud’s wax museum until 1998, published Alan Cane’s “New Techniques Find Meanings in Words.” Click “fast” because locating Financial Times’s news stories can be an interesting exercise. You can read this “news” in the Financial Times, a traditional publishing company with the same type of online track record as the Wall Street Journal and the New York Times. The premise of Mr. Cane’s article is that individuals need information about people, places, and things. Apparently Mr. Cane is unfamiliar with the work of i2 in Cambridge, England, Linguamatics, and dozens of other companies in the British Commonwealth alone actively engaged in providing systems that parse content to discern and make evident information of this type. Nevertheless, Mr. Cane reviews the ideas of Sinequa, Google, and Autonomy. You can read about these companies and their “new” technology in this Web log. For me, the most interesting comment in this write up was this passage attributed in part to the Charles Armstrong, CEO of Trampoline Systems, a company with which I am not familiar:

“The rise of Web 2.0 in the consumer world alerted business to the role that social contacts and networks play. When you are dealing with a project that requires a particular knowledge, you look for the person with the knowledge, not a document.” Mr Armstrong says Trampoline’ [System]s search engine is the first to analyse not just the content of documents but the professional networks of those connected to the documents.

There are three points in this snippet that I noted on my trusty yellow pad:

  1. Who is Charles Armstrong?
  2. What is the connection between the specious buzzword “Web 2.0” and entity extraction. I recall Dr. Ramana Rao talking about entity extraction in the mid-1980s. Before that, various government agencies had systems that would identify “persons of interest”. Vendors included ConQuest Technologies, acquired by Excalibur and even earlier saved queries running against content in the Dialog and LexisNexis files. Anyone remember the command UD=9999 from 1979.
  3. What’s with the “Web 2.0” and the “first”? You can see this type of function on public demonstration sites at www.cluuz.com and www.silobreaker.com. You can also ring your local Kroll OnTrack office, and if you have the right credentials, you can see this type of operation in its industrial strength form.

Here’s what I found:

  • CRM Magazine named Trampoline Systems a rising start in 2008
  • Charles Armstrong, Cambridge grad, is an “ethnographer turned technology entrepreneur.” The company Trampoline Systems was founded in 2003 to “build on his research into how small communities distribute information to relevant recipients.” Ah, the angle is the blend of entity extraction and alerts. Not really new, but more of an angle on what Mr. Armstrong wants to deliver to licensees. Source: here. You can read the Wikipedia profile here. His Linked In profile carries this tag: “Ethnographer gone wrong” here. His Web log is here.
  • Craig McMillan is the technology honcho. According to the Trampoline Web site here, he is a veteran of Sun Microsystems where he “led the technical team building the Identrus Global Trust Network Identity assertion platform led technical team for new enterprise integration and meta-directory platform.” Source: here. I found it interesting that the Fast Forward Web log, the official organ of the pre-Microsoft Fast Search & Transfer, wrote about Mr. McMillan’s work in early 2007 here in a story called “Trampoline Systems: Rediscovering the Lost Art of Communications.” The Fast Forward article identifies Raytheon, the US defense outfit, as a “pilot”. Maybe Fast Search should have purchased this company before the financial issues thrust Fast Search into the maw of Microsoft?
  • I located an Enron Explorer here. This seems to be a demo of some of the Trampoline functionality. But the visualizer was not working on October 10, 2008.
  • The core products are packaged as the Sonar Suite. You can view a demo of a Tacit Software like system here. You can download a demo of the system here. The graphics look quite nice, but the entity precision, relevance, throughput and query response time are where the rubber meets the road. A nice touch is that the demos are available for Macs and PCs. With a bit of clicking from the Trampoline Systems’ home page, you can explore the different products the company offers.
  • Web Pro News has a useful write up about the company which appeared in 2006 here.

Charles Armstrong’s relationships as identified by the Canadian company Cluuz.com appear in the diagram below. You can recreate this map by running this query “Charles Armstrong” + Trampoline on Cluuz.com. The url for the map below is http://www.cluuz.com/ClusterChart.aspx?req=633592276174800000&key=9

armstong map

This is Cluuz.com’s relationship map of Charles Armstrong, CEO of Trampoline Systems. “New” is not the word I would use to describe either the Cluuz.com or the Trampoline relationship visualization function. Both have interesting approaches, but the guts of this type of map have been around for a couple of decades.

Let me be clear: I am intrigued by the Trampoline Systems’ approach. There’s something there. The FT article doesn’t pull the cart, however. I am, therefore, not too thrilled with the FT’s write up, but that’s my opinion to which I am entitled.

Make up your own mind. Please, read the Financial Times article. You will get some insight into why traditional media struggles to explain technology. Neither the editors nor the journalist takes the time or has the expertise to figure out what’s “new” and what’s not. My hunch is that trampoline does offer some interesting features. Ripping through some contacts with well known companies and jumping to the “new” assertion calls into question the understanding of the subjects about which the UK’s top journalists write. Agree? Disagree? Run a query on FT.com for “Trampoline Systems” before you chime in, please.

Stephen Arnold, October 10, 2008

Tess: New Beyond Search Analyst

October 10, 2008

We’ve had several emails about the white boxer shown on the splash page of this Web log. The Beyond Search team rescued her. She’s on the mend and learning the ins and outs of Microsoft Office SharePoint Search. After her unhappy experiences with two previous owners, she’s curious about plugs, cables, and anything that will fit into her mouth. We’ve discovered that she’s deaf, which may explain why she had a tough life before discovering the soft-hearted goose at Beyond Search. Watch for her analyses of MOSS and other Microsoft technologies. She told us, “These are easy for a rescued boxer to understand.”

Stephen Arnold, October 10 2008

Microsoft’s Latest Business Intelligence Push

October 10, 2008

You can get a useful run down of Microsoft’s newest business intelligence initiatives. Mark Whitehorn’s “Microsoft Business Intelligence Conference 2008: Kilimanjaro, Madison, and Gemini” for SearchDataManagement here explains the fancy code names. For me, these new initiatives should avoid the frustrations of Microsoft’s current line up of business intelligence products; for example, getting PerformancePoint to work with SharePoint is tough. The easiest way to pump up ShaerPoint’s business intelligence functions is to write code for Excel. Maybe Microsoft will address this shortcoming. However, companies like Attivio and MarkLogic are pushing into new business intelligence frontiers. Both companies offer “search” but the service is a utility. Microsoft seems to be content to leverage the 100 million SharePoint licenses and urging customers to go with Fast Search and other Microsoft add ons. Check out Mr. Whitehorn’s write up. He’s enthusiastic about Microsoft’s new initiatives. I’m more cautious and skeptical. How will these nifty new services work in Microsoft’s Strata environment? Will Microsoft’s reengineered data centers reduce the need and headaches for on premises installations? I don’t have answers to these questions… yet.

Stephen Arnold, October 10, 2008

Amazon S3 Cuts Prices

October 10, 2008

Om Malik’s “Amazon Cuts Prices on S3” provides a run down of the new AWS fees. My quick look through the prices made me think that Amazon wants to get more small, price-sensitive outfits on the service and pull more high volume users. But Amazon’s accountants with sharp pencils craft prices that take some analysis to figure out. Mr. Malik’s article summarizes the data about the “success” of AWS. You can read the story here. I tucked this article in my Amazon folder. I am still waiting for Amazon to produce hard data about the revenue from AWS. I want to see how the company can do so much with its modest R&D and technology budgets. I also want to see how much of the cost of the ecommerce infrastructure is really offset by the AWS play. Talking to me about “objects” means zilch. Talking to me about money does. Is there a reason Mr. Bezos and his wizards refuse to break out financial data. Because of the object-fuscation, I am not sure what “success” at Amazon means. Readers with insight, feel free to post. In the meantime, read Mr. Malik’s article.

Stephen Arnold, October 10, 2008

Ad Injection: Now a Reality

October 10, 2008

Google selling Ads for Games see the story here from the New York Times by Saul Hansell. Google is pronouncing that the 15 to 30 second television ads will be appealing for the casual games before, between levels and after the game has ended.

A concise explanation of why Google is moving forward when the ad business seems to be dipping is provided by product manager Christian Ostlien: ““Brand advertisers come to us looking for a cross-platform solution that lets them hit audiences that are on the scale of millions but allows them to do very precise targeting.”

As with most things Google this is not a new concept. The work was hinted at during a conference during July 2007. Of note to the video ads is that marketers can have their product ads integrated into the games themselves with Google serving as dealmaker between the developers and the ad makers.

Google’s own view of AdSense for Games can be read here. With 25% of Internet users playing games each week the audience is there. One note for advertisers: you better have big game according to a quote taken from The Channel Wire blog posting on the subject: “At this time, eligible publishers must have a minimum of 500,000 game plays and have 80 percent of their traffic from the U.S. or the U.K.,” Ryan Hayward, from Ads Product Marketing wrote in a Google blog.

The launch is solid with both marketers and game developers already on board. How long will TV ads rule? Mobile content and expanded Internet usage seems to harken to the reality of a need for expanded ads on the web. I personally don’t mind sitting through the ads as I catch up on missed episodes of The Unit. The question is, “What else can Google do with this technology?”

Constance Ard for Beyond Search, October 10, 2008

SAP and Financial Pressure: Complexity Has Its Price

October 10, 2008

Reuters distributed “SAP Imposes Cost Savings as Financial Crisis Looms” on October 9, 2008. SAP provides company-wide infrastructure software to run accounting, manufacturing, and other back office operations. The company has its own NetWeaver search engine called TREX and the company’s venture arm pumped $4.0 million into Endeca earlier this year. Until that investment, I mostly checked on enhancements to TREX, which have been slow in coming in comparison to other enterprise content processing vendors’ upgrades. The core of the Reuters story which was at this link at 6 40 pm on October 9, 2008, is that the SAP board of directors wants cost savings. Among the changes are stops on new hires and changes in business travel. I haven’t seen much information about new deals for TREX or for Endeca SAP installations in the last few months. I am going to try and chase down some information about the TREX search system sales. I don’t have enough information to make a comment about SAP R&D spending, upgrades to TREX, or the Endeca deal.

Stephen Arnold, October 10, 2008

Recommind: A Cash Infusion to Go to eDiscovery 2.0

October 10, 2008

Most people don’t know what eDiscovery is. Believe me. You are better off not knowing. The term applies to information obtaining during the discovery process of a legal matter. The “e” part means that software does some of the heavy lifting. Human attorneys fall asleep when grinding through hundreds of thousands of pages of documents, email, and transcripts. Software never sleeps. The customers for eDiscovery services include some law firms. Keep in mind that although there are a quarter million attorneys running around the lower 48 states, the top 50 law firms constitute a small market. The bigger market for eDiscovery are corporations. With the cost of litigation rising more quickly than the price of diet soda in Harrod’s Creek, Kentucky, eDiscovery can save in house legal operations big money.

Recommind emerged as a challenger to Stratify, now a unit of Iron Mountain, a $2.8 billion records and information company known for putting paper in limestone caves. Recommind’s system is, to the uninitiated, similar to Autonomy’s. The company started in the legal market, made a run at the enterprise search market, and now with this infusion of $7.5 million seems to be nosing back into the eDiscovery space. You must read Anthony Ha’s story “Recommind Raises $7.5 million for eDiscovery 2.0” here. I’m baffled by the “2.0” part. Recommind has to come up with something beyond 2.0 because Brainware, Clearwell Systems, and a couple of other firms such as Autonomy have caught the fancy of some legal eagles. I think that the “2.0” is somewhat gratuitous, maybe meaningless. As I read the story by Mr. Ha, Recommind is going to use the money to grow.

Kennet Partners was the lead for the financing. Smart money knows where there’s a payoff. Recommind will have the cash to leap frog some of the competitors, but not necessarily the ones mentioned in the New York Times article. Once the technology moves Recommind ahead of the upstarts who are selling management systems, not just eDiscovery systems, then Recommind will have to adjust its marketing. My offer of visiting and conducting an interview with the firm was rejected. “Too busy” was the reason I was given by the firm’s top sales guru. With smart money watching $7.5 million, expecting its cash to grow, Recommind will have an exciting road to travel in the next six to nine months as it goes “2.0” as the economic thermometer drifts toward zero.

Stephen Arnold, October 10, 2008

Google: We’re Not a Publisher but …

October 10, 2008

Chris Snyder’s “AOL Sends Journals Users to Google’s Blogger” for Wired here reports that AOL will transfer its bloggers’ content to Google. Maybe Google is an aggregator, not a publisher. Whatever the term, original content is original content. Making that content available is a publishing function. Google insists that it is not a publisher. Maybe a better logician than I can explain this apparent contradiction. Mr. Snyder does not seize upon this angle. He must know more about Google’s definition of publishing than I do. He works for a real publication, and I work with geese and dogs.

Stephen Arnold, October 10, 2008

The Dream of Downstream

October 9, 2008

“When Will BI Head Downstream?” asks Shadan Malik. The question is one that challenges business intelligence vendors to expand their market. The story appeared in eCommerceTimes.com on June 27, 2008. You can read the full story here. (This is a story that has a pop up ad, so you could encounter a dead link or be unable to access this story.)

The write up reviews the consolidation of the business intelligence industry. You know the mantra by heart: IBM bought Cognos, SAP bought Business Objects, etc.

For me the key point of the essay is this statement:

The rapid expansion of these vendors has led them to focus on integrating platforms rather than creating new tools, which has resulted in complex and expensive solutions and stagnant innovation.

Business intelligence makes such good sense. The arguments advanced by vendors whet the appetite of executives for:

  • Dashboards. The idea is that a busy manager can see the key data at a glance. The metaphor is the speedometer and oil light in your automobile. Who would not want to have a single heads up display just like a fighter pilot?
  • Consolidated information. Most organizations have data scattered around just like a fraternity has junk piled in every room’s nooks and crannies. Finding anything is the equivalent of rummaging through a pile of sweaty gym clothes and mismatched socks. Who would not want an organized view of information?
  • Real time alerts. Most organizations know when something bad happens by happenstance. A sales person takes a client to lunch and the client says, “You guys are not going to get the contract for our new data management system” or “You read about a competitor’s winning a contract in the Wall Street Journal”. Who would not want to know important events in real time?

The article takes a final jab at business intelligence vendors with this observation:

Organizations face the challenge of increasing loads of data, and traditional spreadsheet reports are not providing the quick and easy information access that is required for them to complete their job effectively. The majority of organizations have a variety of software applications that constitute their data infrastructure, and this data is often locked within the silos of these independent applications, making it difficult to report on combined data and have a holistic view of the business.

Annex Greener Pastures

Towns grow by annexing smaller communities. The idea is part of the American approach to getting big. In Kentucky, Louisville annexed Jefferson Country. The result was the Louisville was going to be as large a metropolitan area than Boston or some other big city. The arguments were that consolidation makes the city cheaper to operate because duplication of certain services would be eliminated. Well, Louisville is not Boston, and the annexation of adjacent communities has not paid off. Many people just moved farther away from the city, preferring the inefficiencies of life outside the reach of bureaucracy.

Just annex that land, er, I mean markets.

The notion of annexing markets is similar. Business intelligence vendors and their owners want to move downmarket. I anticipate that smaller business intelligence firms will be gobbled up. Prices for business intelligence services will be cut as vendors offer more business intelligence services from the cloud. Microsoft and other vendors will bundle business intelligence into larger packages.

However, the push is initiated, the idea is the same: move downmarket in order to generate more revenue.

Sprawl Brings New Problems

The problem for business intelligence is that business intelligence, like enterprise search, difficult to implement. The intuitive goodness of doing “business intelligence” glosses over the expensive, time consuming work that is required to make the systems work as the customers perceive them as working. The reality of business intelligence is very different from the jazzy, easy-to-use services that marketers plant in the minds of customers.

A short list of these issues includes:

  • The data are a mess and it is expensive to clean up simple things like names in many different forms and formats
  • Information is just plain wrong. Commercial databases cannot easily remove flawed data, and most businesses lack the resources specialists have. Bad data are endemic.
  • Real time is a pipe dream. If you want to analyze streams of real time data, get ready to write big checks for specialized hardware and systems. There’s a good reason why the Exegy real time data engine costs six figures a box.
  • Business intelligence requires that users understand what mathematical procedures do in order to generate results. A value such as 0.04567 does not mean much without some deeper understanding of what was analyzed and how and for what purpose.

Downmarket is Not Downstream

The metaphor of moving downmarket reminds me of sitting in a rubber tube and floating down the river near Moose Jaw, Maine. As long as I kept still, the current did the rest. I was clueless about the rapids around the bend, but progress at the outset seemed automatic. How nice, I thought, just before I entered the Class 5 rapids.

Business intelligence, just like enterprise search, may behave the same way. Business intelligence and search are becoming utilities, but they are when packaged for the downmarket cut down to an inner tube. The idea of seeing how much of a product sold in real time is okay when the product units are well defined, the reporting systems working well, and the data flows leisurely. When the real world of messy data and fast changes become evident, the vendors are in for a wild ride.

Observations

At this time, simplified business intelligence, just like the basic key word search systems, give the appearance of delivering “good enough” results. The challenge arises in these conditions:

  1. The flows of data exceed the system’s capabilities, allowing the user of the business intelligence system to experience the thrill of losing control and being carried along unaware of what the heck is coming next
  2. A crash because the customer cannot handle the business intelligence system no matter how simple its controls
  3. A greater loss of control than previously because the automatic nature of the system allows managers to relax, trusting the new system.

My hunch is that more business intelligence vendors will offer simplified versions of certain business intelligence functions. The adoption of these tools will over time accelerate. The market does not yet know what it does not know. Like search, only when the system does not perform to expectations will the problems surface. It has taken decades for the dissatisfaction with search to become widely known. Business intelligence may, in my view, follow the same trajectory.

Stephen Arnold, October 9, 2008

Surviving Nuclear Winter: 10 Item Checklist for Selling Content Processing

October 9, 2008

After a month of international travel and dozens of meetings, I went through my old-fashioned paper notebook and looked at the comments I wrote to myself. I am not sure if these are useful, but I thought I would save myself the hassle of creating a file and storing it in my “Book Notes” folder on my desk top computer. If you want to critique, refine, or criticize these thoughts, please, use the comments section to the Web log. I received a flurry of emails from PR mavens last night who discovered that pulling this goose’s tail feathers produces a couple of loud honks and fierce beak peck.

image

Set Up

It should come as no surprise that two of the high profile search vendors have been working overtime to generate PR buzz and revenue. Furthermore, I have documented the sad fate of content processing companies who post a Web site, invite people to contact the company, and then don’t respond. You can plow through the postings on this diary / Web log and find these articles about SurfRay, TeezIR, and other firms. Finally, there are quite a few start ups. I met with two in San Jose and one in Utrecht that show significant promise. These outfits are in pre-divestment mode, so each has to hit up mom and dad for cash to keep the lights on.

These points, then, are designed to encapsulate what I thought as I pondered these meetings and the information I gathered about content processing in the last month. If the list is useful, great. If it annoys you, use the comments to tell. I don’t need to hear from Trent or Sky for a PR Webinar.

The List

Here are the 10 tips:

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta