Text Analytics Data from Hurwitz and Associates

May 13, 2009

IT Analysis published Dr. Fern Halper’s “2009 Text Analytics Survey” here. The core of the essay was data from a longer Hurwitz and Associates study, which I have not seen. Based on the data in the article, you may want to get the full study. Two items jumped out at me.

First, customer and competitive intelligence were text analytics drivers. The also ran? Compliance. Second, and more surprising, implementation was as software as a service.

Interesting data.

Stephen Arnold, May 13, 2009

Some Google in the White House

May 13, 2009

A month ago, I received a call from a journalist asking about the Obama White House’s uses of Google. I did not answer the question because big time journalists ask me question, and I am not a public library reference desk worker any more.

One insight can be found here. Google said:

App Engine supports White House town hall meeting
In late March, the White House hosted an online town hall meeting, soliciting questions from concerned citizens directly through its website. To manage the large stream of questions and votes, the White House used Google Moderator, which runs on App Engine. At its peak, the application received 700 hits per second, and across the 48-hour voting window, accepted over 104,000 questions and 3,600,000 votes. Despite this traffic, App Engine continued to scale and none of the other 50,000 hosted applications were impacted. For more on this project, including a graph of the traffic and more details on how App Engine was able to cope with the load, see the Google Code blog.

How Googley is the Obama White House? Pretty Googley I hear.

Ste3phen Arnold, May 13, 2009

.

Google Time

May 13, 2009

Searchology strikes me as a forum for Google to remind journalists, the faithful, unbelievers, and competitors that the GOOG is the big dog in search, You can read dozens of reports about Google’s search enhancements, A good round up was “Google Unveils New Search Features” here. Don’t like AFP, run this query on Google News and pick a more useful summary. For me, the key announcements had to do with time. The date of a document and the time of an event are important but different concepts. Time is a difficult problem, and Google’s announcements underscore the firm’s time expertise. Timelines? No problem. Date sort? No problem. For me what’s important is that time prowess is a tiny tip of much deeper underlying technical capabilities. The Google has some muscles it is just starting to flex.

Stephen Arnold, May 13, 2009

Search and Predictive Math

May 9, 2009

Short honk: Curious about how predictive math will affect search and retrieval? Check out “How Your Search Queries Can Predict the Future here. Queries are useful in interesting ways.

Stephen Arnold, May 9, 2009

Google Disclosed Time Function for Desktop Search

May 7, 2009

Time is important to Google. Date tags on documents are useful. As computer users become more comfortable with search, date and time stamps that one can trust are a useful way to manipulate retrieved content.

The ever efficient USPTO published US7,529,739 on May 5, 2009, “Temporal Ranking Scheme for Desktop Searching”. The patent is interesting because of the method disclosed, particularly the predictive twist. The abstract stated:

A system for searching an object environment includes harvesting and indexing applications to create a search database and one or more indexes into the database. A scoring application determines the relevance of the objects, and a querying application locates objects in the database according to a search term. One or more of the indexes may be implemented by a hash table or other suitable data structure, where algorithms provide for adding objects to the indexes and searching for objects in the indexes. A ranking scheme sorts searchable items according to an estimate of the frequency that the items will be used in the future. Multiple indexes enable a combined prefix title and full-text content search of the database, accessible from a single search interface.

You can snag a copy at http://www.freepatentsonline.com or you can brave the syntax at the USPTO here.

Stephen Arnold, May 7, 2009

Visualization: Interesting and Mostly Misleading

May 7, 2009

i fund the EVvie award for excellence in search system analysis. This year’s number two winner was Martin Baumgartel for his paper “Advanced Visualization of Search Results: More Risks, or More Chances?”. You can read the full text here and a brief interview with him here. You will want to read Mr. Baumgartel’s paper and then the Fast Company article called “Is Information Visualization the Next Frontier for Design?” here by Michael Cannell.

These two write ups point out the difference between a researcher’s approach to the role of visualization and a journalist’s approach to the subject. In a nutshell, visualization works in certain situations. In most information retrieval applications, visualization is not a benefit.

The Fast Company approach hits on the eye candy value of visualization. The title wisely references design. Visualization as illustration can enhance a page of text. Visualization as an aid to information analysis may not deliver the value desired.

Which side of the fence is right for your search system? Selective use of visualization or eye candy? The addled goose favors selectivity. Most visualizations of data distract me, but your response may be different. Functionality, not arts and crafts, appeal to the addled goose.

Stephen Arnold, May 7, 2009

Open Text Vignette: Canadian Omelet

May 6, 2009

A happy quack to the reader who alerted me to this big buck ($300 million plus) deal for Open Text to purchase the financially challenged content management vendor Vignette. You can read the Canadian press take on the announcement here. This story is hosted by Google, so it may disappear after a short time. I recall seeing a story by Matt Asay in August 2008 here that the two companies were talking. Well, the deal appears to be done. Open Text is a vendor with a collection of search systems, including Tim Bray’s SGML engine, Information Dimension’s BASIS, the mainframe centric BRS search, the Fulcrum system, and some built in query systems that came with acquisitions. Vignette, on the other hand, is complex and expensive content platform. The company has some who love it and some like me who think that it is a pretty messy bowl of clam linguini. The question is, “What will Open Text do with Vignette”?” Autonomy snagged Interwoven, snagged some up sell prospects, and fattened its eDiscovery calf. Open Text has systems that can manage content. Can Open Text manage the money losing Vignette? Autonomy in my opinion is pressuring Open Text. Open Text now has to manage the Vignette system and marshal its forces against the aggressive Autonomy. Joining me on the skepticism skateboard is ZDNet’s Stephen Powers. He wrote “Can Open Text Turn the Page on Vignette’s Recent History?” here. He wrote:

The other interesting question raised by this announcement: what to do about the Vignette brand? The press release states that Vignette will be run as a wholly-owned subsidiary. But will Open Text continue to invest in what some argue is a damaged brand? Or will they eventually go through a rebranding, as they did with their other ECM acquisitions, and retire the purple logo? Time will tell.

Mr. Powers is gentle. I think the complexity of the Vignette system, its money losing 2008, and the push back some of its licensees have directed at the company. Does Open Text have the management skill and the resources to deal with integration, support, and product stability issues? Will Open Text customers grasp the difference between Open Text’s overlapping product lines?

My hunch is that this deal is viewed by Autonomy as an opportunity, not a barrier.

Stephen Arnold, June 6, 2009

Google vs Alpha: A Phantom Faceoff

May 6, 2009

Technology Review had an opportunity to put Google Public Data (not yet available) and Wolfram Alpha (not yet available) to the test. The expert crafting the queries and reporting the results of the phantom face off was David Talbot. You can read his multipart analysis here:

I found the screenshots interesting. The analysis, however, did not answer the questions that I had about the two services; for example:

  • How will these services identify sources with “credibility” scores? I have heard that Google calculates these types of values, and I assume that Wolfram Alpha will want to winnow the goose feathers from the giblets as I try to do in this human written Web log
  • What is the scope of the data sets in these two “demo” or trial systems? I know first hand how difficult it is to get some data in their current form for on the fly analyses. There are often quite a few data sets in the wild. The question is which ones are normalized and included and which ones are not? Small data sets can lead to interesting consequences for “the decider”.
  • What is the freshness of the data; that is, how often are the indexes refreshed? Charts can be flashy but if the information is not fresh, the value can be affected.

Technology Review is trying to cover search and that’s good. Summarizing sample queries is interesting. Answering the questions that matter is much more difficult even for Technology Review.

Stephen Arnold, May 6, 2009

IBM and Data Mashups

May 6, 2009

Google Public Data and Wolfram Alpha. Dozens of business intelligence vendors like Business Objects and Clarabridge. Content processing systems like Connotate and Northern Light. And now IBM. These companies and IBM want to grab a piece of the data transformation, analysis, and  mashup business. In the pre crash days, MBAs normalized data, figured out what these MBA brainiacs thought were valid relationships, and created snazzy charts and graphs. In the post crash era, smart software is supposed to be able to do this MBA-type work without the human MBAs. IBM, already owners of Web Fountain and other data crunching tools, bought Exeros, a privately held maker of computer programs that help companies analyze data across corporate databases. You can read one take on the story here.

If you want more information about Exeros, explore these links:

  • The official news release here
  • The architecture for transformation and other methods here
  • Data validation block diagram here.

How does Exeros differ from what’s available from other vendors? Easy. Exeros has enterprise partners and customers plus some nifty technology.

What I find interesting is that IBM pumps big bucks into its labs, allows engineers to invent data transformation systems and methods, and then has to look outside for a ready-to-sell bundle of products and services. Does this suggest that IBM would get better return on its money by focusing on acquisitions, and scaling back its R&D?

Will this acquisition allow IBM to leap frog Google? Maybe, but I don’t think so. Google has had some of IBM Almaden wizards laboring in the Googleplex along with other “transformation” experts. Google is edging toward this enterprise opportunity with some exciting technology which I describe in Google: The Digital Gutenberg here. IBM thinks a market opportunity exists, and it is willing to invest to have a chance to increase its share.

Stephen Arnold, May 6, 2009

Microsoft and Search: Interface Makes Search Disappear

May 5, 2009

The Microsoft Enterprise Search Blog here published the second part of an NUI (natural user interface) essay. The article, when I reviewed it on May 4, had three comments. I found one comment as interesting as the main body of the write up. The author of the remark that caught my attention was Carl Lambrecht, Lexalytics, who commented:

The interface, and method of interaction, in searching for something which can be geographically represented could be quite different from searching for newspaper articles on a particular topic or looking up a phone number. As the user of a NUI, where is the starting point for your search? Should that differ depending on and be relevant to the ultimate object of your search? I think you make a very good point about not reverting to browser methods. That would be the easy way out and seem to defeat the point of having a fresh opportunity to consider a new user experience environment.

Microsoft enterprise search Web log’s NUI series focuses on interface. The focus is Microsoft Surface, which allows a user to interact with information by touching and pointing. A keyboard is optional, I assume. The idea is that a person can walk up to a display and obtain information. A map of a shopping center is the example that came to my mind. I want to “see” where a store is, tap the screen, and get additional information.

This blog post referenced the Fast Forward 2009 conference and its themes. There’s a refernce to EMC’s interest in the technology. The article wraps up with a statement that a different phrase may be needed to describe the NUI (natural user interface), which I mistakenly pronounced like the word ennui.

image

Microsoft Suface. Image Source: http://psyne.net/blog4/wp-content/uploads/2007/09/microsoftsurface.jpg

Several thoughts:

First, I think that interface is important, but the interface depends upon the underlying plumbing. A great interface sitting on top of lousy plumbing may not be able to deliver information quickly or in some cases present the information the user needed. I see this frequently when ad servers cannot deliver information. The user experience (UX) is degraded. I often give up and navigate elsewhere.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta