Search and Predictive Math

May 9, 2009

Short honk: Curious about how predictive math will affect search and retrieval? Check out “How Your Search Queries Can Predict the Future here. Queries are useful in interesting ways.

Stephen Arnold, May 9, 2009

Google Disclosed Time Function for Desktop Search

May 7, 2009

Time is important to Google. Date tags on documents are useful. As computer users become more comfortable with search, date and time stamps that one can trust are a useful way to manipulate retrieved content.

The ever efficient USPTO published US7,529,739 on May 5, 2009, “Temporal Ranking Scheme for Desktop Searching”. The patent is interesting because of the method disclosed, particularly the predictive twist. The abstract stated:

A system for searching an object environment includes harvesting and indexing applications to create a search database and one or more indexes into the database. A scoring application determines the relevance of the objects, and a querying application locates objects in the database according to a search term. One or more of the indexes may be implemented by a hash table or other suitable data structure, where algorithms provide for adding objects to the indexes and searching for objects in the indexes. A ranking scheme sorts searchable items according to an estimate of the frequency that the items will be used in the future. Multiple indexes enable a combined prefix title and full-text content search of the database, accessible from a single search interface.

You can snag a copy at http://www.freepatentsonline.com or you can brave the syntax at the USPTO here.

Stephen Arnold, May 7, 2009

Visualization: Interesting and Mostly Misleading

May 7, 2009

i fund the EVvie award for excellence in search system analysis. This year’s number two winner was Martin Baumgartel for his paper “Advanced Visualization of Search Results: More Risks, or More Chances?”. You can read the full text here and a brief interview with him here. You will want to read Mr. Baumgartel’s paper and then the Fast Company article called “Is Information Visualization the Next Frontier for Design?” here by Michael Cannell.

These two write ups point out the difference between a researcher’s approach to the role of visualization and a journalist’s approach to the subject. In a nutshell, visualization works in certain situations. In most information retrieval applications, visualization is not a benefit.

The Fast Company approach hits on the eye candy value of visualization. The title wisely references design. Visualization as illustration can enhance a page of text. Visualization as an aid to information analysis may not deliver the value desired.

Which side of the fence is right for your search system? Selective use of visualization or eye candy? The addled goose favors selectivity. Most visualizations of data distract me, but your response may be different. Functionality, not arts and crafts, appeal to the addled goose.

Stephen Arnold, May 7, 2009

Open Text Vignette: Canadian Omelet

May 6, 2009

A happy quack to the reader who alerted me to this big buck ($300 million plus) deal for Open Text to purchase the financially challenged content management vendor Vignette. You can read the Canadian press take on the announcement here. This story is hosted by Google, so it may disappear after a short time. I recall seeing a story by Matt Asay in August 2008 here that the two companies were talking. Well, the deal appears to be done. Open Text is a vendor with a collection of search systems, including Tim Bray’s SGML engine, Information Dimension’s BASIS, the mainframe centric BRS search, the Fulcrum system, and some built in query systems that came with acquisitions. Vignette, on the other hand, is complex and expensive content platform. The company has some who love it and some like me who think that it is a pretty messy bowl of clam linguini. The question is, “What will Open Text do with Vignette”?” Autonomy snagged Interwoven, snagged some up sell prospects, and fattened its eDiscovery calf. Open Text has systems that can manage content. Can Open Text manage the money losing Vignette? Autonomy in my opinion is pressuring Open Text. Open Text now has to manage the Vignette system and marshal its forces against the aggressive Autonomy. Joining me on the skepticism skateboard is ZDNet’s Stephen Powers. He wrote “Can Open Text Turn the Page on Vignette’s Recent History?” here. He wrote:

The other interesting question raised by this announcement: what to do about the Vignette brand? The press release states that Vignette will be run as a wholly-owned subsidiary. But will Open Text continue to invest in what some argue is a damaged brand? Or will they eventually go through a rebranding, as they did with their other ECM acquisitions, and retire the purple logo? Time will tell.

Mr. Powers is gentle. I think the complexity of the Vignette system, its money losing 2008, and the push back some of its licensees have directed at the company. Does Open Text have the management skill and the resources to deal with integration, support, and product stability issues? Will Open Text customers grasp the difference between Open Text’s overlapping product lines?

My hunch is that this deal is viewed by Autonomy as an opportunity, not a barrier.

Stephen Arnold, June 6, 2009

Google vs Alpha: A Phantom Faceoff

May 6, 2009

Technology Review had an opportunity to put Google Public Data (not yet available) and Wolfram Alpha (not yet available) to the test. The expert crafting the queries and reporting the results of the phantom face off was David Talbot. You can read his multipart analysis here:

I found the screenshots interesting. The analysis, however, did not answer the questions that I had about the two services; for example:

  • How will these services identify sources with “credibility” scores? I have heard that Google calculates these types of values, and I assume that Wolfram Alpha will want to winnow the goose feathers from the giblets as I try to do in this human written Web log
  • What is the scope of the data sets in these two “demo” or trial systems? I know first hand how difficult it is to get some data in their current form for on the fly analyses. There are often quite a few data sets in the wild. The question is which ones are normalized and included and which ones are not? Small data sets can lead to interesting consequences for “the decider”.
  • What is the freshness of the data; that is, how often are the indexes refreshed? Charts can be flashy but if the information is not fresh, the value can be affected.

Technology Review is trying to cover search and that’s good. Summarizing sample queries is interesting. Answering the questions that matter is much more difficult even for Technology Review.

Stephen Arnold, May 6, 2009

IBM and Data Mashups

May 6, 2009

Google Public Data and Wolfram Alpha. Dozens of business intelligence vendors like Business Objects and Clarabridge. Content processing systems like Connotate and Northern Light. And now IBM. These companies and IBM want to grab a piece of the data transformation, analysis, and  mashup business. In the pre crash days, MBAs normalized data, figured out what these MBA brainiacs thought were valid relationships, and created snazzy charts and graphs. In the post crash era, smart software is supposed to be able to do this MBA-type work without the human MBAs. IBM, already owners of Web Fountain and other data crunching tools, bought Exeros, a privately held maker of computer programs that help companies analyze data across corporate databases. You can read one take on the story here.

If you want more information about Exeros, explore these links:

  • The official news release here
  • The architecture for transformation and other methods here
  • Data validation block diagram here.

How does Exeros differ from what’s available from other vendors? Easy. Exeros has enterprise partners and customers plus some nifty technology.

What I find interesting is that IBM pumps big bucks into its labs, allows engineers to invent data transformation systems and methods, and then has to look outside for a ready-to-sell bundle of products and services. Does this suggest that IBM would get better return on its money by focusing on acquisitions, and scaling back its R&D?

Will this acquisition allow IBM to leap frog Google? Maybe, but I don’t think so. Google has had some of IBM Almaden wizards laboring in the Googleplex along with other “transformation” experts. Google is edging toward this enterprise opportunity with some exciting technology which I describe in Google: The Digital Gutenberg here. IBM thinks a market opportunity exists, and it is willing to invest to have a chance to increase its share.

Stephen Arnold, May 6, 2009

Microsoft and Search: Interface Makes Search Disappear

May 5, 2009

The Microsoft Enterprise Search Blog here published the second part of an NUI (natural user interface) essay. The article, when I reviewed it on May 4, had three comments. I found one comment as interesting as the main body of the write up. The author of the remark that caught my attention was Carl Lambrecht, Lexalytics, who commented:

The interface, and method of interaction, in searching for something which can be geographically represented could be quite different from searching for newspaper articles on a particular topic or looking up a phone number. As the user of a NUI, where is the starting point for your search? Should that differ depending on and be relevant to the ultimate object of your search? I think you make a very good point about not reverting to browser methods. That would be the easy way out and seem to defeat the point of having a fresh opportunity to consider a new user experience environment.

Microsoft enterprise search Web log’s NUI series focuses on interface. The focus is Microsoft Surface, which allows a user to interact with information by touching and pointing. A keyboard is optional, I assume. The idea is that a person can walk up to a display and obtain information. A map of a shopping center is the example that came to my mind. I want to “see” where a store is, tap the screen, and get additional information.

This blog post referenced the Fast Forward 2009 conference and its themes. There’s a refernce to EMC’s interest in the technology. The article wraps up with a statement that a different phrase may be needed to describe the NUI (natural user interface), which I mistakenly pronounced like the word ennui.

image

Microsoft Suface. Image Source: http://psyne.net/blog4/wp-content/uploads/2007/09/microsoftsurface.jpg

Several thoughts:

First, I think that interface is important, but the interface depends upon the underlying plumbing. A great interface sitting on top of lousy plumbing may not be able to deliver information quickly or in some cases present the information the user needed. I see this frequently when ad servers cannot deliver information. The user experience (UX) is degraded. I often give up and navigate elsewhere.

Read more

Evvie 2009 Winners: David Evans and Martin Baumgartel

May 4, 2009

Stephen E. Arnold of ArnoldIT.com, http://www.arnoldit.com, announced the Evvie “best paper award” for 2009 at Infonortics’ Boston Search Engine Meeting on April 28.

The 2009 Evvie Award went to Dr. David Evans of Just Systems Evans Research for “E-Discovery: A Signature Challenge for Search.” The paper explains the principal goals and challenges of E-Discovery techniques. The second place award went to Martin Baumgärtel of bioRASI for “Advanced Visualization of Search Results: More Risks or More Chances?”, which addressed the gap between breakthroughs in visualization and actual application of techniques.

evvie 2009

Stephen Arnold (left) is pictured with Dr. David Evans, Just System Evans Research on the right.

The Evvie is given in honor of Ev Brenner, one of the leaders in online information systems and functions. The award was established after Brenner’s death in 2006. Brenner served on the program committee for the Boston Search Engine Meeting since its inception almost 20 years ago. Everett Brenner is generally regarded as one of the “fathers” of commercial online databases. He worked for the American Petroleum Institute and served as a mentor to many of the innovators who built commercial online.

baumgartel

Martin Baumgartel (left) and Dr. David Evans discuss their recognition at the 2009 Boston Search Engine Meeting.

Mr. Brenner had two characteristics that made his participation a signature feature of each year’s program: He was willing to tell a speaker or paper author to “add more content,” and after a presentation, he would ask a presenter one or more penetrating questions that helped make a complex subject more clear.

The Boston Search Engine meeting attracts search professionals, search vendors, and experts interested in content processing, text analysis, and search and retrieval. Held each year in Boston, Ev, as he was known to his friends, demanded excellence in presentations about information processing.

Sponsored by Stephen E. Arnold (ArnoldIT.com), this award goes to the speaker who best exemplifies Ev’s standards of excellence. The selection committee consists of the program committee, assisted by Harry Collier (conference operator) and Stephen E. Arnold.

This year’s judges were Jill O’Neill, NFAIS, Sue Feldman, IDC Content Technologies Group, and Anne Girard, Infonortics Ltd.

Mr. Arnold said, “This award is one way for us to respect his contributions and support his life long commitment to excellence.”

The recipients receive a cash prize and an engraved plaque. Information about the conference is available on the Infonortics, Ltd. Web site at www.infonortics.com and here. More information about the award is here. Information about ArnoldIT.com is here.

The Beeb and Alpha

April 30, 2009

I am delighted that the BBC, the once non commercial entity, has a new horse to ride. I must admit that when I think of the UK and horse to ride, my mind echoes with the sound of Ms. Sperling saying, “Into the valley of death rode the 600”. The story (article) here carries a title worthy of the Google-phobic Guardian newspaper: “Web Tool As Important as Google.” The subject is the Wolfram Alpha information system which is “the brainchild of British-born physicist Stephen Wolfram”.

Wolfram Alpha is a new content processing and information system that uses a “computational knowledge engine”. There are quite a few new search and information processing systems. In fact, I mentioned two of these in recent Web log posts: NetBase here and Veratect here.

image

Can Wolfram Alpha or another search start up Taser the Google? Image source:

In my reading of the BBC story includes a hint that Wolfram Alpha may have a bit of “fluff” sticking to its ones and zeros. Nevertheless, I sensed a bit of glee that Google is likely to face a challenge from a math-centric system.

Now let’s step back:

First, I have no doubt that the Wolfram Alpha system will deliver useful results. Not only does Dr. Wolfram have impeccable credentials, he is letting math do the heavy lifting. The problem with most NLP and semantic systems is that humans are usually needed to figure out certain things regarding “meaning” of and in information. Like Google, Dr. Wolfram lets the software machines grind away.

Second, in order to pull of an upset of Google, Wolfram Alpha will need some ramp up momentum. Think of the search system as a big airplane. The commercial version of the big airplane has to be built, made reliable, and then supported. Once that’s done, the beast has to taxi down a big runway, build up speed, and then get aloft. Once aloft, the airplane must operate and then get back to ground for fuel, upgrades, etc. The Wolfram Alpha system is in it early stages.

Third, Google poses a practical problem to Wolfram Alpha and to Microsoft, Yahoo, and the others in the public search space. Google keeps doing new things. In fact, Google doesn’t have to do big things. Incremental changes are fine. Cumulatively these increase Google’s lead or its “magnetism”, if you will. So competitors are going to have to find a way to leapfrog Google. I don’t think any of the present systems have the legs for this jump, including Wolfram Alpha because it is not yet a commercial grade offering. When it is, I will reassess my present view. What competitors are doing is repositioning themselves away from Google. Instead of getting sand kicked in one face on the beach, the competitors are swimming in the pool at the country club. Specialization makes it easier to avoid Googzilla’s hot breath.

To wrap up, I hope Wolfram Alpha goes commercial quickly. I want to have access to its functions and features. Before that happens, I think that the Beeb and other publishing outfits will be rooting for the next big thing in the hopes that once of these wizards can Taser the Google. For now, the Tasers are running on a partial charge. The GOOG does not feel them.

Stephen Arnold, May 1, 2009

NetBase and Content Intelligence

April 30, 2009

Vertical search is alive and well. Technology Review described NetBase’s Content Intelligence here. The story, written by Erica Naone, was “A Smarter Search for What Ails You”. Ms. Naone wrote:

organizes searchable content by analyzing sentence structure in a novel way. The company created a demonstration of the platform that searches through health-related information. When a user enters the name of a disease, he or she is most interested in common causes, symptoms, and treatments, and in finding doctors who specialize in treating it, says Netbase CEO and cofounder Jonathan Spier. So the company’s new software doesn’t simply return a list of documents that reference the disease, as most search engines would. Instead, it presents the user with answers to common questions. For example, it shows a list of treatments and excerpts from documents that discuss those treatments. The Content Intelligence platform is not intended as a stand-alone search engine, Spier explains. Instead, Netbase hopes to sell it to companies that want to enhance the quality of their results.

NetBase (formerly Accelovation) has developed a natural language processing system.Ms. Naone reported:

NetBase’s software focuses on recognizing phrases that describe the connections between important words. For example, when the system looks for treatments, it might search for phrases such as “reduce the risk of” instead of the name of a particular drug. Tellefson notes that this isn’t a matter of simply listing instances of this phrase, rather catching phrases with an equivalent meaning. Netbase’s system uses these phrases to understand the relationship between parts of the sentence.

At this point in the write up, I heard echoes of other vendors with NLP, semantics, bound phrase identification, etc. Elsevier has embraced the system for its illumin8 service. You can obtain more information about this Elsevier service here. Illumin8 asked me, “What if you could become an expert in any topic in a few minutes?” Wow!

The NetBase explanation of content intelligence is:

… understanding the actual “meaning” of sentences independent of custom lexicons. It is designed to handle myriads of syntactical sentence structures – even ungrammatical ones – and convert them to logical form. Content Intelligence creates structured semantic indexes from massive volumes of content (billions of web-pages and documents) used to power question-and-answer type of search experiences.

NetBase asserts:

Because NetBase doesn’t rely on custom taxonomies, manual annotations or coding, the solutions are fully automated, massively scalable and able to be rolled-out in weeks with a minimal amount of effort. NetBase’s semantic index is easy to keep up-to-date since no human editing or updates to controlled vocabulary are needed to capture and index new information – even when it includes new technical terms.

Let me offer several observations:

  • The application of NLP to content is not new and it imposes some computational burdens on the search system. To minimize those loads, NLP is often constrained to content that contains a restricted terminology; for example, medicine, engineering, etc. Even with a narrow focus, NLP remains interesting.
  • “Loose” NLP can squirm around some of the brute force challenges, but it is not yet clear if NLP methods are ready for center stage. Sophisticated content processing often works best out of sight, delivering to the user delightful, useful ways to obtain needed information.
  • A number of NLP systems are available today; for example, Hakia. Microsoft snapped up PowerSet. One can argue that some of the Inxight technology acquired first by Business Objects then by the software giant SAP are NLP systems. To my knowledge, none of these has scored a hat trick in revenue, customer uptake, and high volume content processing.

You can get more information about NetBase here. You can find demonstrations and screenshots. A good place to start is here. According to TechCrunch:

NetBase has been around for a while. Originally called Accelovation, it has raised $9 million in two rounds of venture funding over the past four years, has 30 employees…

In my files, I had noted that the funding sources included Altos Ventures and ThomVest, but these data may be stale or just plain wrong. I don’t have enough information about Netbase to offer substantive comments. NLP requires significant computing horsepower. I need to know more about the plumbing. Technology Review provided the sizzle. Now we need to know about the cow from which the prime rib comes.

Stephen Arnold, April 30, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta