Nstein and Taxonomy Improvement

December 14, 2008

Nstein Technologies is helping media companies like Scripps, Bonnier, and Time expand search  taxonomies to return better search results. By customizing word relationships, Nstein uses semantics to categorize results in context. The goal is to increase user satisfaction. By giving them better results in searches, the customers are more likely to return to the Web site. To support the idea, Nstein redesigned its entire site, incorporating a custom taxonomy to increase reader satisfaction. Their example: “Stuffing” was added to the taxonomy – and an association was made between “Dressing” with “Stuffing,” so no matter which keyword a reader chose, all relevant recipes would appear. Companies also are going farther than custom taxonomies – they are adopting and expanding authority files (controlled lists of products, companies, locations, people, etc.)
It all comes down to making search better.

Jessica Bratcher, December 14, 2008

Autonomy: The Next Big Thing

December 14, 2008

I enjoy the hate mail I get when I write about Autonomy’s news announcements. Some of my three or four readers think that I write these items for Autonomy. Wrong. I am reporting information that my trusty newsreader delivers to me. Here’s a gem that will get the anti-Autonomy crowd revved on a Sunday morning. The article appeared on SmartBrief.com as news. The headline was an attention grabber: “Autonomy at the Cutting Edge of New Multi-Trillion dollar Sector According to Head of Gartner Research.” You can read it here. The url is one of those wacky jobs that can fail to resolve. The core of the story for me is that Gartner has identified a “multi trillion dollar sector.” That has to be good news to those who pay Gartner to make forecasts about markets. Search and content processing has been chugging along in the $1.3 to $3.0 billion range if one ignores the aberration that is Google. I find it hard to believe that Gartner’s financial forecasts can be spot on, but who knows? In case, you want to know what a trillion is, it is one followed by a dozen zeros. The Gartner fellow with the sharp and optimistic pencil is identified as Peter Sondergaard, Senior Vice President, Gartner Research. The source, according the the news release, is an interview with an outfit called Business Spectator. I wonder if a few extra zeros were added as Mr. Sondergaard’s pronouncement was recorded? So, what’s this forecast have to do with Autonomy? Autonomy said in its input to SmartBrief:

Autonomy Corporation plc , a global leader in infrastructure software for the enterprise, today announced that its vision of searching and analyzing structured and unstructured data has now been validated as the next big thing in business IT. According to an interview with Business Spectator, Peter Sondergaard, Senior Vice President, Gartner Research, predicts that the next quantum leap in productivity will come from the use of IT systems that analyze structured and unstructured data. Sondergaard says that Autonomy is at the cutting edge of the new search technology, a sector in the IT industry that will ultimately earn multi trillion dollar revenues.

The story appeared on PRNewswire and on one of the Thomson Reuters’ services. With economies tanking, I am delighted to know that the sector in which I work is slated to become a multi trillion dollar business. I hope I live long enough. Since laughter is a medicine that extends one’s life, I look forward to more Gartner forecasts and to Autonomy’s riding the crest of this predicted market boom.

Stephen Arnold, December 15, 2008

Expert System’s COGITO Answers

December 12, 2008

Expert System has launched COGITO Answers, which streamlines search and provides customer assistance on web sites, e-mail and mobile interfaces such as cell phones and PDAs while creating a company knowledge base.  The platform allows users to search across multiple resources with a handy twist: it uses semantic analysis to absorb and understand a customer’s lingo, therefore analyzing the meaning of the text to process search results rather than just matching keywords. It interprets word usage in context. The program also tracks customer interface and stores all requests so the company can anticipate client needs and questions, thus cutting down response time and increasing accuracy. You can get more information by e-mailing answers@expertsystem.net.

Jessica Bratcher, December 12, 2008

Nstein Branches Out

December 11, 2008

Nstein Technologies is helping media companies like Scripps, Bonnier, and Time expand search  taxonomies to return better search results. By customizing word relationships, Nstein uses semantics to categorize results in context. The goal is to increase user satisfaction. By giving them better results in searches, the customers are more likely to return to the web site. To support the idea, Nstein redesigned its entire site, incorporating a custom taxonomy to increase reader satisfaction. Their example: “Stuffing” was added to the taxonomy – and an association was made between “Dressing” with “Stuffing,” so no matter which keyword a reader chose, all relevant recipes would appear. Companies also are going farther than custom taxonomies – they are adopting and expanding authority files (controlled lists of products, companies, locations, people, etc.) It all comes down to making search better.

Jessica Bratcher, December 11, 2008

Enterprise Translation Systems

December 10, 2008

Update: December 14, 2008 I came across Nice Translator at http://www.nicetranslator.com/

Original Post

I received an email from a colleague who wanted to know about translation systems. I fired back an answer, but I thought you might want to have my short list of vendors to peruse. If you run a search on Google for “enterprise translation software”, you get more than 400,000 hits. That’s not too useful. If you want to experiment with free translation services, download this file.

BASIS Technologies licenses its various translation components to a number of search and content processing vendors; for example, Fast Search & Transfer was a customer. BASIS has been a leader in providing machine translation of Arabic and related languages. The Federal government has been a fan of BASIS’s systems. You can get some very specialized translation and language components; for example, a Japanese address analyzer.

Google provides a pretty good translation system. Right now, it is for free, which is a plus. Some of the translation systems shoot into six figures pretty quickly if you pack on the language packs and custom tuning. You can use the Google system by navigating here: http://translate.google.com. You can fiddle around and automate translation, but I have heard that Google monitors its translation system, so if you push too much through the system, the Googlers follow up. You can feed it a line of text or a url.

Language Weaver automated language translation. The company serves digital industries and enterprise customers directly and through strategic partnerships. You can hook this system into other enterprise software. Employees can access documents in their native language.The company recently added new language pairs:

  • Bulgarian to/from English
  • Hebrew to/from English
  • Serbian to/from English
  • Thai to/from English
  • Turkish to/from English.

Systran has been a player in translation for years. You have to buy Systran’s software. The desktop version works quite well. The enterprise system involves some fiddling, but you can automate the translation and perform some useful operations on the machine-generated files. You can get more information about Systran here. Systran is used for the Babel Fish online translation function in AltaVista.com and Yahoo.

How good are these systems?

None of the systems is perfect. None of the systems translates as well as a human with deep knowledge of the language pairs being translated. However, the speed of these systems and their “good enough” translations can cope with the volume of data flowing into an organization. I use several of these systems. I can get a sense of the document and then turn to a native speaker to clarify the translation.

I have unsubstantiated information that suggests Google has been making considerable progress with their online translation system. Because the system is available without charge, Google is becoming the default system. AltaVista.com still offers an online translation system, but Google has surpassed that system in speed and language pair support. When Google integrates its online translation system with its other enterprise services, I think Google will continue to chew away at the established vendors’ market share. The GOOG, however, seems happy to let customers find their online translation service. The economic downturn may shift the Google into higher gear.

Stephen Arnold, December 10, 2008

Stratify Adds Cloud Storage Services

December 9, 2008

On December 3, 2008, Stratify–a unit of Iron Mountain–announced new services for its thriving eDiscovery business. You can read the Stratify news release here. The core of the service is disaster recovery. Attorneys apparently have a need to make sure that the legions of attorneys who pour through electronic documents obtained as part of the discovery process can’t nuke the data. Stratify said:

To safeguard client eDiscovery data Stratify has invested in and deployed a fully replicated production datacenter with more than 250 terabytes of storage, 200 servers and redundant 100MB Internet access, coupled with highly trained personnel and security procedures.

Stratify (once did business as Purple Yogi) now wears a blue suit and polished shoes, no sneakers now. IDC’s Sue Feldman weighs in with an observation that the new service “raises the bar” for the companies competing for eDiscovery accounts.

Stratify’s news release added:

Stratify can restore access to client matters within four hours after a potential disaster, recover 100 percent of processed and loaded documents and system metadata, and lose no more than 59 minutes worth of review work product.

In my opinion, the eDiscovery sector is undergoing rapid change. The need for end-to-end solutions and bullet proof systems means that specialist vendors may be forced to add sophisticated new features in order to compete. The problem is that eDiscovery systems are selling to corporations. With the technology and market changing, well funded organizations with a strong client list may have an advantage. Stratify said that it had more than 250 matters underway at this time.

eDiscovery, like business intelligence, is becoming a magnet for search and content processing companies who want to find a way to pump up revenues.

Stephen Arnold, December 9, 2008

Overflight Enhancements

December 9, 2008

ArnoldIT.com’s Google monitoring service made some changes over the last few days. You can access the service by clicking here. Overflight Google allows you to look at the most recent Web log posts on more than 70 Google Web logs. The change is the addition of a link that says, “Show Overflight Update Stream”. When you click it, we display the additions to Google Web logs and put the date on each item. The Update Stream function has been added for each of the Google Web log clusters. If you want to scan headlines, you can browse the most recent items for each of the Google Web logs.

The other enhancement is the addition of entity extraction to the Exalead search system’s index of the corpus of Google Web logs. I am not too happy with the phrase “vertical search”, but I must admit, the Exalead index of more than 70 Web logs is a sharply focused vertical search engine. Here’s a screen shot of the Exalead entity extraction. You can use it to learn the name of the Google customer at Genentech and similar interesting ways to learn about the GOOG.

entity extraction 1

A happy quack to the Exalead team. More enhancements are coming. If you would like an Overflight service on your Web site, write seaky2000 at Yahoo dot come.

Stephen Arnold, December 9, 2008

Arnold White Study Published

December 8, 2008

Galatea has published Successful Enterprise Search Management by Stephen E. Arnold and Martin White. The authors are widely known for their research and consulting in search and information management. An interview with Martin White is here.

The study approaches the management aspect of search in information-dense environments: Ineffective information access can make the difference between an organization meeting its goals and actually going out of business. Managers spend up to two hours a day searching for information, and more than 50% of the information they obtain has no value to them.

To support its advice, the book outlines case studies and references to specific vendors’ systems while offering practical guidance on how to better manage key elements of enterprise search including planning, preparation, implementation, and adaptation. Specific topics addressed include text mining and advanced content processing, information governance, and the challenges language itself presents.

“This book will be of value to any organization seeking to get the best out of its current search implementation, considering whether to upgrade the implementation or starting the process of specifying and selecting enterprise search software,” co-author Martin White said.

A detailed summary of the contents of the 130 page report is available on the Galatea Web site here. You can order a copy, which costs about US$200  here. A number of the longer essays in the Beyond Search Web log consists of information excised from the final report.

Stephen Arnold, December 8, 2008

Autonomy Firmware Technologies Deal

December 8, 2008

On December 3, 2008, Autonomy said that it had inked a deal with Firmware Technologies, an Australian company. Firmware has an OEM deal for Autonomy’s IDOL (Intelligent Data Operating Layer). Firmware will use IDOL for search and content processing in Firmware’s vistime product. The enterprise version of vistime is, according to the company’s Web site delivers “virtual meetings”. In addition,

With the Enterprise Edition you receive a customized, integrated enterprise solution that you can use for a variety of purposes: virtual meetings can replace traveling and accelerate decision-making processes, support becomes more efficient, Internet-based informational events reach large numbers of participants and target groups like journalists, customers, employees, and partners.

Autonomy has more than 350 OEM customers, according to Stouffer Egan, CEO of Autonomy, Inc. A number of search and content processing vendors are pursuing OEM deals. The idea is to make search available to users, or what I call a “just there” implementation.

Stephen Arnold, December 10, 2008

Yahoo Jumping Ahead of Google

December 7, 2008

On December 7, 2008, PCWorld reported that Yahoo will offer abstracts, not laundry lists of search results. The news story I saw appeared in the Yahoo technology news service. You can read “Yahoo Technology Will Offer Abstracts of Search Results” here. If the link goes dead, try the PCWorld site itself here. When I saw the story, the search engine on the PCWorld site couldn’t locate the story. Nothing new there, of course. The key point in the unsigned article was that Yahoo’s Bangalore research facility has figure out how to abstract key information on the page. The idea is that when a user searches for “hotel”, the system would provide an address, map, and other information. I described a similar function in my description of Google’s dossier function. See US20070198481. According to the news story, Yahoo will roll out this service in 2009. My thought is that these types of smart services work really well when described on paper. The value of these “reports” or “answer” type systems is that language can be tricky. Google’s approach relies on “context”, a system and method disclosed in the February 2007 patent documents filed by Google’s Ramanathan Guha. My hunch is that Yahoo went public because of the rumors that Google was starting to use some of its niftier technology in certain public facing services. The Googler with whom I had interaction in London knew zero about the dossier function. Maybe Yahoo is trying to jump ahead of Google. We’ll see. I think Yahoo needs to address the shortcomings of its core search service first.

Stephen Arnold, December 7, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta