Nstein Branches Out

December 11, 2008

Nstein Technologies is helping media companies like Scripps, Bonnier, and Time expand search  taxonomies to return better search results. By customizing word relationships, Nstein uses semantics to categorize results in context. The goal is to increase user satisfaction. By giving them better results in searches, the customers are more likely to return to the web site. To support the idea, Nstein redesigned its entire site, incorporating a custom taxonomy to increase reader satisfaction. Their example: “Stuffing” was added to the taxonomy – and an association was made between “Dressing” with “Stuffing,” so no matter which keyword a reader chose, all relevant recipes would appear. Companies also are going farther than custom taxonomies – they are adopting and expanding authority files (controlled lists of products, companies, locations, people, etc.) It all comes down to making search better.

Jessica Bratcher, December 11, 2008

Enterprise Translation Systems

December 10, 2008

Update: December 14, 2008 I came across Nice Translator at http://www.nicetranslator.com/

Original Post

I received an email from a colleague who wanted to know about translation systems. I fired back an answer, but I thought you might want to have my short list of vendors to peruse. If you run a search on Google for “enterprise translation software”, you get more than 400,000 hits. That’s not too useful. If you want to experiment with free translation services, download this file.

BASIS Technologies licenses its various translation components to a number of search and content processing vendors; for example, Fast Search & Transfer was a customer. BASIS has been a leader in providing machine translation of Arabic and related languages. The Federal government has been a fan of BASIS’s systems. You can get some very specialized translation and language components; for example, a Japanese address analyzer.

Google provides a pretty good translation system. Right now, it is for free, which is a plus. Some of the translation systems shoot into six figures pretty quickly if you pack on the language packs and custom tuning. You can use the Google system by navigating here: http://translate.google.com. You can fiddle around and automate translation, but I have heard that Google monitors its translation system, so if you push too much through the system, the Googlers follow up. You can feed it a line of text or a url.

Language Weaver automated language translation. The company serves digital industries and enterprise customers directly and through strategic partnerships. You can hook this system into other enterprise software. Employees can access documents in their native language.The company recently added new language pairs:

  • Bulgarian to/from English
  • Hebrew to/from English
  • Serbian to/from English
  • Thai to/from English
  • Turkish to/from English.

Systran has been a player in translation for years. You have to buy Systran’s software. The desktop version works quite well. The enterprise system involves some fiddling, but you can automate the translation and perform some useful operations on the machine-generated files. You can get more information about Systran here. Systran is used for the Babel Fish online translation function in AltaVista.com and Yahoo.

How good are these systems?

None of the systems is perfect. None of the systems translates as well as a human with deep knowledge of the language pairs being translated. However, the speed of these systems and their “good enough” translations can cope with the volume of data flowing into an organization. I use several of these systems. I can get a sense of the document and then turn to a native speaker to clarify the translation.

I have unsubstantiated information that suggests Google has been making considerable progress with their online translation system. Because the system is available without charge, Google is becoming the default system. AltaVista.com still offers an online translation system, but Google has surpassed that system in speed and language pair support. When Google integrates its online translation system with its other enterprise services, I think Google will continue to chew away at the established vendors’ market share. The GOOG, however, seems happy to let customers find their online translation service. The economic downturn may shift the Google into higher gear.

Stephen Arnold, December 10, 2008

Stratify Adds Cloud Storage Services

December 9, 2008

On December 3, 2008, Stratify–a unit of Iron Mountain–announced new services for its thriving eDiscovery business. You can read the Stratify news release here. The core of the service is disaster recovery. Attorneys apparently have a need to make sure that the legions of attorneys who pour through electronic documents obtained as part of the discovery process can’t nuke the data. Stratify said:

To safeguard client eDiscovery data Stratify has invested in and deployed a fully replicated production datacenter with more than 250 terabytes of storage, 200 servers and redundant 100MB Internet access, coupled with highly trained personnel and security procedures.

Stratify (once did business as Purple Yogi) now wears a blue suit and polished shoes, no sneakers now. IDC’s Sue Feldman weighs in with an observation that the new service “raises the bar” for the companies competing for eDiscovery accounts.

Stratify’s news release added:

Stratify can restore access to client matters within four hours after a potential disaster, recover 100 percent of processed and loaded documents and system metadata, and lose no more than 59 minutes worth of review work product.

In my opinion, the eDiscovery sector is undergoing rapid change. The need for end-to-end solutions and bullet proof systems means that specialist vendors may be forced to add sophisticated new features in order to compete. The problem is that eDiscovery systems are selling to corporations. With the technology and market changing, well funded organizations with a strong client list may have an advantage. Stratify said that it had more than 250 matters underway at this time.

eDiscovery, like business intelligence, is becoming a magnet for search and content processing companies who want to find a way to pump up revenues.

Stephen Arnold, December 9, 2008

Overflight Enhancements

December 9, 2008

ArnoldIT.com’s Google monitoring service made some changes over the last few days. You can access the service by clicking here. Overflight Google allows you to look at the most recent Web log posts on more than 70 Google Web logs. The change is the addition of a link that says, “Show Overflight Update Stream”. When you click it, we display the additions to Google Web logs and put the date on each item. The Update Stream function has been added for each of the Google Web log clusters. If you want to scan headlines, you can browse the most recent items for each of the Google Web logs.

The other enhancement is the addition of entity extraction to the Exalead search system’s index of the corpus of Google Web logs. I am not too happy with the phrase “vertical search”, but I must admit, the Exalead index of more than 70 Web logs is a sharply focused vertical search engine. Here’s a screen shot of the Exalead entity extraction. You can use it to learn the name of the Google customer at Genentech and similar interesting ways to learn about the GOOG.

entity extraction 1

A happy quack to the Exalead team. More enhancements are coming. If you would like an Overflight service on your Web site, write seaky2000 at Yahoo dot come.

Stephen Arnold, December 9, 2008

Arnold White Study Published

December 8, 2008

Galatea has published Successful Enterprise Search Management by Stephen E. Arnold and Martin White. The authors are widely known for their research and consulting in search and information management. An interview with Martin White is here.

The study approaches the management aspect of search in information-dense environments: Ineffective information access can make the difference between an organization meeting its goals and actually going out of business. Managers spend up to two hours a day searching for information, and more than 50% of the information they obtain has no value to them.

To support its advice, the book outlines case studies and references to specific vendors’ systems while offering practical guidance on how to better manage key elements of enterprise search including planning, preparation, implementation, and adaptation. Specific topics addressed include text mining and advanced content processing, information governance, and the challenges language itself presents.

“This book will be of value to any organization seeking to get the best out of its current search implementation, considering whether to upgrade the implementation or starting the process of specifying and selecting enterprise search software,” co-author Martin White said.

A detailed summary of the contents of the 130 page report is available on the Galatea Web site here. You can order a copy, which costs about US$200  here. A number of the longer essays in the Beyond Search Web log consists of information excised from the final report.

Stephen Arnold, December 8, 2008

Autonomy Firmware Technologies Deal

December 8, 2008

On December 3, 2008, Autonomy said that it had inked a deal with Firmware Technologies, an Australian company. Firmware has an OEM deal for Autonomy’s IDOL (Intelligent Data Operating Layer). Firmware will use IDOL for search and content processing in Firmware’s vistime product. The enterprise version of vistime is, according to the company’s Web site delivers “virtual meetings”. In addition,

With the Enterprise Edition you receive a customized, integrated enterprise solution that you can use for a variety of purposes: virtual meetings can replace traveling and accelerate decision-making processes, support becomes more efficient, Internet-based informational events reach large numbers of participants and target groups like journalists, customers, employees, and partners.

Autonomy has more than 350 OEM customers, according to Stouffer Egan, CEO of Autonomy, Inc. A number of search and content processing vendors are pursuing OEM deals. The idea is to make search available to users, or what I call a “just there” implementation.

Stephen Arnold, December 10, 2008

Yahoo Jumping Ahead of Google

December 7, 2008

On December 7, 2008, PCWorld reported that Yahoo will offer abstracts, not laundry lists of search results. The news story I saw appeared in the Yahoo technology news service. You can read “Yahoo Technology Will Offer Abstracts of Search Results” here. If the link goes dead, try the PCWorld site itself here. When I saw the story, the search engine on the PCWorld site couldn’t locate the story. Nothing new there, of course. The key point in the unsigned article was that Yahoo’s Bangalore research facility has figure out how to abstract key information on the page. The idea is that when a user searches for “hotel”, the system would provide an address, map, and other information. I described a similar function in my description of Google’s dossier function. See US20070198481. According to the news story, Yahoo will roll out this service in 2009. My thought is that these types of smart services work really well when described on paper. The value of these “reports” or “answer” type systems is that language can be tricky. Google’s approach relies on “context”, a system and method disclosed in the February 2007 patent documents filed by Google’s Ramanathan Guha. My hunch is that Yahoo went public because of the rumors that Google was starting to use some of its niftier technology in certain public facing services. The Googler with whom I had interaction in London knew zero about the dossier function. Maybe Yahoo is trying to jump ahead of Google. We’ll see. I think Yahoo needs to address the shortcomings of its core search service first.

Stephen Arnold, December 7, 2008

Leximancer Polecat: Polecatting Text Analytics

December 6, 2008

Polecat (www.polecatting.com) offers a reputation analysis solution. The company has inked a deal with Leximancer, a UK based text analytics company. Leximancer’s system allows customer satisfaction, brand management and competitive intelligence professionals to automatically extract the root causes of customer attitudes from Internet communications such as blogs, Web sites and social media, as well as e-mails, service notes, call center notes, voice transcripts and survey feedback. Polecat’s MeaningMine draws strategic marketing insight from external and internal data sources, including intranets, blogs, customer feedback, audio and video, and analyst reports. The combined platform will derive actionable customer insight from unstructured data and provide key insights for customer service, brand management and customer intelligence professionals. The deal if a revenue-sharing agreement allows Polecat to integrate with Leximancer’s Web service interface—operating the Leximancer platform seamlessly in the Polecat SaaS-based solution. Polecat will market the enhanced solution under its own brand.

Stephen Arnold, December 6, 2008

ISYS Search Software CEO Interview

December 1, 2008

Scott Coles has joined ISYS Search Software as the firm’s chief executive officer. Ian Davies, founder, remains the chairman of the company. Among Mr. Coles’s tasks will be to lead the firm’s new strategic direction characterized by an expanded presence in Europe and Asia, specialized vertical-market offerings, a broader channel sales strategy, and a deeper set of embedded search solutions for original equipment manufacturers and independent software vendors.

Coles joins ISYS with a significant background in the commercialization of innovation for multinational corporations, holding senior executive roles with companies such as EDS, Lucent Technologies and Avaya. In the mid-1990s, Scott was the driving force behind the establishment and success of AT&T Bell Labs in Australia.

In his interview with ArnoldIT.com’s Search Wizards Speak, Coles provided information about the company’s focus in 2009.

On this topic, he said:

We are seeing significant increase in other software vendors coming to us to license our engine for incorporation into their products. This marks a general industry trend that I believe will increase significantly in the coming year. A number of applications today that previously had either none or only rudimentary search are finding that their products can be significantly enhanced with a sophisticated search engine. The amount of data that these applications have to deal with is now becoming so large that some form of pre-processing to narrow down to that which is relevant is becoming essential.

Mr. Coles also noted that Microsoft SharePoint continues to capture market share in content management and collaboration. However, the SharePoint user needs access to a range of content and:

ISYS can search all data, both inside and outside of SharePoint. In addition, ISYS provides high quality relevant results through features such as Boolean search operators, multi-dimensional clustering, and many others for which SharePoint users have expressed a desire that are currently not available in the native SharePoint product…we’ve taken great care to ensure our new “intelligent content analysis” methods are reliable, predictable and easily understood by the end user. These include parametric search and navigation, visual timeline refinement bars, intelligence clouds, de-duplication and intelligent query expansion. We’ve even added additional post-query processing to help streamline the e-discovery process. The end result is a core set of new capabilities that help our customers better cull and refine efficiently, without cutting corners on accuracy or relevance.

You can read the full text of the interview with Scott Coles at http://www.arnoldit.com/search-wizards-speak or click here.

Concept Searching

November 30, 2008

Concept Searching Inc. offers a suite of horizontal search and classification products with the goal of delivering critical precision and recall. They’re moving beyond keyword identification and traditional taxonomy approaches. As the company’s tagline “Retrieval Just Got Smarter” suggests, the products use compound term processing to manage unstructured content. The concept extraction improves access to unstructured information so companies can better leverage data. What’s useful is that their cross-platform products, a search program, a classifier, a taxonomy manager, and SQL, are fully integrated with Microsoft SharePoint. There’s no need for a separate index, and the suite respects preset SharePoint security. Features can be integrated or delivered in pieces, and system access is administered using standard SharePoint administrative tools. Some of these functions were among the most popular in the SurfRay Ontolica product which is now long in the tooth. Perhaps Concept Searching will benefit from what seems to be a growing demand for SharePoint tools.

Jessica Brather, November 30, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta