Deep Web Technologies: Cracking Multilingual Search
January 30, 2012
The rapid development of Web-based technologies over the last decade has created a unique opportunity to bring together the world’s scientists by making it easy for them to share research information. With the shift from US-centric, English language information to information published in other languages, researchers find that facility in one or two other languages is inadequate.
The Multilingual Challenge
Multilingual search increases the value of research output by making it available to a wider audience. Seamless federation and automated translation makes available research from China, Japan, Russia, and other countries prolific in science publication to researchers who may lack facility in certain languages. In the area of patent research, multilingual search greatly broadens the scope of patent research. For English speakers, the availability of multilingual federated search exposes English speakers to diverse perspectives from researchers in foreign countries.
For example, China’s research output is now far outpacing the rest of the world. In 2006 China’s research and development output surpassed that of Japan, the UK and Germany. At this pace, China will overtake the USA in a few years. But non US innovation is not confined to Asia and Europe. Brazil’s share of research output is growing rapidly.
Sample system output from WorldWideScience.org, powered by Deep Web Technologies’ multilingual federating system.
Deep Web Technologies (DWT) is one of the leaders in federated search. Federation requires taking a user’s query and using it to obtain search results from other indexes and search-and-retrieval systems. For example, Deep Web Technologies’ Explorit product handles this process, returning to the user a blended set of results. For the user, federation eliminates the need to frame a query for Google, Medline, USA.gov, and the NASA website. The user frames a query, sends it to Explorit and a single, relevance-ranked results list is displayed to the user.
DWT has moved beyond single language federation and grown to become the leader in federated search of the deep web. This has resulted in the launch of their ground-breaking, patent pending multilingual federated search capability in June of 2011.
“We now live in a much more interconnected world where information is available in a variety of languages,” noted Abe Lederman, President and CTO of Deep Web Technologies. “Major advances in machine translation have made it possible for DWT to develop a revolutionary new Explorit product that breaks down language barriers and advances scientific collaboration and business productivity.”
According to the Federated Search blog post “Researchers Can Now Search Text, Audio, and Video Images in Multiple Languages.” Multilingual federated search rolled out in 2011 is now available to any of DWT’s customers who require seamless access to foreign language documents. We learned:
Multilingual federated search, unveiled June 11, 2010 in Helsinki at the International Council for Scientific and Technical Information’s Summer Conference and originally only available as a beta release to users of the WorldWideScience.org gateway to global science, is now available to all Deep Web Technologies customers who require seamless access to foreign language documents. The system’s multilingual search capability translates a user’s search query into the native languages of the collections being searched, aggregates and ranks these results according to relevance, and translates result titles and snippets back to the user’s original language. The multilingual translation functionality, powered by Microsoft, makes it simple to search collections in multiple languages from a single search box in the user’s native language.
WorldWideScience.org is a global gateway to international science databases and portals. The content is from national governments or vetted by national governments. The system has been developed and is maintained by the US Department of Energy’s Office of Scientific and Technical Information (OSTI). The DWT powered system provides one stop search and includes database content from China, Japan, Korea, Germany, and other non-English countries
WorldWideScience.org Case Example
WorldWideScience.org searches in real-time over 80 collections of scientific and technical information from more than 70 countries around the world. Included in this search are 20 non-English collections. Introduced this past June is a multimedia federated search capability that allows for the seamless integration of audio, video, and image content sources into a system. WorldWideScience.org searches seven multimedia sources: CDC Pod casts, CERN Multimedia, Medline Plus, NASA, NSF, NBII LIFE, and ScienceCinema.
Applications for Business
Scientific and technical researchers are in the forefront of multilingual federated search. Businesses are increasingly likely to need systems which can retrieve information for disparate sources of information with content in a range of languages.
Applications range from customer support to business intelligence. Social content such as information disseminated via FaceBook and Twitter are no longer limited to English. Text mining systems which ignore language nuances may deliver off center outputs. As a result, applications of multilingual federated search embrace:
- Sales and marketing
- Business intelligence
- Enterprise resource management
- Customer relationship management
- Product and supplier management
By providing a single point to searching hundreds of sources, researchers can issue a single search request through a simple Google-like interface and get thousands of results sorted by semantically related concepts. Deep Web Technologies’ investment in creating technologies for researchers and the public to perform multilingual searches of global science in their native language is advancing science through innovative search technology.
My view is that many information retrieval vendors talk about federation, only a small number deliver functional federation. Add in the multilingual requirement, and there is a single vendor to consider—Deep Web Technologies.
Stephen E Arnold, January 30, 2012
Sponsored by Pandia.com