Exclusive Interview with Rosoka NLP Developer, Mike Sorah
November 21, 2012
I continue to learn about companies with high-value content processing technologies. The challenge in real-time translation, if one believes the Google marketing, is now in “game over” mode. The winner, of course, is Google. Other firms can head to the showers and maybe think about competing in another business sector.
But some of that Google confidence may be based on assumptions about Google’s language processing expertise, not more recent systems and methods. I know. This is “burn at the stake” information to a Googler.
However, I saw a demonstration which made clear to me that Google’s “kitchen sink” approach to figuring out how to handle speech input and near real time translation may not be in step with other firm’s approaches. The company with some quite interesting translation technology and a commitment to easy integration is IMT Holdings. The privately held company’s product is Rosoka.
IMT Holdings, Corp. was founded in 2007. Our background is in US government contracting. In the course of the firm’s work, Mr. Sorah saw that the existing NLP or Natural Language Processing (NLP) tools were not able to handle the volumes and complexities of the data they needed to process. In December of 2011, IMT began actively marketing its NLP technology.
I was able after some telephone tag and email to interview Mike Sorah, one of the co-founders of IMT and one of the wizards behind the Rosoka technology.
Mr. Sorah told me:
Many of the existing NLP tools claim to be multilingual, but what they mean is that they have linguistic knowledge bases usually acquired from vendors who provide dictionaries and libraries that make NLP an issue for many licensees. But most of the NLP system don’t process documents that contain English and Chinese or English and Spanish. In the world of our clients, mixed language documents are important. These have to be processed as part of the normal stream, not put in an exception folder and maybe never processed or processed after a delay of hours or days.
The Rosoka system is different from other NLP and translation systems on the market at this time. He asserted:
In most multilingual NLP systems, the customer needs to know before they process the document what language the document is so they can load the appropriate language-specific knowledge base. What we did via our proprietary Rosoka algorithms was to take a multilingual look at the world. Our system automatically understands that a document may be in English or Chinese, or even English and Spanish mixed. The language angle is huge. We randomly sample Twitter stream and have been tweeting the top 10 languages of the week are. English varies between 35 to 45% of the tweets. Every language that Rosoka can process is included. Our multilingual support is not not sold as separate, add-on functionality.
You can read the full text of the interview with Mike Sorah in the ArnoldIT.com Search Wizards Speak series at this link. More information about IMT and Rosoka is available from the firm’s Web site, http://www.imtholdings.com.
Stephen E. Arnold, November 21, 2012