Talk to Text: Problem. What Problem?

April 15, 2016

I marvel at the baloney I read about smart software. The most effective systems blend humans with sort of smart software. The interaction of the human with the artificial intelligence can speed some work processes. But right now, I am not sure that I want a smart software driven automobile to navigate near the bus on which I am riding. I don’t need smart automobile keys which don’t work when the temperature drops, do you? I am not keen on reading about the wonders of IBM Watson type systems when IBM struggles to generate revenue.

I read “Why Our Crazy-Smart AI Still Sucks at Transcribing Speech.” Frankly I was surprised with the candor about the difficulty software has in figuring out human speech. I highlighted this passage:

“If you have people transcribe conversational speech over the telephone, the error rate is around 4 percent,” says Xuedong Huang, a senior scientist at Microsoft, whose Project Oxford has provided a public API for budding voice recognition entrepreneurs to play with. “If you put all the systems together—IBM and Google and Microsoft and all the best combined—amazingly the error rate will be around 8 percent.” Huang also estimates commercially available systems are probably closer to 12 percent. “This is not as good as humans,” Huang admits, “but it’s the best the speech community can do. It’s about as twice as bad as humans.”

I suggest your read the article. My view is that speech recognition is just one area which requires more time, effort, research, and innovation.

The situation today is that as vendor struggle to prove their relevance and importance to investors, many companies are struggling to generate sustainable revenue. In case anyone has not noticed, Microsoft’s smart system Tay was a source of humor and outrage. IBM Watson spends more on marketing the wonders of its Lucene, acquired technology, and home brew confection than many companies earn in a year.

There are folks who insist that speech to text is not that hard. It may not be hard, but this one tiny niche in the search and content processing sector seems to be lagging. Hyperbole, assurance, and marketing depict one reality. The software often delivers a different one.

Who is the leader? The write up points out:

…most transcription start-ups seem to be mainly licensing Google’s API and going from there.

Yep, the Alphabet Google thing.

Stephen E Arnold, April 15, 2016

Written by Stephen E. Arnold · Filed Under News, Rich media

Comments

One Response to “Talk to Text: Problem. What Problem?”

William Meisel on April 21st, 2016 11:42 am

A bit misleading. Speech-to-text transcription is useful if you do dictation and want an exact representation of what you said. In a quiet consistent environment, it is very impressive for people who use it regularly and have developed some skill in dictating.
However, “conversational speech” is the type of speech we use with applications like Siri, where what we are looking for is usually an answer or action, not word-for-word transcription. We are looking for UNDERSTANDING, not word-for-word transcription, and the word error rate that Huang is quoting is not equivalent to performing the wrong action. Both humans and machines can get the gist of an utterance without fully understanding every word.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.