Talk to Text: Problem. What Problem?
April 15, 2016
I marvel at the baloney I read about smart software. The most effective systems blend humans with sort of smart software. The interaction of the human with the artificial intelligence can speed some work processes. But right now, I am not sure that I want a smart software driven automobile to navigate near the bus on which I am riding. I don’t need smart automobile keys which don’t work when the temperature drops, do you? I am not keen on reading about the wonders of IBM Watson type systems when IBM struggles to generate revenue.
I read “Why Our Crazy-Smart AI Still Sucks at Transcribing Speech.” Frankly I was surprised with the candor about the difficulty software has in figuring out human speech. I highlighted this passage:
“If you have people transcribe conversational speech over the telephone, the error rate is around 4 percent,” says Xuedong Huang, a senior scientist at Microsoft, whose Project Oxford has provided a public API for budding voice recognition entrepreneurs to play with. “If you put all the systems together—IBM and Google and Microsoft and all the best combined—amazingly the error rate will be around 8 percent.” Huang also estimates commercially available systems are probably closer to 12 percent. “This is not as good as humans,” Huang admits, “but it’s the best the speech community can do. It’s about as twice as bad as humans.”
I suggest your read the article. My view is that speech recognition is just one area which requires more time, effort, research, and innovation.
The situation today is that as vendor struggle to prove their relevance and importance to investors, many companies are struggling to generate sustainable revenue. In case anyone has not noticed, Microsoft’s smart system Tay was a source of humor and outrage. IBM Watson spends more on marketing the wonders of its Lucene, acquired technology, and home brew confection than many companies earn in a year.
There are folks who insist that speech to text is not that hard. It may not be hard, but this one tiny niche in the search and content processing sector seems to be lagging. Hyperbole, assurance, and marketing depict one reality. The software often delivers a different one.
Who is the leader? The write up points out:
…most transcription start-ups seem to be mainly licensing Google’s API and going from there.
Yep, the Alphabet Google thing.
Stephen E Arnold, April 15, 2016
Comments
One Response to “Talk to Text: Problem. What Problem?”
A bit misleading. Speech-to-text transcription is useful if you do dictation and want an exact representation of what you said. In a quiet consistent environment, it is very impressive for people who use it regularly and have developed some skill in dictating.
However, “conversational speech” is the type of speech we use with applications like Siri, where what we are looking for is usually an answer or action, not word-for-word transcription. We are looking for UNDERSTANDING, not word-for-word transcription, and the word error rate that Huang is quoting is not equivalent to performing the wrong action. Both humans and machines can get the gist of an utterance without fully understanding every word.