Talk to Text: Problem. What Problem?
April 15, 2016
I marvel at the baloney I read about smart software. The most effective systems blend humans with sort of smart software. The interaction of the human with the artificial intelligence can speed some work processes. But right now, I am not sure that I want a smart software driven automobile to navigate near the bus on which I am riding. I don’t need smart automobile keys which don’t work when the temperature drops, do you? I am not keen on reading about the wonders of IBM Watson type systems when IBM struggles to generate revenue.
I read “Why Our Crazy-Smart AI Still Sucks at Transcribing Speech.” Frankly I was surprised with the candor about the difficulty software has in figuring out human speech. I highlighted this passage:
“If you have people transcribe conversational speech over the telephone, the error rate is around 4 percent,” says Xuedong Huang, a senior scientist at Microsoft, whose Project Oxford has provided a public API for budding voice recognition entrepreneurs to play with. “If you put all the systems together—IBM and Google and Microsoft and all the best combined—amazingly the error rate will be around 8 percent.” Huang also estimates commercially available systems are probably closer to 12 percent. “This is not as good as humans,” Huang admits, “but it’s the best the speech community can do. It’s about as twice as bad as humans.”
I suggest your read the article. My view is that speech recognition is just one area which requires more time, effort, research, and innovation.
The situation today is that as vendor struggle to prove their relevance and importance to investors, many companies are struggling to generate sustainable revenue. In case anyone has not noticed, Microsoft’s smart system Tay was a source of humor and outrage. IBM Watson spends more on marketing the wonders of its Lucene, acquired technology, and home brew confection than many companies earn in a year.
There are folks who insist that speech to text is not that hard. It may not be hard, but this one tiny niche in the search and content processing sector seems to be lagging. Hyperbole, assurance, and marketing depict one reality. The software often delivers a different one.
Who is the leader? The write up points out:
…most transcription start-ups seem to be mainly licensing Google’s API and going from there.
Yep, the Alphabet Google thing.
Stephen E Arnold, April 15, 2016
Video: The Top Info Dog
April 4, 2016
I love the capitalist tool. The founder rode a motorcycle. When I was in Manhattan, I had the pleasure of listening to the Malcolm-cycle burble and grunt when talking with a couple of pals. Wonderful that noise and odor.
I read “The Content Pyramid: And Why Video Must Be at the Top.” I am not sure the founder of the capitalist tool was into video. Well, the capitalist tool is an an article with a parental “must” makes this point:
Video is the Matryoshka doll of content.
I did not know that. I know that some folks who shoot videos write scripts, sell them and then other people (who know better than the author) rewrite them.
The write up points out that a video has a script. But the video has pictures and audio.
I need to take a couple of deep breaths. My heart is racing with the impact of these comments.
I learned:
As more and more content consumption goes mobile, it’s usually a necessity to create multiple lengths and optimized formats of video content, so you should always have tiered, multi-channel thinking built in to your editorial process.
So how much video does the capitalist tool have on YouTube? 4,900 videos. But that’s not too many. I ran the query “Forbes” on Google Video and learned that there are 16,900 videos available. I checked Vimeo and learned there were 521 videos. I checked Blinkx and found quite a few false drops.
The problem is that I have never seen a reference to a Forbes video. I do receive mail addressed to my deceased father enjoining him to re-subscribe to the print edition of Forbes Magazine. But the video thing with the podcast, the clips, and the use of video in marketing. Not on my radar.
Remember the “must.” How about adding the concept of “effective”?
Video by itself is a bit of an ego play in my opinion. When no one watches the video or knows a video exists, what’s the point? Right, right. I forget. Some ad agencies love to do video shoots in Half Moon Bay. It is fun. How bright the video shines depends on more the height of the pyramid in my opinion.
Stephen E Arnold, April 4, 2016
Watson Weakly: Another Game. This Time I Spy. Huh?
March 28, 2016
I survived the Go games. In case you have been on an extended vacation, Google’s smart software beat a human at the game of Go. I assume that this smart software did not drive the car which ran into a bus, but that’s another issue.
I then noted “IBM Watson Could Soon Use Artificial Intelligence to Beat You at a Game of I Spy.” I love the use of the word “could.” I prefer supposition to reality. Contrast the satisfaction of “I could go to the gym” with “I am eating potato chips.” Which does IBM prefer? If you answered, “Generate substantial revenue”, you are incorrect.
The write up in question reports that IBM has “updated” Watson. I noted this statement about the updated Watson:
IBM has created a ‘Visual Recognition Demo’ to showcase Watson’s latest trick, which allows users to feed Watson an image before it tells you what it believes it sees. For example, supplying Watson with the image of a tiger throws up the result 77 per cent tiger, 26 per cent wild cat and 63 per cent cat.
In my experience, determining if an animal is a real live and possibly hungry tiger, that error could be darned interesting. On my last trip to Africa, I learned that a hapless trekker discovered that confusing “cat” with “tiger” can have interesting consequences.,
Sigh. IBM appears to be making news out of some image processing capabilities which I have seen in action before. How long “before”? Think more years than IBM has been reporting declining revenues. Watson, what can one do about that? Hello, Watson. Are you there?
Stephen E Arnold, March 28, 2016
DeepGram: Audio Search in Lectures and Podcasts
March 23, 2016
I read “DeepGram Lets You Search through Lectures and Podcasts for Your Favorite Quotes.” I don’t think the system is available at this time. The article states:
Search engines make it easy to look through text files for specific words, but finding phrases and keywords in audio and video recordings could be a hassle. Fortunately, California-based startup DeepGram is working on a tool that will make this process simpler.
The hint is the “is working.” Not surprisingly, the system is infused with artificial intelligence. The process is to covert speech to text and then index the result.
Exalead had an interesting system seven or eight years ago. I am not sure what happened to that demonstration. My recollection is that the challenge is to have sufficient processing power to handle the volume of audio and video content available for indexing.
When an outfit like Google is not able to pull off a comprehensive search system for its audio and video content, my hunch is that the task for a robust volume of content might be a challenge.
But if there is sufficient money, engineering talent, and processing power, perhaps I will no longer have to watch serial videos and listen to lousy audio to figure out what some folks are trying to communicate in their presentations.
Stephen E Arnold, March 23, 2016
Multimedia Data Mining
February 3, 2016
I read “Knowledge Discovery using Various Multimedia Data Mining Technique.” The write up is an Encyclopedia Britannica type summary of the components required to make sense of audio and video.
I noted this passage:
In this paper, we addressed data mining for multimedia data such as text, image, video and audio. In particular, we have reviewed and analyzed the multimedia data mining process with different tasks. This paper also described the clustering models using video for multimedia mining.
The methods used by the systems the author considered use the same numerical recipes which most search vendors know, love, rely upon, and ignore the known biases of the methods: Regression, time series, etc.
My take away is that talk about making sense of the flood of rich media is a heck of a lot easier than processing the video uploaded to Facebook and YouTube in a single hour.
The write up does not mention companies working in this farm yard. There are some nifty case studies to reference as well; for example, Exalead’s video search and my touchstone, Google YouTube and Google Video Search. Blinkx (spun out of Autonomy, a semi famous search outfit) is a juicy tale as well.
In short, if you want to locate videos, one has to use multiple tools, ask people where a video may be found, or code your own solution.
Stephen E Arnold, February 3, 2016
Google Maps: Blurred Spots
January 28, 2016
Short honk: You might be able to search by lat and long, but you will not see “it.” To get a partial run down on what’s not visible in Google Maps, navigate to “Controversial Places That Google Maps Won’t Let You See.”
The question becomes, “How does one see these blurred locations?” There are some options, but that’s the information covered in my lectures for Telestrategies’ “Now That Google Doesn’t Work, What Does an Investigator Do.” There are some free and for fee services which are quite useful.
A good question to ponder is, “Why?”
Why are some locations visible via Google and the same locations are not visible in Bing?
If it is not there, one cannot search it. If it is there and blurred, one has to find an option. Life online. Such a drag.
Stephen E Arnold, January 28, 2016
Unogs: A Third Party Netflix Search
January 26, 2016
My wife loves Netflix. She finds programs that strike me as a bit fanciful, but that’s okay. How do she, her friends, and millions of other people locate just the right video confection for snowmageddon weekend?
Not with the Netflix search and recommendation as far as I know. I dabbled with this service a couple of times and formed two opinions:
- The folks have a lot of work to do in basic findability
- The interface is not my cup of hot chocolate. (If you love that Netflix search system, have at it. I still read.)
An alternative seems to be available if the information in “This Site Lets You Search the Worldwide Netflix Library” is on the money. I learned one can use Unogs. Here’s some color:
The “unofficial Netflix online Global Search” (uNoGS) takes most of the guesswork out of the process: it lets you search by movie or actor, narrow the results by a few extra fields, and then spits out what movies are available in which countries. From there, users just need to use one of many cheap VPN services, fake the correct country, and let the back episodes of Doctor Who trickle in. The site is also a wealth of data on which countries have the best and worst libraries, and what VPNs give access to which countries. According to an interview with TorrentFreak, the site’s creator ‘Brian’ initially created the site solely for his own personal use, before putting it online last year.
Keep those brain cells in idle mode. Gobble the videos, gentle reader. Some of the large online outfits really covet people who find video consumption more fun that reading the works of James Clerk Maxwell.
Stephen E Arnold, January 27, 2016
Shodan: Web Cam Search Engine
January 26, 2016
When snowmageddon hit the DC area, I thought it would be amusing to check out some of the streets which once enchanted me. Alas. The webcams were not working particularly well.
I poked around and located a couple of functioning devices. Just as I figured. Quite a mess, but it is Washington, DC. A fine, well organized place.
Get ready for the next snowpocalype. Navigate to “Shodan Search Engine Provides Access to Hundreds of Unsecured Webcams.” The write up describes how the unsecured webcam search engine finds unsecured webcams. The system may prove interesting to those explore.
I learned:
The new feed consists of webcams that stream video, have an open port, and don’t require any authentication, which is how Shodan is able to snap screenshots in the first place. These webcams all employ the Real Time Streaming Protocol (RTSP) on port 554, which is what makes them so easy to discover.
Shodan is at https://www.shodan.io/. I put tape over my computer’s video thingies. Just a thought for you to consider.
Stephen E Arnold, January 26, 2016
IBM Watson Will See Soon
January 6, 2016
I read “Watson to Gain Ability to See with Planned $1B Acquisition of Merge Healthcare.” This mid 2015 deal will, according to the IBM announcement:
Watson will gain the ability to “see” by bringing together Watson’s advanced image analytics and cognitive capabilities with data and images obtained from Merge Healthcare Incorporated’s medical imaging management platform.
Interesting. IBM has a number of content management platforms; for example, FileNet. Reconciling the different types of images within Watson’s content intake system will keep some folks busy at Big Blue. The last diagnostic test I had generated a live stream of video images of various body parts chugging along. Movies!
Watson is a capable system, right?
Stephen E Arnold, January 6, 2016
Search without Words: The ViSenze API
January 5, 2016
I read “GuangDa Li, Co-Founder and CTO ViSenze on Enabling Search without Key Words.” The article, I wish to point out, is written in words. To locate the article, one will have to use words to search for information about Dr. Li. Dragging his image to Google Images will not do the trick. The idea for search without words continues to attract attention. Ecommerce and law enforcement are keen to find alternatives to word centric queries. Searching for a text message with a particular emoji is not easy using words and phrases.
According to the write up:
In February 2013, GuangDa Li along with Oliver Tan, an industry veteran started ViSenze, a spin-off company from NExT, a research centre jointly established between National University of Singapore (NUS) and Tsinghua University of China. ViSenze has developed a technology that enables search without keywords. Users simply need to click a photo and ViSenze brings you the relevant search results based on that image.
The write up contains several points which I found interesting.
First, Mr. Li said:
Because of my background in internet media processing, I anticipated the change in the industry about 4 years ago – there was a sharp rise in the amount of multimedia content on the internet. The management, search and discovery of media content has become more and more demanding.
Image search is a challenge. Once promising systems to query video like Exalead’s system have dropped from public view. Video search on most services is frustrating.
Second, the business model for ViSenze is API focused. Mr. Li said:
ViSearch Search API is our flagship product and it also serves as the fundamentals for our other vertical applications. The key advantage of ViSearch API is that it is a perfect combination of latency, scalability and accuracy.
The third passage of interest to me was:
We used to be in stealth mode for a while. Only after our API was launched on the Rakuten Taiwan Ichiba website, did we start to talk with investors. It just happened.
I interpreted this to suggest that Rakuten recognizes that traditional eCommerce search systems like Amazon are vulnerable to a different information access approach.
Should Amazon worry about Rakuten or regulators? Amazon does not worry about much it seems. Its core search and cloud based search systems are, in my view, old school and frustrating for some users. Maybe ViSenze will offer a way to deliver a more effective solution for Rakuten. Competition might motive Amazon to do a better job with its own search and retrieval systems.
Stephen E Arnold, January 5, 2016