Image Search and, of Course, Google

June 13, 2013

Many years ago I lectured in Japan. On that visit, I saw a demonstration of a photo recognition system. Click on a cow and the system would return other four legged animals —- most of the time. Some years later I was asked to review facial recognition systems after a notable misfire in a major city. Since then, my team and I check out the systems which become known to us.

Progress is being made. That’s encouraging. However, a number of challenges have to be resolved. These range from false positives to context failures. In the case of a false positive, the person or thing in the picture is not the person or thing one sought. In the case of context failure, the cow painted on the side of a truck is not the same as a cow standing in a field with many other cows clumped around.

Software is bumping up against computational boundaries. The methods available have to be optimized to run in available resources. When there are bigger and faster systems, then fancier math can be used. Today’s innovations boil down, in my opinion, to clever manipulations of well known systems and methods. The reason many software systems perform in a similar manner is that these systems share many procedures. Innovation is often optimization and packaging, not a leap frog to more sophisticated numerical procedures. Trimming, chopping down, and streamlining via predictive methods are advancing the ball down the field.

I read with interest “Improving Photo Search: A Step across the Semantic Gap.” Google has rolled out enhanced photo search. The system works better than other systems. As Google phrases it:

We built and trained models similar to those from the winning team using software infrastructure for training large-scale neural networks developed at Google in a group started by Jeff Dean and Andrew Ng. When we evaluated these models, we were impressed; on our test set we saw double the average precision when compared to other approaches we had tried. We knew we had found what we needed to make photo searching easier for people using Google. We acquired the rights to the technology and went full speed ahead adapting it to run at large scale on Google’s computers. We took cutting edge research straight out of an academic research lab and launched it, in just a little over six months. You can try it out at Why the success now? What is new? Some things are unchanged: we still use convolutional neural networks — originally developed in the late 1990s by Professor Yann LeCun in the context of software for reading handwritten letters and digits. What is different is that both computers and algorithms have improved significantly. First, bigger and faster computers have made it feasible to train larger neural networks with much larger data. Ten years ago, running neural networks of this complexity would have been a momentous task even on a single image — now we are able to run them on billions of images. Second, new training techniques have made it possible to train the large deep neural networks necessary for successful image recognition.

The use of “semantics” is also noteworthy. As I wrote in my analysis of Google Voice for a large investment bank, “Google has an advantage because it has data others do not have.” When it comes to predictive methods and certain types of semantics, the Google data sets give it an advantage over some rivals.

What applied to Google Voice applies to Google photo search. Google is able to tap its data to make educated guesses about images. The semantics and the infrastructure have a turbo boosting effect on Google.

The understatement in the Google message should not be taken at face value. The Google is increasing its lead over its rivals and preparing to move into completely new areas of revenue generation. Images? A step but an important one.

Stephen E Arnold, June 13, 2013

Sponsored by Xenky


Comments are closed.

  • Archives

  • Recent Posts

  • Meta