Finding an Optical Character Recognition Program

July 19, 2013

The I ran the following query for a client project yesterday: “OCR programs.”

I passed the query to Google, Yandex, and Bing in that order. What did I find?

image

There are 11 ads and 10 hits, one set of news items and one set of related search suggestions. Several the links pointed me to downloads which were too confusing to try. The other links pointed to information ranging from Google Groups to commercial companies’ products.

Here’s what Yandex delivered to me:

image

No ads and mostly general information, including a hit to TextBridge which is no longer current.

And Bing?

image

There were five ads, related searches in two places, and links to mostly “free” programs and general information sites.

The reason this is an important series of examples is that I have been reading some of the articles about Google’s somewhat disappointing earnings results. The numbers are huge, but when most search and content processing companies are struggling for growth, Google is the Sir Lancelot of search vendors. If Google can’t grow quickly, what does that say about Google’s business strategy, about other search and content processing companies, and the US economy? My takeaway is not much different from that expressed in USA Today. Yes, USA Today, what one of my goslings calls “McPaper.”

The story is “Google Earnings Clipped in Mobile Headwinds.” The main point is, in my opinion:

Concerns continue about so-called cost-per-click prices that advertisers pay Google for Internet-search advertising.

And then:

Google’s average cost-per-click, which includes clicks related to ads served on Google sites and the sites of its network members, decreased about 6% in the quarter compared with a year ago. Analysts had predicted prices would drop about 3% in the period.

Two forces are at work here. One is the issue of relevance for a general search. None of the systems against which I ran my query returned useful information to me. What I wanted was a list like this:

Alternatives with OCR supported:
http://www.onlineocr.net/
http://free-online-ocr.com/
http://www.aolor.com/pdf-converter/
http://www.anypdftools.com/pdf-to-word.html
http://www.pdfocr.net/
==
An alternative without ocr
http://www.nemopdf.com/pdf-converter.html

Where did I find this list? I had to use multiple search strategies. None of the general Web search systems makes it easy to use Boolean operators. None of the search systems can deliver on point results without the user trying to figure out how to NOT out the ads, the scams, and the out-of-date programs. Date range operations are important to me but not to those engineers who are presenting ad supported or free search systems.

What’s this have to do with Google revenues? Three points:

  1. By aiming at the lowest common denominator, has a feedback loop of increasing difficulty be added to the search process. If a result is not on point in a desktop search, what does a mobile user do to find an acceptable answer?
  2. Advertisers wanting qualified visitors to their Web sites may be producing more and more deceptive landing pages and trying to monetize them. I saw this with some of the freeware and shareware OCR links. The goal was not to provide a trial to a user. I concluded the goal was monetizing anything the marketer could. I know I closed those pages very quickly taking care not to click on anything.
  3. Are users learning not to use search? I know the traffic to the major sites continues to go up? In my own work, I see pages refresh themselves to generate a fake click. Are the clicks, therefore, real? Maybe users are unaware that their clicks may not yield anything for them but are of value only to the advertisers who presumably are happy. But if advertisers are happy, what about USA Today’s insight about Google’s declining click performance?

Where is relevance? I read “SearchYourCloud Awarded US Patent for Improved Search Engine Results.” The article contained an interesting factoid:

Take a guess how long it has been since the United States Patent and Trademark Office (USPTO) granted a new search engine patent?  Hint: It has been longer than you might think given how absolutely vital SEARCH is to everyone – and not just Google, Microsoft and Yahoo, to name a few. Time is up. You can impress your friends with the answer, which is five years.

I suppose one could argue that other search patents have been granted, but that would suck me into one of those arguments like “predictive analytics is/is not search.” The point is that there are warning signs that basic information retrieval of open source content is not working. Whether the US patent office is part of the problem or whether the vendors are struggling to find a way to keep revenues at an acceptable level, I know one thing.

A person with a relatively straightforward query will have a difficult time finding a relevant answer quickly, easily, and without running the risk of being sucked into a marketing house of mirrors.

Interesting to me is that search is becoming less useful across the access landscape. How will the vendors address this problem? Ask marketing, I suppose.

Stephen E Arnold, July 19, 2013

Sponsored by Xenky

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta