Google and Disallow

January 7, 2009

You will want to check out “On Google Disallowing Carling of Their Life Hosting” here. Google Blogoscoped has a good write up about this — to some — surprising development. Other search engines cannot index the Time Warner Life Magazine images. Google inserted a blocking line in its robots.txt file. I noticed that I was limited in the number of images I could browse when the service first went live. I was surprised that these images were available to me without a fee. For years, the Time crowd has noodled about its picture archive. First, Time wanted to handle the scanning itself. Then Time wanted to subcontract the work but that was too expensive. Then it was a good idea to talk with experts about what to do. Then the cycle repeated. Along came the GOOG and the rest, as someone will write after this goose is cooked, is history. Here’s what is going on in my opinion:

Restrictive content access is going to become more visible. If you read the Guha patent applications from February 2007, you will have noted that Google’s system can operate in a discrimatory way. That translates, in my view of the world, to restrictions on what others can and cannot do with Google information. This is an important phrase: “Google information.” Please, note it, copyright lovers.
The Life images are a big deal, and I am confident that the restrictions are probably positioned as part of the method to balance public access with protection for the assets of Time Warner. Everyone has needs, so this restriction is a nifty way of finding a middle way with Googzilla’s hands on the controls.
The cost of getting the Life images was not trivial. I have not heard anything substantive about the financial burden of this project, but based on my prior knowledge of the magnitude of the scanning and logistics of the images, this puppy was expensive. In my view, unlike a pure academic library play, this deal has a price tag and someone has to pay at some point.

What’s ahead? Well, in my view, once Google creates metadata and populates one of its knowledgebases, those data will be protected and probably with considerable enthusiasm. Google’s programmable search engine generates data and if some data items are missing, the system beavers away until the empty cell is filled. Once those dataspaces are populated, the information is not for just anyone’s use.

I mentioned the word dataspaces in a telephone converastion today. I know I am not communicating. The person on the other end of the call asked, “What’s a dataspace?” Well, you are now disallowed from one.

Stephen Arnold, January 7, 2008

Written by Stephen E. Arnold · Filed Under Business strategy, Google, News, Online (general), Publishing

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.