Google: Baby Steps with Image Recognition

August 31, 2009

With attention focused on Google Books and Google lobbying, modest technical innovations can be overlooked. The Overflight service flagged US7,580,568 “Methods and Systems for Identifying an Image as a Representative Image for an Article.” On the surface, what is the big deal about parsing a document with multiple images and taking one as a representative image? Google does this frequently. Navigate to Google News and look at the images positioned next to a news story.

google news

Now what if an article has an image but that image is not one that represents the information in the article? In the good old days when traditional publishers were kings, a human would flip through a photo archive, locate a suitable image, and mark up the copy to show the compositor where to put the picture. Google has automated this service. (Page 12, Column B, line number 49.) Not a big deal, but it is one that chops costs out of the process of assembling original mash ups of information.

One of the principal findings from my research into Google’s technology is that the company has been purposeful in squeezing costs out of operations that are often money bottlenecks when traditional methods are shoehorned into online. What I find interesting is that the system and method can be applied to a range of “images”, not just those in a magazine article or a book chapter.

Baby step or not, US7,580,568—filed in 2004—is now a patent. The plumbing and logic for the disclosed system and method have been in operation since late 2002 or early 2003. How the toddler has matured!

Stephen Arnold, August 31, 2009

Back Lot of Google Begins to Takes Shape

August 28, 2009

Google filed a number of patent documents over the last four years that referenced video. None of the individual documents connect the dots. A light bulb went on at the goose pond today. One of the goslings read “YouTube Views Can Ad Up for Popular Videos.” The write up explains that Google will share ad money with the individuals who have videos that generate lots of Google traffic. Google is in the motion picture business. We think that Google will make it possible for a company looking for a hot video producer to locate that individual using Google match making services. We think that over time Google will put in place a unified company for producing, distributing, and monetizing video. Are we right or wrong? Lights, camera, action.

Stephen Arnold, August 28, 2009

Data Warehouse Leader to Reinvent Data Warehousing

August 26, 2009

“IBM Announces ‘Smart Analytics System’ Aimed at Reinventing Data Warehousing” reminded me of Einstein’s discomfort with some of the implications of his theory of relativity. Invent one thing, then scramble to find a way to deal with problems that won’t go away. IBM, one might assert, invented data warehousing. It was an IBM researcher who developed our old friend the relational database. The Codd approach has been the big dog in data management for a long time. Options are now becoming more widely available, but when one says, “Data warehousing”, I think IBM. That’s why I am an addled goose I suppose.

image

Mr. Data Warehouse. Image source: http://en.wikipedia.org/wiki/Edgar_F._Codd

This article-interview makes clear that something is not right in IBM land. For me, the most suggestive comment in the Intelligent Enterprise write up was this passage:

Though IBM is promising better performance, a big part of the appeal seems to be targeted at executives who would favor contract simplicity and a single “throat to choke” over enterprising, but potentially riskier, in-house development, integration and innovation.

The “reinvention” seems to be to be little more than fixing responsibility for a mission critical system on a company big enough to take to court if the data warehouse has a leaking roof. In my experience these traditional data warehouses have more problems than a fast-build Shanghai apartment building.

My thought is to take a hard look at the assumptions about data warehousing, then poke into some options. Dare I suggest Aster Data? What about a Perfect Search enabled system?

Stephen Arnold, August 26, 2009

The Content Crisis Deconstructed

August 13, 2009

Business Week’s Lars Bastholm wrote an interesting article. When I read it, I thought about a wacky professor I had at Duquesne University decades ago who loved beyond all reason the approach to textual analysis pinned to Jacque Derrida. (If you cut that class in modern critical analysis, you can get a brief here.) On the surface, “The Content Crisis” is another one of those the “sky is falling” articles that “real” journalists write. When these crisis revealed articles appear in traditional magazines like Business Week, I take notice. My reasoning is that the top brass at McGraw Hill probably does not think to much about the pressures on the worker bees in the journalistic hive on Sixth Avenue. The worker bees do think about what is happening to the magazine industry in particular and the broader traditional information industry in general. A write up like Lars Bastholm is essentially a news story about that now tired phrase “the content crisis”. Passages like this one are recycled like newspapers by the Rumpke Corporation which operates in Harrod’s Creek:

What I propose is that phone companies and Internet providers just slap additional content fees onto their bills. Sure, I don’t like the additional fee. But if a $10 monthly content fee was added to both my existing AT&T and Time Warner bills, and in return I got access to all the content I wanted, it would feel pretty close to free.

The article argues that a magazine can charge for content. The money, however, would be collected by an outfit such as AT&T and Time Warner. Okay. I wonder if Mr. Bastholm knows how money is shared by a utility across multiple providers? That question is one that has been sidestepped in the write up. That question is an important one, however.

image

The traditional world has morphed. One cannot go back. Image source: http://www.astrococktail.com/images/Deconstruction700.jpg

The article concludes with what was probably in journalism school a killer peg:

So when you think about it, is $20 a month really a big price to pay for saving movies, TV, music, magazines, and newspapers and getting rid of unwanted advertising in one fell swoop? It feels like a bargain to me.

What triggered the Derridaesque moment for me were these notions waddling through this addled goose’s brain:

  1. The article is less of a news story and more of a plank in a political platform for Rupert Murdoch’s campaign to charge for information with a nod to the microcode method favored by the Associated Press. I can see the senior editor, the publisher, and one McGraw Hill vice president standing in the hall with copies of the print publication, smiling and nodding about a job well done.
  2. The notion that a utility (essentially a monopoly if set up correctly) will share money in a way that returns the lion’s share of the revenue to one supplier is at odds with my experience. Utilities, due to buying power and market control, force suppliers to deliver at very competitive rates. Instead of a payday, the utility wheels and deals. Coal is a commodity to Duke Power. Information is a commodity to AT&T and Time Warner. Forgetting what business utilities are in will lead to a financial surprise when the first payment arrives 45 to 90 days late.
  3. The solution advocated in the article does not address the broader challenge. The children of publishing executives—possibly Mr. Bastholm’s own or his friends’ are not interested in traditional media as much as I was when I was young and callow. In fact, each generation in the demographic pipeline younger than the preceding cohort will be less and less interested in the “traditional” approach to information.

Yesterday I had a conversation with a young journalist. I asked about the person’s recent experience in journalism classes at one of the * major * journalism schools. I jotted down that person’s comment because it underscores the need to deconstruct what Business Week has written about “the content crisis”. The journalist told me:

I think that my professors know that the media and news world is changing. But the classes don’t reflect that change. Now that I am working, I see first hand that the traditional approach to news is not where the opportunities are. Online is the future and it has arrived. (Editor at a magazine publisher located in the United States.)

As M. Derrida observed, “Every discourse, even a poetic or oracular sentence, carries with it a system of rules for producing analogous things and thus an outline of methodology.”

Stephen Arnold, August 13, 2009

New Media Guidelines for Search and Access

August 13, 2009

Imediaconnection.com’s “Metadata Secrets for Expanding your Content’s Reach” struck me as a useful back to basics for traditional media executives. Ben Weinberger has gathered seven tips that provide some useful advice (use analytics) and some that is going to be as clear as Aramaic to media executives (intelligent metadata in a metadata management framework). If you want a shopping list of what to do to stay in business, you will want to add Mr. Weinberger’s write up to your archive. The killer omission is the plumbing required to permit implementation of some of his tips. Mr. Weinberger may want to acquaint himself with MarkLogic. MarkLogic, may I suggest you brief Mr. Weinberger?

Stephen Arnold, August 13, 2009

Kids and Downloading. And the Parents?

August 12, 2009

Short honk: TechDirt’s “New Study States the Obvious: Kids Download a Lot of Music.” The most interesting comment in the story was:

A new study, sponsored by UK Music (the UK organization that’s looking to get ISPs to put in place some sort of blanket licensing plan) has found that over 60% of kids in the UK admit to file sharing, with 83% of those admitting to doing it regularly, and those surveyed claiming to have downloaded an average of 8,100 tracks. Think about that for a second. 8,100 tracks.

As the kids grow up, what changes?

Stephen Arnold, August 12, 2009

Google and the Open Source Card

August 7, 2009

Digital video is a high stakes game and only high rollers can play. Hulu.com has the backing of several motivated outfits with deep pockets. Smaller video sites are interesting but the punishing costs associated with dense bit media are going to be too much for most of these companies over the next couple of years.

Google is committed to video. A big chunk of the under 40 crowd love to fiddle with, wallow in, and learn via video. I don’t, but that does not make any difference whatsoever.

There are two different views of the Google acquisition of On2’s video compression technology. On one side of the fence is a traditional media company, the Guardian newspaper. You can read “Google Buy Up Will Help Cut YouTube Costs.” The idea is that Google is not making money via YouTube.com. Therefore, the all-stock deal worth about $110 million gets Google some compression technology that will reduce bandwidth costs and deliver other efficiencies. The On2 technology also has the potential to give Google an edge in video quality. This is an AP story, so I don’t want to quote from the item. I do want to point out that this on the surface seems like a really great analysis.

On the other side of the fence  is the viewpoint expressed in The Register. Its story “Is Google Spending $106.5 Million to Open Source a Codec”?” is quite different. Cade Metz, a good thinker in the opinion of the goslings here in Harrod’s Creek, wrote:

But if you also consider the company’s so far fruitless efforts to push through a video tag for HTML 5 – the still gestating update to the web’s hypertext markup language – the On2 acquisition looks an awful lot like an effort to solve this browser-maker impasse.

Mr. Metz sees the On2 buy as a way for Google to offer an alternative video codec which sidesteps some issues with H.264 and other beasties in the video jungle.

In my opinion, The Register is closer to the truth that the Guardian. Google is playing an open source trump card. Making open source moves delivers two benefits. The first is the short term solution to the hassle over video standards. Google offers an attractive alternative to the issues described by Mr. Metz. The second advantage is that Google reaps the benefits of contributing to open source in a substantive way.

Open source is a major threat to Microsoft and some other enterprise software vendors. Google is playing a sophisticated game and playing that game well in our opinion. The Register’s story gets it; the Guardian’s story does not.

Stephen Arnold, August 7, 2009

Flickr Thunderstorms

August 6, 2009

Right after inking a yet-to-be-approved deal with Microsoft, Yahoo rolled out enhancements to Flickr’s image search. If you have not tried the new-and-improved Flickr, click here and give the system a whirl. My test queries were modest. I need pictures of train wrecks, collapsed houses, and skiers who are doing headers into snow drifts. These illustrations amuse me and I find them useful in illustrating the business methods of some dinosaur-like organizations. The search “train wreck” worked. I received image results that were on a par with Google’s. Yahoo’s Flickr did not allow me to NOT out jpgs or narrow the query to line art. The system was fine. My query for “house collapse” was less satisfying, but the results were usable. I had to click and browse before I found a suitable image for a company that is shaken by financial upheavals and management decisions.

image

Source: http://www.flickr.com/photos/tbruce/193295658/

What surprised me about Flickr was the story “Cloud Storage Nightmare with Flickr.” Hubert Nguyen reported:

A Flickr user learned the hard way when his account got hacked and 3000 of his photos were deleted by the hacker, who also closed his account. The account owner is now campaigning against Flickr’s support. You can imagine how mad that person was, but it gets worse: Flickr cannot retrieve his data and we guess that this is because they were deleted in a seemingly “legitimate” manner (from Flickr’s point of view). We think that Flickr is built to survive some catastrophic hardware failure, but if an account is closed, the data is immediately deleted – permanently.

This strikes me as a policy issue, but it underscores the types of challenges that Microsoft may find itself trying to free itself from the thorn bush. If the revenue from the yet-to-be-approved tie up does not produce a truck load of dough, the situation could become even thornier for Microsoft.

Stephen Arnold, August 6, 2009

YouTube Upload Volume

July 28, 2009

Short honk: I saw this factoid in TechCrunch a while ago. I neglected to pull it out. In May 2009, MG Siegler reported that “every minute just about a day’s worth of video is … uploaded to YouTube.” You can read the story on TechCrunch. The metrics are, according to TechCrunch:

Think about that for a minute. In that minute, nearly a days worth of footage will have been uploaded. And the pace is quickening. Back in 2007, shortly after Google bought the service, it was 6 hours of footage being uploaded every minute. As recently as January of this year, that number had grown to 15 hours, according to the YouTube blog. Now it’s 20 — soon it will be 24. That’s insane.

Bottomline: lots of video with more coming. iPhone users are videographers with black belts. Now when will the Google unleash its automated video findability technology?

Stephen Arnold, July 27, 2009

Media Mavens Face Generational Threat

July 24, 2009

You are a publisher.  Maybe you are in the media business. Perhaps you fancy yourself a band promoter. Pick your pigeonhole and then read “How Teenagers Consume Media: The Report that Shook the City”. I point to this summary of teen wonder Matthew Robson, Morgan Stanley’s secret weapon in the analysis wars. Why the synopsis? Most people don’t read. Among the points that I wrote down in my dinosaur skin notebook were:

  • On demand TV is of interest
  • Newspapers are dead ducks or geese
  • Stolen music is common
  • Mobile gizmos are in vogue

Books don’t make the loser list. For me, the key point was that these kids may have traditional media contexts. Yet despite what parents and schools say, the teens march to a different synthesized drum beat. Mom and Dad at work try to stop the shift, but I think the generational threat is here and now.

Stephen Arnold, July 24, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta