Order Google: The Digital GutenbergTop Banner

Arnold at NFAIS: Google Books, Scholar, and Good Enough

June 26, 2009

Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this Web log. I find that it is easier to put some of my work in a Web log than it is to create a PDF and post that version of a presentation on my main Web site, www.arnoldit.com. I have skipped the “who I am” part of the talk and jump into the core of the presentation.

Stephen Arnold, June 26, 2009

In the past, epics were a popular form of entertainment. Most of you have read the Iliad, possibly Beowulf, and some Gilgamesh. One convention is that these complex literary constructs begin in the middle or what my grade school teacher call “In media res.”

That’s how I want to begin my comments about Google’s scanning project – an epic — usually referred to as Google Books. Then I want to go back to the beginning of the story and then jump ahead to what is happening now. I will close with several observations about the future. I don’t work for Google, and my efforts to get Google to comment on topics are ignored. I am not an attorney, so my remarks have zero legal foundation. And I am not a publisher. I write studies about information retrieval. To make matters even more suspect, I do my work from rural Kentucky. From that remote location, I note the Amazon is concerned about Google Books, probably because Google seeks to enter the eBook sector. This story is good enough; that is, in a project so large, so sweeping perfection is not possible. Pages are skewed. Insects scanned. Coverage is hit and miss. But what other outfit is prepared to spend to scan books?

Let’s begin in the heat of the battle. Google is fighting a number things. Google finds itself under scrutiny from publishers and authors. These are the entities with whom Google signed a “truce” of sorts regarding the scanning of books. Increasingly libraries have begun to express concern that Google may not be doing the type of preservation job to keep the source materials in a suitable form for scholars. Regulators have taken an interest in the matter because of the publicity swirling around a number of complicated business and legal issues.

These issues threaten Google with several new challenges.

Since its founding in 1998, Google has enjoyed what I would call positive relationships with users, stakeholders, and most of its constituents. The Google Books’ matter is now creating what I would describe as “rising tension”. If the tension escalates, a series of battles can erupt in the legal arena. As you know, battle is risky when two heroes face off in a sword fight. Fighting in a legal arena is in some ways more risky and more dangerous.

Second, the friction of these battles can distract Google from other business activities. Google, as some commentators, including myself in Google: The Digital Gutenberg may be vulnerable to new types of information challenges. One example is Google’s absence from the real time indexing sector where Facebook, Twitter, Scoopler.com, and even Microsoft seem to be outpacing Google. Distractions like the Google Books matter could exclude Google from an important new opportunity.

Finally, Google’s approach to its projects is notable because the scope of the project makes it hard for most people to comprehend. Scanning books takes exabytes of storage. Converting images to ASCII, transforming the text (that is, adding structure tags), and then indexing the content takes a staggering amount of computing resources.

image

Inputs to outputs, an idea that was shaped between 1999 to 2001. © Stephen E. Arnold, 2009

Google has been measured and slow in its approach. The company works with large libraries, provides copies of the scanned material to its partners, and has tried to keep moving forward. Microsoft and Yahoo, database publishers, the Library of Congress, and most libraries have ceded the scanning of books work to Google.

Now Google finds itself having to juggle a large number of balls.

Now let’s go back in time.

I have noticed that most analysts peg Google Books’s project as starting right before the initial public offering in 2004. That’s not what my research has revealed. Google’s interest in scanning the contents of books reaches back to 2000.

In fact, an analysis of Google’s patent documents and technical papers for the period from 1998 to 2003 reveals that the company had explored knowledge bases, content transformation, and mashing up information from a variety of sources. In addition, the company had examined various security methods, including methods to prevent certain material from being easily copied or repurposed.

The idea, which I described in my The Google Legacy (which I wrote in 2003 and 2004 with publication in early 2005) was to gather a range of information, process that information using mathematical methods in order to produce useful outputs like search results for users and generate information about the information. The word given to describe value added indexing is metadata. I prefer the less common but more accurate term meta indexing.

Read more

Social Networks, Mobile Operators and Telcos Dust Up Coming

June 21, 2009

Short honk: “Head of Mobile Bebo on Why We Don’t Need Mobile Operators” seemed to be just another complaint about telcos. But after you read the article, do you think another seismic wave may be building in the datasphere? Sean Kane (Head of Mobile from social networking service Bebo) asserted, according to the GoMoNews writer Bena Roberts:

He [Sean Kane] said that apart from messaging, Bebo didn’t need mobile operators. He said that the most important asset an operator had was mobile messaging and the SMSC was vital for bebo. But that was about it. Bebo was already a leader in social networking. It was already a leader in mobile messaging boasting the largest youth exchange of messages already that it was an asset.  He said that mobile was great but that the mobile operator were only one obese side of the equation and bebo was like an application that by-passed the operator.

But the killer was Ms. Roberts statement:

Now, everything that Sean said made a lot of sense. Making money is vital and any one that thinks there is something more or else than money out there – is just wrong. On top of that anyone with a business model that depends on operators and is not D2C at this stage - is flawed.

My thought was the economic pressure will spark the type of warfare that made European History so tough for me when I had to take the class. Strange names, continual squabbling, and deep rooted animosity. Is this the future for social networks, mobile operators, and telcos? Finding messages within these services is already tough, and I think search in the mobile space may become even more fractured. Who can ride to the rescue? Maybe Wave?

Stephen Arnold, June 21, 2009

Autonomy Gets Social

June 11, 2009

Autonomy has embraced the social search scene. Two announcements make this clear. The first is that Autonomy has an iPhone application. You can read about the Jobs love here. Second, Autonomy has a Facebook app as well. You can read about that here. Both of these software components are intended to allow adults do real work. The iPhone app permits document management from the Apple device. Facebook users can tap into such functions as a visual report about what’s hot in the organization’s Intranet. More information about the newly social Autonomy may be found on the Cambridge UK company’s Web site.

Stephen Arnold, June 11, 2009

Real Time: Fad or Foundation

May 11, 2009

Ben Parr wrote “Is Real Time the Future of the Web?” I had not considered this question because moving one mode of communication from a traditional telephone to a mobile device with a keyboard is part of the hybridization and diffusion of technology that characterizes “cut and paste” innovation. Mr. Parr raises some interesting questions in his article here. The one that intrigued me was, “Is it [real time information] sustainable?” On the surface, the answer is, “Yes.” After some reflection, I think that the emergence of text mining, predictive analytics, and comprehensive surveillance may have a significant impact on certain types of real time information flows. The Hawthorne Effect may have a  side and backspin which causes certain changes in information behavior. The examples I am thinking about include:

  • Bad guys using non monitored channels in order to remain outside the real time flow; for example, hire a person to deliver a coded message
  • Teens using F2F (face to face) communication for important information such as the kid with parents away for the weekend
  • Executives discussing deals by walking down a noisy sidewalk in a metro area.

Check out Mr. Parr’s approach. I will keep thinking about how certain communication methods may make real time online communications unattractive.

Stephen Arnold, May 10, 2009

Microsoft and Two Rip Tides

May 4, 2009

Jason Hiner’s “The Two Trends That Are Conspiring against Microsoft” here is a so-so title for a pretty good analysis of the rip tides sucking at Microsoft’s revenue. The two points are browser-based applications which blur the distinction between the desktop and the cloud, and mobile devices, which make the traditional desktop computer a boat anchor. The essay is hard hitting, and I think it makes some excellent points.

Stephen Arnold, May 4, 2009

Twitter Bashing

May 1, 2009

Short honk: If you hate Twitter, you will love this criticism of Twitter. It appeared on the MadAtoms.com Web log here. The author of “The Devolution of the Internet” by Farley Elliott is entertaining and insightful. Among the weaknesses of Twitter, Mr. Elliott highlighted:

… perhaps the most disgusting part of Twitter is it’s most basic: it is a chatroom. A quick check of the calendar reveals that it’s not 1995. Yet twitter allows in the same riffraff that early chatrooms attracted, but without any of the moderation, or the ability to spend more than 140 characters wording up trolls and goons.

A keeper for sure.

Stephen Arnold, April 30, 2009

Bandwidth Cost

April 29, 2009

A happy quack to the reader who wrote, asking me to comment on the cost of bandwidth. His point of reference was the New York Times’s article “In Developing Countries, Web Grows Without Profit” here.

“I believe in free, open communications,” Dmitry Shapiro, the company’s chief executive, said. “But these people are so hungry for this content. They sit and they watch and watch and watch. The problem is they are eating up bandwidth, and it’s very difficult to derive revenue from it.”

My views on this issue are well documented in my books and studies. Let me recap three ideas and invite feedback on these.

First, most users and content centric outfits make errors when estimating the costs of online access. Unexpected spikes in telco fees are even today in my experience greeted with surprise and indignation. I hesitate to suggest that bandwidth is assumed to be cheap, readily available, and without much technical interest. As the New York Times’s article points out, bandwidth is an issue, and it can be a deal breaker financially and technically.

Second, in theory bandwidth is unlimited. The “unlimited” comes with two trap doors. One is the money available to apply to the problem. Bandwidth, even today, is not free. Someone has to build the plumbing, pay for infrastructure, hire the technical staff, and work the back office procedures. The second trap door is time. It is possible in Kentucky to make a call and get more bandwidth. But within the last two months, we found that making this call did not result in immediate bandwidth. The vendor said, “We can reprovision you within 72 hours. Take it or leave it.” The reason the vendor made the statement I learned was a result of tightening financial noose around the vendor’s neck. The vendor in turn told me to wait.

Third, user expectations are now being shaped in a direction that makes bandwidth, infrastructure, and technical resources increasingly fragile. Here’s an example. Last night in a restaurant, a young man at a table next to mine watched a YouTube.com video on a mobile device. That young man in Boston and young people throughout the world see the Internet (wireless or wireline) as a broadcast channel. In my experience, this shift to rich media will put financial and technical pressure on infrastructure needed for this use of the Internet.

In short, I think there’s a cost problem looming. Will it arrive before the technical problem? Pick your poison.

Stephen Arnold, April 29, 2009

Twitter Quitter

April 28, 2009

Short honk: I heard quite a bit about how lousy Twitter.com is at the Boston Search Engine Meeting. Now those Twitter bashers have some ammo for their argument. Mashable, the bible of the real time mash up sector, reported 60% of Twitter Users Quit Within the First Month. Click here to check out the stats.

Stephen Arnold, April 29, 2009

Mobile Versus Netbooks as Google Goes Slow

April 27, 2009

In my Google tutorial today (April 26, 2009), I ran through some of Google’s innovations in mobile search and services. One person in the session pointed out that Android 1.5 was immature. I agreed. Nevertheless, Google is plodding forward slowly. The slow motion approach of Google is not an indication of technical ineptitude. My research suggests that Google uses slow movement as a tactic. Android 1.5 will be improved, just not quickly or as quickly as some want a giant software company to react.

I ran through my newsreader items when I returned to my hotel room and spotted an article from New Zealand with this title: “Are Kids Becoming Phone Addicts?” here. For me, the important comment in the write up was:

“There are certainly teenagers who we are seeing that have an over-reliance on their mobiles and who become anxious at the prospect of going without their phone. “They worry that they’ll run out of battery or credit and they’ll be forced to go without this way of communicating with their network of friends. It’s a big fear for them and it illustrates just how important they see the phone as being to their lives.”

I was thinking about my observation that Google was not in a particular hurry with some of its mobile initiatives. This article triggered in my mind the idea of Google’s patience. The company is improving Android and other of its mobile services fast enough. The idea is that as young mobile users grow older, Google will improve a ratchet click at a time. When today’s middle school and high school student are ready, Google’s mobile services will be ready as well.

Will the competitors see Google improving? Yes, but the incremental approach makes it difficult to discern what Google is doing on a larger scale. When the pieces click into place, the customers will be ready.

There’s a risk with this strategy of slow but sure improvement, which is different from Microsoft’s set a ship date and start the death march. The GOOG wanders forward. The approach opens the door for some competitors to move into sectors and capture them as Amazon and Twitter have done in the last six to 12 months. On the other hand, Google has the advantage of deciding what differentiators to release and when.

Will Google’s slow time strategy work? Judging from the “addiction” rate among young mobile users, the Google will have a product that will tempt younger cohorts. If Google fails, it still has mobile services to offer to users through third parties. I am not sure how much of this analysis will play out in reality, but the idea of fast cycle versus slow cycle seems ideal for Google to target specific demographics and then let the aging process carry Google into some markets where mobile will be the primary computing platform, not a netbook or other large form factor device.

Stephen Arnold, April 27, 2009

Twitter: Now a Thought Leader Gets It

April 16, 2009

I was delighted to read Steve Espinosa’s “How Twitter Will Win Local Search” here. The story appeared in Silicon Alley Insider. I have been reluctant to post my specific views of Twitter because I sell these addled ideas to even more addled clients. But when something runs in the pulsing “blogosphere”, I want to call attention to the information. One useful function of Twitter is providing very timely, quite specific information about local activities. At lunch, one of the goslings monitors Tweets flowing in real time from Twitter users in the Louisville area. (We don’t get many Tweets in Harrod’s Creek. Ground hogs and possums have yet to acquire iPhones.) Why is this important? The young goslings at ArnoldIT.com use it to locate lunch specials. One of the perks of putting up with the addled goose is a company provided meal at a sit down restaurant every work day. The Twitter thing works like a champ, and it gave me the confidence in my new Google: The Digital Gutenberg to assert that the Google may find itself on the outside looking in with regard to real time search.

Mr. Espinosa said:

You actually have a profitable revenue source that may not be the end all be all model, but will be a huge chuck of revenue that does not interrupt the user experience but actually makes it better.

I think he may be on the trail leading toward a business model. A happy quack to him for posting this analysis. The trick to understanding real time search is to think in terms of the utility of lots of eyeballs and users who may have an answer to a particular, location-centric query. The next step is to think about monetization options as Mr. Espinosa has. Will Twitter be the winner in this space? Who knows. Will some company emerge as an oxygen hog? Absolutely.

Stephen Arnold, April 16, 2009

Next Page »