Order Google: The Digital GutenbergTop Banner

Arnold at NFAIS: Google Books, Scholar, and Good Enough

June 26, 2009

Speaker’s introduction: The text that appears below is a summary of my remarks at the NFAIS Conference on June 26, 2009, in Philadelphia. I talk from notes, not a written manuscript, but it is my practice to create a narrative that summarizes my main points. I have reproduced this working text for readers of this Web log. I find that it is easier to put some of my work in a Web log than it is to create a PDF and post that version of a presentation on my main Web site, www.arnoldit.com. I have skipped the “who I am” part of the talk and jump into the core of the presentation.

Stephen Arnold, June 26, 2009

In the past, epics were a popular form of entertainment. Most of you have read the Iliad, possibly Beowulf, and some Gilgamesh. One convention is that these complex literary constructs begin in the middle or what my grade school teacher call “In media res.”

That’s how I want to begin my comments about Google’s scanning project – an epic — usually referred to as Google Books. Then I want to go back to the beginning of the story and then jump ahead to what is happening now. I will close with several observations about the future. I don’t work for Google, and my efforts to get Google to comment on topics are ignored. I am not an attorney, so my remarks have zero legal foundation. And I am not a publisher. I write studies about information retrieval. To make matters even more suspect, I do my work from rural Kentucky. From that remote location, I note the Amazon is concerned about Google Books, probably because Google seeks to enter the eBook sector. This story is good enough; that is, in a project so large, so sweeping perfection is not possible. Pages are skewed. Insects scanned. Coverage is hit and miss. But what other outfit is prepared to spend to scan books?

Let’s begin in the heat of the battle. Google is fighting a number things. Google finds itself under scrutiny from publishers and authors. These are the entities with whom Google signed a “truce” of sorts regarding the scanning of books. Increasingly libraries have begun to express concern that Google may not be doing the type of preservation job to keep the source materials in a suitable form for scholars. Regulators have taken an interest in the matter because of the publicity swirling around a number of complicated business and legal issues.

These issues threaten Google with several new challenges.

Since its founding in 1998, Google has enjoyed what I would call positive relationships with users, stakeholders, and most of its constituents. The Google Books’ matter is now creating what I would describe as “rising tension”. If the tension escalates, a series of battles can erupt in the legal arena. As you know, battle is risky when two heroes face off in a sword fight. Fighting in a legal arena is in some ways more risky and more dangerous.

Second, the friction of these battles can distract Google from other business activities. Google, as some commentators, including myself in Google: The Digital Gutenberg may be vulnerable to new types of information challenges. One example is Google’s absence from the real time indexing sector where Facebook, Twitter, Scoopler.com, and even Microsoft seem to be outpacing Google. Distractions like the Google Books matter could exclude Google from an important new opportunity.

Finally, Google’s approach to its projects is notable because the scope of the project makes it hard for most people to comprehend. Scanning books takes exabytes of storage. Converting images to ASCII, transforming the text (that is, adding structure tags), and then indexing the content takes a staggering amount of computing resources.

image

Inputs to outputs, an idea that was shaped between 1999 to 2001. © Stephen E. Arnold, 2009

Google has been measured and slow in its approach. The company works with large libraries, provides copies of the scanned material to its partners, and has tried to keep moving forward. Microsoft and Yahoo, database publishers, the Library of Congress, and most libraries have ceded the scanning of books work to Google.

Now Google finds itself having to juggle a large number of balls.

Now let’s go back in time.

I have noticed that most analysts peg Google Books’s project as starting right before the initial public offering in 2004. That’s not what my research has revealed. Google’s interest in scanning the contents of books reaches back to 2000.

In fact, an analysis of Google’s patent documents and technical papers for the period from 1998 to 2003 reveals that the company had explored knowledge bases, content transformation, and mashing up information from a variety of sources. In addition, the company had examined various security methods, including methods to prevent certain material from being easily copied or repurposed.

The idea, which I described in my The Google Legacy (which I wrote in 2003 and 2004 with publication in early 2005) was to gather a range of information, process that information using mathematical methods in order to produce useful outputs like search results for users and generate information about the information. The word given to describe value added indexing is metadata. I prefer the less common but more accurate term meta indexing.

Read more

Twitter Link Indexing

June 5, 2009

Today after my talk at the Gilbane content management conference in San Francisco, a person mentioned that Twitter was indexing links in Tweets. I said that I included this information in my Twitter Web log posts. But when I looked at my posts, I found that I had not been explicit. You can get more info at http://www.domaintweeter.com.

Stephen Arnold, June 5, 2009

Bartz Reveals the Truth about Bing to Microsoft

June 2, 2009

In the oh-so-in group that comprises the All Things Digital conference, many interesting side stories unfold. You have to be there to get the real scoop. But the hot fudge, whip cream, and cherry on top go to those who get to fiddle with the detritus of a conference. I read “Bartz’s (S)mash Note to Ballmer: The Photographic Proof” here and realized that sometimes in the leave behinds are factoids of hard truth. First you need to read Kara Swisher’s article. Then look closely at the pink sticky note and look at the accompanying transcription. Set up: Carol Bartz, cruise directory of the SS Yahoo wrote to Steve Ballmer, captain of the $65 billion Redmond class war ship:

Steve, Forget it. Won’t Help. Ha. Carol

Addled geese are not at All Things Digital. Guests must leave dogs and other no hip creatures outside. I wasn’t there. But I can from my pond filled with Beargrass Creek pollutants offer Jacques Derrida like observations:

  • The pronoun “it” lacks an antecedent. Because Mr. Ballmer spoke and demonstrated the Bing Kumo search system, I must assume that “it” is that search system.
  • If the “it” is Bing Kumo, the statement “Forget it” introduces another ambiguity. Is the second “it” a reference to Bing Kumo. If so, Ms. Bartz is suggesting that Microsoft forget Bing Kumo. More colloquially, the phrase “forget it” said to me, “Dude, Bing Kumo cannot close the gap between Microsoft and Google in the Web search sector.
  • The “ha” is also ambiguous. One can interpret this “ha” as an inside joke, discounting or disclaiming the implication that Bing Kumo is a loser. On the other hand, perhaps the “ha” means a Jay Leno Jaywalker “ha” where people laugh at others’ weaknesses.

In short, lots of ambiguity, but possibly a grain of truth. Here in Harrod’s Creek, the sticky note, the ambiguity, and the reference to getting one’s make up done underscores how far away the addled goose is from the real action in the world of Web search. Thank goodness there are neither make up artists nor pink sticky notes in these here parts. We don’t even have an in crowd unless you include the bikers who hit the River Creek Inn on Sunday morning before the church goers show up for brunch and a whistle wetting drink.

Stephen Arnold, June 1, 2009

Boye 09 Overflight Awards

May 19, 2009

The Overflight Award for Excellence, created by ArnoldIT.com and JBoye.com, was presented to Volker Grünauer, head of E-marketing at Wienerberger in Austria, at the JBoye Conference: Philadelphia 2009, http://jboye08.dk/]http://www.jboye.com/conferences/philadelphia09/, May 5-7, held at the Down Town Club in Philadelphis.

The award recognizes the best presentation at the conference on digital media, which featured more than 50 speakers from around the world.

Grünauer offered a relevant talk called “Developing a customer centric web strategy.” This presentation discussed smart web strategy for promoting real brick and mortar products, including how Wienerberger defines the four elements of web success and how customer behavior has become the trigger for every eMarketing decision. Slides of the presentation are available at http://jboye08.dk/downloads/download.php?file=1226063851.pdf. He was awarded an engraved Lucite trophy and 500 Euros.

Volker is responsible for the marketing strategy of all websites at Wienerberger, the world’s largest manufacturer of bricks, clay roof tiles and clay pavers. In this function he also developed a new brand and domain management strategy. Together with the IT department he managed the rollout of the CMS into new Wienerberger markets. See his profile athttp://www.jboye.com/conferences/philadelphia09/speakers/volker_grunauer.

An honorable mention went to Donna Spencer, a freelance information architect and interaction designer, a mentor, writer and trainer from Australia, who presented a discussion on the user experience track called “Getting Content Right.” She was awarded an engraved Lucite trophy. Her profile is at http://www.jboye.com/conferences/philadelphia09/speakers/donna_spencer.

Stephen E. Arnold and Janus Boye created the award to permit the community attending the conference to identify presentations that met the following criteria: information that would be useful to delegates upon returning to work; research supporting the presentatio; quality of the delivery and examples; and importance of the speakers’ topics at the time of the conference.

A panel of distinguished attendees and information practitioners had the task of assessing the presentations and determining the winners. The judges were Dana Hallman, Office of the Comptroller of the Currency; Karen Rosenzweig, Novartis;Peter Svensson, Lund University; and Troy Winfrey, University of Baltimore.

About ArnoldIT.com

Stephen E. Arnold monitors search, content processing, text mining and related topics from his office in Kentucky. He works with colleagues worldwide on a wide range of online and content-related projects. The company’s Web site is http://arnoldit.com, and the Beyond Search blog is at http://arnoldit.com/wordpress/.

About JBoye.com

J. Boye, a digital media enterprise, is frequently contracted to help with strategy and governance, project planning, requirement specifications, vendor and software selection, project management and ROI optimization. They also produce industry reports and organize educational conferences. Contact the company at info@jboye.co.uk or info@jboye.dk.

Jessica Bratcher, May 19, 2009

Evvie 2009 Winners: David Evans and Martin Baumgartel

May 4, 2009

Stephen E. Arnold of ArnoldIT.com, http://www.arnoldit.com, announced the Evvie “best paper award” for 2009 at Infonortics’ Boston Search Engine Meeting on April 28.

The 2009 Evvie Award went to Dr. David Evans of Just Systems Evans Research for “E-Discovery: A Signature Challenge for Search.” The paper explains the principal goals and challenges of E-Discovery techniques. The second place award went to Martin Baumgärtel of bioRASI for “Advanced Visualization of Search Results: More Risks or More Chances?”, which addressed the gap between breakthroughs in visualization and actual application of techniques.

evvie 2009

Stephen Arnold (left) is pictured with Dr. David Evans, Just System Evans Research on the right.

The Evvie is given in honor of Ev Brenner, one of the leaders in online information systems and functions. The award was established after Brenner’s death in 2006. Brenner served on the program committee for the Boston Search Engine Meeting since its inception almost 20 years ago. Everett Brenner is generally regarded as one of the “fathers” of commercial online databases. He worked for the American Petroleum Institute and served as a mentor to many of the innovators who built commercial online.

baumgartel

Martin Baumgartel (left) and Dr. David Evans discuss their recognition at the 2009 Boston Search Engine Meeting.

Mr. Brenner had two characteristics that made his participation a signature feature of each year’s program: He was willing to tell a speaker or paper author to “add more content,” and after a presentation, he would ask a presenter one or more penetrating questions that helped make a complex subject more clear.

The Boston Search Engine meeting attracts search professionals, search vendors, and experts interested in content processing, text analysis, and search and retrieval. Held each year in Boston, Ev, as he was known to his friends, demanded excellence in presentations about information processing.

Sponsored by Stephen E. Arnold (ArnoldIT.com), this award goes to the speaker who best exemplifies Ev’s standards of excellence. The selection committee consists of the program committee, assisted by Harry Collier (conference operator) and Stephen E. Arnold.

This year’s judges were Jill O’Neill, NFAIS, Sue Feldman, IDC Content Technologies Group, and Anne Girard, Infonortics Ltd.

Mr. Arnold said, “This award is one way for us to respect his contributions and support his life long commitment to excellence.”

The recipients receive a cash prize and an engraved plaque. Information about the conference is available on the Infonortics, Ltd. Web site at www.infonortics.com and here. More information about the award is here. Information about ArnoldIT.com is here.

The Microsoft Enterprise Search Vision

April 27, 2009

I read Fran Foo’s “Microsoft Chooses R&D over Buyouts” here. What fascinated me was this statement in the AustraliaIT.news.com.au report of a top Microsoft executive’s view of preparing for the future. Kevin Turner, Microsoft’s global COO, allegedly said:

“In the consumer area we aren’t the market leader but we’re investing in search, MSN, Windows Live and Office Live to become a world-class digital advertising company,” he said. “The landscape is fluid and you have to keep innovating and growing faster than your competition or you’re going to become obsolete.”

Acquisitions can pump up revenue. R&D is often less certain. Google has relied on formal and personal innovation tactics, along with fast cycle live-die cycles, and acquisitions. Balance seems important to the GOOG. Furthermore, applied research can be difficult to make work in certain technical contexts. A good case example is Yahoo’s Panama ad system. In fact, R&D dollars can be blown away with an unexpected twitch in the datasphere.

Ms. Foo wrote,

“Globally, Microsoft registered a 32 per cent drop in profit and the first decline in quarterly revenue in its 23-year history as a publicly listed company.”

What I found interesting was that I scanned Ms. Foo’s article during a “game plan” keynote by Bjorn Olstad, a senior executive in the Microsoft enterprise search unit. At the Boston Search Engine Meeting, Mr. Olstad focused on the future and few tech specifics about enterprise search. The future described by Microsoft reminded me of a Steve Jobs‘s presentation a couple of years ago just without fungible products. I was impressed with iPhone like mobile devices and large touch screen surfaces in Mr. Olstad’s PowerPoint. Even more interesting was the vocabulary he used to Microsoft’s vision of the future in enterprise search; for example:

  • On-the-fly computing
  • Algorithmic orchestration of the user experience
  • Consumption enhanced modes of discovery.

Now Microsoft has to take Mr. Turner’s R&D money and Mr. Olstad’s description of the future and deliver products and services. I hasten to add that that the enterprise search products ideally will be stable, scalable, documented, compatible, feature complete, and  affordable by organizations under the same revenue pressure as Microsoft itself. I think that is an interesting task with an uncertain timeline and an unknowable payoff. Oracle sees acquisition, although risky, as a path that may yield more concrete benefits. Shares in the value stock category may need more performance-oriented tactics for stakeholders. R&D or strategic acquisition? Time will tell.

Stephen Arnold, April 27, 2009

Exclusive Interview: Donna Spencer, Enterprise Systems Expert

April 20, 2009

Editor’s Note: Another speaker for what looks like a stellar conference agreed to an interview with Janus Boye. As you know, the Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include a range of topics; including strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. Click here for more conference information. Janus Boye spoke with Donna Spencer on April 16, 2009.

Ms. Spencer is a freelance information architect, interaction designer and writer. She plans how to present the things you see on your computer screen, so that they’re easy to understand, engaging and compelling: Things like the navigation, forms, categories and words on intranets, websites, web applications and business systems.

The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

I don’t know that this is necessarily true. There are lots of organizations creating awesome user experiences. Of course, there are a lot who aren’t creating great experiences, but it isn’t because they can’t get a grip on user experience, it is because they care more about themselves than about their customers. If they really cared about their customers they’d do stuff to make their experiences great - and that’s possible without even knowing anything formal about user experience. But because they don’t care about their customers, they will fail, as they should…

Is content or visual design most important to the user experience?

Content (or functionality) is ultimately what people visit a website, intranet or application for. So it’s really, really important to get that right. If the core of the product is bad, it isn’t going to work.

But the visual design is often the part that helps people to get to the content. If the layout is poor, the colours and contrast awful and the site looks like it was designed in 1995, that’s going to stop people from even trying.

So both are important, though if I ever had to choose, I’d go for great content.

Is your book on card sorting really going to be released in 2009?

Yes, by the time the conference is on, there should be real, printed books. 150-odd pages of card sorting goodness. I hear that it should be out around 28 April. Really. I promise.

Does Facebook actually offer a better user experience after the redesign?

That’s a really interesting question. I can only speak for myself, but the thing that struck me about the redesign is that all of a sudden Facebook feels like a different beast. It used to be a site where friends were, but also where there were events, and groups and silly apps. Now it just feels like twitter that you can reply to. It feels like they have done a complete turn-around on who they actually are.

So for me the experience is worse. I can get a better idea of what my friends are doing, but I do that via twitter. Now it’s much harder for me to experience groups, events and all the other things we used to do there. I’m definitely using it less.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they rock! But really, their core business overlaps a lot with what I do. I’m interested in the content the conference offers and I think my experience offers a lot to the attendees. Plus I’ve never been to Philly, and travelling to new places is a wonderful learning experience.

Lou Rosenfeld on Content Architecture

April 15, 2009

Editor’s Note: The Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here. One of the featured speakers, is Lou Rosenfeld. You can get more information here. Janus Boye spoke with Mr. Rosenfeld on April 14, 2009. The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

Because UX is an interdisciplinary pursuit. In most organizations, the people who need to work together to develop good experiences–designers, developers, content authors, customer service personnel, business analysts, product managers, and more–currently work in separate silos. Bad idea. Worse, these people already have a hard time working together because they don’t speak the same language.

Once you get them all in the same place and help them to communicate better, they’ll figure out the rest.

Why is web analytics relevant when talking about user experience?

Web sites exist to achieve goals of some sort. UX people, for various reasons, rely on qualitative research methods to ensure their designs meet those goals. Conversely, Web analytics people rely on quantitative methods. Both are incomplete without the other - one helps you figure out what’s going on, the other why. UX and WA folks two more groups that need help communicating; I’m hoping my talk in some small way helps them see how they fit together.

Is your book “Information Architecture for the World Wide Web” still relevant 11 years later?

Nah, not the first edition from 1998. It was geared toward developing sites–and information architectures–from scratch. But the second edition, which came out in 2002, was almost a completely new book, much longer and geared toward tuning existing sites that were groaning under the weight of lots of content: good and bad, old and new. The third edition–which was more of a light update–came out in 2006. I don’t imagine information architecture will ever lose relevance as long as there’s content. In any case, O’Reilly has sold about 130,000 copies, so apparently they think our book is relevant.

Does Facebook actually offer a better user experience after the redesign?

I really don’t know. I used to find Facebook an excellent platform for playing Scrabble, but thanks to Hasbro’s legal department, the Facebook version of Scrabble has gone the way of all flesh. Actually, I think it’s back now, but I’ve gotten too busy to fall again to its temptation.

Sorry, that’s something of an underhanded swipe at Facebook. But now, as before, I find it too difficult to figure out. I have a hard time finding (and installing) applications that should be at my fingertips. I’m overwhelmed - and, sometimes, troubled–by all the notifications which seem to be at the core of the new design. I’d far prefer to keep up with people via Twitter (I’m @louisrosenfeld), which actually integrates quite elegantly with the other tools I already use to communicate, like my blog (http://louisrosenfeld.com) and email. But I’m the wrong person to ask. I’m not likely Facebook’s target audience. And frankly, my opinion here is worth what you paid for it. Much better to do even a lightweight user study to answer your question.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they asked so nicely. And because I hope that someday they’ll bring me to their Danish event, so I can take my daughter to the original Legoland.

Janus Boye, April 15, 2009

Bob Boiko, Exclusive Interview

April 9, 2009

The J Boye Conference will be held in Philadelphia, May 5 to May 7, 2009. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here.

One of the featured speakers, is Bob Boiko, author of Laughing at the CIO and a senior lecturer at the University of Washington iSchool. Peter Sejersen spoke with Mr. Boiko about the upcoming conference and information management today.

image

Why is it better to talk about “Information Management” than “Content Management”?

Content is just one kind of information. Document management, records management, asset management and a host of other “managements” including data management all deal with other worthy forms of information. While the objects differ between managements (CM has content items, DM has file, and so on) the principles are the same. So why not unite as a discipline around information rather than fracture because you call them records and I call them assets?

Who should be responsible for the information management in the organization?

That’s a hard question to answer outside of a particular organizational context. I can’t tell you who should manage information in *your* organization. But it seems to me in general that we already have *Information* Technology groups and Chief *Information* Officers, so they would be a good place to start. The real question is are the people with the titles ready to really embrace the full spectrum of activities that their titles imply

What is your best advice to people working with information management?

Again, advice has to vary with the context. I’ve never found two organizations that needed the same specific advice. However, we can all benefit from this simple idea. If, as we all seem to believe, information has value, then our first requirement must be to find that value and figure out how to quantify it in terms of both user information needs and organizational goals.  Only then should we go on to building systems that move information from source to destination because only then will we know what the right sources and destinations are.

Your book “Laughing at the CIO” has a catchy title, but have you ever laughed at you CIO yourself?

I don’t actually. But it is always amazing to me how many nervous (and not so nervous) snickers I hear when I say the title. The sad fact is that a lot of the people I interact with don’t see their leadership as relevant.  Many (but definitely not all)  IT leaders forget or never knew that there is an I to be lead as well as a T. It’s not malicious, it has just never been their focus. I gave the book that title in an attempt to make it less ignorable to IT leaders. Once a leader (or would be leader) picks the book up, I hope it helps them build a base of strength and power based on the strategic use of information as well as technology.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Janus and his crew are dynamite organizers. They know how to make a conference much more than a series of speeches. They have been connecting professionals and leaders with each other and with global talent for a long time. Those Danes get it and they know how to get you to get it too.

Peter Sejersen, J Boye. April 9, 2009

Googzilla to Newspaper Titans: Keep Customers Happy

April 8, 2009

I absolutely love the intellectual ultimate fighting championship underway. In one corner is Googzilla–oops–I mean Google. In the other corner is the entire newspaper industry. Seems like a fair fight to me. The GOOG is a global behemoth. The company has a killer business model that provides users with oodles of “free” information and services. Sure, a motivated customer can buy services from the Google, but the fusion power of Googzilla is its business model that sells access to its customers. Google’s brand is a hot one. Google love is rampant. Sure, there are some complainers, when it comes to search systems, the Google is the love bunny.

When I read “Google’s Schmidt To Newspaper Publishers: Don’t ‘P#&% Off’ Consumers” here, I had to honk merrily. I know the top Googlers don’t think the grousing–er, escalating hostility–is amusing. In my opinion, I don’t think most of the Googlers understand what the newspapers’ problem is. PaidContent.org’s article does a great job of capturing the facts of the top Googler’s speech. What the article underscores is the general cluelessness of both sides of this battle about one another’s business zeitgeist. As I read the story, I though of Mark Twain’s A Connecticut Yankee in King Arthur’s Court. Same deal. Google is the future. The newspaper industry is the castle artisan. Everything the Connecticut Yankee did was magic. Same problem. Pretty funny when Mark Twain tells the story. Not so humorous for the traditional publishing companies. The traditional newspaper folks are trying fix a water problem with incantations. The Yankee repairs the leak. Pragmatism wins out over shamanism every time in Mr. Twain’s world.

I found this passage from the excellent PaidContent.org write up most interesting:

But Schmidt came down harder on concerns about intellectual property and fair use: “From our perspective, we look at this pretty thoroughly and there is always a tension around fair use … I would encourage everybody, think in terms of what your reader wants. These are ultimately consumer businesses and if you piss off enough of them, you will not have any more.”

If I were a betting goose, I would wager that some in the newspaper industry might have interpreted Mr. Schmid’s comments a somewhat arrogant. Not much Mark Twain in Mr. Schmidt’s alleged comment. Good advice in my opinion. Probably ignored though.

Stephen Arnold, April 8, 2009

Next Page »