What If Google Books Goes Away?

September 21, 2009

I had a talk with one of my partners this morning. The article in TechRadar “Google Books Smacked Down by US Government” was the trigger. This Web log post captures the consequences portion of our discussion. I am not sure Google, authors, or any other pundit embroiled in the dust up over Google Books will agree with these points. That’s okay. I am capturing highlights for myself. If you have forgotten this function of this Beyond Search Web log, quit reading or look at the editorial policy for this marketing / diary publication.

Let’s jump into the discussion in media res. The battle is joined and at this time, Google is on the defensive. Keep in mind that Google has been plugging away at this Google Book “project” since 2000 or 2001 when it made a key hire from Caere (now folded into Nuance) to add a turbo charge to the Books project.

image

Who is David? Who is Goliath?

With nine years of effort under its belt, Google will get a broken snout if the Google Books project stops. Now, let’s assume that the courts stop Google. What might happen?

First, Google could just keep on scanning. Google lawyers will do lawyer-type things. The wheels of justice will grind forward. With enough money and lawyers, Google can buy time. Let’s face it. Publishers could run out of enthusiasm or cash. If the Google keeps on scanning, discourse will deteriorate, but the acquisition of data for the Google knowledge base and for Google repurposing keeps on keeping on.

Second, Google might agree. Shut up shop and go directly to authors with an offer to buy rights to their work. I have four or five publishers right now. I would toss them overboard for a chance to publish my next monograph on the Google system, let Google monetize it any way it sees fit, and give me a percentage of the revenue. Heck, if I get a couple of hundred a month from the Google I am ahead of the game. Note this: none of my publishers are selling very many expensive studies right now. The for fee columns I write produce a pittance as well. One publisher cut my pay by 30 percent as part of a shift to a four day week and a trimmed publishing schedule. Heck, I love my publishers, but I love an outfit that pays money more. I think quite a few authors would find publishing on the Google Press most interesting. If that happens, the Google Books project has a gap, but going forward, Google has the info and the publishers and non participating authors have a different type of competitive problem.

Third, Google cuts a new deal, adjusts the terms, and keeps on scanning books. Google’s management throws enough bird feed to the flock. Google is secure in its knowledge that the future belongs to a trans-national digital information platform stuffed with digital information of various types. No publisher or group of publishers has a comparable platform. Microsoft and Yahoo were in the book game and bailed out. Perhaps their platforms can at some point in the future match Google’s. But my hunch is that the critics of Google’s book project are not looking at the value of the information to Google’s knowledge base, Google’s repurposing technologies, and Google’s next generation dataspace applications. Because these are dark corners, the bright light of protest is illuminating the dust and mice only.

One theme runs through these three possibilities. Google gets information. In this game, the publishers have lost but have not recognized it. Without a better idea and without an alternative to the irreversible erosion of libraries, Google is not the miserable little worm that so many want the company to be. Just my opinion.

Stephen Arnold, September 21, 2009

Spidering Google Docs

September 20, 2009

I spotted the story “Google Ready to Unleash Spiders and Expose Your Google Docs” and wondered if it was a bit of humor. I did some poking around in my Google info repository, and I could not find any info that proved or disproved this Next Web story. You will have to make up your own mind. The addled goose does not use Google Docs because he is 65 and happy with his own ineffective, outmoded system for managing digital content. Tip: the addled goose pays a pile of pin feathers and on premises infrastructure that works just fine. No cloud need, that you. The sky is not falling. The Google will index Google Docs that a user has linked to from a publicly accessible Web site. This is similar to Google’s indexing PDFs on my Web site in my opinion. Read the Next Web article and decide whether you have been hit by a chunk of the Google cloud or having access to more public info.

Stephen Arnold, September 20, 2009

US Government Smiles and Frowns at Google

September 20, 2009

Short honk: You don’t read Beyond Search for political shenanigans. Two items struck me as important to Google’s long term content processing activities. First, the FCC seemed to use semaphores which I interpreted as supportive of Google’s view of net neutrality. There is a write up in the Washington Post which included this passage:

the proposal is expected to call for expanded guidelines on how operators like AT&T, Verizon and Comcast control traffic on their networks. One proposed rule would prevent them from discriminating, or act as gatekeepers, of legal Web content and applications.

Google was a cheerleader for net neutrality. That’s a smile for Google.

But the trusty newsreader is chock-a-block with CNN reports, newspaper stories, and technical Web sites’ reporting that the US Department of Justice wants the Google Book deal rethought. A good summary appears in Tom Krazit’s “DOJ: Google’s Book Settlement Needs Rewrite”.

Google saw a frown cross the face of former Google dance squad members.

How are these smiles and frowns germane to Google search. The net neutrality message may mean that Google’s push into TV land is semi-okay. The book rethink may spell major changes for what has become a real problem for the Googley folks. Book search itself may be forced off the rails. If this happens, some of the nifty slice-and-dice and repurposing functionality may be put on hold. Not good for the GOOG’s next generation search effort.

Stephen Arnold, September 20, 2009

Open Source Costs: A Contrarian View

September 20, 2009

I am skeptical about broad generalizations about “costs” and even more doubtful about “cheap”. My radar lights up when I see these terms applied to software and systems. If you don’t know how to get into your child’s Facebook.com account, the cost of hacking into the system can be pretty high, especially if you have to hire a person with a particular technical capability to accomplish what seems to be a trivial objective. There is a non linearity in software costs that most people don’t want to know about. Unfortunately when these “costs” become visible, the ensuing excitement can lead to staff turnover or big problems for the organization who found “a certain blindness” more desirable that clear sightedness.

When I read “Open Source Is the Freedom of Choice, Not Necessarily the Cheaper Option,” my microwave detector beeped. For me, the key point in the write up seemed to be:

Admittedly Open Source can be cheaper if you think of the code itself not costing anything. However nothing is free, time and therefore money will have been spent creating and modifying that code. To have adequate technical support and installation businesses should be prepared to value the product and the support provided. With Open Source you have the freedom of choice. You can choose to look at the online documentation and the wealth of technical books out there to implement what you need, you can also choose to support the Open Source Product. Or you can choose to hire an experienced professional (or even pay for training in house) to implement and support the product for you. Saying a product is cheaper can be interpreted that the product is somehow lesser than the competition. I do not feel that this is always the case, superior products can develop from close contact between developers and their clients. This is the value add that Open Source can bring to the table.

The author is not a coder, so if he / she were involved in either a proprietary or open source project, the “cost” of getting the system to work depends on the time and the billing rate of the people involved, the cost of lost opportunity, and the expense of any infrastructure or gizmos required to make the system work in a way somewhat proximate to the system specification.

Open source eliminates a license fee. The problem is that license fees for some mainstream systems in search are declining. One big software company has included an industrial strength search system with other software products. In effect, the licensing fee for the search and content processing system is zero because it is buried in other elements on the invoice.

My view is that the folks with technical expertise can save some money on both open source and proprietary software. The clueless—regardless of whether the software is open source or proprietary—will pay almost the same to get their system running, customizing it, and optimizing it for the organization’s specific needs. Just my opinion. The key drivers in cost boils down the capabilities of the individuals involved in a project.

Stephen Arnold, September 20, 2009

x

Connotate Tag Line

September 20, 2009

A reader sent me a link to a Web site because it contained the phrase “beyond search”. We checked. The Beyond Search’s goslings were delighted to find the Connotate logo and its tag line, which was new to us. the screenshot below presents the logo in context. The tag line is “Beyond Search”.

connotate logo

Here’s a larger snap of the logo and the tag line:

close up

My recollection is that Connotate’s use of the phrase “beyond search” is nothing new. But at our Saturday morning meeting (yes, I know, Saturday morning, sigh), some lively honking took place about the “ownership” of this phrase. Since my use of the phrase is a marketing ploy, I can’t get too excited. One of the goslings did quack at me about this. Boring.

I know that I did not think up the phrase “beyond search”. My recollection is that someone reviewing the draft of the study I wrote for the Gilbane Group suggested the phrase to me. My hunch is that the idea came from Ulla de Stricker, my long time wonderful colleague and unrelenting critic in Toronto. Anyway, the title “Beyond Search” appeared on my January 2009 monograph. The full title of that study is “Beyond Search: What to do When Your Enterprise Search System Doesn’t Work”. Believe me, quite a few enterprise search systems do not work. Licensees have limited options to get out of the swamp. Buy the book to find a route to safety. You can get information about the analysis of a couple dozen vendors’ next-generation search systems on the Gilbane Group Web site.

Prior to the publication of the book in 2008, I decided to use the phrase “Beyond Search” for this Web log, diary, and digest of my opinions / thoughts about search, content processing, and related subjects. I am delighted with the persona of the addled goose, the feathered friend whose voice dominates the more than 3,000 Web log posts.

In fact, I wrote a profile Connotate in my Beyond Search study. I found the firm’s system potentially useful, but the company had a low profile and was, in my opinion, navigating in the rough waters of real time business intelligence, a Bermuda triangle for some firms. That particular segment is a tough one. Within the last month, two services I used—TechFuga.com and Doggdot.us—seem to have sunk. The quality of the hits in other systems I monitor has begun to be affected by the increasing noise in the real time streams.

If you run the query “beyond search” on Google as I did a moment ago, you will find that this Web log is the top hit. I canned the listings on the first two pages of results and did not see a link to Connotate. My hunch is  that the Connotate Web site is going to have to beef up its SEO attractiveness. Their site does not appear high in the Google results listing for this particular query.

arnold beyond splash

The goslings checked out the Connotate Web site and noticed a blog and a podcast. The most recent posting was interesting because it touched upon Twitter. The content, however, focused on using Twitter as a tool, not as a content or intelligence source. This puzzled me. Connotate is in the business of processing streams to extract information. My hope was to read a blog post about how Connotate could make the Tweet stream immediately and directly useful in business intelligence.

That’s how one moves beyond search in my opinion. A company’s technology needs to wrestle the streams of content to the ground and put them in a Rear Naked Choke.

image

One cannot win in the information processing wars by writing about uses of streams; one wins by converting the streams to actionable intelligence at a low cost, in near real time, across multiple languages. That’s how one moves “beyond search” in my opinion.

Stephen Arnold, September 20, 2009

Autonomy Gets a Cheer from Smart Money

September 19, 2009

I saw this Financial Times news item and thought PR coup. The story was “Autonomy Buoyed by Broker’s Support.” Two points struck me. First, Goldman Sachs seems to suggest that Autonomy warrants a “push.” A “push” notion from smart money is good. Second, and more important, the news story contains the word “bid”. I read this as the most gentle suggestion that Autonomy may be acquired. The FT pointed out that other brokers were “mistrustful.” Interesting in my opinion.

Stephen Arnold, September 19, 2009

YAGG: Gmail Migration Glitch Bites Brown

September 19, 2009

Short honk: YAGG means “yet another Google glitch”. A happy quack to the reader who alerted me to “Google Apps Bug: You’ve Got (My) Mail.” I don’t know if the story is spot on, but I suggest you read it and tuck the info away for future reference. Cnet reported:

As a result of a bug in a Google Apps e-mail migration tool, some students at Brown University found other students’ e-mail in their in-box over the weekend as Google was moving their e-mail from Exchange to Gmail, Google confirmed on Friday. The problem affected a “handful” of organizations that use Google Apps, a spokesman said. He declined to specify how many were affected or how many individual users were affected.

Big deal? Ask one of the students. I would be annoyed.

Stephen Arnold, September 19, 2009

Comparing Database Costs: Duesenberg Versus Hispano-Suiza

September 19, 2009

One of the many boutique consulting firms in New York is the Edison Group. The firm’s work is good, certainly better than the azure chip firms whose intellectual antics entertain me each week. I have to say that I am on the other side of the Hudson River when I think about the firm’s findings about database costs. You can read “Edison Group, Comparative Management Costs Study: Oracle Database 11g vs. IBM DB2 Enterprise 9.5”, which consumes 78 pages. The study carries a copyright of 2008 and a first publication date of January 2009. Today is September 18, 2009. My hunch is that the document was generated for a client and has been in circulation for months. I just learned about it, and I made an effort to read it with its nine month old data in mind. I don’t have any quibble with the charts, graphs, and numbers in the report. When one writes about traditional databases, the costs are irrelevant. In fact, in my opinion, it is like two old car buffs arguing about the merits of a Duesenberg and a Hispano-Suiza. Owners of these autos know that if you have to ask how much the vehicles “costs”, the person asking the question cannot afford the vehicles and probably will be unhappy shoveling money into the whirlpool that sucks cash to keep the Duesie and the Suiza humming. Traditional databases are equally voracious money pits.

The Edison Group discloses in its 78 page study, the following items. I have included the page number on which I located each of these points. I have selected only four items because I don’t want to spoil your fun when you read the original.

First, the Edison Group reported:

Benefiting from increased DBA productivity due to lower complexity and higher efficiency cited above, businesses could save up to $35,155 per year per DBA by using Oracle Database 11g rather than IBM DB2 Enterprise 9.5.

This statement from page 5 suggested to me that the study was funded by Oracle, and test data would demonstrate that Oracle is a money saver compared to IBM’s DB2. The precision of the number $35,155 is one of those numeric oddities I enjoy. The savings do not amount to $35,156, nor do I know what the confidence level the numerical method delivered; for example, a confidence level of plus or minus 50 percent gives me one sense of the savings. My hunch is that the difference in costs is probably like the Duesenberg – Hispano-Suiza analysis. For a big company, the cost differences may not be material because of the indirect costs these long-in-the-tooth data management systems impose.

Second, the weighting for the study makes it clear that indirects are not considered nor are capital costs. The table revealing this rather narrow focus on one specific set of administrative costs caused me to chuckle. You can find the table on page 15 of the report. One quick example: day to day admin has a weighting of 34 percent. Now the admin load depends on a number of factors. In these days of petascale data flows, the notion of a 34 percent weighting strikes me as low. Here’s why: the inability of these dinosaur-like database systems requires lots of fiddling around to build datacubes, to update indexes, and to deal with the hands-on fiddling dinosaurs require.

On page 31, the Edison Group reports that the two systems have improved. Here’s the passage I marked:

The differences between the two platforms have slightly increased over the years that Edison has been performing these studies. While IBM has done a good job in addressing many of the criticisms Edison has leveled in the past, most significantly in the areas of general system maintenance, Oracle has also significantly improved its offerings in these areas.

Wow. The idea that these systems have evolved and that Oracle is better than DB2 surprised me. Here’s why:

  1. Oracle and DB2 are both old-style data management systems
  2. The two companies have similar approaches to selling, cultivating partners, and solving performance problems. In fact, the easiest way to make either Oracle or DB2 perform better is to throw hardware at the problem.
  3. Speed up methods so both systems can handle near real time index updates are not included in either system. In fact, an IBM invention that was used with the permission of IBM in the original Speed of Mind speed up “snap in” worked equally well on DB2, Informix, Oracle, and SQL Server. The point is that these old style data management systems suffer similar performance problems related to the engineering in the basement of the data management systems.

In short, Oracle and DB2, like the Duesenberg and the Hispano-Suiza are relics, expensive to own and maintain, and likely to become collector items.

Studies like this one from the Edison Group are useful for organizations who understand that data management has to be a name brand solution from a name brand company. The consultants may not realize that the foundation on which these Codd systems stand is being eroded. The notion that a company like Aster Data or InfoBright are precursors of even more significant disruptions from other firms is foreign.

The data management disruption will have a significant impact on today’s dominant data management companies. In my opinion, when petascale data flows impose sufficiently high costs, customers of Oracle and IBM RDBMS and data management systems will look for a lower cost, speedier, less expensive, more stable, easier to use option. That option will not come from today’s data management leaders. What data management systems does Google use? What happens if these are made available as a key component of Google’s enterprise services? Interesting questions and questions not addressed in the quite interesting analysis of the data management world’s Duesenberg’s and Hispano-Suiza’s of Codd technology. Just my opinion.

Stephen Arnold, September 19, 2009

Google Noop

September 19, 2009

The goslings and I are working on a new project about Google development tools. We felt comfortable that we had the topic nailed. Then one of the ArnoldIT.com team read “Google Urges Developers to Get in Loop with Noop.” Kelly Fiveash wrote:

Mountain View said that Java Virtual Machine-based Noop, which is pronounced ‘noh-awp’, “attempts to blend the best lessons of languages old and new, while syntactically encouraging industry best-practice and discouraging the worst offences.”

She reported:

The goal is to build dependency injection and testability into the language from the beginning, rather than rely on third-party libraries as all other languages do,” said Google on the Noop code website.

Google continues to develop tools appropriate for its platform. With Java fraught with some exciting challenges, the Googlers grabbed the Java bag and ran it through the Google mill. Important step. The testing component puts a cost trimming safe guard in place that some of its competitors lack. Google is interested in other traffic efficiencies as well; for example, one way streets I think.

Stephen Arnold, September 18, 2009

Google Gives News Corp. an F in Financial Reasoning

September 19, 2009

I have been busy making videos. As a result, I have been taking direction from a film maker and hassling the goslings for a punchy script. In a few weeks, the videos will be available without charge, and you will see me, the goslings, and a surprise nerd floundering as we try to make complicated stuff simple.

As a consequence, I have fallen behind in my reading about the dust up between Google and the publishing industry. I did read with considerable interest a write up in Maximum PC. I used to buy the  magazine, but my recollection is that the last issue cost $6, maybe $7. When I worked at Ziff, I used to get PC magazines free. Well, those days are gone, so the publishers have to find new ways to earn money to pay for their Yacht Club memberships. The New York City outpost is across from the Royalton Hotel in what has become an upscale street in the last decade.

image

Image source: http://riverdaughter.files.wordpress.com/2009/03/report-card-f-011409l.jpg

Google CEO Spurns Murdoch’s Paid Content Plan” grabbed my attention. I like the word “spurn”, which means to me to reject with disdain; scorn; to treat with contempt; despise; and to kick or trample with the foot. I used the word with the “foot” sense in Ms. Sperling’s English class and earned her disdain. (She did not know the “trample with the foot” meaning, and I did. Snort.) Maximum PC’s story wrote:

He believes that it is highly unlikely that internet users will be willing to pay for accessing general news items on the internet given the nimiety of free news sources on the internet. “In general these models have not worked for general public consumption because there are enough free sources that the marginal value of paying is not justified based on the incremental value of quantity. So my guess is for niche and specialist markets … it will be possible to do it but I think it is unlikely that you will be able to do it for all news,”

If true, Google seems to perceive that News Corp.’s financial wizards have earned an F in financial analysis from Google’s Singularity University. Here in Harrod’s Creek, my neighbor would just say, “Ain’t got no clue.” I like the “spurn” word, however. Both Google and my neighbor would agree on one point, I believe. An F in financial analysis. Does News Corp. care? Does Google care? Nope, two ships passing in the night. One is the Titanic. One is a nuclear powered destroyer. Which is which?

Stephen Arnold, September 19, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta