The London Times Blocks Newsnow.co.uk

January 10, 2010

The big experiment is underway. News Corp believes that its content is worth money. Outfits indexing public Web sites find that certain content is no longer available for spidering and indexing. Newsnow.co.uk—a UK based news headline service—posted a message I saw earlier today (January 8, 2010).

newsnow blocking

In my experience, a shift from one medium to another does not automatically bring the former customers along. In fact, there is attrition. The question is, “How many users of aggregation sites or Web indexing services like Google will change their information grazing habits and pay for the content?” In my experience, each medium attracts a user base with a distinct fingerprint.

Online services are tricky beasts. A spreadsheet jockey can plug in assumptions to make a move like the pay wall around certain content work on paper. Getting those assumptions right is a much more difficult task. It is sufficiently difficult that there are more failures than successes.

Can Rupert Murdoch make his online strategy generate enough money to make up for rising costs and declining print advertisements? The name of the game in online is getting lots of information. The content then works like a magnet, pulling users to the information. Mr. Murdoch is betting that his online strategy which other content producers seem to be emulating will work.

My experience suggests that the cost may be very high and lead to severe cost reduction actions.  I don’t think he will fire himself. Others may not be so lucky.

Stephen E Arnold, January 10, 2010

This is a freebie. No one paid me to point out the long odds that News Corp faces. I suppose I must report this to the Jockeys Club. Quite a risky wager in my opinion.

Google and Its Toughest Rivals

January 9, 2010

Let’s see. Google is disrupting a number of business sectors. These range from mobile phones to publishing. There are some other sectors in the Google Hummer’s headlights, but you have heard about the Google phone, haven’t you?

ComputerWorld ran an amazing article called “Google’s 10 Toughest Rivals.” My initial reaction was, “Just 10?” The core of the story is that after 11 years, Google faces some competition. I don’t buy this assertion, but that is neither here nor there. My view is that Google has a dominant position, not just in search, but in the computational platform game as well. No, I am not forgetting Apple, but Apple is Sony and Google has become the Microsoft on steroids for the now generation.

Here’s a snippet from the ComputerWorld story that I noted:

Until now, Google’s biggest frenemies were the traditional media: newspapers, magazines and TV stations that create online content Google searches and that buy online advertising from Google. But as its portfolio has grown to encompass more than 150 products — including free, hosted versions of popular software applications — Google has attracted an array of tech industry competitors.

ComputerWorld is playing the get clicks game because of Google. You will have to click six times to learn who the 10 competitors are. I will give you one hint: Yahoo is on the list.

What is omitted from this story are these points in my opinion:

  1. What is the competitive advantage each of these competitors has that Google does not possess? Yahoo as a competitor is a notable example of the analytic depth of this write up.
  2. What is the competitive line up for Google outside the US? Mentioning Nokia won’t cut it because Nokia has fallen on its sword and the company has its own management, not Google, to blame in my opinion.
  3. How similar is Google to companies with a single competitive challenge to Google? An example is the reference to Facebook, arguably a Googley company in some ways. Is Facebook different from Google or is Facebook similar to Google? What are the key points of each?
  4. Is Google really competing against companies? In my research, I think Google intentionally annoys and makes fun of other companies, but I think Google perceives most of its competitors somewhere in the space between “ants at a picnic” and “irrelevant”. Sure, this seems arrogant, but the Google, in my opinion, is in a different technical space, so company to Google comparisons fall short of the mark.

Read the ComputerWorld story. I think that most folks do not understand what Google has built via its methods over the last 11 years. As I have pointed out in my Google studies and in this Web log, Google is more like a New World than an challenger to a single business. Google will remain mostly unchallenged because pundits, mavens, and wizards cannot perceive the Google in a way uncolored by simple and superficial generalizations about markets and competition.

Stephen E. Arnold, January 8, 2010

Okay, this is a freebie. I am sitting in a client facility and getting paid to think about publishing, not Hulu and Google. I have to report this conflicted situation to the National Institutes of Health, an outfit that knows all about mental distress.

Google and Fine-Grained, Point and Click Access

January 9, 2010

Want to know about search without search? This is one of those write ups that make clear how search has morphed in the last five or six years. Why type a query when one can point and click?

Today we had a client call and ask about Google’s faceted navigation. This is a buzzword for providing links to users. The user scans the suggestions and picks the one that appears to be on target. These types of point-and-click interfaces are essential because most folks don’t like Boolean. The thought is that spotting a suspect is easier than formulating a Boolean query. Point-and-click does not ring my chimes, but those who are much younger, enthused by iPhones, and like life simplified are pretty darned excited.

The conversation turned to Google. The caller pointed out that Google did not have a point-and-click or what I call a “training wheels” interface. I tried to be gentle like the average goose. But I am an addled goose so I pointed out that the caller needed to navigate to Google.com and enter the query “cancer receptor”.

There is a plus sign in the left hand column. Click the plus sign and you see some big, chunky categories. Now click the phrase “Related Searches” under the bold faced heading Standard View. Here’s what you should see:

cancer receptors

As you can see, there are lots of links to spot and click. No big thinking required. Now if you want to see what’s coming down the trail for this type of query, take a gander (no pun intended by the addled goose) at Search over Structured Data, the 60 page patent application published on July 19, 2007 and filed in October 2005. The number is US2007/0168331. There are a number of inventors who seem to have some affection for Ramanathan Guha’s PSE team.

In that document, Google discloses a system and method to add yet a finder layer of point-and-clickiness; namely, inserting such components as Pubmed, Newsource, Authors, Citations, and the always exciting More. The point is that Google has a fine-grained and a broad stroke point and click capability.

Back to the client who said Google lacked this skill. The point is that making a general statement about Google’s cluelessness is risky. One needs to examine what Google does here and now and then consider what is evident in January 2010 against the context of what is a five year old system and method.

Google is endlessly surprising to some for three reasons:

  1. Folks don’t look at what Google has available. Man, time is short for some folks today. The idea is that if these folks don’t know about a function, that function must not exist. Too bad this approach does not work reliably for Google functionality.
  2. Folks don’t read Google’s clear statements of its technical systems and methods. I know that patent applications are not the most exciting things a busy 20 something can do on a January afternoon, but once in a while looking at the real deal documents, not a blog post, can be illuminating.
  3. Folks don’t know what they are looking for, so even when Google puts the purloined letter on their keyboard, the letter is invisible. I don’t know how to remediate this, but that’s why I am an addled goose and not a sleek, confident master of the universe.

Before making a generalization about Google, I think it is helpful to know exactly what Google offers as well as what its engineers have disclosed in technical documents freely available to anyone who takes the time to look and read.

Stephen E. Arnold, January 9, 2010

I wish to disclose to the USPTO that I was not paid to point out the value of their honorable work, no matter how dull it may be.

The Paywall Chronicles: The Value of Fuzziness

January 9, 2010

Short honk: Getting money for electronic information is tough. A peek inside a new media financial concept appears in “Steven Brill’s Growing Mound of Twaddle.” The most interesting part of the write up is a chart that plots alleged customers with announced customers. Zero seems to flat line as the announced customer tally soars.

Stephen E. Arnold, January 9, 2010

A freebie. I know at least one reader thinks I am a PR bunny. I suppose I must report this to the Fish & Wildlife crew.

X1 Wants a UNC

January 9, 2010

Short honk: We are grinding through our annual test of various search systems. Today with the X1 system we noticed an issue that might confuse some. When you want to specify a drive to index such as our Drobo with the test collection, we now follow this procedure:

  1. Click on Tools, Options, Index, Files
  2. Browse returns the network share
  3. Delete what’s in the box and add the UNC path; for example, \\Nippy\Drobo-p\…..

A sample entry in the Files box would be helpful. The path that does not work is not helpful.

Donald Anderson, January 9, 2010

The addled goose paid Engineer Anderson to work out this method. No one else kicked in any dough. I will report this to the Railway Retirement Board. Ooops, not that kind of engineer.

Bing and Slow Indexing

January 8, 2010

Short honk: I noticed a couple of years ago that for certain queries, Microsoft was faster than Google at displaying results. A bit of sleuthing revealed that Microsoft was caching aggressively. One of the people with whom I spoke suggested that Microsoft cached everything and as close to users as possible. We don’t have a data center in Harrod’s Creek, but close enough. Second, Microsoft was indexing only Web sites that were known to generate hits for popular queries. Unpopular Web sites at that time were skipped in order to speed up indexing. At a gig at a certain large software company a person in the know about Microsoft search told me that the expensive speed ups were a thing of the past. Microsoft was in Google territory. Sounded reasonable.

Flash forward to the here and now. Read “Microsoft Admits that Bing Is Slow at Indexing.” If this article is correct, Microsoft has not been able to resolve certain turtle like characteristics of its Google killer, Bing.com. For me, the most interesting comment was this quote allegedly made by a person in the know:

It is well known in the industry that MSNbot is fairly slow. I suggest reading our FAQs stickied at the top of the indexing forum to get some ideas of what to do.

Yikes. No wonder Google is pushing the “speed angle” as one of its marketing themes.

Stephen E. Arnold, January 8, 2010

Unpaid was I. I wrote for free. I must report to the Superfund Basic Research Program. What? You expected poetry?

Billing Google the Government Way

January 8, 2010

A happy quack to the reader who alerted me to this Canada.com article “France Considers ‘Google Tax’ to Pay Creative Workers”. Google has some horsepower, but it must tax its customers indirectly. Countries do not have to fool around with indirect taxes. You get a bill and you pay. If you don’t pay, life becomes interesting. What’s interesting about this angle is summed up in this comment:

The levy, which would also apply to other operators such as MSN and Yahoo, would put an end to “enrichment without any limit or compensation,” newspaper Liberation quoted Guillaume Cerutti, one of the authors of the report, as saying. It would apply even if the operator had its offices outside France, as long as the Internet users who click on ad banners or sponsored links are here, the paper said.

If France pulls this off, it is possible to tax Google into submission. I thought lawyers were going to be one of the Google killers. I forgot to include the tax authorities. The article does not dig into definitions and limits. But it is a stylish idea in my opinion. Ah, the French. Endlessly surprising.

Stephen E. Arnold, January 8, 2010

I was not paid to write this article. I will report this to the Jefferson County tax authority, an outfit that wonders why I live in Louisville but I do not work in Louisville. It’s only been 20 years.

Quote to Note: Ballmer on Bing

January 8, 2010

Short honk: My source was Gizmodo. Here’s the keeper: “We Bing and we Bing and we Bing. Bing! Bing! Bing!” In terms of key word density, this statement is exemplary.

Stephen E. Arnold, January 8, 2010

A freebie and almost too modest a writing effort to report. But the agency responsible for modest undertakings is the Department of Defense, and I herewith say, “No dough for this article.” My cash register did not make the sound “bing”.

Google Book Scanning Tech Stripped Bare

January 8, 2010

The Google seems to be sitting back and letting some wizards in Japan explain how Google scanning works. Patent documents don’t make much sense to my dad. If you have access to universities participating in a Google Books scanfest, you have a shot at seeing what happens. Oxford University’s a good place to start in my opinion. If you are not in that part of England or another university on board with the Google, you can read “Google’s Book Scanning Technology.” For me the most interesting comment was:

Researchers Nakashima, Watanabe, Komuro, and Ishikawa of the University of Tokyo have published an article fully explaining and providing pictures of a system nearly identical to that in Google’s patent. It is not clear whether the Japanese researchers or Google came up with the idea first, but the University of Tokyo article does an excellent job of explaining the book scanning technology.

I quite like the phrase “came up with the idea first.”

My thoughts:

  1. This article may make clear to those in the publishing business exactly what resources the Google Book project has commanded. I am not sure that the Google will back down or even slow its scanning. Too much skin in the game.
  2. The method for dealing with distortion is math centric. Big surprise. What’s interesting in the patent documents is how much math. I don’t think most publishers have had an appreciation of what Google math can do. A lot in  my opinion.
  3. The article does not make too much of speed. My blundering around uncovered some data that suggest the Google method zips along at an order of magnitude or three faster than more traditional systems. If you are not up to speed on the throughput of systems like those in use at the former UMI, the notion of scanning speed won’t make much sense. In a word, fast—the Google scans, corrects, blasts to disc, and links metadata without dragging its high heeled sneakers.

Worth a quick read in my opinion. You get pictures and even a video.

Stephen E. Arnold, January 8, 2010

I must reveal that I am not a PR shill, a paid writer, or even much of a waterfowl in winter. I will report this sad state of affairs to those who think I am a public relations flack and, of course, to the Rural Housing Service, a fine group monitoring blogs from Kentucky.

Lazarus, Azure Chip Consultants, and Search

January 8, 2010

A person called me today to tell me that a consulting firm is not accepting my statement “Search is dead”. Then I received a spam email that said, “Search is back.” I thought, “Yo, Lazarus. There be lots of dead search vendors out there. Example: Convera.

Who reports that search has risen? An azure chip consultant! Here’s what raced through my addled goose brain as I pondered the call and the “search is back” T shirt slogan:

In 2006, I was sitting on a pile of research about the search market sector. The data I collected included:

  • Interviews with various procurement officers, search system managers, vendors, and financial analysts
  • My own profiles of about 36 vendors of enterprise search systems plus the automated content files I generate using the Overflight system. A small scale version is available as a demo on ArnoldIT.com
  • Information I had from my work as a systems engineering and technical advisor to several governments and their search system procurement teams
  • My own experience licensing, testing, and evaluating search systems for clients. (I started doing this work after we created in 1993 The Point (Top 5% of the Internet) and sold it to Lycos, a unit of CMGI. I figured I should look into what Lycos was doing so I could speak with authority about its differences from BRS/Search, InQuire, Dialog (RECON), and IBM STAIRS III. I had familiarity with most of these systems through various projects in my pre Point (Top 5% of the Internet life).
  • My Google research funded by the now-defunct BearStearns outfit and a couple of other well heeled organizations.

What was clear in 2006 was the following:

First, most of the search system vendors shared quite a bit of similarity. Despite the marketing baloney, the key differentiators among the flagship systems in 2006 were minor. Examples range from their basic architecture to their use of stemming to the methods of updating indexes. There were innovators, and I pointed out these companies in my talks and various writings, including the three editions of the Enterprise Search Report I wrote before I fell ill in February 2007 and quit doing that big encyclopedia type publication. These similarities made it very clear to me that innovation for enterprise search was shifting from the plain old key word indexing of structured records available since the advent of RECON and STAIRS to a more freeform approach with generally lousy relevance.

image

Get information access wrong, and some folks may find a new career. Source: http://www.seeing-stars.com/Images/ScenesFromMovies/AmericanBeautyMrSmiley%28BIG%29.JPG

Second, the more innovative vendors were making an effort in 2006 to take a document and provide some sort of context for it. Without a human indexer to assign a classification code to a document that is about marketing but does not contain the word “marketing”, this was rocket science. But when I examined these systems, there were two basic approaches which are still around today. The first was to use statistical methods to put documents together and make inferences and the other was a variation on human indexing but without humans doing most of the work. The idea was that a word list would contain synonyms. There were promising demonstrations of software methods that could “read” a document, but there were piggy and of use where money was no object.

Third, the Google approach which used social methods—that is, a human clicking on a link—were evident but not migrating to the enterprise world. Google was new but to make their 2006 method hum, lots of clicks were needed. In the enterprise, most documents never get clicked, so the 2006 Google method was truly lousy. Google has made improvements, mostly by implementing the older search methods, not by pushing the envelope as it has been doing with its Web search and dataspace efforts.

Fourth, most of the search vendors were trying like Dickens to get out of a “one size fits all” approach to enterprise search. Companies making sales were focusing on a specific niche or problem and selling a package of search and content searching that solved one problem. The failure of the boil the ocean approach was evident because user satisfaction data from my research funded by a government agency and other clients revealed that about two thirds of the users of an enterprise search system were dissatisfied or very dissatisfied with that search system. The solution, then, was to focus. My exemplary case was the use of the Endeca technology to allow Fidelity UK sales professionals to increase their productivity with content pushed to them using the Endeca system. The idea was that a broker could click on a link and the search results were displayed. No searching required. ClearForest got in the game by analyzing the dealer warranty repair comments. Endeca and ClearForest were harbingers of focus. ClearForest is owned by Thomson Reuters and in the open source software game too.

When I wrote the article in Online Magazine for Barbara Quint, one of my favorite editors, I explained these points in more detail. But it was clear that the financial pressures on Convera, for example, and the difficulty some of the more promising vendors like Entopia were having made the thin edge of survival glint in my desk lamp’s light. Autonomy by 2006 had shifted from search and organic growth to inorganic growth fueled by acquisitions that were adjacent to search.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta