Microsoft Chasing Google. Here We Go Again

August 31, 2009

I have to give credit to Microsoft. The company is persistent. The New York Times is dancing on froth in my opinion in its “A Hired Gun for Microsoft, in Dogged Pursuit of Google”. As the search engine optimization pundits and the azure chip consultants pile on board, the gutsy Microsoft will get another shot at search publicity. The addled goose has a different view. First, Microsoft is paying for traffic. Second, Microsoft has been making runs at Google for several years. Third, the Google Microsoft is chasing is no longer a search system. Microsoft has to do more than get eyeballs for its service. The challenge is to adapt to the competitive threat that Google now poses in three core sectors of Microsoft’s business model:

  • Stopping * any * revenue attrition from Google’s push into the enterprise
  • Responding to Google’s increasing array of mobile services, including the tie up with Microsoft’s enemy Sony
  • Leapfrogging Google’s push into back office services in sectors ranging from finance to publishing to video production.

The New York Times and most Google analysts simplify the Google challenge and Microsoft response to a one front war. Wrong. Google is fighting Microsoft with street fighting tactics in multiple sectors. Traditional armies are ill equipped to deal with this type of threat, attack, retreat, threat, attack, retreat method. See the Google trilogy for more on the Google tactics.

Stephen Arnold, August 31, 2009

News Corp versus the BBC, Internecine War Brewing

August 31, 2009

Read the BBC news story “Murdoch Attack on ‘Dominant’ BBC.  Note the following comment:

Mr Murdoch (the son, not the big dog) said free news on the web provided by the BBC made it “incredibly difficult” for private news organisations to ask people to pay for their news. “It is essential for the future of independent digital journalism that a fair price can be charged for news to people who value it,” he said. News Corporation has said it will start charging online customers for news content across all its websites.

In a high class mud slinging rejoinder,

Former BBC director general Greg Dyke said Mr Murdoch’s argument that the BBC was a “threat” to independent journalism was “fundamentally wrong”. He told BBC Radio 5 live: “Journalism is going through a very difficult time – not only in this country but every country in the world – because newspapers, radio and television in the commercial world are all having a very rough time.”

Quite an exchange. Both outfits find themselves in the headlights of users who are embracing different methods of getting information. The likely escalation? Knitted brows and hard stares at the country club.

Stephen Arnold, August 31, 2009

Very Large Databases – Googzilla Being Coy

August 31, 2009

I read Technofeel’s “VLDB09 Part Two” and noted another Google head fake. Technofeel points out that Google’s paw prints were all over the conference from his point of view. MapReduce and Hadoop (an open source semi MapReduce) presentations caught his attention. In my opinion, the most interesting comment in the write up was:

Finally, I ended my visit at VLDB09 with two presentation of Google Interns about data mining to get structured result sets out of semi unstructured pages with lists and tables.

These two Google papers are important. You can get links to them from Technofeel’s article. Let me make two or three observations:

  • The use of “interns” is a way for the Google to reward bright folks while keeping the big guns off the podium. The experience of the Google Books product manager makes this use of interns prudent.
  • The content of the papers is not intern grade. When you work through the two documents, you will learn that Google has made significant advances in methods for working out issues in manipulating Google-scale structured data and discerning context.
  • The traditional world of relational databases is on a collision course with Googzilla. Big data are part of the Google core competency.

Those are some interns because their co authors are among Google’s most sophisticated researchers and academic colleagues. Technofeel’s instincts are good. He may want to check the bios of the secondary and tertiary authors of these Google papers. The interns are not the hubs on these wheels.

Stephen Arnold, August 31, 2009

Google: Last Library

August 31, 2009

Cade Metz’s write up “Google Book Search. Is It the Last Library? Uh, Yes.” will cause some excitement among the anti Google Book contingent. Mr. Metz, like most Google watchers, is a bit like a Kentucky Derby horse when the gate opens. The stallions charge forward. The race is exciting, but it is tough to pick the winner until the first horse crosses the finish line. So, the race is on.

My thought is that Google and its book program is old news. I think that books are going to lose traction in the years ahead. I like books, but the costs and environmental impact suggests that books may become less a mass medium and more of a collectors’ sport.

I do not think Google is the “last library”, however. I think that Google’s focus is on the past. The future is in new types of content. I know the arguments for books. I have written eight or nine myself. Google’s vision for books is likely to distract some folks from watching other and, in my opinion, more important initiatives at the company.

Let me cite a couple of examples and then remind you, gentle reader, that this information appears in my Google studies which are for sale as PDFs from Here you go:

  • Google’s push into education. This is a big deal and books are a contributory stream, not the Mississippi River that is being carved by the Google glacier.
  • Google finances or what I call Google Global Bank. Check out Google’s array of money-related services. Big doings in that sector.
  • Google and the motion picture sector. The teaming with Sony is one leaf on a fast growing evergreen.

Google Books is a hot topic, but like many Google tactics, the excitement makes it difficult to see other disruptions the Google is setting off. If Google quit scanning books tomorrows, how many authors would assign Google copyright so that Google could sell their books to Google search users? I know I would toss my four publishers overboard in a heartbeat. Google can sell. Getting rights directly from me eliminates a problem and snags the higher value current information to boot.

Stephen Arnold, August 31, 2009

What Annoys Europe about Google Books

August 31, 2009

Short honk: The Financial Times’s “Europe’s Digital Library Stuck in the Slow Lane” contains an interesting comment:

…what really annoys the European book industry about Google’s ambitious digital library project is that only US internet users will be allowed to browse in it.

Forces in the US are lining up to derail Google Books. Several observations:

  1. If the service is stopped, who in the US will pick up the ball? Libraries, the Library of Congress, outfits like Elsevier, Thomson Reuters, or Wolters Kluwer?
  2. What if Google stops, then shifts its focus to scanning books in more friendly climes? Looking forward, the knowledge value of the collection may put the US in the Europe pickle barrel. No access.
  3. What if Google builds a book data center and anchors it outside the three mile limit? With a partner in Europe, the legal eagles will have a fine time figuring out which book, what jurisdiction, what copyright, and where the digital instance “is”.
  4. What if students and researchers decide to publish their books using Google’s various “digital Gutenberg” systems? With “new” books flowing directly into the Google, what happens to publishers who need compliant authors to keep the pipeline filled?

I don’t have answers, but I raise the questions and provide examples in Google: The Digital Gutenberg?

Stephen Arnold, August 31, 2009

Guardian Makes an Interesting Comparison

August 31, 2009

WolframAlpha, like business intelligence systems, make little sense to people like my father. Aged 86, my father has zero clue about obtaining from a search box the melting point of Inconel. Wolfram is not doing reverse public relations. The company is going to provide data feeds. There are quite a few data sets becoming available. Some are not so good: Others are useful: A few are bait: Some are misunderstood: What struck me was this sentence in “Wolfram Alpha to Open Data Feeds”:

Alpha is not a generalised search engine like Google that searches the web. Instead, much like the Guardian Data Store, Wolfram is curating data sets with financial, mathematical, scientific and other data that you can query using simple questions or that you can manipulate using mathematical formulae.

I would never have drawn a parallel between the Wolfram data and the newspaper writing about Alpha’s data.  The embedded link is a nice touch as well. In this marketing blog, I shamelessly flog my books. I suppose it is a natural evolution of a newspaper to follow in the web prints of the addled goose. Everyone has to toot his or her own trumpet.

Stephen Arnold, August 31, 2009

Google: Baby Steps with Image Recognition

August 31, 2009

With attention focused on Google Books and Google lobbying, modest technical innovations can be overlooked. The Overflight service flagged US7,580,568 “Methods and Systems for Identifying an Image as a Representative Image for an Article.” On the surface, what is the big deal about parsing a document with multiple images and taking one as a representative image? Google does this frequently. Navigate to Google News and look at the images positioned next to a news story.

google news

Now what if an article has an image but that image is not one that represents the information in the article? In the good old days when traditional publishers were kings, a human would flip through a photo archive, locate a suitable image, and mark up the copy to show the compositor where to put the picture. Google has automated this service. (Page 12, Column B, line number 49.) Not a big deal, but it is one that chops costs out of the process of assembling original mash ups of information.

One of the principal findings from my research into Google’s technology is that the company has been purposeful in squeezing costs out of operations that are often money bottlenecks when traditional methods are shoehorned into online. What I find interesting is that the system and method can be applied to a range of “images”, not just those in a magazine article or a book chapter.

Baby step or not, US7,580,568—filed in 2004—is now a patent. The plumbing and logic for the disclosed system and method have been in operation since late 2002 or early 2003. How the toddler has matured!

Stephen Arnold, August 31, 2009

SharePoint Shake with Facets

August 30, 2009

SharePoint. SharePoint. SharePoint. We have had a flurry of questions from organizations about the system, search, metadata, and facets. For a product that has been around in one form or another for years, the last week in August has been a SharePoint festival.

We did some digging for one client with a broken SharePoint search system. The question was, “Is there an easy way to add facets to our plain vanilla SharePoint search system?” The short answer is, No. The reason is that any Microsoft-provided component like the facet component, version 2.0, 2.5, or 3.0 requires dozens and dozens of manual steps. Get one wrong and you can create even more SharePoint shakes. We know when this happens because the certified SharePoint administrator has consumed so much coffee that her hands tremble from the combination of fatigue and caffeine. Ergo, SharePoint shakes.

This particular client did not want to buy or license a third party product. That is okay with us, but these tools from a wide range of vendors work pretty well. But the client is right.

We provided the client with several links. You may want to note these down because Microsoft does not keep its chickens in one coop and its cows in a field. The chickens and cows are mixed up and allowed to wander.

The free facet component is called MOSS Faceted Search. You can download it from Codeplex. Next you will need to documentation. We located what looked quite complete at another Codeplex page. The 3.0 version of the faceted search components are still in beta. If you have a copy of the 2.5 version, you may want to keep that handy in case the beta 3.0 goes south. You can restore your SharePoint and give 2.5 a whirl.

The documentation is detailed. It consists of:

You will need your arsenal of Microsoft tools in order to get the beta into gear.

We found Bob Mixon’s “SharePoint Search: Improving the Relevancy of Search Results” useful. The discussion of search scopes and metadata is useful. The observation I would offer is that Microsoft makes no real effort to deliver a finished product. Manual work is required. By way of contrast, a number of vendors offer a snap in solution for search that keeps these manual tasks to a minimum. A beta is a beta so be prepared for some excitement.

Stephen Arnold, August 30, 2009


August 30, 2009

TeraText is a search and content processing system that has a low profile. The company that licenses TeraText is SAIC, a large consulting firm with revenues in the $11 billion range. If you take a look at information about TeraText in the Overflight service, you see that the company has zero information in the current news stream. A visit to the TeraText Web site and a look at the news reveals that the most recent news was a May 2008 item about the TeraText SAFE Symposium in Washington, DC. Yet last week, I heard that one Federal agency is using TeraText. Obviously the product is still in the channel, just invisible.

So what’s going on with TeraText?

My email to my contact at SAIC bounced. I fired off an inquiry to the “contact us” link but heard nothing as I write this short news item.

Let me round up the bits and pieces of information that I have. As I get more information, I will keep you informed.

First, TeraText is an umbrella product name. SAIC uses the repository-based system for a wide range of client applications. Examples range from email archiving to data and document management in environments where security is job number one. SAIC has wrapped output functions around the data management system. A licensee can use TeraText to output documents and reports for print or electronic dissemination. Keep in mind that the word “tera” is designed to connote the system’s ability to handle terabytes of information.

Second, the company offers search and retrieval as a core service. If you zoom into the firm’s email archiving system, you will find that email can be indexed as it is created. As a result, TeraText offers a real time email search system. The roots of this product variant are deep in the US intelligence community.

Third, the product was evolved from research conducted in Australia. The original technology dates from the early 1990s. Although the technology has been updated on a regular cycle, the core principles of the system are now beginning to show their age. Nevertheless, for large-scale information processing where security is of great importance, the TeraText system will do the job. For the system, expect to pay six to seven figures to get the show on the road.

One of the last announcements in my files was a 2004 item about TeraText’s tie up with Hewlett Packard for Federal sales. Since that deal was announced, HP has purchased an enterprise publishing system company. I am not sure if HP is actively marketing the TeraText systems at this time.

Stephen Arnold, August 30, 2009

Optimizing SharePoint

August 30, 2009

Short honk: Quite a useful list of SharePoint tips and insights appears in “My Checklist for Optimizing SharePoint Sites.” The original post appeared in June 2009 when I was out of the country. We found the information about YSlow quite useful.

Stephen Arnold, August 30, 2009

Next Page »

  • Archives

  • Recent Posts

  • Meta