Microsoft’s Data Robustness

January 11, 2009

The “we may go out of business” Seattlepi.com Web site ran a story with the cruel title “Microsoft’s Servers Overloaded by Interest in Windows 7.” You can read this sort of weird headline and its accompanying story here. The story makes clear that Microsoft’s investments in its data centers was not up to the load imposed by the faithful downloading Windows 7.

The misstep was described as a “borkfest” by Lifehacker here. This goose isn’t sure what a borkfest is, but he can make a guess. Gina Trapani’s article nails the problem. She wrote:

If lack of infrastructure to handle an insane traffic spike over a few hours was truly the problem (even though these were conditions Microsoft created), there are lots of alternatives they could’ve used that would have kept their servers up. In fact, users have been happily downloading and distributing the Windows 7 beta build 7000 now for weeks using an efficient file-sharing protocol called BitTorrent.

When the GOOG streamed its live concert test last year, the Googlers tapped Akamai. Did Microsoft use its own content delivery network? Did Microsoft contract out the job? Whoever handled the job may want to check out another line of work in my opinion. Seattlepi.com quotes a Microsoft Web log. I noted this sentence: “We are adding some additional infrastructure support to the Microsoft.com properties before we post the public beta.” Good idea.

Stephen Arnold, January 11, 2009

Yahoo: Slipping and Dipping

January 11, 2009

I have deep skepticism about third party data. Nevertheless, when reports about Web site traffic and online advertising share appear, the data get snapped up the way Tess goes for a dropped chicken wing. Silicon Alley Insider’s “Yahoo’s Share of All Search Advertisers Drops 36% in QY (YHOO)” is worth reading. You can find the story and the scary red line here. Let’s assume the data are accurate. Bad news for Yahoo. Let’s assume the data are off a tad, say, down 18 percent in Q4. Slightly less bad news. If the Yahooligans continue to slip, the GOOG benefits. Yahoo started as a directory, became a portal, and then floundered. Like a person overboard in the Arctic waters off Nordkaap, even a strong swimmer succumbs. A weak swimmer, well, not much chance. Yahoo is now in the Arctic waters.

Stephen Arnold, January 11, 2009

Business Week: All Over the GOOG

January 10, 2009

Business Week may want to rename one of its editorial sections “Google Week.” The editors at Business Week crank out articles about Google. Most are interesting, but some of the Google coverage is–well, let me be gentle–obvious. Here’s an example, “Small businesses Love Google, Even When things Go Wrong.” Now we know that search is not very good. I know that folks with multiple PhD’s and big IQs will beg to differ but I point to the research I have done, Jane McConnell in Paris has done, and that Martin White in London has done. Our data reveal that about two thirds of the users of a search system are dissatisfied. Now Business Week has embraced a Neilsen-WebVisible survey that says 92 percent of Internet users are satisfied with Web search. But–and this is an important “but”–“39 percent of them frequently can’t find companies they’re looking for.” Search doesn’t work too well. Imagine that. You must read the Business Week article here which includes a link to the news release from the big time research outfit here. In my opinion, the reason people love Google has to do with the imprint Google has stamped on two thirds of the people who look for information on Google. Google is search. Search is Google. If a free service works in a manner one can describe as “good enough”, that’s okay. The key is the brand power and magnetism Google possesses. Perception is a big part of a search system’s success. Google’s been working on perception for a decade, and the GOOG has done a bang up job. Now if we can shift people from their grip on the view of Google as an ad company, I would be a happier goose.

Stephen Arnold, January 10, 2009

Microsoft Innovation According to Network World

January 10, 2009

Mitchell Ashley wrote “Top 8 Microsoft Research Projects to Improve Our Lives” here. Straight away, let me remind you that I am a goose, an addled goose at that. I doubt very much that Mr. Ashley was thinking about geese when he penned his headline. What are the eight research project that may improve a human’s life? For starters, these eight “socio-digital systems (whatever that phrase means) include “I know it’s here somewhere.”

The idea is that big hard drives are available, economical, and sucking data into their depths. But wait, humans use other digital devices like SD cards and USB sticks. (Not this goose. I lose them.) So there are two socio-digital innovations that address this problem. You will have to read Mr. Ashley’s article for the other six innovations. I focus on search and content processing.

The first innovation is called “digital shoebox” and “family archive”. Here is what Mr. Ashley wrote:

It’s like the data management version of the cryogenic-freezing program: We all keep creating personal digital content and buying more disk drives in hopes that someday they’ll discover a cure for the information archiving, searching and retrieval of all that stuff before our time on earth is up.

Now I must admit that I had a tough time figuring out what these innovations are. I turned to Microsoft Live Search for elucidation. I noted a reference dated 2002 to this technology, and I saw a December 5, 2007, Business Week article here about this innovation. I jumped back to the 2002 reference to digital shoebox research here and then back to the 2007 reference. Same invention. I asked myself, “At what point does an innovation become JAT or just another technology. I think five years is a long time to move from innovation in one part of the R&D to the public  relations office in another part of the R&D department.

From my quick scan of these documents, I think a server that indexes and points to where information objects are. I am not sure how the digital shoebox works on non-text objects, what metadata are generated, how the index update operates, or the indexing overhead.

The family archive proved to be easier to locate. Microsoft offers a brief description here. The key point for me was:

This project aims to understand the needs of families to interact with, manage, and archive materials which are important in preserving and sharing family memories. We are developing a system which allows the input and safe archiving of both digital and physical media, and which allows natural interaction with those media. This work has been informed by our in-depth studies of “photowork” and of “videowork“.

I think the archive adds smarter software to the digital shoebox.

My hunch is that Microsoft wants to make it easy for a Vista user to dump data anywhere. The Microsoft technology will sniff out the data and index it. When the user wants to find something, the “server” (probably a software component, not a power sucking six figure system) will allow the user to browse, search, and click metadata like a date or some other tag like Wesak and see hits that match.

A couple of thoughts.

These  are interesting search and content processing ideas. I need to test these systems to see if my life becomes easier with them. My previous brushes with smart information object metatagging systems is a love-hate affair. Some systems I downright hate. Others I sort of love. So far none of Microsoft’s search technologies has made me swoon. I am thinking about search in Outlook, native search in XP, free SharePoint search, and the Byzantine Microsoft Fast ESP system.

Second, the notion of dumping data locally is out of step with what my research suggests young people want to do. The notion of dumping stuff is viable. The last set of interviews I did revealed that dumping should be automatic and the data dump should be located on a server somewhere. The idea was that when the device dies, a new device can suck data from the dump in the sky.

Finally, with Microsoft’s share of the search market slipping, Microsoft needs to make market share gains. R&D is okay when it yields more than words about innovation. I want innovation.

Stephen Arnold, January 10, 2009

Social Search: Manipulating for Money

January 9, 2009

Mike Elgan wrote “How China’s 50 Cent Army Could Wreck Web 2.0” here. The point of this article is that a person with money can hire Chinese computer users to insert comments into social networks. The infusion of posts would, in effect, distort the much-ballyhooed wisdom of crowds. Mr. Elgan does a good job of explaining how these army works and pointing out the fragility of user-dependent Web 2.0 services. I think he strays from the tethering ring when we asserts that the Chinese “army” can undermine free speech, but otherwise, he’s spot on.

However–and I know you relish my “howevers”–a few of my addled goose observations are now in order.

First, the “social network” revolution is not as zippy as most pundits assert. Mr. Elgan’s write up explains how the person with money can pay to make a specific issue, product, or person percolate upwards. Money can’t buy happiness but it sure can buy visibility in a Web 2.0 service that depends on user inputs.

Second, social networks is more of marketing story than a technology innovation. Sure, MySpace.com and Facebook.com move well beyond discussion fora and individual Web pages. These sites have knitted together functions and surfed on young-at-heart users who need a way to connect in today’s Jetson’s world. As the young-at-heart grow old and infirm, their use of network communication methods will persist, but these methods are extensions of older technologies, not sudden inventions.

Third, the implications of a technology cannot be accurately predicted. As a result, when an issue arises with a technology application or suite of technology applications like social networks, the “fix” will be more technology. My concern with MySpace.com and Facebook.com stems not from what they do, but my concern arises from the new technologies these services will require to handle the problems. For example, what’s the fix for the Chinese “army” issue? Think more stringent controls. The casualty is not free speech. It is freedom.

Stephen Arnold, January 9, 2009

Google Revenue: Why the One-Trick Pony View Persists

January 9, 2009

A one trick pony is a pony that does one thing. Like let kids sit on its back. The one-trick pony is highly prized among certain carnival concession owners. Kids who afraid of big animals may also cotton to the one-trick pony. Wall Street likes any type of animal as long as it makes a lot of money. Google’s viewed as a one-trick pony because the company makes money from advertising. Ignore the boundaries that separate Google’s different types of advertising. The one-trick pony at the carnival might do some more interesting things in its stall at night with another pony. For the carnival impresario and the Wall Street crowd, Google’s a one trick pony shown below:

Google Blogoscoped presented a textbook example of how the one-trick pony view of Google is perpetuated. Navigate here and scan the table that shows “how Google makes money”. The useful list of more than 80 products and services includes three ways to make money. For each product and services, Blogoscoped tallied how Google monetizes these services. The three ways for Google to make money are–you guessed it–one trick ponies. There are ads and some fees to get involved with ads; for example, to be an AdWord advertiser, Google charges. But this is a variation on the one-trick pony’s ribbons, not a change to the one-trick pony.

I think it would be useful to consider these types of revenue horses. In 2009, a couple of them will be given their heads. competitors may find these ponies harder to ride than the docile one-trick pony that sits quietly as noisy kids climb on and off; to wit:

  1. Payments from educational institutions for various Google services. Example: the fees paid by New South Wales to license Google services for school kids
  2. License fees from the oft-reviled but highly-disruptive Google Search Appliance
  3. Subscription fees to commercial Google products and services; for example, Google Earth or SketchUp
  4. Payments from partners to become one of Google’s best pals
  5. Fees assessed to organizations when one of their top dogs decides that paying Google for Postini email archiving is better than getting caught  unprepared for a discovery process.

image

Do you want to be standing flat footed in front of this group of ponies. Source: http://www.summers-photo.co.uk/Feb2007/images/Stampede_jpg.jpg

My list of other revenue ponies is longer than this group of five. Looking at the GOOG too closely or from one narrow angle makes it difficult to perform these tasks:

  1. Assess which pricing models could be implemented with little or no warning for unmonetized products and services; for example, charging me to look at pages when running a Google Book Search
  2. Place Google in a competitive context where advertising will not work; for example, Google charges for certain content constructs that it creates and are not available from other online services. Think a directory of specialized vendors in a specific market like video production.
  3. Understand what business models Google will have to implement in order to meet its financial objectives and Wall Street’s expectations; for example, if travel advertising goes down, what monetizing options are available to Google to address that shortfall.

If one wants to understand Google, one may want to keep track of the revenue herd. Granted ads generate a whopping 95 percent of Google’s “now” revenue. But going forward I like to watch those ponies. One or two may grow up to be different revenue animals. More about these options appears in my forthcoming Google and Publishing study available this spring from Infonortics Ltd. here.

Stephen Arnold, January 9, 2009

Search Goes Down, Google Turns on the Juice

January 8, 2009

I saw several Web log posts and major media (dead tree outfits) articles about the decline in Web travel searches. A representative story is “Internet Travel Searches Drop 42 Percent” here. UK journalistic endeavors amuse me no end. Honk. Honk. Laura Dixon wrote:

Internet searches for flights were down over 40 per cent in the week after Christmas according to Hitwise, a division of Experian. Traffic to travel Web sites for the same period – up to the week ending January 3 – was also down 16 per cent.

Interesting but not the sort of data that makes me flap my wings. The addled goose thinks that a quick visit to Google.com is in order. I wonder if Ms. Dixon has entered this query in the Google search box: SFO LAX. That’s it. Two three letter strings. These abbreviations refer to airports. Here’s what the GOOG displayed for me:

airschedule

My thought is that if the number of queries is down, what’s the value of appearing as one of the seven featured air ticket sources underneath the structured query insert? In fact, in the last few months, there’s been some shuffling of the featured carriers. Notice too that the Google system automatically converted the airport pair into a query. Set the dates, pick a vendor, and bingo you get a list of options. Pretty handy for mobile phone users too. I wonder if Microsoft will offer this feature on its forthcoming Verizon service.

To me the downturn in flight searches means that the Google will turn on the juice to get more revenue from advertisers who must get traffic. That’s a more interesting angle for the addled goose to consider. But I live in rural Kentucky and am not affiliated with an oh-so-excellent dead tree publication. There you go.

Stephen Arnold, January 8, 2009

Google Semantics Surfacing

January 8, 2009

ReadWriteWeb.com (January 6, 2009) ran an interesting article that tiptoes around Google’s semantic activities. You will want to read “Did Google Just Expose Semantic Data in Search Results”. Google won’t answer the question, of course. But the addled goose will, “Yep, where have you been since early 2007?” Let me point out that Marshall Kirkpatrick has done a good job of tracking down “in the wild” examples of Google’s machine-based semantic methods. These examples (and others in Google open source documents) make it clear that the semantic activities are chugging along and maturing nicely. “Semantics” as used in this write up means “figuring out what something is about.” Once one knows the “about” part of an information object, then other methods can hook these “about” metadata together. If you want to get a sense of the scope of the Google semantic system, click here. I have a checking copy of the report I wrote for BearStearns before that outfit went up in flames or down the drain. (Pick your metaphor.) My write up here does not include the detail that is in the full discussion in Google Version 2.0 here. But this draft provides some meat for the “in the wild” examples found in Mr. Kirkpatrick’s good article. How significant is the investment in semantics at Google? You can find some color on the sweep of Google’s semantic activities in the dataspace white paper Sue Feldman and I wrote (September 2008). You can get this report from IDC; it is report number 213562.

Let me close with three observations:

  1. Google is deeply involved in semantics, but with a Googley twist. Watching for examples in the wild is a very useful activity, especially for competitors
  2. The notion of semantics is sufficiently broad to embrace metadata generation and new types of metadata so that new types of data constructs can be automatically generated by Google. Think publishing new constructs for money.
  3. The competitors chasing Google face the increasingly likely prospect that Google has jumped over its own present position and will land even farther ahead of the likes of IBM, Microsoft, Oracle, and SAP. Yahoo. Forget them. The smart Yahooligans are either at Google or surfing on Google.

Now I expect some push back from the anti Google crowd. Have at it. Just make sure you have internalized Google’s technical papers, Google “talks”, and the patent documentation. This goose is not too interested in uninformed challenges. You can read more about Google semantics and in my forthcoming Google and Publishing study from my trusty publisher Infonortics Ltd. located near Oxford, England, in the spring.

Stephen Arnold, January 8, 2009

Lawyers and Metadata

January 8, 2009

Now the indexing world gets something to gnaw on. Automated indexing systems beat out humans when measured by cost per item indexed, speed, and consistency. Automated indexing systems can be as good as a human for some types of content. But humans are variably bad at indexing. Software hits a sweet spot and doesn’t get significantly better or worse unless the content throws in a wrench. Now the issue of not providing metadata arises. We can automate the creation of metadata, but it is early days in the world of automatic metadata scrubbing. I quacked happily when I thought, “I wonder who knows where their metadata are?”

Jim Calloway’s “Metadata–What Is It and Waht Are My Ethical Duties” here breathes new life into human indexing. What I find interesting is that lawyers charge by the hour. Human indexes are paid by piece work schedules or given a flat year fee and maybe some benefit crumbs. The economics of human indexing is based on keeping the per record cost as low as possible whilst one maintains the “quality” of the indexing. “Quality” in the commercial database world is often defined as a metric such as “four to six index terms per bibliographic record” or “16 records per hour with required fields completed”. You may have a more academic definition, but my examples come from the soon-to-be-marginalized world of human commercial database production.

The article defines metadata in terms of a legal eagle, of course. But the story gets interesting when Mr. Calloway cites a sitaution in which metadata became a legal issue. Where there is a legal issue, there is the risk of a fine, jail, or losing pride of place among the brood of legal eagles. Forget the compensation. Ego may be a bigger force in the legal eagle world. Mr. Calloway nicely hooks metadata with risk.

For me, the most important comment in this useful write up was:

In this writer’s view, the key is to avoid sending out documents with metadata that could disclose confidential information. Comparing metadata to a wrongly sent fax or e-mail is questionable and the idea that lawyers will be prohibited from examining metadata while parties, law enforcement officers and private detectives will be free to do so seems artificial at best. The Colorado rule that one must disclose receiving confidential information via metadata before acting on it seems to strike a rational balance. The best rule is for law firms to develop best practices internally to keep metadata from “escaping” in the first place.

I quite like “keep metadata from escaping in the first place”. To close, let me ask several questions:

  • Do you know why metadata are in the documents available for indexing on your Web site
  • Do you know how value added indexing in a dataspace can expand the access to a document in an often unrelated context
  • Do you know where metadata are in a document, in a Web page or other containing housing the document, or in the dataspace created for the information objects?

If not, you will want to dig up this information yourself. Asking your attorney will result in a very large legal bill. One final question: Do you think Mr. Madoff knows about his metadata?

Stephen Arnold, January 8, 2009

Non-Techies and Metadata

January 8, 2009

The metadata quandary for legal eagles will stick like Kentucky mine run off. If you want to make sure your Word documents are metadata free, you will want to read “How to Remove the Hidden Metadata from Word Document” here. A slightly more interesting exercise is to aim a search engine’s content acquisition system at shared folders and browse what the spider catches in its digital web. If you think metadata are a liability, check out the goodies you harvest. Download any desktop indexing system that can access your network shares. Now you know why eDiscovery is so important and often quite interesting for those paid to pour through metadata.

Stephen Arnold, January 8, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta