Google Forms: A Data Snout for a Bigger Creature
April 12, 2008
Navigate to Google’s Webmaster Central Blog. Scan the posting written by two wizards whom you probably don’t know, Alon Halevy (senior wizard) and Jayant Madhavan (slightly less senior wizard). Here’s what you will be told in well-chosen, Googley prose:
In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the Web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.
The idea is that dynamic content does not usually appear in an index. On the public Internet, this type of content is useful to me. For example, when I want to take a Southwest flight, I have to fill in some annoying Southwest forms, fiddle with drop down boxes, and figure out exactly which fare is likely to let me sit in one of the “choice” seats by boarding first. Wouldn’t it be great to be able to run a query on Google, see the flights aggregated, and from that master list jump to the order form? Dynamic content is now becoming more common.
I heard from one wizard at a conference in London that dynamic content is now more than half of the content appearing on the Web. The shift from static to dynamic is, therefore, a fundamental change in the way Web plumbing works on Web log content management systems to the sprawling craziness of Amazon.com.
A diagram from Dr. Guha’s patent applications with the Context Server shown in relation to the other parts of the PSE. This is a figure from Google Version 2.0: The Calculating Predator, published by Infonortics, Ltd., Tetbury, Glou. in July 2007. Infonortics holds the copyright to this study and its contents.
The Importance of Being First
April 11, 2008
Alex Moskalyuk’s Web log contained a posting on April 10, 2008, that asserted “68 percent of search engine users click on the first page of results.” The story appeared in his Web log on Ziff-Davis’ ZDNet.com site. These data can be tough to find after a few days. Please, access the story and capture the data, which are from iProspect, a unit of the Aegis Group.
I am skeptical of usage data from Internet consultancies and search engine optimization companies. With that caveat in mind, the iProspect data reveal a significant trend in search system user behavior. Specifically, over time–if the data are accurate–users click on the first page of results only. The chart below illustrates this trend:
The top line is climbing, and it means that almost half of the users on Web search systems click on the first page of results. No real surprise, I suppose. The two other lines underscore the fact that fewer and fewer users are working through laundry lists of results. If these data are accurate, information on any other than the first page is not likely to get reviewed by a user.
What’s this mean for enterprise search (sometimes called Intranet search or behind-the-firewall search)? Users won’t spend much time looking for information if it is not slapped in front of their face. Key word search in organizations is generally a push cart filled with items that may or may not be pertinent to the employee’s query. If consumer behavior carries over to enterprise searchers, any system that takes a query such as “Acme proposal” and generates lists of results is going to be annoying.
Enterprise search system users need information to do their jobs, so the laundry list is almost a cinch to be more work than hunting for the needed information in other ways.
The iProspect data have another hook for me. As more young people enter the work force, Web behaviors are going to color their expectations of online search in their employer’s organization. Faced with laundry lists when Google and Microsoft personalize results, using probabilities to deliver a best guess about what’s needed by a particular person, traditional search systems in an enterprise are going to attract fewer and fewer enthusiastic users.
With the attention reports about deep-seated dissatisfaction about traditional enterprise search and content processing systems becoming more widely known, Mr. Moskalyuk’s Web log has provided another chunk of suggestive, interesting data. More details about enterprise search are needed, but in the search business, we have to take what the vendors provide. Like it or not.
Stephen Arnold, April 11, 2008
ArnoldIT.com Headquarters
April 10, 2008
ArnoldIT.com is delighted to announce that it has moved to new headquarters in Harrod’s Creek, Kentucky.
In response to two questions about the location of Harrod’s Creek, ArnoldIT.com has released a photograph of its spacious, state-of-the-art offices.
Harrod’s Creek is one of North America’s high-technology centers. Our staff filters enterprise search news to separate the goose feathers from the giblets. Contact ArnoldIT.com by write sa at arnoldit.com.
Stephen Arnold, April 10, 2008
Gartner and the GOOG: Is Google Failing in the Enterprise?
April 10, 2008
The Ziff Davis / eWeek story stopped me in my tracks. Chris Boulton, a fine, fine journalist, wrote a story the ZD editors called “Gartner: Google Doesn`t Understand the Enterprise”. (Read this story before it disappears from the eWeek Web site.) The hook for the piece is a Gartner professionals’ assertion that:
Google Apps is like a “fog rolling into the harbor,” permeating businesses quite possibly at the expense of Microsoft and IBM.
Allegedly Gartner pundit Tom Austin asserted that
Clients are calling us about GAPE [Google Apps Premier Edition],” Austin said. ‘They will use it as a bat to beat Microsoft or IBM to make them lower the cost of their software.’
The remarks appears to orginate in a talk at the Gartner Symposium ITxpo on April 9, 2008. The most telling part of this article, if Mr. Boulton heard correctly is:
In a line of reasoning echoing Microsoft Chairman Bill Gates’ claims that Google doesn’t understand businesses’ needs, Austin said that Google doesn’t understand the enterprise. It is not that the company can’t, he said, it is that Google doesn’t care to understand the enterprise. For example, while Microsoft and IBM offer customers five-year roadmaps under non-disclosure agreements, Google’s roadmap is one day at a time.
If true, Gartner must know a great deal more about Google’s enterprise success than I do. My sources tell me that Google is struggling to stay on top of the wave of success with its map and geo-spatial services. Google is reacting to customer requests, at least in the US government sector from what I hear from those familiar with canvas cubes in Washington, DC. My research about enterprise search revenues indicates that Google now has more than 9,000 licensees of its Google Search Appliance. This product generated somewhere around $350 to $400 million in calendar 2007 and is growing at double digit rates. The various applications, enhanced email, and messaging functions are pulling inquiries as well. In short, the Google is disrupting the traditional enterprise market on several fronts. Google lets customers pull Google to them. Google doesn’t push for sales like most enterprise software vendors.
My hunch is that Google’s “fog-like” behavior translates to sour grapes because Google is somewhat reluctant to shovel cash into the maw of the high-end IT consultancies for guidance. Google’s reliance on “pull” tactics is challenge for some traditional consulting firms like Booz, Allen & Hamilton where I worked . Google has plenty of wizards and gurus on staff. If a pundit is Googley, that consultant will probably work for Google. This is a difficult concept for some for-hire experts to accept. But that’s just my interpretation of the matter.
I think Mr. Boulton got the story right. Could it be that Gartner doesn’t understand Google?
Stephen Arnold, April 10, 2008
Absolutes and Electronic Information
April 9, 2008
I find the research for my work fascinating. Periodically I root through some of the PDFs and PowerPoints used in my public talks.
Information in 2001
Today, while consolidating some information from a soon-to-be-retired NetFinity 5500, I came across a presentation I made to the legal information giant, Lexis Nexis, in year 2001.
The presentation sure didn’t win me any buddies in this $1 billion a year unit of the Euro-giant Reed Information. Reed, like the Thomson Corporation, maintains a low profile. Most people are unaware of what these two professional publishing companies do for a living, and I am not going to tell you that. You will have to figure it out for yourself.
My talk was given at some golf resort, and I don’t golf. I sat on my tail feather and waited to deliver my talk, which I titled “Information Professionals and In-Phase Services”. The main idea behind the talk was that anyone who used information for a living (lawyers, consultants, intelligence officers, and financial analysts) wanted current information in the context of their work.
The idea of stopping one thing to go ferret out a missing piece of information is growing long in the tooth. No, “long in the tooth” is too gentle even seven years after I wrote this presentation. Stupid, ill-advised, crazy, dumb — these are much more appropriate words. In year 2000, it was obvious — based on my research — that savvy users of information wanted information from one screen or dashboard. Furthermore that information should be [a] comprehensive, [b] current or fresh, and [c] in a form that allowed it to be cut-and-pasted or recycled without annoying manual reformatting.
I used this quote from Emily Dickinson to catch the crowd’s attention: “The truth must dazzle gradually / Or every man be blind…” No one knew what the heck I was talking about. To help the audience along, I used this chart from Forbes Magazine, October 2, 2000:
The point of this study is that humans–more than two thirds of them in 2000–want fixed points in their lives. The notions of change, flux, transformation made people uncomfortable. The chart did little to win my audience’s confidence in my talk because I then told the group, “Absolutes are rarely found when we talk about electronic information.”
Search Hoops: Exercising Technology to Meeting User Needs
March 29, 2008
A “hoop” is a circular that binds a barrel’s staves together. A “hoops” has a more informal meaning; the word is a synonym for basketball. In Kentucky, you say, “The Louisville Cardinals shoot serious hoops”. This sentence won’t make much sense in Santiago, Chile, but it does at the local gas station.
Search “hoops” are different. These are technical spaces that make it possible for a person to look for information. The figure below shows a series of search hoops. I want to take a few minutes to talk briefly about each of these with particular emphasis on their relationship to behind-the-firewall search. As you know, I think the term enterprise search is essentially valueless. It’s become an audible pause mouthed by vendors of many shapes and sizes. When I hear it, I’m baffled. Truth be told, most of the vendors who use the term enterprise search don’t know what it means. The job of explaining its meaning is left to the pundits and mavens who earn a living blowing smoke to explain fuzziness. Visibility and comprehension hit the two to four inch range.
This is a diagram from a report I wrote for a company silly enough to pay me for an analysis of the online search-and-retrieval trends in the period 1975 to 2003. I have an updated version, but that’s something I sell to buy my beloved boxer dog Tyson Kibbles and Bits.
© Stephen E. Arnold, 2002-2008
Please, click on the image so you can read the textual annotations to each of the rings. I’m not going to repeat the information in the diagram’s annotations. I will related these “hoops” to the challenge of behind-the-firewall search.
Northern Light: A New Business Information Search Service
March 27, 2008
Northern Light has made a free business information search services. You can try it yourself at www.nlsearch.com. Search and browse are free, but you will have to pay to access certain content. A day pass is priced at about $5.00 and enterprise licenses are available.
Northern Light, in the mid-1990s, offered a somewhat similar service. The company received an infusion of capital from Reuters in 1999. By 2002, the company had become part of the now-defunct divine Interventures. Northern Light is once again a self-standing company. David Seuss, the former consultant who founded the firm, is once-again running Northern Light.
Northern Light was one of the first search systems to enhance its results list with folders grouping similar results. More information is available from the Northern Light Web site. Information Today’s Paula Hane’s story has additional details about the service here.
Stephen Arnold, March 27, 2008
Search: A Kitchen Sink and the Carcassonne Problem
March 25, 2008
As I worked on my keynote for the upcoming Buying and Selling eContent Conference in April 2008, I flipped through PowerPoint decks in search of examples. I came across a presentation I delivered in the summer of 2006. In that talk, I described behind-the-firewall search as following an interesting trajectory. Humans have a tendency to elaborate, embroider, and complicate.
Let me give you an example. My mother and father recently moved from their home to a condominium-style dwelling. The “space” was a blank canvas. After a year, I noticed that the white space was filled in. Some of the objects were family mementos like the hand-carved ebony elephant that has been in the Arnold family for a century. But other acquisitions were plaques identifying my mother as a “red hat lady”. My father had taped instructions for replacing the cartridge in his printer next to his flat panel monitor. In short, the white space was being filled in.
I noticed a similar “stuffing” when I was in Carcassonne, the walled city in Aude. Every square inch inside the city walls had been put to use. Read more
Civita: The Paradox of Disintermediation
March 19, 2008
In December 2007, Antonio Maccanico, director, Associazione Civita in Rome, Italy, asked me to contribute an essay to a forthcoming publication focused on new information and communications technology. The full essay will not appear in print until later in 2008, but I wanted to highlight several of the points in my essay because each is germane to the tumultuous search and content processing sector. When the Italian language publication becomes available, I will post a link to the full text of my essay “Open Access and Other New Communication Technology Plays: The Temptation and Paradox of Disintermediation Roulette”.
First, the title. One of the issues that arises when a new search or content processing technology becomes available is its usefulness. Few vendors assert that their newest system brings numerous benefits to a licensee, user, or business partner. A positive, optimistic outlook is one of the essentials of mental health. However, I’ve learned to be conservative when it comes to benefits. This essay of Associazione Civita reminds the reader that many new information technologies are powerful disintermediators.
Disintermediation means cutting out the middle man or woman. If it is possible to buy something cheaper direct from manufacturer, many people will. The savings can be a few pennies or orders of magnitude. Information technology disintermediates. In my experience, this is a categorical affirmative. The benefit of information technology — particularly search and content processing — is that it creates new opportunities. We are in the midst of a information discontinuity. Publishers — classic intermediaries between authors and readers — are learning about disintermediation as I keyboard this summary. Libraries continue to struggle with disintermediation as student rely on Google, not reference books for research. The paradox, then, is that dislocation is inevitable. So far, the information revolution has created more opportunities overall. Users are “winners”. Some entrepreneurs are “winners”. Some traditional operations are trying to adapt lest they become “losers”.
Second, the core of my argument in this essay for Associazione Civita boils down to three issues. Let’s look at each briefly. Please, appreciate that I am extracting a segment from a 20 – page essay:
- Web sites, Web services, and Web applications do not guarantee success. In fact, inexperience or bad decisions about what to “Web – ify” can drag an organization down, and, in terms of revenue, plunge the operation into the red. Therefore, significant effort is required to create a browser experience that attracts users and continues to build usage. The costs of development, enhancements, and sales are often far greater than expected. In terms of search and content processing, customers learn (often the hard way) that there is neither money nor appetite for making the system perform as advertised. I see no change in this paradoxical situation. The more you want to do with content, the farther behind you fall.
- Information on its own won’t ensure success. Users are now savvy when it comes to access, interface, ease of use, and clarity. I learned yesterday about a new search system that uses the Apple iPhone “flipping page” metaphor to display search results. A list of relevant results in the view of the venture firm pumping millions into this start up is that interface, not relevance, is as important as clever algorithms. I never thought I would say this, but, “I agree”. A flawed user experience can doom a superior search and content processing system within 30 seconds of a user’s accessing the service.
- Assumptions have to be verified with facts. Echoing in my mind is a catch phrase from someone in either President Reagan’s or President Clinton’s administration. The catch phrase is, “Trust but verify”. One of the twists in the information world is that the snazzier the demonstration, the greater the gullibility factor. A “gullibility factor” is a person’s willingness to accept the demo as reality. Assumptions about what search and content processing can do contribute to most information retrieval project failures. We stop at “trust” and leap frog over “verify”.
What happens when a system works well? What takes place when an entrepreneur “invents” a better mouse trap? What takes place when senior management uses a system and gets useful results quickly and without an engineer standing by to “help out the suit”?
Disintermediation. When systems work, the likelihood of staff reductions or business process modification goes up. The idea is that software can “reduce headcount”.
This issue is particularly sensitive for libraries, museums, academic institutions, and certain citizen – facing services. The more effective a system is, the easier it is to justify marginalizing certain institutions, people, and manual work processes. As we pursue ever more potent search and content processing, keep in mind that the imperative of disintermediation follows closely behind.
Stephen Arnold, March 19, 2008
Googzilla’s Assault and Old Media’s Nuclear Winter
March 15, 2008
Last summer, I gave a short talk to the Board of Directors of one of the US’s major news gathering organizations. I knew what to expect. I had worked for Barry Bingham Jr., owner of the once-prestigious Courier Journal & Louisville Times Co. After I left the Courier Journal a year or so after Gannett bought the Louisville mini-conglomerate, I worked for Ziff Communications. Bill Ziff was a media sensation. He created niche magazines, building properties to a peak, and then — almost without fail — selling them at a premium price. Mr. Ziff was not “old media”; he was among the first to be a “new media” innovator. Health forced changes at Ziff, and his empire was sold. His leaving the New York publishing sector was, I believe, a respite for news and publishing companies against which he and his staff waged commercial warfare.
Old Media Wants to Understand “the Google”
Last year, I was entering the sacred confines of “old media” in the Big Apple. My task was to participate in one of those interview-style presentations so much in vogue. A polished “old media” journalist was going to pepper me with questions. To make the interview more spontaneous, I was not given the questions in advance. Goodness, I had to match wits with an “old media” alchemist. I knew only that the subject was Google, an annoyance that continued to challenge the hegemony of “old media”. Even though I was tainted by my association with Mr. Ziff, I assumed that this tweedy group perceived me as sufficiently tame to “explain” what Google was doing to “old media’s” revenues.
Google — then as now — won’t answer my questions. I’m a lone goose consultant who his wings reading Google technical papers, watching dull Google videos by Googlers explaining another technical facet of Google, and reading patent applications filed by Google’s equally quotodian attorneys. But to the dewans (an Indian prince) in the audience, I was marginally less annoying than real Googlers.
The “interview” clanked over well-worn cobble stones. The leitmotif was advertising, specifically money that Google was “taking” from old media. Each question fired at me, meant, “How can these guys make so much money doing online things?” Pretty sophisticated questions, right?
The Business Model Problem
Newspapers and magazines sell advertising. Subscriptions contribute a modest amount to the bottom line. Historically, each “new medium” allows ad revenue to flow from “old media” (Point A) to “new media” (Point B). Radio sucked money, so newspapers like the Courier Journal got into the radio business. When I joined the Courier Journal, the newspaper owned AM and FM radio stations. When TV came along, more ad “blood” flowed. The Courier Journal bought a television station. When databases came along, the Courier Journal entered the database business. Most “old media” watched the Courier Journal’s online antics from the sidelines. Google has figured out the shortest distance from Point A to Point B.
Revenue from commercial online in the 1980s did not come from advertising. Customers paid for access, dealing with specialized timesharing companies. Ziff entered the online business in three ways. We offered electronic products tailored to information technology professionals. Our sales force called on companies and sought licensing deals. Ziff also entered the commercial database business, quickly becoming by the late 1980s one of the dominant players in full text and reference databases for libraries. And, we also struck a deal with CompuServer and created a Ziff branded online service called ZDNet. Both the Courier Journal and Ziff worked long and hard to make money from online. It’s a tribute to the professionals in the Courier Journal’s online business and to Ziff’s various electronic publishing units that both organizations generated positive cash flow.
Google’s Secret
Google, on the other hand, had something that my colleagues at the Courier Journal and Ziff Communications lacked. No, it wasn’t smart people. No, it wasn’t better programmers. Google had a business model “borrowed” from Yahoo – Overture and the burgeoning online “environment” generally described as the Internet. By 2004, when Google went public, Google’s business model and the exploding user base of “the Internet” ignited Google’s online advertising business.
In less than three years, Google had poked its business model into numerous nooks and crannies of “old media”. Unlike the tame online services of the 1980s, Google’s approach was operating “off the radar” of “old media”. Old media used traditional ad sales mechanisms. Old media thrived on an inner circle of competitors who behaved in a polite, club-like environment. Old media called the shots for advertisers, who by and large, had no choice but to deal with “old media” on “old media’s” terms.
Not Google. Google allowed anyone to buy an ad online. Anyone could sidestep “old media” and traditional advertising rules of engagement. More disturbing, other companies were looking at Google’s business model and trying to squeeze money from electronic ads. None of these competitors played by the rules crafted over decades of “old media” fraternizing. Disintermediation, the Internet, and a better business model — these are the problems “old media” has to resolve and quickly.
So, there I was. I answered questions about Google’s ad business. I answered questions about Google’s technical approach to online. I answered questions about Google’s arrogance. I don’t think the interviewer or the audience found my answers to their liking. I can say in retrospect that nothing I said about Google made any sense to these “old media” types. I could have been speaking Thagojian, and the effect would have been the same. “Old media” didn’t understand what the Courier Journal did in 1980, what Ziff did in 1990, or what Google was doing in the early years of the 21st century.
Watching the Money Disappear
Why am I dragging you through history and this sordid tale of my sitting under hot lights answering questions about a company that won’t answer my email? This post caught my attention this morning (6 am, March 15, 2008) in the Charlotte Airport: Google Sucks Life Out of Old Media: Check Out The 2007 Share Shift by Henry Blodgett.
The gist of Mr. Blodgett’s Web log post is: “The year-over-year growth of revenue on Google.com (US)–approximately $2 billion–was more than twice as much the growth of ad revenue in all of the offline media companies in this sample combined. This is such an amazing fact that it bears repeating: A single media property, Google.com (US), grew by $2 billion. All the offline media properties owned by the 13 offline media companies above, meanwhile–all of them–grew by about $1 billion.”
What this means is that “old media” are going the way of the dodo unless the “old media” club gets its act together. One of the more controversial statements I made to the dewans in their posh NY digs was, “Surf on Google.” The idea is simple. Google is the big dog in the advertising kennel. Instead of watching Googzilla eat your lunch, find a way to harness Google. I use the phrase Surf on Google to connote sitting down and figuring out how to create new revenue using Googzilla as an engine, not an enemy.
Problems with Newspapers
I was speaking some unintelligible language to these “old media” dewans. Even old dinosaurs like me listen to an iPods, read news on my mobile device, and often throw out unopened the print newspapers I receive each day. Why don’t I look at these traditional news injection devices? Let me count the ways:
- Courier Journal. It just sucks. The recycled news is two or three days “old” when it hits print. I get current information via RSS, Google News, Yahoo News, and the BBC Web site, among others.
- Financial Times. I get a paper three days out of six. This outfit can’t work out delivery to Harrods Creek despite its meaty price tag.
- New York Times. I look at the Monday business section and maybe flip through the paper. I no longer spend an hour with the Sunday New York Times.
- USA Today. I look at McPaper’s weather map and scan its TV grid to see if the History Channel is running the “Naked Archaeologist,” my current must-see program.
- Wall Street Journal. I scan the headlines and check out the tech information. The banks for which I work expect me to know what the Journal says but I’m not sure my clients read the paper very thoroughly any more. Online is easier and quicker.
People in my son’s cohort spend less time with “old media” than I do. When I sit in airports, I watch what college students do. My sample is small, but I don’t see many 20-somethings immersed in “old media”. If you want to understand what young people do for news, embrace ClickZ stats. It’s free and useful.
I find encouraging that the Wall Street Journal, the New York Times, and the Financial Times “reinvent” their Web sites — again and again. But the engine of “old media” is advertising, and no spiffy Web site is going to make up for lost ad revenue.
Did my statement in June 2007 “Surf on Google” have an impact? Not that I can see. “Old media” are building a fort out of dead trees, traditional technology, and battle tactics used by cities besieged by Alexander the Great. The combatant — Google — is armed with nuclear weapons and is not afraid to use them.
For “old media” Mr. Blodgett’s summary of the financial devastation is confirmation that “old media” now finds itself suffering nuclear winter. There are some fixes, but these are not easy, not comfortable, not traditional, and not cheap. I’m glad I’m in the sunset of my career and no longer sitting in meetings trying to figure out how to out-Google Google. Innovation not contravallation is needed. Can “old media” respond? I’m betting on Google and its progeny.
Stephen Arnold, March 15, 2008