Google Forms: A Data Snout for a Bigger Creature
April 12, 2008
Navigate to Google’s Webmaster Central Blog. Scan the posting written by two wizards whom you probably don’t know, Alon Halevy (senior wizard) and Jayant Madhavan (slightly less senior wizard). Here’s what you will be told in well-chosen, Googley prose:
In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the Web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.
The idea is that dynamic content does not usually appear in an index. On the public Internet, this type of content is useful to me. For example, when I want to take a Southwest flight, I have to fill in some annoying Southwest forms, fiddle with drop down boxes, and figure out exactly which fare is likely to let me sit in one of the “choice” seats by boarding first. Wouldn’t it be great to be able to run a query on Google, see the flights aggregated, and from that master list jump to the order form? Dynamic content is now becoming more common.
I heard from one wizard at a conference in London that dynamic content is now more than half of the content appearing on the Web. The shift from static to dynamic is, therefore, a fundamental change in the way Web plumbing works on Web log content management systems to the sprawling craziness of Amazon.com.
A diagram from Dr. Guha’s patent applications with the Context Server shown in relation to the other parts of the PSE. This is a figure from Google Version 2.0: The Calculating Predator, published by Infonortics, Ltd., Tetbury, Glou. in July 2007. Infonortics holds the copyright to this study and its contents.
Enterprise Search: Disappointing and Annoying Users
April 11, 2008
Sinequa Shines a Bright Light in Enterprise Search’s Darkest Corner
Sinequa published the results of a survey of 200 users of enterprise search (sometimes called Intranet search or behind-the-firewall search). Users are dissatisfied and work in “information grave yards.”
Enterprise search is different from Web search. A Web search system such as those available from Google.com or Yahoo.com index content on the public Internet. Enterprise search indexes information that an organization has on its own servers. The differences boil down to search technology itself. What works on billions of Web pages is not well suited for the content on an organization’s servers. Sure, there are some gross similarities, but security, the diverse nature of the content, and the existence of many versions of certain documents require special functions generally not implemented in a public search system such as Microsoft’s search.live.com. Microsoft, in an effort to get technology suitable for enterprise search, is paying $1.2 billion for Fast Search & Transfer, a company that has asserted leadership in enterprise search. Fast Search’s executives have 1.2 billion reasons to make that claim.
The enterprise sector is an intensely competitive business sector. The definition of the word search itself is fluid and subject to many different shades of meaning. In the last five years, key word retrieval–typing one or two words into a search box–has expanded to embrace point-and-click interfaces. Here’s a point-and-click interface available from Yahoo. Notice the search box. But the most important parts of the Web page are those that contain suggestions, hot links, and information directly germane to a user. In days gone by, this would have been called a portal. Not today. Now these assisted navigation interfaces are called search.
These new interfaces have a large number of moving parts. You can see some of the plumbing in the illustrations accompanying these Entopia and Sagemaker business case analyses published on this Web log. Enterprise search means hugely complex systems that find themselves a cross between a digital Swiss Army knife and a computerized information factory. This combination of very specific tools and a huge, sprawling technical infrastructure make many of these systems expensive and complicated.
Until the Sinequa study, which corroborated the findings in my new study Beyond Search: What to Do When Your Enterprise Search System Doesn’t Work, few people were aware of the growing dissatisfaction with enterprise search systems. A quick review of the analyst reports from well-known pundits and the marketing collateral of competitors rarely talk about dissatisfied users. Vendors like Autonomy and Google may snipe at one another, none of the vendors hint at their systems or even their competitors irritating users, creating dissatisfaction, and forcing employees to find their own search solutions by ordering a Google Search Appliance and saying, “My department will index its own information, thank you.”
Google App Engine: Googzilla’s Slow, Small Baby Steps
April 10, 2008
Google is reasonably transparent–if you know the angle from which to observe the company. Viewed head on, Google sells only ads, generates billions. On one hand, the company races forward with betas, new engineering initiatives, and acquisitions. On the other hand, Google goes ever so slowly in converting its technological advantages into cold, hard cash.
Wall Street looks at the billions in ad revenue. Notes the $400 million from enterprise sales. End of story. Most Google pundits follow the same track. Its Google Search Appliance is not given much respect by the 150 companies competing in the search and content processing market. The scores of products available are quirks or footnotes to the larger story of Google’s ad revenue. Read more
Absolutes and Electronic Information
April 9, 2008
I find the research for my work fascinating. Periodically I root through some of the PDFs and PowerPoints used in my public talks.
Information in 2001
Today, while consolidating some information from a soon-to-be-retired NetFinity 5500, I came across a presentation I made to the legal information giant, Lexis Nexis, in year 2001.
The presentation sure didn’t win me any buddies in this $1 billion a year unit of the Euro-giant Reed Information. Reed, like the Thomson Corporation, maintains a low profile. Most people are unaware of what these two professional publishing companies do for a living, and I am not going to tell you that. You will have to figure it out for yourself.
My talk was given at some golf resort, and I don’t golf. I sat on my tail feather and waited to deliver my talk, which I titled “Information Professionals and In-Phase Services”. The main idea behind the talk was that anyone who used information for a living (lawyers, consultants, intelligence officers, and financial analysts) wanted current information in the context of their work.
The idea of stopping one thing to go ferret out a missing piece of information is growing long in the tooth. No, “long in the tooth” is too gentle even seven years after I wrote this presentation. Stupid, ill-advised, crazy, dumb — these are much more appropriate words. In year 2000, it was obvious — based on my research — that savvy users of information wanted information from one screen or dashboard. Furthermore that information should be [a] comprehensive, [b] current or fresh, and [c] in a form that allowed it to be cut-and-pasted or recycled without annoying manual reformatting.
I used this quote from Emily Dickinson to catch the crowd’s attention: “The truth must dazzle gradually / Or every man be blind…” No one knew what the heck I was talking about. To help the audience along, I used this chart from Forbes Magazine, October 2, 2000:
The point of this study is that humans–more than two thirds of them in 2000–want fixed points in their lives. The notions of change, flux, transformation made people uncomfortable. The chart did little to win my audience’s confidence in my talk because I then told the group, “Absolutes are rarely found when we talk about electronic information.”
Time: Your Search Infrastructure’s Deadliest Enemy
April 8, 2008
Look at these two diagrams.
The first diagram–the one with a single red arrow–represents the volume of content your search and content processing system must process. Unless you work in a very unusual organization, you and your colleagues produce digital information. Let’s confine our discussion to text, but keep in mind that your organization will also produce digital images, audio, and video. At some point your search or content processing system will have to deal with these types of content as well.
The story in the chart with the single error is simple: each day the amount of content increases. The content is very distinct, and it falls into one of three broad categories. Some content is original; that is, there is no previous version or draft. You or one of your colleagues generates an original document and stores it on her computer. If your colleague is a tele worker, she might upload the document to your company’s server.
Beyond Search Published by the Gilbane Group
April 7, 2008
Stephen E. Arnold’s newest search and content processing study is now available for purchase. Beyond Search: What to Do When Your Enterprise Search System Doesn’t Work is an electronic book in Adobe PDF format. is available for purchase and immediate download.
This study contains a discussion of how-to’s, so you can fix your broken enterprise search system. You will also find a detailed discussion of today’s market for search and content processing systems, profiles of 24 vendors (many from outside the U.S.), and a plain-talk glossary that cuts through the verbal fog characterizing much of the analysis of search and content processing.
Enterprise Search–A Problem Reaching Its Boiling Point
Mr. Arnold, author of the first three editions of the Enterprise Search Report and the 2007 monograph Google Version 2.0, reveals actionable information about remediating “broken” enterprise search systems. He said, “Most vendors and IT professionals won’t talk openly about their users’ satisfaction with the incumbent search-and-retrieval system. The reason is that as many as two-thirds of a system’s users are dissatisfied with that system.”
Niche-ization: A New Source of Revenue Oxygen
April 6, 2008
A One-Minute, One-Act Comedy
Mise en scène. An industrial wasteland in Silicon Valley or a new Euro building in Stockholm.
Set up: a conference room. Nothing flashy. No windows. Plugs and Ethernet cables in a jumble under the table. So-so lighting. A screen. A projector. A plug seeking a video connection. The whisper of air conditioning.
The money folks: 30 to 50 year old guys, casual lux, recently groomed hair and nails, movie star smiles. Testosterone. Mucho testosterone.
The techno-serfs: 20 to 30 year old engineers. Mostly thin. Laptops. Dark clothes. A few hippie-dippie hair doos. Earnest. Serious and very earnest. Less testosterone.
The agenda: Generating revenue.
The testosterone-energized VC speaks,
Money person 1: “Hey, guys, let’s get started. The agenda today is pretty simple. Let me give you some back ground and then let’s dive right in?”
Money person 2: “Right. Right?”
Techno-serfs in unison: “Okay. Sounds good.”
Money person 3: “Let me do the background, guys. We put in $1.2 million over the last three months. You guys pulled out $400K a month. That sound right.”
Serf 1: “I think we are running under that figure. Maybe we’re at $185K this month and will use the rest for payroll next week.”
Money person 3: “Like I said, You have burned $1.2 million.”
Money person 1: “May I ask a question?”
Everyone in unison: “Sure”.
Money person 1: “What are you guys going to do to generate some revenue?”
Money person 2: “Let me rephrase that, ‘How are we doing relative to the plan?”
Money person 3: “Screw the plan. What are you guys going to do to make some sales. Generate revenue.”
Serf 2: “We’re in beta.
Serf 3: “We have some great leads.”
Serf 1: “The competition sucks, man.”
Money person 2: “Forget leads. Produce revenue or we’re cutting off your oxygen.”
Money person 3: “Let’s take a bio break, shall we?”
Curtain falls. House lights up.
Discussion of this Aeshylean Tragedy
Has this happened to you? If not, you’re lucky. If it has, you know what this means. The start up–what I have called the “serfs”–has not made any sales. The friendly venture capitalists–what I have called the “money person”–have shed their fraternity rush warmth. Beneath that cheerful Norman Vincent Peale veneer is the real VC illustrated below:
The Disappearing Middle: Liposuction for High-Profile Search Vendors
April 5, 2008
Last week I received a telephone call from a perky MBA working at a European investment bank. The caller exuded confidence about one of the major publicly traded vendors of enterprise search. She wanted to know if I as a contrarian would speak with her.
No problem. I enjoy being contrary–especially to perky young MBAs with exotic accent-tinged English.
I spoke with her. She was confident about her belief that a share price surge for one company was imminent or “coming round the mountain” as we say in Kentucky.
I gave her my opinion that her stallion was a donkey. Her favored company (which shall remain unidentified) had a poor track record in the money and technology races. Furthermore, no company–including her dark horse–was not going to change its performance record quickly, if at all. She thanked me and disconnected without so much as a “merci” with the cute ascending note native speakers add.
Now it’s two days after this call. I am trapped on an Air Canada flight. The seat back video flickers to life, and I see a documentary about weight loss. The hero (not a sandwich) weighed about 450 pounds. After months of effort, the hero ette was a mere slice of the person’s former self, weighing a lean, mean 200 pounds. Amazing what chemicals can do, I thought. I could see that the blubber around the middle was gone. I killed the annoying LCD and looked out at the frozen wasteland that makes northern Canada the inviting clime it is.
Inside the Tokamak, Part 3: The Green Spheres of Community
April 4, 2008
In the second part of this essay, I explored the notion of context. The short comings of key word search and retrieval are easy to identify once we think in terms of what the user needs to do his job or accomplish a task. But context is larger than a single user, context spills into other areas as well and it gains significance when interacting with messages and community.
We’re ready to tackle one of today’s hottest ideas–community. I loathe the term social software, but English is what it is, and I can’t figure common usage. I will stick with the word community, and you can substitute social software, so this essay seems more in step with the times. You can see where the community function sits in this schematic:
When the Internet was unknown to the auto mechanic, community, not technology, allowed Internet Protocol to work. The early Internet and its precursor the Advanced Research Projects Agency was for a nerdy in crowd. I was lucky. The University of Illinois in Chambana was a player in this game. But for all practical purposes, Internet access when I started college was for an elite group. Flash forward four decades, and the Internet is dependent on people communicating. The surge of interest in point-and-click services like MySpace.com and Facebook.com defines millions of people’s Internet experience.
Inside the Tokamak, Part 2: The Red Spheres of Context
April 3, 2008
In the first part of this essay, I drew a parallel between a tokamak device and plasmas. The idea is that in an organization, new technologies and increasing pressure to work smarter changes what users expect a search and retrieval system to deliver. In this second installment, we look at four additional digital ions and electrons that are “going critical” with regards to information access.
Let’s begin by revisiting the diagram, paying particular attention to the 12 spheres inside the diagram’s central “gray boundary”.
The outer two stacks of “yellow spheres” and “purple spheres” exert pressure on users, vendors, and organizations. As the individual yellow and purple spheres expand, the activity inside the “gray boundary” increases. When dealing with non-linear phenomena, it is difficult to predict what will give way and what will surge to dominance. There is considerable uncertainty within the “gray boundary”.
Perhaps you have experienced this yourself. In my work in the last five years, I have documented the increasing dissatisfaction users express about their search and retrieval systems. Some comments are delivered with hope: for example, “I wish the system would let me retrieve what I need regardless of which department has the data”. Other comments are more earthy, “Management has no idea how frustrated I am with this stupid system.” In my work in New York, I have seen 20-somethings staring at a search results display with frustration and anger clouding their otherwise pampered features.
You may want to click on the diagram to see the labels of the “red spheres” more clearly. As you recall, I prepared this diagram more than five years ago, so it is long in the tooth. But it serves as a useful starting point for our exploration of the forces transforming search from a nice-to-have function to a must-have service.
The Red Spheres
There are four “red spheres” in this stack of digital ions and electrons. As per my wont, I’ll comment on each briefly. To sum up this second installment, I want to offer some additional comments about the “search” sphere. The label for the “red spheres” is contextual. Read more