Indexing Dynamic Databased Content

April 20, 2008

In the last week, there’s been considerable discussion of what is now called “deep Web” content. The idea is that some content requires the user to enter a query. The system processes the querey and generates a search result from a database. This function is easier to illustrate than explain in words.

Look at the screen shot below. I have navigated to Southwest Airlines Web page and entered a query for flights from Louisville, Kentucky, to Baltimore, Maryland.

southwest form

Here’s what the system shows me:

southwest result

If you do a search on Google, Live.com, or Yahoo, you won’t see the specific listing of flights shown below:

southwest flight listing

Read more

Traditional Publishers: Patricians under Siege

April 19, 2008

This is an abbreviated version of Stephen Arnold’s key note at the Buying and Selling eContent Conference on April 15, 2008. A full text of the remarks is here.

Roman generals like Caesar relied on towers spaced about 3000 feet apart. Torch signals allowed messages to be passed. Routine communications used a Roman version of the “pony express”, based on innovations in Persia centuries before Rome took to the battlefield.

Today, you rely on email and your mobile phones. Those in the teens and tweens Twitter and use “instant” social messaging systems like those in Facebook and Google Mail. Try to Imagine how difficult it would be for Caesar to understand the technology behind Twitter. but how many of you think Caesar would have hit upon a tactical use of this “faster that flares” technology?

Read more

Data Bunny Unmasked

April 16, 2008

Earlier today, a well-paid, somewhat insightful senior executive ripped the fur off a 27 year charade. The keen investigative mind of the anonymous investigator revealed that the data bunny has been Stephen E. Arnold.

The shocking discovery dismayed the two known fans of Mr. Arnold. One chagrined client said:

We had no idea that Mr. Arnold was the data bunny. When he lectured at our company, we did not notice the ears. The information he conveyed was more important than his appearance. I’m not sure what he was wearing during the briefing. But now that the truth is revealed, we will not listen to his analyses if he wears those ears. I hope we don’t confuse substance and appearance again. Proper dress is more important than real information.

When Mr. Arnold learned that his secret was out of the hutch, he blinked his pink eyes and said, according to Donald Anderson, an engineer who has worked with Mr. Arnold for more than 15 years: “Those bunny ears are not funny. Mr. Arnold doesn’t wear them all the time or I just don’t notice them anymore.”

According to Mr. Anderson’, Mr. Arnold’s reaction was to stamp his paw and twitch his nose in frustration. Added Mr. Anderson, “I guess he thought the secret was safe. It’s sad. Almost like Lois Lane learning the identity of Superman. It’s sad, but the truth must come out.”

According to another member of the Beyond Search team, Mr. Arnold removed his bunny ears in disgust and slipped on his new Beyond Search rubber goose mask. A photograph of Mr. Arnold in his goose disguise is the basis of this Web log’s logo here.

Beyond Search will publish more details about this startling investigative discovery as they become available. Mr. Arnold’s attorney told Beyond Search, “Although the revelation is shocking, I have advised Mr. Arnold to not reveal the name of the genius who disclosed this 27 year old mystery.”

According to his attorney, Mr. Arnold’s final comment was, “Honk. Honk.”

Stephen Arnold, April 16, 2008

Google Forms: A Data Snout for a Bigger Creature

April 12, 2008

Navigate to Google’s Webmaster Central Blog. Scan the posting written by two wizards whom you probably don’t know, Alon Halevy (senior wizard) and Jayant Madhavan (slightly less senior wizard). Here’s what you will be told in well-chosen, Googley prose:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML. Having chosen the values for each input, we generate and then try to crawl URLs that correspond to a possible query a user may have made. If we ascertain that the Web page resulting from our query is valid, interesting, and includes content not in our index, we may include it in our index much as we would include any other web page.

The idea is that dynamic content does not usually appear in an index. On the public Internet, this type of content is useful to me. For example, when I want to take a Southwest flight, I have to fill in some annoying Southwest forms, fiddle with drop down boxes, and figure out exactly which fare is likely to let me sit in one of the “choice” seats by boarding first. Wouldn’t it be great to be able to run a query on Google, see the flights aggregated, and from that master list jump to the order form? Dynamic content is now becoming more common.

I heard from one wizard at a conference in London that dynamic content is now more than half of the content appearing on the Web. The shift from static to dynamic is, therefore, a fundamental change in the way Web plumbing works on Web log content management systems to the sprawling craziness of Amazon.com.

pse

A diagram from Dr. Guha’s patent applications with the Context Server shown in relation to the other parts of the PSE. This is a figure from Google Version 2.0: The Calculating Predator, published by Infonortics, Ltd., Tetbury, Glou. in July 2007. Infonortics holds the copyright to this study and its contents.

Read more

The Importance of Being First

April 11, 2008

Alex Moskalyuk’s Web log contained a posting on April 10, 2008, that asserted “68 percent of search engine users click on the first page of results.” The story appeared in his Web log on Ziff-Davis’ ZDNet.com site. These data can be tough to find after a few days. Please, access the story and capture the data, which are from iProspect, a unit of the Aegis Group.

I am skeptical of usage data from Internet consultancies and search engine optimization companies. With that caveat in mind, the iProspect data reveal a significant trend in search system user behavior. Specifically, over time–if the data are accurate–users click on the first page of results only. The chart below illustrates this trend:

PageClickData

The top line is climbing, and it means that almost half of the users on Web search systems click on the first page of results. No real surprise, I suppose. The two other lines underscore the fact that fewer and fewer users are working through laundry lists of results. If these data are accurate, information on any other than the first page is not likely to get reviewed by a user.

What’s this mean for enterprise search (sometimes called Intranet search or behind-the-firewall search)? Users won’t spend much time looking for information if it is not slapped in front of their face. Key word search in organizations is generally a push cart filled with items that may or may not be pertinent to the employee’s query. If consumer behavior carries over to enterprise searchers, any system that takes a query such as “Acme proposal” and generates lists of results is going to be annoying.

Enterprise search system users need information to do their jobs, so the laundry list is almost a cinch to be more work than hunting for the needed information in other ways.

The iProspect data have another hook for me. As more young people enter the work force, Web behaviors are going to color their expectations of online search in their employer’s organization. Faced with laundry lists when Google and Microsoft personalize results, using probabilities to deliver a best guess about what’s needed by a particular person, traditional search systems in an enterprise are going to attract fewer and fewer enthusiastic users.

With the attention reports about deep-seated dissatisfaction about traditional enterprise search and content processing systems becoming more widely known, Mr. Moskalyuk’s Web log has provided another chunk of suggestive, interesting data. More details about enterprise search are needed, but in the search business, we have to take what the vendors provide. Like it or not.

Stephen Arnold, April 11, 2008

ArnoldIT.com Headquarters

April 10, 2008

ArnoldIT.com is delighted to announce that it has moved to new headquarters in Harrod’s Creek, Kentucky.

In response to two questions about the location of Harrod’s Creek, ArnoldIT.com has released a photograph of its spacious, state-of-the-art offices.

Harrod’s Creek is one of North America’s high-technology centers. Our staff filters enterprise search news to separate the goose feathers from the giblets. Contact ArnoldIT.com by write sa at arnoldit.com.

arnoldithdq

Stephen Arnold, April 10, 2008

Gartner and the GOOG: Is Google Failing in the Enterprise?

April 10, 2008

The Ziff Davis / eWeek story stopped me in my tracks. Chris Boulton, a fine, fine journalist, wrote a story the ZD editors called “Gartner: Google Doesn`t Understand the Enterprise”. (Read this story before it disappears from the eWeek Web site.) The hook for the piece is a Gartner professionals’ assertion that:

Google Apps is like a “fog rolling into the harbor,” permeating businesses quite possibly at the expense of Microsoft and IBM.

Allegedly Gartner pundit Tom Austin asserted that

Clients are calling us about GAPE [Google Apps Premier Edition],” Austin said. ‘They will use it as a bat to beat Microsoft or IBM to make them lower the cost of their software.’

The remarks appears to orginate in a talk at the Gartner Symposium ITxpo on  April 9, 2008.  The most telling part of this article, if  Mr.  Boulton  heard  correctly is:

In a line of reasoning echoing Microsoft Chairman Bill Gates’ claims that Google doesn’t understand businesses’ needs, Austin said that Google doesn’t understand the enterprise. It is not that the company can’t, he said, it is that Google doesn’t care to understand the enterprise. For example, while Microsoft and IBM offer customers five-year roadmaps under non-disclosure agreements, Google’s roadmap is one day at a time.

If true, Gartner must know a great deal more about Google’s enterprise success than I do. My sources tell me that Google is struggling to stay on top of the wave of success with its map and geo-spatial services. Google is reacting to customer requests, at least in the US government sector from what I hear from those familiar with canvas cubes in Washington, DC. My research about enterprise search revenues indicates that Google now has more than 9,000 licensees of its Google Search Appliance. This product generated somewhere around $350 to $400 million in calendar 2007 and is growing at double digit rates. The various applications, enhanced email, and messaging functions are pulling inquiries as well. In short, the Google is disrupting the traditional enterprise market on several fronts. Google lets customers pull Google to them. Google doesn’t push for sales like most enterprise software vendors.

My hunch is that Google’s “fog-like” behavior translates to sour grapes because Google is somewhat reluctant to shovel cash into the maw of the high-end IT consultancies for guidance.  Google’s reliance on “pull” tactics is challenge for some traditional consulting firms like Booz, Allen & Hamilton where I worked . Google has plenty of wizards and gurus on staff. If a pundit is Googley, that consultant will probably work for Google. This is a difficult concept for some for-hire experts to accept. But that’s just my interpretation of the matter.

I think Mr. Boulton got the story right. Could it be that Gartner doesn’t understand Google?

Stephen Arnold, April 10, 2008

Absolutes and Electronic Information

April 9, 2008

I find the research for my work fascinating. Periodically I root through some of the PDFs and PowerPoints used in my public talks.

Information in 2001

Today, while consolidating some information from a soon-to-be-retired NetFinity 5500, I came across a presentation I made to the legal information giant, Lexis Nexis, in year 2001.

The presentation sure didn’t win me any buddies in this $1 billion a year unit of the Euro-giant Reed Information. Reed, like the Thomson Corporation, maintains a low profile. Most people are unaware of what these two professional publishing companies do for a living, and I am not going to tell you that. You will have to figure it out for yourself.

My talk was given at some golf resort, and I don’t golf. I sat on my tail feather and waited to deliver my talk, which I titled “Information Professionals and In-Phase Services”. The main idea behind the talk was that anyone who used information for a living (lawyers, consultants, intelligence officers, and financial analysts) wanted current information in the context of their work.

The idea of stopping one thing to go ferret out a missing piece of information is growing long in the tooth. No, “long in the tooth” is too gentle even seven years after I wrote this presentation. Stupid, ill-advised, crazy, dumb — these are much more appropriate words. In year 2000, it was obvious — based on my research — that savvy users of information wanted information from one screen or dashboard. Furthermore that information should be [a] comprehensive, [b] current or fresh, and [c] in a form that allowed it to be cut-and-pasted or recycled without annoying manual reformatting.

I used this quote from Emily Dickinson to catch the crowd’s attention: “The truth must dazzle gradually / Or every man be blind…” No one knew what the heck I was talking about. To help the audience along, I used this chart from Forbes Magazine, October 2, 2000:

absolutes

The point of this study is that humans–more than two thirds of them in 2000–want fixed points in their lives. The notions of change, flux, transformation made people uncomfortable. The chart did little to win my audience’s confidence in my talk because I then told the group, “Absolutes are rarely found when we talk about electronic information.”

Read more

Search Hoops: Exercising Technology to Meeting User Needs

March 29, 2008

A “hoop” is a circular that binds a barrel’s staves together. A “hoops” has a more informal meaning; the word is a synonym for basketball. In Kentucky, you say, “The Louisville Cardinals shoot serious hoops”. This sentence won’t make much sense in Santiago, Chile, but it does at the local gas station.

Search “hoops” are different. These are technical spaces that make it possible for a person to look for information. The figure below shows a series of search hoops. I want to take a few minutes to talk briefly about each of these with particular emphasis on their relationship to behind-the-firewall search. As you know, I think the term enterprise search is essentially valueless. It’s become an audible pause mouthed by vendors of many shapes and sizes. When I hear it, I’m baffled. Truth be told, most of the vendors who use the term enterprise search don’t know what it means. The job of explaining its meaning is left to the pundits and mavens who earn a living blowing smoke to explain fuzziness. Visibility and comprehension hit the two to four inch range.

This is a diagram from a report I wrote for a company silly enough to pay me for an analysis of the online search-and-retrieval trends in the period 1975 to 2003. I have an updated version, but that’s something I sell to buy my beloved boxer dog Tyson Kibbles and Bits.

searchhoopstrimmed

© Stephen E. Arnold, 2002-2008

Please, click on the image so you can read the textual annotations to each of the rings. I’m not going to repeat the information in the diagram’s annotations. I will related these “hoops” to the challenge of behind-the-firewall search.

Read more

Northern Light: A New Business Information Search Service

March 27, 2008

Northern Light has made a free business information search services. You can try it yourself at www.nlsearch.com. Search and browse are free, but you will have to pay to access certain content. A day pass is priced at about $5.00 and enterprise licenses are available.

Northern Light, in the mid-1990s, offered a somewhat similar service. The company received an infusion of capital from Reuters in 1999. By 2002, the company had become part of the now-defunct divine Interventures.  Northern Light is once again a self-standing company. David Seuss, the former consultant who founded the firm, is once-again running Northern Light.

Northern Light was one of the first search systems to enhance its results list with folders grouping similar results. More information is available from the Northern Light Web site. Information Today’s Paula Hane’s story has additional details about the service here.

Stephen Arnold, March 27, 2008

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta