Search Hoops: Exercising Technology to Meeting User Needs

March 29, 2008

A “hoop” is a circular that binds a barrel’s staves together. A “hoops” has a more informal meaning; the word is a synonym for basketball. In Kentucky, you say, “The Louisville Cardinals shoot serious hoops”. This sentence won’t make much sense in Santiago, Chile, but it does at the local gas station.

Search “hoops” are different. These are technical spaces that make it possible for a person to look for information. The figure below shows a series of search hoops. I want to take a few minutes to talk briefly about each of these with particular emphasis on their relationship to behind-the-firewall search. As you know, I think the term enterprise search is essentially valueless. It’s become an audible pause mouthed by vendors of many shapes and sizes. When I hear it, I’m baffled. Truth be told, most of the vendors who use the term enterprise search don’t know what it means. The job of explaining its meaning is left to the pundits and mavens who earn a living blowing smoke to explain fuzziness. Visibility and comprehension hit the two to four inch range.

This is a diagram from a report I wrote for a company silly enough to pay me for an analysis of the online search-and-retrieval trends in the period 1975 to 2003. I have an updated version, but that’s something I sell to buy my beloved boxer dog Tyson Kibbles and Bits.

© Stephen E. Arnold, 2002-2008

Please, click on the image so you can read the textual annotations to each of the rings. I’m not going to repeat the information in the diagram’s annotations. I will related these “hoops” to the challenge of behind-the-firewall search.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search | Comments Off on Search Hoops: Exercising Technology to Meeting User Needs

Northern Light: A New Business Information Search Service

March 27, 2008

Northern Light has made a free business information search services. You can try it yourself at www.nlsearch.com. Search and browse are free, but you will have to pay to access certain content. A day pass is priced at about $5.00 and enterprise licenses are available.

Northern Light, in the mid-1990s, offered a somewhat similar service. The company received an infusion of capital from Reuters in 1999. By 2002, the company had become part of the now-defunct divine Interventures. Northern Light is once again a self-standing company. David Seuss, the former consultant who founded the firm, is once-again running Northern Light.

Northern Light was one of the first search systems to enhance its results list with folders grouping similar results. More information is available from the Northern Light Web site. Information Today’s Paula Hane’s story has additional details about the service here.

Stephen Arnold, March 27, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general) | 1 Comment

Search: A Kitchen Sink and the Carcassonne Problem

March 25, 2008

As I worked on my keynote for the upcoming Buying and Selling eContent Conference in April 2008, I flipped through PowerPoint decks in search of examples. I came across a presentation I delivered in the summer of 2006. In that talk, I described behind-the-firewall search as following an interesting trajectory. Humans have a tendency to elaborate, embroider, and complicate.

Let me give you an example. My mother and father recently moved from their home to a condominium-style dwelling. The “space” was a blank canvas. After a year, I noticed that the white space was filled in. Some of the objects were family mementos like the hand-carved ebony elephant that has been in the Arnold family for a century. But other acquisitions were plaques identifying my mother as a “red hat lady”. My father had taped instructions for replacing the cartridge in his printer next to his flat panel monitor. In short, the white space was being filled in.

I noticed a similar “stuffing” when I was in Carcassonne, the walled city in Aude. Every square inch inside the city walls had been put to use. Read more

Written by Stephen E. Arnold · Filed Under Feature, Online (general), Search | 3 Comments

Civita: The Paradox of Disintermediation

March 19, 2008

In December 2007, Antonio Maccanico, director, Associazione Civita in Rome, Italy, asked me to contribute an essay to a forthcoming publication focused on new information and communications technology. The full essay will not appear in print until later in 2008, but I wanted to highlight several of the points in my essay because each is germane to the tumultuous search and content processing sector. When the Italian language publication becomes available, I will post a link to the full text of my essay “Open Access and Other New Communication Technology Plays: The Temptation and Paradox of Disintermediation Roulette”.

First, the title. One of the issues that arises when a new search or content processing technology becomes available is its usefulness. Few vendors assert that their newest system brings numerous benefits to a licensee, user, or business partner. A positive, optimistic outlook is one of the essentials of mental health. However, I’ve learned to be conservative when it comes to benefits. This essay of Associazione Civita reminds the reader that many new information technologies are powerful disintermediators.

Disintermediation means cutting out the middle man or woman. If it is possible to buy something cheaper direct from manufacturer, many people will. The savings can be a few pennies or orders of magnitude. Information technology disintermediates. In my experience, this is a categorical affirmative. The benefit of information technology — particularly search and content processing — is that it creates new opportunities. We are in the midst of a information discontinuity. Publishers — classic intermediaries between authors and readers — are learning about disintermediation as I keyboard this summary. Libraries continue to struggle with disintermediation as student rely on Google, not reference books for research. The paradox, then, is that dislocation is inevitable. So far, the information revolution has created more opportunities overall. Users are “winners”. Some entrepreneurs are “winners”. Some traditional operations are trying to adapt lest they become “losers”.

Second, the core of my argument in this essay for Associazione Civita boils down to three issues. Let’s look at each briefly. Please, appreciate that I am extracting a segment from a 20 – page essay:

Web sites, Web services, and Web applications do not guarantee success. In fact, inexperience or bad decisions about what to “Web – ify” can drag an organization down, and, in terms of revenue, plunge the operation into the red. Therefore, significant effort is required to create a browser experience that attracts users and continues to build usage. The costs of development, enhancements, and sales are often far greater than expected. In terms of search and content processing, customers learn (often the hard way) that there is neither money nor appetite for making the system perform as advertised. I see no change in this paradoxical situation. The more you want to do with content, the farther behind you fall.
Information on its own won’t ensure success. Users are now savvy when it comes to access, interface, ease of use, and clarity. I learned yesterday about a new search system that uses the Apple iPhone “flipping page” metaphor to display search results. A list of relevant results in the view of the venture firm pumping millions into this start up is that interface, not relevance, is as important as clever algorithms. I never thought I would say this, but, “I agree”. A flawed user experience can doom a superior search and content processing system within 30 seconds of a user’s accessing the service.
Assumptions have to be verified with facts. Echoing in my mind is a catch phrase from someone in either President Reagan’s or President Clinton’s administration. The catch phrase is, “Trust but verify”. One of the twists in the information world is that the snazzier the demonstration, the greater the gullibility factor. A “gullibility factor” is a person’s willingness to accept the demo as reality. Assumptions about what search and content processing can do contribute to most information retrieval project failures. We stop at “trust” and leap frog over “verify”.

What happens when a system works well? What takes place when an entrepreneur “invents” a better mouse trap? What takes place when senior management uses a system and gets useful results quickly and without an engineer standing by to “help out the suit”?

Disintermediation. When systems work, the likelihood of staff reductions or business process modification goes up. The idea is that software can “reduce headcount”.

This issue is particularly sensitive for libraries, museums, academic institutions, and certain citizen – facing services. The more effective a system is, the easier it is to justify marginalizing certain institutions, people, and manual work processes. As we pursue ever more potent search and content processing, keep in mind that the imperative of disintermediation follows closely behind.

Stephen Arnold, March 19, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Library automation, Online (general), Search | 1 Comment

Googzilla’s Assault and Old Media’s Nuclear Winter

March 15, 2008

Last summer, I gave a short talk to the Board of Directors of one of the US’s major news gathering organizations. I knew what to expect. I had worked for Barry Bingham Jr., owner of the once-prestigious Courier Journal & Louisville Times Co. After I left the Courier Journal a year or so after Gannett bought the Louisville mini-conglomerate, I worked for Ziff Communications. Bill Ziff was a media sensation. He created niche magazines, building properties to a peak, and then — almost without fail — selling them at a premium price. Mr. Ziff was not “old media”; he was among the first to be a “new media” innovator. Health forced changes at Ziff, and his empire was sold. His leaving the New York publishing sector was, I believe, a respite for news and publishing companies against which he and his staff waged commercial warfare.

Old Media Wants to Understand “the Google”

Last year, I was entering the sacred confines of “old media” in the Big Apple. My task was to participate in one of those interview-style presentations so much in vogue. A polished “old media” journalist was going to pepper me with questions. To make the interview more spontaneous, I was not given the questions in advance. Goodness, I had to match wits with an “old media” alchemist. I knew only that the subject was Google, an annoyance that continued to challenge the hegemony of “old media”. Even though I was tainted by my association with Mr. Ziff, I assumed that this tweedy group perceived me as sufficiently tame to “explain” what Google was doing to “old media’s” revenues.

Google — then as now — won’t answer my questions. I’m a lone goose consultant who his wings reading Google technical papers, watching dull Google videos by Googlers explaining another technical facet of Google, and reading patent applications filed by Google’s equally quotodian attorneys. But to the dewans (an Indian prince) in the audience, I was marginally less annoying than real Googlers.

The “interview” clanked over well-worn cobble stones. The leitmotif was advertising, specifically money that Google was “taking” from old media. Each question fired at me, meant, “How can these guys make so much money doing online things?” Pretty sophisticated questions, right?

The Business Model Problem

Newspapers and magazines sell advertising. Subscriptions contribute a modest amount to the bottom line. Historically, each “new medium” allows ad revenue to flow from “old media” (Point A) to “new media” (Point B). Radio sucked money, so newspapers like the Courier Journal got into the radio business. When I joined the Courier Journal, the newspaper owned AM and FM radio stations. When TV came along, more ad “blood” flowed. The Courier Journal bought a television station. When databases came along, the Courier Journal entered the database business. Most “old media” watched the Courier Journal’s online antics from the sidelines. Google has figured out the shortest distance from Point A to Point B.

Revenue from commercial online in the 1980s did not come from advertising. Customers paid for access, dealing with specialized timesharing companies. Ziff entered the online business in three ways. We offered electronic products tailored to information technology professionals. Our sales force called on companies and sought licensing deals. Ziff also entered the commercial database business, quickly becoming by the late 1980s one of the dominant players in full text and reference databases for libraries. And, we also struck a deal with CompuServer and created a Ziff branded online service called ZDNet. Both the Courier Journal and Ziff worked long and hard to make money from online. It’s a tribute to the professionals in the Courier Journal’s online business and to Ziff’s various electronic publishing units that both organizations generated positive cash flow.

Google’s Secret

Google, on the other hand, had something that my colleagues at the Courier Journal and Ziff Communications lacked. No, it wasn’t smart people. No, it wasn’t better programmers. Google had a business model “borrowed” from Yahoo – Overture and the burgeoning online “environment” generally described as the Internet. By 2004, when Google went public, Google’s business model and the exploding user base of “the Internet” ignited Google’s online advertising business.

In less than three years, Google had poked its business model into numerous nooks and crannies of “old media”. Unlike the tame online services of the 1980s, Google’s approach was operating “off the radar” of “old media”. Old media used traditional ad sales mechanisms. Old media thrived on an inner circle of competitors who behaved in a polite, club-like environment. Old media called the shots for advertisers, who by and large, had no choice but to deal with “old media” on “old media’s” terms.

Not Google. Google allowed anyone to buy an ad online. Anyone could sidestep “old media” and traditional advertising rules of engagement. More disturbing, other companies were looking at Google’s business model and trying to squeeze money from electronic ads. None of these competitors played by the rules crafted over decades of “old media” fraternizing. Disintermediation, the Internet, and a better business model — these are the problems “old media” has to resolve and quickly.

So, there I was. I answered questions about Google’s ad business. I answered questions about Google’s technical approach to online. I answered questions about Google’s arrogance. I don’t think the interviewer or the audience found my answers to their liking. I can say in retrospect that nothing I said about Google made any sense to these “old media” types. I could have been speaking Thagojian, and the effect would have been the same. “Old media” didn’t understand what the Courier Journal did in 1980, what Ziff did in 1990, or what Google was doing in the early years of the 21st century.

Watching the Money Disappear

Why am I dragging you through history and this sordid tale of my sitting under hot lights answering questions about a company that won’t answer my email? This post caught my attention this morning (6 am, March 15, 2008) in the Charlotte Airport: Google Sucks Life Out of Old Media: Check Out The 2007 Share Shift by Henry Blodgett.

The gist of Mr. Blodgett’s Web log post is: “The year-over-year growth of revenue on Google.com (US)–approximately $2 billion–was more than twice as much the growth of ad revenue in all of the offline media companies in this sample combined. This is such an amazing fact that it bears repeating: A single media property, Google.com (US), grew by $2 billion. All the offline media properties owned by the 13 offline media companies above, meanwhile–all of them–grew by about $1 billion.”

What this means is that “old media” are going the way of the dodo unless the “old media” club gets its act together. One of the more controversial statements I made to the dewans in their posh NY digs was, “Surf on Google.” The idea is simple. Google is the big dog in the advertising kennel. Instead of watching Googzilla eat your lunch, find a way to harness Google. I use the phrase Surf on Google to connote sitting down and figuring out how to create new revenue using Googzilla as an engine, not an enemy.

Problems with Newspapers

I was speaking some unintelligible language to these “old media” dewans. Even old dinosaurs like me listen to an iPods, read news on my mobile device, and often throw out unopened the print newspapers I receive each day. Why don’t I look at these traditional news injection devices? Let me count the ways:

Courier Journal. It just sucks. The recycled news is two or three days “old” when it hits print. I get current information via RSS, Google News, Yahoo News, and the BBC Web site, among others.
Financial Times. I get a paper three days out of six. This outfit can’t work out delivery to Harrods Creek despite its meaty price tag.
New York Times. I look at the Monday business section and maybe flip through the paper. I no longer spend an hour with the Sunday New York Times.
USA Today. I look at McPaper’s weather map and scan its TV grid to see if the History Channel is running the “Naked Archaeologist,” my current must-see program.
Wall Street Journal. I scan the headlines and check out the tech information. The banks for which I work expect me to know what the Journal says but I’m not sure my clients read the paper very thoroughly any more. Online is easier and quicker.

People in my son’s cohort spend less time with “old media” than I do. When I sit in airports, I watch what college students do. My sample is small, but I don’t see many 20-somethings immersed in “old media”. If you want to understand what young people do for news, embrace ClickZ stats. It’s free and useful.

I find encouraging that the Wall Street Journal, the New York Times, and the Financial Times “reinvent” their Web sites — again and again. But the engine of “old media” is advertising, and no spiffy Web site is going to make up for lost ad revenue.

Did my statement in June 2007 “Surf on Google” have an impact? Not that I can see. “Old media” are building a fort out of dead trees, traditional technology, and battle tactics used by cities besieged by Alexander the Great. The combatant — Google — is armed with nuclear weapons and is not afraid to use them.

For “old media” Mr. Blodgett’s summary of the financial devastation is confirmation that “old media” now finds itself suffering nuclear winter. There are some fixes, but these are not easy, not comfortable, not traditional, and not cheap. I’m glad I’m in the sunset of my career and no longer sitting in meetings trying to figure out how to out-Google Google. Innovation not contravallation is needed. Can “old media” respond? I’m betting on Google and its progeny.

Stephen Arnold, March 15, 2008

Written by Stephen E. Arnold · Filed Under Online (general) | Comments Off on Googzilla’s Assault and Old Media’s Nuclear Winter

Yahoo Goes Semantic

March 13, 2008

Yahoo has embraced the Semantic Web. Yahoo’s Web log stated:

In the coming weeks, we’ll be releasing more detailed specifications that will describe our support of semantic web standards. Initially, we plan to support a number of microformats, including hCard, hCalendar, hReview, hAtom, and XFN. Yahoo! Search will work with the web community to evolve the vocabulary framework for embedding structured data. For starters, we plan to support vocabulary components from Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, and others based on feedback. And, we will support RDFa and eRDF markup to embed these into existing HTML pages. Finally, we are announcing support for the OpenSearch specification, with extensions for structured queries to deep web data sources.

Interesting, but maybe these two lads knew something I didn’t. What’s interesting about this announcement is that Google’s Programmable Search Engine, disclosed in a series of patent applications in February 2007, strikes me as a more sophisticated, well-conceived approach. But Google has kept its semantic technology under wraps.

Amazon, like Yahoo, has moved more quickly than Google. Jeff Bezos has
deployed cloud computing, introduced storage, and a hosted data management service. Google has these technologies and disclosed each in patent applications.
The question for me is, “Is Google content to let Amazon and Yahoo operate like lab experiments?”

Google doesn’t answer my email, so I can’t provide any insight based on information from the Googleplex. Google’s professionals are a heck of a lot more intelligent than I am. Google is hanging back, allowing two of its rivals to push forward in areas where Google has a core competency.

I find this puzzling. Do you?

Stephen Arnold, March 13, 2008

Written by Stephen E. Arnold · Filed Under News, Online (general), Search | Comments Off on Yahoo Goes Semantic

Who Quaeros?

March 11, 2008

Europe is concerned about Google.

When I was in Denmark in November 2006, I learned that about 85 percent of the country’s search traffic was a result of Google searches. I think Google has increased its share of traffic in Denmark to Germany’s level. For those of you not paying attention, Google drives about 90 percent of the traffic in Deutschland.

There are two initiatives under way to “kill Google.” The first is Quaero, a French inititive. You can read about it here. The second is a German-flavored project called THESEUS, which received funding a year ago. My understanding is that Fast Search & Transfer is in the saddle for the THETUS project, but my information may be stale. The French, not to be outdone, have routed money to Quaero. Check out his story — “France Cleared to Fund Search Project”. Here’s a snippet:

France won EU approval Tuesday to give $152 million to several companies hoping to build a European rival to U.S. search giant Google Inc. … The commission said the grant would not give Thomson market power because rivals will likely keep up their investment in research and development. It cleared the German government to give $165 million to the German arm of the project, called THESEUS. That money will fund “icebreaker” companies — Siemens AG, SAP AG, Deutsche Thomson oHG and EMPOLIS GmbH, owned by Bertelsmann AG — to kick start research. The aid will later spread to smaller firms.

Am I misreading this? My work has publicized some of France’s most promising search and content processing companies.So, what do you do if your are German or French? Replicate the Silicon Valley VC environment? Alter the tax laws? Reduce bureaucratic red tape? Encourage university incubators? Nah, too complex. Just give the money to industrial giants and tell them “build a better Google”.

In my experience, governments dumping money on industrial giants leads to predictable outcomes. Those that come to my mind include Halliburton’s contributions in Iraq, IBM’s work to implement the Documentum content management system for the US Senate, and the numerous reengineerings of the Internal Revenue Service’s computer systems.

Look at these to re familiarize yourself with French engineering and computer science:

What puzzles me is how will France figure out which of these companies will get a wedge of euros to “kill Google”. What will the Thomson oHG operation do with French wizards who are hacking away in un dortoir? Probably nothing.

With French venture capital forcing some French entrepreneurs to leave France for such places as — gasp! — England, I hope some of the euros feather the nests of young entrepreneurs.

The battle lines are drawn. The German “icebreakers” Siemens AG, SAP AG, Deutsche Thomson oHG and EMPOLIS GmbH, owned by Bertelsmann AG will try to crush Google and, of course, France. The French companies will try to “kill Google” and turn off the power for THESEUS, of course.

If we factor these battle lines, you will notice that I think Google will chunk forward, allowing the icebreakers to smash and crunch forward.

For those of you who don’t know what a European icebreaker looks like. Take a gander. Do you think this can smash over Google? Will these efforts run aground? I will watch the progress closely and plot the activity on Google Maps until Google is crushed that is.

Stephen Arnold, March 12, 2008

Written by Stephen E. Arnold · Filed Under Google, Online (general), Search | 1 Comment

CMS: Houston, We Have a Problem!

March 7, 2008

The 2008 AIIM show is history.

I spent several days in Boston (March 3, 4, 5, 2008), wondering why the city built a massive concrete shoe box, probably designed by a Harvard or MIT graduate inspired by Franz Kafka and post-Stalinist architecture. It’s obvious no one had the moxie to tell our budding Leonid Savelyev that people expect mass transit, doors to the hotel across the street, and an easy-to-navigate interior. Spend a few hours wandering around this monstrosity, and you may resonate with my perceptions of this facility.

There’s another disaster brewing under the AIIM umbrella. That’s what the in-crowd calls content management. Synonyms in play at this show included CMS, ECMS (enterprise or extreme content management systems), and eDocuments, among others.

These synonyms are a radio beacon that says to me, loud and clear: “We have a way to help you deal with electronic information.” These assurances wrapped in buzzwords make it clear that organizations are: [a] unable to deal with basic storage and findability tasks; [b] confused about how business processes can and should intersect; [c] staggered like a punch drunk fighter with the brutally punishing costs of these eDoc solutions; and [e] scared because a mistake can send them to court or, even worse, jail. No one I met fancied doing a perp walk in an orange suit due to a failure to comply with regulatory mandates, legal discovery, and basic, common sense record keeping.

Folks were pretty thrilled to get a Google mouse pad from the Googlers or a rubber ball with flashing lights in it from Open Text. But amidst the bonhomie, there was a soupçon of desperation.

To me CMS and its step children attempt to make a run-of-the-mill operation into a high-end publishing company. The problem with attempting to embed an intellectual process dependent on information into software is that most people aren’t very good informationists. Using a BlackBerry or an automatic teller machine is not the same as creating useful, accurate, on-point information. CMS has now morphed from managing a static Web site’s content into a giant, Rube Goldberg machine that ingests everything and outputs anything, at least according to the marketers I met.

Electronic information is now a major problem for most employees, senior managers, and vendors. Building a solution that is affordable and satisfies the needs of the Securities & Exchange Commission from Tinker Toys is a tough job. I saw lots of Tinker Toy solutions on offer. I’m genuinely concerned about the problems these systems are exacerbating. “Trouble,” as one cowboy said to his side kick, “is coming down the line.”

This essay highlights the three of my take-aways from this conference and exhibition. According to the chatter, there were more than 2,000 paying attendees who sat through lectures on subjects ranging from “Architecture Considerations in Electronic Records Management Software Selection: to “Pragmatic to Value Add: Will Anyone Really Pay for It?”. There were product reviews disguised as substantive lectures. I suffered some thin gruel that passing as a solid intellectual feast. I heard that another 20,000 people fascinated with copiers, high-speed imaging, and digital information wandered through the charming aircraft hanger of an exhibit hall.

Most of the presenters “follow the game plan”. The talks are in the average to below average grade range. A few are interesting, but finding one is a hit-and-miss affair. This conference housed a Drupal conference, something called On Demand, and the AIIM conference. For my purposes, there’s one conference, and the unifying theme was lots of people talking about electronic information.

What I Learned

Let me compress 18 hours of AIIM experiences into these points:

Digital content is a major problem for most organizations. CMS is the band aid, but none of the vendors has a cure for information obesity. None of the customers with whom I spoke using vendors’ solutions are in shape for a digital triathlon. Systems are expensive and flaky. Budgets are tight, and the problems of storing, finding, and repurposing information are getting worse fast.
Vendors with hardware solutions that scan paper, print paper, and manipulate digital counterparts of paper are spouting digital babble and double talk. Vendors of quasi – copy machines talk about hardware as if it were bits in a cloud. AIIM has its roots in scanning, micrographics, microfilm, and printing. Hardware — even when it is the size of an SUV — is positioned as software, a system, and a platform. Obviously hardware lacks sizzle. Vendors with software solutions talk about the pot of gold at the end of the dieters’ rainbow. It just ain’t true, folks. It’s a Nike running shoe commercial applied to information. No go. Sorry.
Marketing messages are not just muddled; the messages are almost incomprehensible. Listening to earnest 30 – year olds tell me about “enterprise repositories with integrated content transformation and repurposing functionality” and “e – presentment” left me 100 percent convinced that the information crisis has arrived, and the vendors will say anything to get a deal and the buyers will buy whatever assuages their fears. Rationality was not a surplus in these sales pitches.

My stomach rebels at baloney.

The “Real” Problem

Organizations right now are fighting a three – front war against digital information. I know that the AIIM attendees are having a tough time expressing their challenges clearly. The people with whom I spoke can only describe the problem from an individual point of view. Vendors want to be all things to all people. The dialogs among the customers and the vendors are fascinating and disturbing to me. I think the market is in a state of turmoil.

Digital information is a different type of challenge for an organization. On one hand, it eliminates the hassles of recycling some information. Cut and paste is a wonderful function. But if your work processes are screwed up, digital information only creates more problems. If your employees aren’t good informationists, you will produce more dross than ever. You will, of course, do it more quickly which adds to the problem. Furthermore, finding something remains tough. Automated systems are expensive, complex, and fully capable of going off the rails with no warning.

What was crystal clear to me is that most business processes have not been “informationized”, to use a weird verb form I heard at the show. Work flows are based on human actions. Humans are just not very good at “being digital”.

Wrap Up

An inability to handle digital information is a problem of great import. Regulators expect companies to manage digital information. Organizations aren’t set up to deal effectively with the data volume and its challenges — format, versions, volatility, non-textual components, etc. The problem is not getting better. The problem is getting bigger.

One well-fed, sleek senior manager smirked with pride about the huge prices paid by certain firms to acquire enterprise content management companies (ECM or enterprise CMS in the jargon of AIIM). He pointed to two firms — EMC and Hewlett Packard — as particularly adept practitioners of snapping up “hot” companies in order to get “high margin upsides”. “There’s a big market for this high-end solution,” he asserted.

I think this weird MBA speak means that EMC and HP want to buy into a sector with fat margins and semi-desperate customers. This can work, but I am not sure that these two firms’ “solutions” are going to solve the information challenges most organizations now face. EMC wants to move hardware. HP wants to sell printers and ink.

I’m probably wrong. I usually stray into the swamp anyway.

I think information mis-management will bring the direct downfall of some organizations in the next few months. Tactical fixes will not be enough. When an information-centric collapse occurs, perhaps buzzwords will give way to new thinking about digital information in organizations. More meat, fewer empty calories, please!

Stephen Arnold, March 7, 2008

Written by Stephen E. Arnold · Filed Under Conferences, Enterprise, Online (general), Search | 3 Comments

Ask and You Shall Receive No Traffic

March 5, 2008

About eight months ago, a colleague and I had dinner with two whizzy consultant types from a big city.

One of the conversational topics was Web search, a subject which I make an effort to avoid. Web search evokes for me information of unknown provenance from an unspecified number of Web sites on an unknown update cycle with a murky (at best) method of determining relevancy.

I don’t care which search engine I use or you use.

None is particularly good, so running the same query on different systems is a must for me. I have some short cuts, but it is a chore to sift through chaff to find a couple of informational “Wheaties”.

Try it. Pick a Web search sysem. Enter a single word query like Spears, and you get the drivel of popular culture. Avoid. Like. The. Plague.

Back to the Dinner Chit Chat

The four of us are sitting in the River Creek Inn, a high class motorcycle bar in Harrod’s Creek. I’m trying to decide between the Kentucky favorites, burgoo (squirrel stew with bourbon) or hot brown (white bread, ham, turkey and bourbon-based gravy).

My colleague picks up on a stray comment and asks the male zippy consultant, “Did you say you prefer Ask.com over Google.com or Yahoo.com for Web search?”

I gave her my best 64-year old squint, but the zippy male consultant grabbed the opening and launched into an Ask.com panegyric. I decided on a green salad and turned my attention to this fellow.

What He Liked and My Rejoinders

I recall clearly that this Ask.com cheerleader liked three aspects of Ask.com. Hold on to your socks:

Ask.com is easy to use. My comment: No doubt about that. The system has long been a favorite of the middle school crowd. Not a hot demographic, but a good window into the “strengths” of the system
Ask.com has a better interface than Google, Microsoft, Yahoo, et al. My comment: I know about large, colorful icons and a search box. I’m not sure how this makes search better than Google’s text hot links or the research test Yahoo Mindset slider interface.
Ask.com is gaining market share and is a real contender in Web search. My comment: Mr. Whizzy Consultant, sir, do you have substantive data to back this assertion? At the time of this meeting in 2007, AskJeeves.com was a distant fifth in Web search traffic and under pressure from Exalead.com.

The zippy consultant seemed shocked that I would challenge his assertions. Today, he probably has conveniently forgotten our table chat and overlooked the news that Ask.com (formerly AskJeeves.com) is going to morph into a Web site for mid western females. That should make the middle school kids happy when they try to look up Julius Caesar in a few months. You can read the AP story here. The writing was on the wall. I learned several months ago, Ask.com’s technical guru has returned to academe. On March 4, 2008, I learned that Gary Price, a highly regarded librarian, severed his ties to Ask.com (or had his ties to Ask.com severed). You can read his upbeat fare-thee-well here.

Some History: Faux NLP, A Miss with Direct Hit, and Killing the Butler

In the 1990s, AskJeeves.com was one of the first Web search companies to assert that it performed NLP or natural language processing. The idea was that a user would type a question in the search box; for example, “What’s the weather in Chicago?” AskJeeves would come back with the temperature. I can’t find my screen shots of this function, but I do recall it worked.But — and this is a big caveat — AskJeeves did not do NLP. Humans created templates and rules. When the user’s query matched a template, the rules would form a query, get the aforementioned temperature, and display it.

Send the system a question it didn’t understand and the system would return a bag of jelly beans in many flavors and colors. In less metaphorical terms, the relevancy of the results was full of empty calories.

Direct Hit

To fix the problem of the brutal costs of human editors working like made to create more templates and rules, AskJeeves.com bought DirectHit.com. As I recall, DirectHit.com was a shopping and ad engine. What sticks in my mind is that DirectHit used tiny orange stick figures to indicate relevance (I think). There was some hoo – haa about AskJeeves’ acquiring this technology, and then it drifted off my radar screen.

Downsizing

AskJeeves’ management team sold off the rules – based question answering system and the public Web search system plopped into the Barry Diller empire.

Teoma

Teoma, I recall, was developed by technnical wizards at Rutgers University. (I want to note that Rutgers is a great institution as evidenced by the Eagleton Lectureship the university awarded me in the late 1980s.)

Teoma, next generation Web Search, delivers three types of search responses. The results included a traditional laundry list. Teoma also offered “See Also” references and a point-and-click set of hot links to narrow a results set. Some of the queries, as I recall, included links suggested by other users. Today this type of feature is dubbed “social search”.

I liked Teoma, and it became the core engine of Ask.com.

What Happened?

In many ways, Ask.com did many things correctly. Management focused on a core strength — search. Management acquired and integrated more sophisticated technology. Management established a brand identity, the Jeeves butler who suggested that he was at my service.

What went wrong appears to have been a combination of exogenous forces and a stalwart tripping and falling on her sword.

The exogenous force was the broader market dynamics involving Yahoo, then Google, and more recently Microsoft. Yahoo and Google sucked up most of the search traffic and captured most of the ad revenue. As the distance widened between Google and Yahoo, there wasn’t enough revenue to keep pace with the brutal technical investments required to play the Web indexing game. Compared to Exalead, to cite one example, Ask.com’s infrastructure was more expensive to scale and more fiddly that the French upstart’s approach based on AltaVista – type engineering.

The self-inflicted wound may have been caused by putting marketing before technology. I am no marketing expert, so my summary is probably off base. I rather liked the butler logo. Like the Google logo, it conveyed some humor and whimsy in what is a bloodless game. I was baffled by some of the Ask.com advertisements. Other than a general sense of bewilderment, I wasn’t sure what the heck the company was trying to say. I recall sitting in a couple of presentations by Ask.com, and I came away feeling that the chipper Ask.com professionals were talking about a system that I did not recognize.

Denoument and a New Beginning

Let’s review what my opinion is:

In order to scale a Web search system, the stakeholders have to be prepared to spend — often substantial sums without much notice. When that money is not available, short cuts become evident. These range from marketing sleight of hand, wacko advertising, and graphic tweaks.
Any search engine working against a headwind and a three percent market share is tough. MBAs often see the search challenge as trivial. It’s not. If you have the technology to leap frog ahead of Google, you can pull Googzilla’s tail, maybe slow Googzilla down. But good enough solutions won’t do the job.
The service was a positioned differently as the various owners / managers tried to turn a digital pig’s ear into a digital golden goose. Killing the venerable Jeeves character presaged the demise of the broader service.

The information superhighway is littered with road kill. Some of these are still around; for example, there’s Lycos.com, Excite.com, and Dogpile.com. Have you used these today? And what about HotBot.com, Muscat.com, or WiseNut.com? Maybe you are using AltaVista.com, AllTheWeb.com, Gigablast.com, IceRocket.com, or A9.com? I’ve got a list of a couple of hundred international search engines, lists of metasearch engines, and lists of more than 350 companies offering search and content processing systems. At this time, none of these outfits are able to hobble Googzilla if my understanding of usage data is correct.

Back to the Biggie Consultants

When a biggie consultant asserts that a long-shot with a track record of coming in last in a five or six year race, I’m not going to let that dog sleep peacefully. Don’t misunderstand me. I need access to multiple, high-quality Web search systems. As nifty as Google is, my tests reveal that if I run a query such as text mining or content processing on Google, I will double the number of relevant hits if I use two other Web search systems.

Here are a few observations about Web search, which I will use to control this “I told you so” essay.

First, Web indexing is a messy, complicated, and imprecise activity. Web robots can’t index servers when the servers are down or when network issues create time outs. A searcher does not know what’s omitted, what’s new, or what’s old in most cases. None of the systems I track provide much substantive information about the “certainty” of a result, its date of creation and when the content was refreshed, the “quality” of the information, and so on. Not only are Web results spotty, in most cases, I have zero useful information to help me determine what’s correct and what’s dead wrong. Social search sounds great, but social systems can be easy to twiddle, spoof, and fast dance?

Second, more Web sites are dynamic today than in the past. This means that static, easy to index Web pages make up a smaller percentage of public pages with content. Dynamic sites are more difficult to spider because robot technology does a lousy job with dynamic sites. There’s a solution, but it is even more expensive, complex, and difficult than indexing good, old flat HTML, XHTML, or XML pages. (Google has this technology called the Programmable Search Engine, but so far the company has been keeping it under wraps.) Under-funded tech operations find it tough to compete because the people and the money are not available.

Third, user behaviors are changing in step with the access devices. With more queries flowing from mobile devices, different processes are needed. Who wants to browse results on a tiny screen even if it is a state of the art iPhone or BlackBerry?

Fourth, search is drifting toward point-and-click interfaces and even more sophisticated approaches. What I call “beyond search” techniques.

To conclude, biggie consultants who assert that a particular search system will gain market share based on personal preferences and lack of information are much in evidence today. There’s no lack of talk about innovation in Web search. But I for one am waiting for Powerset.com to become available. I’m annoyed that EZ2Find.com has such lousy marketing for an interesting and useful service. I want a leap frog system to take me beyond Google.

I “ask” so that I may receive. Whizzy big city consultants don’t ask; they assert. Doesn’t work sometimes, does it?

Stephen Arnold, March 5, 2008

Written by Stephen E. Arnold · Filed Under Online (general), Search | Comments Off on Ask and You Shall Receive No Traffic

Google Sites, Publishing, and Search

March 1, 2008

In the Seattle airport, I fielded a telephone call on February 28, 2008, about Google Sites, the reincarnation of Joe Kraus’s JotSpot, which Google acquired in 2006.

The caller, whom I won’t name, wanted my view on Google Sites as a “SharePoint killer”. As you know, Microsoft SharePoint is a content management system, search and retrieval engine, and “hub” for other Microsoft servers. SharePoint is a digital Popeil Pccket Fisherman or Leatherman tool.

Google Sites is definitely not SharePoint, nor is it a SharePoint killer. SharePoint has upwards of 65 million users, and it is — whether the users like it or not — going to be with us for long time. SharePoint is complex, requires care and feeding by Microsoft Certified Professionals, and requires a number of other Microsoft server products before it hums.

The person who called me wanted me to agree with the assertion that Google Sites is the stake through SharePoint’s bug-riddle heart. SharePoint and I have engaged in a number of alley fights, and I think SharePoint left me panting and bruised.

What is Google Sites? If you have read other essays on this “no news” Web log, you know that I try to look at issues critically and unencumbered by the “received wisdom” of the crowds of Internet pundits.

I included a chapter in my September 2007 study Google Version 2.0 that summarized a few of Google’s content-centric inventions. The JotSpot acquisition provided Google with software and engineers “up to speed” on a system and method for users to create structured information. The structured reference means that content keyed into the JotSpot interface is tagged. Tagged information can be indexed and the metadata allow the information to be sliced and diced. (Sliced and diced means manipulated programmatically.)

So, JotSpot is a component in a broader information initiative at Google. The JotSpot interfaces are fairly genertic, and you can review them here. There’s an employee profile, a student club, and a team project. Availability is limited to users who sign up for Google Apps. You can read about these here.

What I want to do is direct your attention to this diagram that I developed in 2005 for my Google Business Strategy seminars that I gave between 2004 – 2006.

Notice that this diagram doesn’t make any reference to the enterprise. The solid blue arrows indicate that Google has project underway with these entities. Underway, as I use the word, means with or without the cooperation of the identified organizations. For example, Google is indexing US government content for its US government information search service. You can access this service here. The other light yellow boxes name Google services, including Google’s scanning and indexing services, among others.

The dotted line connecting Google to authors is the Google Sites’ function that I think is more important than SharePoint features, the well-known and often controversial deals for information, and the Google Base “upload” function.

I think Google Sites makes it possible — let me emphasize that this is my opinion — and I have zero interaction with Google. Google ignores my requests for comments and information. So internalize this information before reading the next paragraph.

Google Sites makes it possible for Google to go directly to authors, have them enter their information into the Google Sites’s interface, and make that original, primary information available to Google users. With the flip of a bit, Google morphs into a publisher. Google Sites has the potential — if Google wishes to move in this direction — to disintermediate traditional information middle “men” (I’m not being sexist; I’m just using jargon, gentle readers.)

Now let me tell you what I told the person who called me at 10 pm on Thursday, February 28, 2008, as I waited for a red eye to wing me back to the bunker in Harrod’s Creek. I said (and I’m paraphrasing):

“Google Sites may impinge on SharePoint over time. Google Sites may make Google Apps more appealing to enterprise customers. But I think the real significance of Google Sites is that Google is edging ever closer to getting authors to create information for Google. Google can index that content. Slice it. Dice it. Sell it. Authors have been getting a short end of the royalty and money sticks since Gutenberg. If Google meshes selling information via Google Checkout with Google advertising, Google can offer authors a reasonable percentage of the revenue from their work. In a flash, some authors would give Google a whirl. If the authors get reasonable money from their Google deal, it is the beginning of a nuclear winter for traditional publishers. I’m an author. I actually like my publishers Harry Collier, Tony Byrne, Tom Hogan, and Frank Gilbane. But if Google offered me a direct deal with them, I would take it in a heartbeat. This author wants money.”

My caller did not want to hear this. She works for a large, well known publisher. My take on Google Sites pushed her cherished SharePoint argument aside. My suggestion that Google Sites could generate money faster and with greater long-term impact than mud wrestling with Microsoft was one she had not considered.

I know from my work with traditional publishers that the majority of those business magnates don’t think Google could lure an author under contract without a great deal of work. I don’t agreement. The traditional publishing industry is panting between rounds. Many of its digital swings are going wide of the mark.

Google Sites might be a painful blow, worsened as the publishing industry watches the authors swarm to the Google. Google has search and eyeballs. Google has ads and money. Google has original content and people who want math and health information in real time, not after a 12 month peer review process. Times are changing, and most traditional publishing operations are moving deck chairs on a fragile ocean liner of a business model. SharePoint might be collateral damage. The real target are the aging vessels in the shipping lanes of traditional publishing.

Agree? Disagree? Let me know.

Stephen Arnold, March 1, 2008

Written by Stephen E. Arnold · Filed Under Enterprise, Google, Online (general), Search | Comments Off on Google Sites, Publishing, and Search

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Search Hoops: Exercising Technology to Meeting User Needs

Northern Light: A New Business Information Search Service

Search: A Kitchen Sink and the Carcassonne Problem

Civita: The Paradox of Disintermediation

Googzilla’s Assault and Old Media’s Nuclear Winter

Yahoo Goes Semantic

Who Quaeros?

CMS: Houston, We Have a Problem!

Ask and You Shall Receive No Traffic

Google Sites, Publishing, and Search

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta