Lazarus, Azure Chip Consultants, and Search

January 8, 2010

A person called me today to tell me that a consulting firm is not accepting my statement “Search is dead”. Then I received a spam email that said, “Search is back.” I thought, “Yo, Lazarus. There be lots of dead search vendors out there. Example: Convera.

Who reports that search has risen? An azure chip consultant! Here’s what raced through my addled goose brain as I pondered the call and the “search is back” T shirt slogan:

In 2006, I was sitting on a pile of research about the search market sector. The data I collected included:

  • Interviews with various procurement officers, search system managers, vendors, and financial analysts
  • My own profiles of about 36 vendors of enterprise search systems plus the automated content files I generate using the Overflight system. A small scale version is available as a demo on ArnoldIT.com
  • Information I had from my work as a systems engineering and technical advisor to several governments and their search system procurement teams
  • My own experience licensing, testing, and evaluating search systems for clients. (I started doing this work after we created in 1993 The Point (Top 5% of the Internet) and sold it to Lycos, a unit of CMGI. I figured I should look into what Lycos was doing so I could speak with authority about its differences from BRS/Search, InQuire, Dialog (RECON), and IBM STAIRS III. I had familiarity with most of these systems through various projects in my pre Point (Top 5% of the Internet life).
  • My Google research funded by the now-defunct BearStearns outfit and a couple of other well heeled organizations.

What was clear in 2006 was the following:

First, most of the search system vendors shared quite a bit of similarity. Despite the marketing baloney, the key differentiators among the flagship systems in 2006 were minor. Examples range from their basic architecture to their use of stemming to the methods of updating indexes. There were innovators, and I pointed out these companies in my talks and various writings, including the three editions of the Enterprise Search Report I wrote before I fell ill in February 2007 and quit doing that big encyclopedia type publication. These similarities made it very clear to me that innovation for enterprise search was shifting from the plain old key word indexing of structured records available since the advent of RECON and STAIRS to a more freeform approach with generally lousy relevance.

image

Get information access wrong, and some folks may find a new career. Source: http://www.seeing-stars.com/Images/ScenesFromMovies/AmericanBeautyMrSmiley%28BIG%29.JPG

Second, the more innovative vendors were making an effort in 2006 to take a document and provide some sort of context for it. Without a human indexer to assign a classification code to a document that is about marketing but does not contain the word “marketing”, this was rocket science. But when I examined these systems, there were two basic approaches which are still around today. The first was to use statistical methods to put documents together and make inferences and the other was a variation on human indexing but without humans doing most of the work. The idea was that a word list would contain synonyms. There were promising demonstrations of software methods that could “read” a document, but there were piggy and of use where money was no object.

Third, the Google approach which used social methods—that is, a human clicking on a link—were evident but not migrating to the enterprise world. Google was new but to make their 2006 method hum, lots of clicks were needed. In the enterprise, most documents never get clicked, so the 2006 Google method was truly lousy. Google has made improvements, mostly by implementing the older search methods, not by pushing the envelope as it has been doing with its Web search and dataspace efforts.

Fourth, most of the search vendors were trying like Dickens to get out of a “one size fits all” approach to enterprise search. Companies making sales were focusing on a specific niche or problem and selling a package of search and content searching that solved one problem. The failure of the boil the ocean approach was evident because user satisfaction data from my research funded by a government agency and other clients revealed that about two thirds of the users of an enterprise search system were dissatisfied or very dissatisfied with that search system. The solution, then, was to focus. My exemplary case was the use of the Endeca technology to allow Fidelity UK sales professionals to increase their productivity with content pushed to them using the Endeca system. The idea was that a broker could click on a link and the search results were displayed. No searching required. ClearForest got in the game by analyzing the dealer warranty repair comments. Endeca and ClearForest were harbingers of focus. ClearForest is owned by Thomson Reuters and in the open source software game too.

When I wrote the article in Online Magazine for Barbara Quint, one of my favorite editors, I explained these points in more detail. But it was clear that the financial pressures on Convera, for example, and the difficulty some of the more promising vendors like Entopia were having made the thin edge of survival glint in my desk lamp’s light. Autonomy by 2006 had shifted from search and organic growth to inorganic growth fueled by acquisitions that were adjacent to search.

Read more

Microsoft Worries about Intellectual Arteriosclerosis

January 1, 2010

At my advanced age, I think a 50 year old is a spring chicken. Some one in their 40s is a chick. Anyone younger is a beak poking through a shell. Not at Microsoft. Based on the article “At Microsoft, Is Age More Than Just a Number?”, someone at Microsoft allegedly believes the company is losing touch with youth. The idea is that a young engineer comes to Microsoft and then high tails it somewhere else quickly. It costs a big company a lot of money to recruit a young employee, train them, and make them productive. There is a high cost to churn, just as there is in the mobile phone subscription list. To get payback from young workers, the company has to keep these lads and lasses around and productive for several years. As maddening as youth are, these folks are important to the future of an organization. The most interesting sentence in the write up, in my opinion, was:

“My research has shown that 401ks, salaries and other forms of monetary compensation are less important to Generation Y retention than fruitful collaboration with peers, recognition of work, opportunities for growth and the idea of “being a part of something”. These young employees are less averse to change and will tirelessly seek environments that promote these activities, leaving those that don’t.”

The quote is from a wizard at Microsoft. Am I alone in thinking that Google has a magnetic lure for young wizards. Those who can’t get hired at the Google can look at Apple, maybe Amazon, or even the wacky land of Facebook.com. If Microsoft has a bunch of spring chickens, these folks may not be ready for the retirement home, but they may be just the ticket for Googzilla to run over on the information highway. The spring chickens may cross the road to fix a glitch in SharePoint and then find themselves under the wheels of Messrs. Brin’s and Page’s Hummer.

Stephen E. Arnold, January 1, 2010

Another freebie. I must report this sad state of affairs to the National Park Service. A dead animal in a national park is its responsibility.

Cicumvallation: Reed Elsevier and Thomson as Vercingetorix

November 27, 2009

Google Scholar Gets Smart in Legal Information

One turkey received a presidential pardon. Other turkeys may not be so lucky on November 26, 2009, when the US celebrates Thanksgiving. I am befuddled about this holiday. There are not too many farmers in Harrod’s Creek. The fields contain the abandoned foundations of McMansions that the present economic meltdown have left like Shelly’s statue of Ozymandius. The “half buried in the sand” becomes half built homes in the horse farm.

As Kentuckians in my hollow give thanks for a day off from job hunting,, I am sitting by the goose pond trying to remember what I read in my copy of Caesar’s De Bello Gallico. I know Caesar did not write this memoir, but his PR bunnies did a pretty good job. I awoke this morning thinking about the connection between the battle of Alesia and what is now happening to the publishing giants Reed-Elsevier and Thomson Reuters. The trigger for this mental exercise was Google’s announcement that it had added legal content to Google Scholar.

vercingetorix

What’s Vercingetorix got to do with Google, Lexis, and Westlaw? Think military strategy. Starvation, death, surrender, and ritual killing. Just what today’s business giants relish.

Google has added the full text of US federal cases and state cases. The coverage of the federal cases, district and appellate, is from 1924 to the present. US state cases cover 1950 to the present. Additional content will be added; for example, I have one source that suggested that the Commonwealth of Virginia Supreme Court will provide Google with CD ROMs of cases back to 1924. Google, according to this source, is talking with other sources of US legal information and may provide access to additional legal information as well. What are these sources? Possibly
Public.Resource.Org and possibly Justia.org, among others.

The present service includes:

  • The full text of the legal document
  • Footnotes in the legal document
  • Page numbers in the legal document
  • Page breaks in the legal document
  • Hyperlinks in the legal document to cases
  • A tab to show how the case was cited in other documents
  • Links to non legal documents that cite a case.

You can read various pundits, mavens, and azure=chip consultants’ comments on this Google action at this link.

You may want to listen to a podcast called TWIL and listened to the November 23, 2009, show on which Google Scholar was discussed for about a half hour. You can find that discussion on iTunes. Just search for TWIL and download the program “Social Lubricants and Frictions.”

On the surface, the Google push into legal information is a modest amount of data in terms of Google’s daily petabyte flows. The service is easy to use, but the engineering required to provide access to the content strikes me as non-trivial. Content transformation is an expensive proposition, and the cost of fiddling with legal information is one of the primary reasons commercial online services have had to charges hefty fees to look at what amounts to taxpayer supported, public information.

The good news is that the information is free, easily accessible even from an iPhone or other mobile device. The Google service does the standard Google animal tricks of linking, displaying content with minimal latency, and updating new content in a a minute or so that content becoming available to Google software Dyson vacuum cleaner.

So what?

This service is similar to others I have written about in my three Google monographs. Be aware. My studies are not Sergey-and-Larry-eat-pizza books. I look at the Google open source technical and business information. I ignore most of what Google’s wizards “say” in public. These folks are “running the game plan” and add little useful information for my line of work. Your mileage may differ. If so, stop reading this blog post and hunt down a cheerful non-fiction Google book by a real live journalist. That’s not my game. I am an addled goose.

Now let me answer the “so what”.

First, the Google legal content is an incremental effort for the Google. This means that Google’s existing infrastructure, staff, and software can handle the content transformation, parsing, indexing, and serving. No additional big-buck investment is needed. In fact, I have heard that the legal content project, like Google News, was accomplished in the free time for play that Google makes available to its full time professionals. A bit of thought should make clear to you that commercial outfits who have to invest to handle legal content in a Google manner have a cost problem right out of the starting blocks.

Second, Google is doing content processing that should be the responsibility of the US government. I know. I know. The US government wants to create information and not compete with commercial outfits. But the outfits manipulating legal information have priced it so that most everyday Trents and Whitneys cannot afford to use these commercial services. Even some law firms cannot afford these services. Pro bono attorneys don’t have enough money to buy yellow pads to help their clients. Even kind hearted attorneys have to eat before they pay a couple a hundred bucks to run a query on the commercial online services from publicly traded companies out to make their shareholders have a great big financial payday. Google is operating like a government when it processes legal information and makes it available without direct charge to the user. The monetization takes place but on a different business model foundation. That also spells T-R-O-U-B-L-E for the commercial online services like Lexis and Westlaw.

Read more

Coveo Expresso Breaks New Ground in Information Access

November 9, 2009

Coveo, a leading provider of enterprise search technology and information access solutions, recently unveiled a free, entry-level enterprise search solution, Coveo Expresso™ Beta.  Coveo’s new solution places the power of enterprise information access in the hands of employees everywhere, at no cost, for up to 50 users. The free version of the Expresso content processing system can index one million one million desktop files and email items as well as 100,000 Intranet documents.  Licenses can be expanded at minimal cost to as many as 250 users, five million desktop files and email items, and one  million SharePoint and File share documents, just by typing a new access code. Administrators simply add new email accounts and SharePoint or file share documents within the intuitive administrative interface. Coveo Expresso is available for immediate download at www.coveo.com/expresso.

Laurent Simoneau, President and CEO, told Beyond Search:

Although enterprise search solutions have been available for nearly a decade, most are built on legacy systems that are difficult to implement and have not lived up to the promise of intuitive, secure and comprehensive information access across information silos. We want to re-educate businesses about the ease and simplicity with which enterprise search should work, as our customers can attest. Coveo Expresso does that—and takes enterprise search one step further with ubiquitous access interfaces such as the Coveo Outlook Sidebar or the desktop floating search bar, which provide guided, faceted search where employees ‘live’—in their email interface or on their PC/laptop. We’ve been testing this feature for a number of months with our current customers and have found it to be one of the biggest boosts to productivity for all employees, regardless of their roles.

Features

The free download features a number of Coveo innovations, including:

  • Cross-enterprise Email Search, for 50 email accounts, including PST files and attachments, on desktops and in servers for up to 1 million total items.
  • The Coveo Outlook Sidebar, the industry’s first true enterprise search Outlook plug-in, which provides sophisticated features such as conversation folding, related conversations, related people, related attachments, and the ability to search any indexed content without leaving Outlook, as well as the ability to launch advanced search with guided navigation through search facets.
  • The Coveo Desktop Floating Search bar, enabling guided searches without leaving the program in which the user is working.
  • Enterprise Desktop Search, including always-on indexing for 50 PCs/laptops.
  • Mobile access via BlackBerries for 50 users.

The Espresso Interface

Search results appear in a clean, well-organized panel display.

image

Read more

Differentiation: The New Enterprise Search Barrier

October 30, 2009

I don’t know one tree from another. When someone points out a maple and remarks that it is a sugar maple, I have no clue about a maple and even less information about a sugar maple. A lack of factual foundation means that I know nothing about trees. Sure, I know that most trees are green and that I can cut one down and burn it. But I don’t own a chain saw, so that general information means zero in the real world.

Now consider the clueless minions who have to purchase an enterprise search system. The difference between my tree knowledge and their search knowledge is easy to point out. Both of us are likely to become confused. To me, trees look alive. To the search procurement team, search systems look alike.

I received an announcement about a search system (nameless, of course) which asserted:

[The vendor’s product] is the first mobile enterprise search server to enable secure ‘anywhere’ access to data that resides across all information sources, including individual desktops, email stores, file shares, external sites and enterprise applications. Leveraging the [vendor’s product] Enterprise Server as its backbone, [the vendor’s product] Anywhere is capable of delivering secure, immediate access to any browser-enabled device, from an iPhone to a Blackberry and beyond.

I find that this write up is * very * similar to the Coveo email search solution, which has one of its features as mobile access plus a number of other bells and whistles.

I can document many other similarities in the way in which search vendors describe their products. In fact, I identified a phrase first used by Endeca in 2003 or 2004 as a key element in Microsoft’s marketing of its SharePoint search systems. My recollection is the phrase in question is “user experience.” Endeca may have snagged it somewhere just as Mozart plucked notes from his contemporaries.

Confusion among search vendors is easy. Many recycle words, phrases, and buzzwords, hoping that their spin will win customers. One thing is certain. Vendors have the azure chip consultants in a tizzy. One prominent azure chip outfit in New York has pegged Google a laggard and a product that has yet to make its appearance as a leader.

Procurement teams? Baffled for sure. Differentiation is needed, but it doesn’t come by recycling another vendor’s marketing collateral or relying on the azure chip crowd to cook up a new phrase to baffle the paying customers, or some of the paying customers.

Vendors, differentiate. Don’t imitate.

Stephen Arnold, October 30, 2009

A former Ziffer bought me dinner this week. Does that count as compensation? I deserve more.

Coveo Discloses Client Wins in Q209

August 14, 2009

Coveo is a technology company with some interesting products. I learned about the firm when I poked into the origins of the desktop search system called Copernic. The firm flashed on my radar with a snap in solution for SharePoint. I saw a demonstration of email search that provided features I had heard other vendors describe. Coveo implemented them; for example, maintaining a complete email archive for the user’s desktop computer so if he or she lost a mobile device, the mail was recoverable.

Getting information out of Coveo has not been easy for me. I received a link to a Marketwire article that provided me with some useful information, and I wanted to snag it before the data gets buried in the digital avalanche that cascades into the goose pond each day.

Coveo disclosed several interesting customer wins:

  • Goodrich Corporation, a Fortune 500 company
  • Odyssey America, an insurance firm
  • The Doctor’s Company, an insurer of physicians and surgeon.

Coveo also formed alliances with New Idea Engineering and a number of other integrators around the world.

A happy quack to Coveo and a wing flap to the person at Coveo who provided this information.

Stephen Arnold, August 14, 2009

MarkLogic: The Shift Beyond Search

June 5, 2009

Editor’s note: I gave a talk at a recent user group meeting. My actual remarks were extemporaneous, but I did prepare a narrative from which I derived my speech. I am reproducing my notes so I don’t lose track of the examples. I did not mention specific company names. The Successful Enterprise Search Management (SESM) reference is to the new study Martin White and I wrote for Galatea, a publishing company in the UK. MarkLogic paid me to show up and deliver a talk, and the addled goose wishes other companies would turn to Harrod’s Creek for similar enlightenment. MarkLogic is an interesting company because it goes “beyond search”. The firm addresses the thorny problem of information architecture. Once that issue is confronted, search, reports, repurposing, and other information transformations becomes much more useful to users. If you have comments or corrections to my opinions, use the comments feature for this Web log. The talk was given in early May 2009, and the Tyra Banks’s example is now a bit stale. Keep in mind this is my working draft, not my final talk.

Introduction

Thank you for inviting me to be at this conference. My topic is “Multi-Dimensional Content: Enabling Opportunities and Revenue.” A shorter title would be repurposing content to save and make money from information. That’s an important topic today. I want to make a reference to real time information, present two brief cases I researched, offer some observations, and then take questions.

Let me begin with a summary of an event that took place in Manhattan less than a month ago.

Real Time Information

America’s Top Model wanted to add some zest to their popular television reality program. The idea was to hold an audition for short models, not the lanky male and female prototypes with whom we are familiar.

The short models gathered in front of a hotel on Central Park South. In a matter of minutes, the crowd began to grow. A police cruiser stopped and the two officers were watching a full fledged mêlée in progress. Complete with swinging shoulder bags, spike heels, and hair spray. Every combatant was 5 feet six inches taller or below.

The officers called for the SWAT team but the police were caught by surprise.

I learned in the course of the nine months research for the new study written by Martin White (a UK based information governance expert) and myself that a number of police and intelligence groups have embraced one of MarkLogic’s systems to prevent this type of surprise.

Real-time information flows from Twitter, Facebook, and other services are, at their core, publishing methods. The messages may be brief, less than 140 characters or about 12 to 14 words, but they pack a wallop.

image

MarkLogic’s slicing and dicing capabilities open new revenue opportunities.

Here’s a screenshot of the product about which we heard quite positive comments. This is MarkMail, and it makes it possible to take content from real-time systems such as mail and messaging, process them, and use that information to create opportunities.

Intelligence professionals use the slicing and dicing capabilities to generate intelligence that can save lives and reduce to some extent the type of reactive situation in which the NYPD found itself with the short models disturbance.

Financial services and consulting firms can use MarkMail to produce high value knowledge products for their clients. Publishing companies may have similar opportunities to produce high grade materials from high volume, low quality source material.

Read more

Search 2010: Five Game Changers

May 7, 2009

Editor’s Note: This is the outline of Stephen Arnold’s comments at the “debate”session of the Boye 09 Conference in Philadelphia, Pennsylvania, on May 6, 2009. The actual talk will be informal, and these notes are part of the preparation for that talk.

Introduction

Thank you for inviting me to share my ideas with you. I remember that WC Fields had a love hate relationship with Philadelphia. Approaching the Curtis Building, where we are meeting, I realized that much of the old way of doing business has changed. I don’t have time to dig too deeply into the many content challenges organizations face. If the publisher of the Saturday Evening Post were with us this afternoon, I think Mr. Curtis would have a difficult time explaining why his successful business was marginalized; that is, pushed aside, made into an artifact like the Liberty Bell down the street.

I have been asked to do a “Search 2010” talk twice this year. Predicting the future in today’s troubled economic environment is difficult. Nevertheless, I want to identify five trends in the next 20 minutes. I will try to take a position on each trend to challenge the panelists’ thinking and stimulate questions from you in the audience.

Let’s dive right in. Here are the five trends:

  1. Darwinism and search
  2. Real time search
  3. Google’s enterprise push
  4. Microsoft’s enterprise search
  5. Open source

I want to comment on each, offer a couple of examples, and try to come at these subjects in a way that highlights what my research for Google: The Digital Gutenberg revealed as substantive actions in search.

Search and Darwin

The search sector is in a terrible position. The term “search” has been devalued. Few people know what the word means, yet most people say, “I am pretty good at search.” That confidence is an illusion. The search sector is a tough nut to crack. Well known companies such as Mondosoft and Ontolica found themselves purchased by an entrepreneur. That company restructured, and now the “old” Mondosoft has been reincarnated but it is not clear that the new owners will make a success of the business. Delphes, a specialist vendor in Québec, failed. Attensity orchestrated a roll up with two German firms to become more of a force in marketing. A promising system in the Netherlands called Teezir was closed when I visited the office in November 2009. I hear rumors about search vendors who are chasing funding frequently, but I don’t want to mention the names of some of these well known firms in this forum. Not long ago, the high profile Endeca sought support in the form of investments from Intel and SAP’s venture arm. At Oracle, the Secure Enterpriser Search 10g product has largely disappeared. The strong survive, which means big players like Google and Microsoft are going to fighting for the available revenue.

Real Time Search

What is it? The first thing to say is that real time search is a terrible phrase. Riches await the person who crafts a more appropriate buzzword. The notion is that messages from a service like Twitter fly around in their 140 character glory. The Twitter search system at http://search.twitter.com or the developers who use the Twitter API make it easy to find or see information. A good example is the service at http://www.twitturly.com or http://www.tweetmeme.com. You look at Tweets (the name for Twitter messages) and you scan the listings on these services. Real time search blends geospatial and mobile operations. Push, not key word search, complements scanning a list of suggested hits. The mode of user interaction is not keyword search. This is an important distinction.

image

“Search” means look at or scan. “Search” does not mean type key words and hunt through results list. It is possible to send a Tweet to everyone on Twitter or to those who follow you and ask a question. You may get an answer, but the point is that the word “search” does not explain the value of this type of system for business intelligence or marketing, for example. If you run a search with the keyword of a company like Google or Yahoo, you can get information which may or may not be accurate or useful. You will see what’s happening “now”, which is the meaning of “real time”.

Read more

Content Management: Modern Mastodon in a Tar Pit, Part One

April 17, 2009

Editor’s Note: This is a discussion of the reasons why CMS continues to thrive despite the lousy financial climate. The spark for this essay was the report of strong CMS vendor revenues written by an azure chip consulting firm; that is, a high profile outfit a step or two below the Bains, McKinseys, and BCGs of this world.

Part 1: The Tar Pit and Mastodon Metaphor or You Are Stuck

PCWorld reported “Web Content Management Staying Strong in Recession” here. The author, Chris Kanaracus, wrote:

While IT managers are looking to cut costs during the recession, most aren’t looking for savings in Web content management, according to a recent Forrester Research study. Seventy-two percent of the survey’s 261 respondents said they planned to increase WCM deployments or usage this year, even as many also expressed dissatisfaction with how their projects have turned out. Nineteen percent said their implementations would remain the same, and just 3 percent planned to cut back.

When consulting firms generate data, I try to think about the data in the context of my experience. In general, pondering the boundaries of “statistically valid data from a consulting firm” with the wounds and bruises this addled goose gets in client work is an enjoyable exercise.

These data sort of make sense, but I think there are other factors that make CMS one of the alleged bright spots in the otherwise murky financial heavens.

La Brea, Tar, and Stuck Trapped Creatures

I remember the first time I visited the La Brea tar pits in Los Angeles. I was surprised. I had seen well heads chugging away on the drive to a client meeting in Longbeach in the early 1970s, but I did not know there was a tar pit amidst the choked streets of the crown jewel in America’s golden west. It’s there, and I have an image of a big elephant (Mammut americanum for the detail oriented reader) stuck in the tar. Good news for those who study the bones of extinct animals. Bad news for the elephant.

mastadon

Is this a CMS vendor snagged in litigation or the hapless CMS licensee after the installation of a CMS system?

I had two separate conversations about CMS, the breezy acronym for content management systems. I can’t recall the first time I discovered that species of mastodon software, but I was familiar with the tar pits of content in organizations. Let’s set the state, er, prep the tar pit.

Organizational Writing: An Oxymoron

Organizations produce quite a bit of information. The vast majority of this “stuff” (content objects for the detail oriented reader) is in a constant state of churn. Think of the memos, letters, voice mails, etc. like molecules in a fast-flowing river in New Jersey. The environment is fraught with pollutants, regulators, professional garbage collection managers, and the other elements of modern civilization.

The authors of these information payloads are writing with a purpose; that is, instrumental writing. I have not encountered too many sonnets, poems, or novels in the organizational information I have had the pleasure of indexing since 1971. In the studies I worked on first at Halliburton Nuclear Utility Services and then at Booz, Allen & Hamilton, I learned that most organizational writing is not read by very many people. A big fat report on nuclear power plants had many contributors and reviewers, but most of these people focused on a particular technical aspect of a nuclear power generation system, not the big fat book. I edited the proceedings of a nuclear conference in 1972, and discovered that papers often had six or more authors. When I followed up with the “lead author” about a missing figure or an error in a wild and crazy equation, I learnedthat the “lead author” had zero clue about the information in the particular paragraph to which I referred.

Flash forward. Same situation today just lots more digital content. Instrumental writing, not much accountability, and general cluelessness about the contents of a particular paragraph, figure, chart, whatever in a document.

Organizational writing is a hotch potch of individuals with different capabilities and methods of expressing themselves. Consider an engineer or mathematician. Writing is not usually a core competency, but there are exceptions. In technical fields, there will be a large number of people who are terse to the point of being incomprehensible and a couple of folks who crank out reams of information. In an organization, volume may not correlate with “right” or “important”. A variation of this situation crops up in sales. A sales report often is structured, particularly if the company has licensed a product to force each salesperson to provide a name, address, phone, number, and comments about a “contact”. The idea is that getting basic information is pretty helpful if the salesperson quits or simply refuses to fill in the blanks. Often the salesperson who won’t play ball is the guy or gal who nails a multi million dollar deal. The salesperson figures, “Someone will chase up the details.” The guy or gal is right. Distinct content challenges arise in the legal department. Customer support has its writing preferences, sometimes compressed to methods that make the customer quit calling.

Why CMS for Text?

The Web’s popularization as cheap marketing created a demand for software that would provide writing training wheels to those in an organization who had to contribute information to a Web site. The Web site has gained importance with each passing year since 1993 when hyperlinking poked its nose from the deep recesses of Standard Generalized Markup Language.

Customer relationship management systems really did not support authoring, editorial review, version control, and the other bits and pieces of content production. Enterprise resource planning systems manage back office and nitty gritty warehouse activities. Web content is not a core competency of these labyrinthine systems. Content systems mandated for regulatory compliance are designed to pinpoint which supplier delivered an Inconel pipe that cracked, what inspector looked at the installation, what quality assurance engineer checked the work, and what tech did the weld when the pipe was installed. Useful for compliance, but not what the Web marketing department ordered. Until recently, enterprise publishing systems were generally confined to the graphics department or the group that churned out proposals and specifications. The Web content was an aberrant content type.

Enter content management.

I recall the first system that I looked at closely was called NCompass. When I got a demo in late 1999, I recall vividly that it crashed in the brightly lit, very cheerful exhibition stand in San Jose. Reboot. Demo another function. Crash. Repeat. Microsoft acquired this puppy and integrated it into SharePoint. SharePoint has grown over time like a snowball. Here’s a diagram of the SharePoint system from www.JoiningDots.net:

image

SharePoint. Simplicity itself. Source: http://www.joiningdots.net/downloads/SharePoint_History.jpg

A Digital Oklahoma Land Rush

By 2001, CMS was a booming industry. In some ways, it reminded me of the case study I wrote for a client about the early days of the automobile industry. There were many small companies which over time would give way to a handful of major players. Today CMS has reached an interesting point. The auto style aggregation has not worked out exactly like the auto industry case I researched. Before the collapse of the US auto industry in 2008, automobile manufacturing had fractured and globalized. There were holding companies making more vehicles than the US population would buy from American firms. There were vast interconnected of supplier subsystems and below these huge pipelines into more fundamental industrial sectors like chemicals, steel, and rubber.

Read more

Search Certification

April 1, 2009

A happy quack to the reader who told me about the new AIIM search certification program. Now that will be an interesting development. AIIM is a group anchored in the original micrographics business. The organization has morphed over the years, and it now straddles a number of different disciplines. The transition has been slow and in some cases directed by various interest groups from the content management sector and consulting world. CMS experts have produced some major problems for indexing subsystems, and the CMS vendors themselves seem to generate more problems for licensees than their systems resolve. (Click here for one example.)

This is not an April’s Fool joke.

The notion of search certification is interesting for five reasons:

First, there is no widely accepted definition of search in general or enterprise search in particular. I have documented the shift in terminology used by vendors of information retrieval and content processing systems. You can see the lengths here to which some organizations go to avoid using the word “search”, which has been devalued and overburdened in the last three or four years. The issue of definitions becomes quite important, but I suppose in the quest for revenue, providing certification in a discipline without boundaries fulfills some folks’s ambitions for revenue and influence.

Second, the basic idea of search–that is, find information–has shifted from the old command line Boolean to a more trophy-generation approach. Today’s systems are smart, presumably because the users are either too busy to formulate a Boolean query or view the task as irrelevant in a Twitter-choked real time search world. The notion of “showing” information to users means that a fundamental change has taken place which moves search to the margins of this business intelligence or faceted approach to information.

Third, the Google “I’m feeling doubly lucky” invention US2006/0230350 I described last week at a conference in Houston, Texas, removes the need to point and click for information. The Google engineers responsible for “I’m feeling doubly lucky” remove the user from doing much more than using a mobile device. The system monitors and predicts. The information is just there. A certification program for this approach to search will be most interesting because at this time the knowledge to pull off “I’m feeling doubly lucky” resides at Google. If anyone certifies, I suppose it would be Google.

Fourth, search is getting ready to celebrate its 40th birthday if one uses Dr. Salton’s seminal papers as the “official” starting point for search. SQL queries, Codd style, preceded Dr. Salton’s work with text, however. But after 40 years certification seems to be coming a bit late in the game. I can understand certification for a specific vendor’s search system–for example, SharePoint–but I think the notion of tackling a broader swath of this fluid, boundaryless space is logically uncomfortable for me. Others may feel more comfortable with this approach whose time apparently has come.

Finally, search is becoming a commodity, finding itself embedded and reshaped into other enterprise applications. Just as the “I’m feeling doubly lucky” approach shifts the burden of search from the user to the Google infrastructure, these embedded functions create a different problem in navigating and manipulating dataspace.

I applaud the association and its content management advisors for tackling search certification. My thought is that this may be an overly simplistic solution to a problem that has shifted away from the practical into the realm of the improbable.

There is a crisis in search. Certification won’t help too much in my opinion. Other skills are needed and these cannot be imparted in a boot camp or a single seminar. Martin White and I spent almost a year distilling our decades of information retrieval experience into our Successful Enterprise Search Management.

The longest journey begins with a single step. Looks like one step is about to be taken–four decades late. Just my opinion, of course. The question now becomes, “Why has no search certification process been successful in this time interval?” and “Why isn’t there a search professional association?” Any thoughts?

Stephen Arnold, March 31, 2009

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta