Content Management: Modern Mastodon in a Tar Pit, Part One

April 17, 2009

Editor’s Note: This is a discussion of the reasons why CMS continues to thrive despite the lousy financial climate. The spark for this essay was the report of strong CMS vendor revenues written by an azure chip consulting firm; that is, a high profile outfit a step or two below the Bains, McKinseys, and BCGs of this world.

Part 1: The Tar Pit and Mastodon Metaphor or You Are Stuck

PCWorld reported “Web Content Management Staying Strong in Recession” here. The author, Chris Kanaracus, wrote:

While IT managers are looking to cut costs during the recession, most aren’t looking for savings in Web content management, according to a recent Forrester Research study. Seventy-two percent of the survey’s 261 respondents said they planned to increase WCM deployments or usage this year, even as many also expressed dissatisfaction with how their projects have turned out. Nineteen percent said their implementations would remain the same, and just 3 percent planned to cut back.

When consulting firms generate data, I try to think about the data in the context of my experience. In general, pondering the boundaries of “statistically valid data from a consulting firm” with the wounds and bruises this addled goose gets in client work is an enjoyable exercise.

These data sort of make sense, but I think there are other factors that make CMS one of the alleged bright spots in the otherwise murky financial heavens.

La Brea, Tar, and Stuck Trapped Creatures

I remember the first time I visited the La Brea tar pits in Los Angeles. I was surprised. I had seen well heads chugging away on the drive to a client meeting in Longbeach in the early 1970s, but I did not know there was a tar pit amidst the choked streets of the crown jewel in America’s golden west. It’s there, and I have an image of a big elephant (Mammut americanum for the detail oriented reader) stuck in the tar. Good news for those who study the bones of extinct animals. Bad news for the elephant.

mastadon

Is this a CMS vendor snagged in litigation or the hapless CMS licensee after the installation of a CMS system?

I had two separate conversations about CMS, the breezy acronym for content management systems. I can’t recall the first time I discovered that species of mastodon software, but I was familiar with the tar pits of content in organizations. Let’s set the state, er, prep the tar pit.

Organizational Writing: An Oxymoron

Organizations produce quite a bit of information. The vast majority of this “stuff” (content objects for the detail oriented reader) is in a constant state of churn. Think of the memos, letters, voice mails, etc. like molecules in a fast-flowing river in New Jersey. The environment is fraught with pollutants, regulators, professional garbage collection managers, and the other elements of modern civilization.

The authors of these information payloads are writing with a purpose; that is, instrumental writing. I have not encountered too many sonnets, poems, or novels in the organizational information I have had the pleasure of indexing since 1971. In the studies I worked on first at Halliburton Nuclear Utility Services and then at Booz, Allen & Hamilton, I learned that most organizational writing is not read by very many people. A big fat report on nuclear power plants had many contributors and reviewers, but most of these people focused on a particular technical aspect of a nuclear power generation system, not the big fat book. I edited the proceedings of a nuclear conference in 1972, and discovered that papers often had six or more authors. When I followed up with the “lead author” about a missing figure or an error in a wild and crazy equation, I learnedthat the “lead author” had zero clue about the information in the particular paragraph to which I referred.

Flash forward. Same situation today just lots more digital content. Instrumental writing, not much accountability, and general cluelessness about the contents of a particular paragraph, figure, chart, whatever in a document.

Organizational writing is a hotch potch of individuals with different capabilities and methods of expressing themselves. Consider an engineer or mathematician. Writing is not usually a core competency, but there are exceptions. In technical fields, there will be a large number of people who are terse to the point of being incomprehensible and a couple of folks who crank out reams of information. In an organization, volume may not correlate with “right” or “important”. A variation of this situation crops up in sales. A sales report often is structured, particularly if the company has licensed a product to force each salesperson to provide a name, address, phone, number, and comments about a “contact”. The idea is that getting basic information is pretty helpful if the salesperson quits or simply refuses to fill in the blanks. Often the salesperson who won’t play ball is the guy or gal who nails a multi million dollar deal. The salesperson figures, “Someone will chase up the details.” The guy or gal is right. Distinct content challenges arise in the legal department. Customer support has its writing preferences, sometimes compressed to methods that make the customer quit calling.

Why CMS for Text?

The Web’s popularization as cheap marketing created a demand for software that would provide writing training wheels to those in an organization who had to contribute information to a Web site. The Web site has gained importance with each passing year since 1993 when hyperlinking poked its nose from the deep recesses of Standard Generalized Markup Language.

Customer relationship management systems really did not support authoring, editorial review, version control, and the other bits and pieces of content production. Enterprise resource planning systems manage back office and nitty gritty warehouse activities. Web content is not a core competency of these labyrinthine systems. Content systems mandated for regulatory compliance are designed to pinpoint which supplier delivered an Inconel pipe that cracked, what inspector looked at the installation, what quality assurance engineer checked the work, and what tech did the weld when the pipe was installed. Useful for compliance, but not what the Web marketing department ordered. Until recently, enterprise publishing systems were generally confined to the graphics department or the group that churned out proposals and specifications. The Web content was an aberrant content type.

Enter content management.

I recall the first system that I looked at closely was called NCompass. When I got a demo in late 1999, I recall vividly that it crashed in the brightly lit, very cheerful exhibition stand in San Jose. Reboot. Demo another function. Crash. Repeat. Microsoft acquired this puppy and integrated it into SharePoint. SharePoint has grown over time like a snowball. Here’s a diagram of the SharePoint system from www.JoiningDots.net:

image

SharePoint. Simplicity itself. Source: http://www.joiningdots.net/downloads/SharePoint_History.jpg

A Digital Oklahoma Land Rush

By 2001, CMS was a booming industry. In some ways, it reminded me of the case study I wrote for a client about the early days of the automobile industry. There were many small companies which over time would give way to a handful of major players. Today CMS has reached an interesting point. The auto style aggregation has not worked out exactly like the auto industry case I researched. Before the collapse of the US auto industry in 2008, automobile manufacturing had fractured and globalized. There were holding companies making more vehicles than the US population would buy from American firms. There were vast interconnected of supplier subsystems and below these huge pipelines into more fundamental industrial sectors like chemicals, steel, and rubber.

Read more

Search Certification

April 1, 2009

A happy quack to the reader who told me about the new AIIM search certification program. Now that will be an interesting development. AIIM is a group anchored in the original micrographics business. The organization has morphed over the years, and it now straddles a number of different disciplines. The transition has been slow and in some cases directed by various interest groups from the content management sector and consulting world. CMS experts have produced some major problems for indexing subsystems, and the CMS vendors themselves seem to generate more problems for licensees than their systems resolve. (Click here for one example.)

This is not an April’s Fool joke.

The notion of search certification is interesting for five reasons:

First, there is no widely accepted definition of search in general or enterprise search in particular. I have documented the shift in terminology used by vendors of information retrieval and content processing systems. You can see the lengths here to which some organizations go to avoid using the word “search”, which has been devalued and overburdened in the last three or four years. The issue of definitions becomes quite important, but I suppose in the quest for revenue, providing certification in a discipline without boundaries fulfills some folks’s ambitions for revenue and influence.

Second, the basic idea of search–that is, find information–has shifted from the old command line Boolean to a more trophy-generation approach. Today’s systems are smart, presumably because the users are either too busy to formulate a Boolean query or view the task as irrelevant in a Twitter-choked real time search world. The notion of “showing” information to users means that a fundamental change has taken place which moves search to the margins of this business intelligence or faceted approach to information.

Third, the Google “I’m feeling doubly lucky” invention US2006/0230350 I described last week at a conference in Houston, Texas, removes the need to point and click for information. The Google engineers responsible for “I’m feeling doubly lucky” remove the user from doing much more than using a mobile device. The system monitors and predicts. The information is just there. A certification program for this approach to search will be most interesting because at this time the knowledge to pull off “I’m feeling doubly lucky” resides at Google. If anyone certifies, I suppose it would be Google.

Fourth, search is getting ready to celebrate its 40th birthday if one uses Dr. Salton’s seminal papers as the “official” starting point for search. SQL queries, Codd style, preceded Dr. Salton’s work with text, however. But after 40 years certification seems to be coming a bit late in the game. I can understand certification for a specific vendor’s search system–for example, SharePoint–but I think the notion of tackling a broader swath of this fluid, boundaryless space is logically uncomfortable for me. Others may feel more comfortable with this approach whose time apparently has come.

Finally, search is becoming a commodity, finding itself embedded and reshaped into other enterprise applications. Just as the “I’m feeling doubly lucky” approach shifts the burden of search from the user to the Google infrastructure, these embedded functions create a different problem in navigating and manipulating dataspace.

I applaud the association and its content management advisors for tackling search certification. My thought is that this may be an overly simplistic solution to a problem that has shifted away from the practical into the realm of the improbable.

There is a crisis in search. Certification won’t help too much in my opinion. Other skills are needed and these cannot be imparted in a boot camp or a single seminar. Martin White and I spent almost a year distilling our decades of information retrieval experience into our Successful Enterprise Search Management.

The longest journey begins with a single step. Looks like one step is about to be taken–four decades late. Just my opinion, of course. The question now becomes, “Why has no search certification process been successful in this time interval?” and “Why isn’t there a search professional association?” Any thoughts?

Stephen Arnold, March 31, 2009

Forbes Calls Microsoft’s Ballmer Insane

February 15, 2009

Wow, not even the addled goose risks headlines like this one in MetaData: “Steve Ballmer Is Insane” here. There’s no allegedly, slightly, or possibly. Just insane. The writer is Wendy Tanaka, and I am shaking my feathers nervously to ponder what she would call this addled goose. Fricasseed? Silly? Cooked? Addled. No, that won’t work I call myself addled.

What’s insane mean? According to Dictionary.com, a property of Ask.com, a source I really trust, insane denotes three meanings:

  1. not sane; not of sound mind; mentally deranged.
  2. of, pertaining to, or characteristic of a person who is mentally deranged: insane actions; an insane asylum.
  3. utterly senseless: an insane plan.

Ms. Tanaka, whom I opine may be younger than the 65 years for this addled goose, may be younger in mind and spirit than I. She focuses on the lousy economy and Microsoft’s decision to open retail stores. To spearhead the retail effort, Microsoft has snatched a Wal*Mart superstar. In Harrod’s Creek, Wal*Mart is not a store. Wal*Mart is the equivalent of a vacation.

My hunch is that Ms. Tanaka and her sources are skeptical of Microsoft’s push into retailing. She cites an MBA trophy generation type wizard from Technology business Research, an outfit with a core competency in retailing I presume. Mr. Krans, allegedly said:

Apple’s retail store rollout coincided with the introduction of the iPod in 2001, which gave a very compelling reason for consumers to visit its locations. …Microsoft brings no such compelling product to bear in its retail entrance, which makes getting consumers in the door a large obstacle to overcome.

This addled goose thinks there are significant benefits to Microsoft retail stores located in Harrod’s Creek. Read on.

A Baker’s Dozen of Benefits from MSFT Retail Shops

Here are some reasons that this addled goose thinks that the Microsoft retail push is such an interesting idea:

  1. Retail stores will permit Microsoft to showcase the Zune and related products. I saw a Zune case with happy faces in the local Coconut record shop last September
  2. Individuals interested in the XBox 360 can buy these at the Microsoft store, eliminating a need to go to BestBuy, GameStop, or the other established retail outlets for this product.
  3. Procurement teams could take a field trip–much like the Harrod Creek residents’ vacation at Wal*Mart to buy the SharePoint Fast ESP product offerings. I think there will be two, maybe three versions, of SharePoint with Fast technology on offer soon
  4. The local customer support outfit Administrative Services could drop in to the Microsoft retail shop near Fern Valley Road and grab one or more versions of Dynamics along with Windows Server, SQL Server, and any other server needed to make Dynamics sing a happy song
  5. Display the wide range of mobile devices running Windows Mobile. I don’t think I have seen every Windows Mobile device in one location. What a convenience to disenchanted Nokia, iPhone, and BlackBerry users.
  6. Offer the complete line up of Microsoft mice and keyboards. Shame about the nifty Microsoft networking products in the compelling pale orange and green boxes.
  7. Introduce a service bar with Windows geniuses to address questions from customers. I would drop in to get help when my MSDN generated authentication keys don’t work or when the Word 2007 formatting on a Windows system does not stick, yet the formatting works just fine on a Mac with Word 2007 installed.
  8. Provide a line up of Microsoft T shirts, caps, and other memorabilia, including the new “old” range of gear with MS DOS era logos
  9. Purchase CALs for various Microsoft products, eliminating the hassle of dealing with the Platinum, Gold, Silver, and other semi precious metal badged partners
  10. Purchase Microsoft Consulting support so I can get different Microsoft server products to talk to one another and expose their data and metadata to SharePoint
  11. Sign up for Microsoft Live.com cloud services and get help with the horizontal and sometimes confusing to me “blank” slate interfaces. See item 7 above.
  12. Meet Microsoft partners, eliminating the need to go to a trade show to learn about “snap in” products that extend, enrich, and sometimes replace Microsoft components that don’t work as advertised for some customer applications.
  13. Visit with Microsoft executives. I think of this as an extension of the company’s “open door policy.” Nothing will boost share price more than giving retail customers an opportunity to talk with senior Microsoft executives about Vista, usability testing, prices, variants of Windows 7, the difference between MSN.com and Live.com, and job opportunities.

Insane? Wrong. From Harrod’s Creek, the retail plan makes perfect sense. I wonder if the Microsoft retail shop will be in downtown Harrod’s Creek or out by the mine run off pond on Wolf Creek Road? Maybe we’ll get more than one store just like Taco Bell.

Stephen Arnold, February 15, 2009

Microsoft’s Certified Partners Are Jittery

February 9, 2009

The Fast Forward conference (search, not the science fiction show) is now a reality. I want to report that I have had more Microsoft announcements in the last few hours than I normally receive in a month. The announcements–email spam, really–underscore how Microsoft’s executive revolving door has escalated the uncertainty among certified partners. I don’t want to mention the names of these nervous Nellies. You can spot the Microsoft mania in other Web log posts. The spammers know who they are, and I don’t want to deal with outputs from legal eagles. I can offer a handful of observations germane to the shindig at which Microsoft’s most recent enterpriser search strategy will be unveiled this week. If you are not familiar with this user group meeting gussied up for the faithful, you can read about the FUG (Fast User Group) here. The conference is in Las Vegas, a place where risk and odds make the city vibrate.

First, expect good news, lots and lots of good news about the power, flexibility, and value of the Fast Search Enterprise Search Platform. In my limited experience with these types of events, not much of the downside of the system will find its way to the lectures, cheerleading sessions, and events. To be fair, the open user groups went the way of the dodo. The reason is that users form grouplets and then some of these grouplets raise a stink. The controlled user group, in this case the FUG, helps to ensure that the agenda is followed. Closely followed. Is there a downside to Fast ESP or any other search system? In a word, “Yep.” Fast ESP in particular? Good question indeed. Example here.

Second, expect anti Google moves. Now the speakers will be gracious to Googzilla. The GOOG will be praised for doing a good job in Web search, but that GOOG technology doesn’t do the job an enterprise needs. Furthermore, the GOOG has primed the market with a low priced, inferior product that sets the stage for a low priced superior offering from Microsoft. Around this theme, there will be “experts” who point out that Google does okay but Microsoft does much better. The attendees will, in my experience, cheer when Googzilla takes a liver punch.

Third, expect consultants, pundits, and advisors to quiver with excitement. Users of SharePoint will be ready to pay big bucks for guidance. Among the questions that these wizards will “answer” are: “What’s this mean for the 50 million document limit in SharePoint?” “When do I abandon a free SharePoint search and move to the more capable search platform?” “Will the new and improved search work with Dynamics, SQL Server, and other servers that Microsoft puts into client locations?” “Will there be a managed service available from Fast Search’s data centers?” “Whom do I call when I can’t get the indexing subsystem to update?” “How much are the connectors to hook the new search system into a legacy Ironsides application running on an old AS/400?” I must admit that I don’t have answer to some of these questions, and I would wager a goose feather that the boffins don’t have the answer either. That’s what makes consultants quiver: getting paid to find out the answer to a question that can’t be answered for quite a while.

To close, I want to offer some observations about the impact of Fast Forward’s “news” on Googzilla. Keep in mind that I have zero relationship with either of these publicly traded companies, so you are getting my opinions.

  1. Google doesn’t really care too  much about Fast Forward search announcements. The GOOG is busy  responding to unsolicited inquiries about its various enterprise products and services. I wouldn’t be surprised if Googlers did not know about the event. Fast Forward is not technical, and technology, not PR and boosterism, doesn’t resonate with some of the Googley tribe.
  2. Customers are defining search as Google. Microsoft will have to find a way to counter the grassroots interest in Google solutions. Large consulting firms are forming Google practices to respond to demand. Microsoft consulting practices are in place, but these are different in their tone and services from the Google practices. One consulting firm is making phone calls trying to find Googley people to ram information in the members of the Google practice. There is a hunger for Google information based on my narrow angle of view. Google has grassroots growing in Microsoft’s playing field.
  3. Integrators are getting more interested in things Google. It is not just the Google Search Appliance, the Google Apps, or GMail. Google appears to be what a snowmobile drivers wants: fresh, firm, untracked snow. Integrators want to be among the first to blast through this pristine environment, reaping the joy and excitement of the new. Microsoft, despite its best efforts, is not new.

As more “news” from Fast Forward flows into the hollow here in Harrods Creek, I will filter and comment as my wont. In the meantime, I am going to check out what’s new on Google via Overflight.

Stephen Arnold, February 9, 2009

Daniel Tunkelang: Co-Founder of Endeca Interviewed

February 9, 2009

As other search conferences gasp for the fresh air of enervating speakers, Harry Collier’s Boston Search Engine Conference (more information is here) has landed another thought-leader speaker. Daniel Tunkelang is one of the founders of Endeca. After the implosion of Convera and the buys out of Fast Search and Verity, Endeca is one of the two flagship vendors of search, content processing, and information management systems recognized by most information technology professionals. Dr. Tunkelang writes an informative Web log The Noisy Channel here.

image

Dr. Daniel Tunkelang. Source: http://www.cs.cmu.edu/~quixote/dt.jpg

You can get a sense of Dr. Tunkelang’s views in this exclusive interview conducted by Stephen Arnold with the assistance of Harry Collier, Managing Director, Infonortics Ltd.. If you want to hear and meet Dr. Tunkelang, attend the Boston Search Engine Meeting, which is focused on search and information retrieval. The Boston Search Engine Meeting is the show you may want to consider attending. All beef, no filler.

image

The speakers, like Dr. Tunkelang, will challenge you to think about the nature of information and the ways to deal with substantive issues, not antimacassars slapped on a problem. We interviewed Mr. Tunkelang on February 5, 2009. The full text of this interview appears below.

Tell us a bit about yourself and about Endeca.

I’m the Chief Scientist and a co-founder of Endeca, a leading enterprise search vendor. We are the largest organically grown company in our space (no preservatives or acquisitions!), and we have been recognized by industry analysts as a market and technology leader. Our hundreds of clients include household names in retail (Wal*Mart, Home Depot); manufacturing and distribution (Boeing, IBM); media and publishing (LexisNexis, World Book), financial services (ABN AMRO, Bank of America), and government (Defense Intelligence Agency, National Cancer Institute).

My own background: I was an undergraduate at MIT, double majoring in math and computer science, and I completed a PhD at CMU, where I worked on information visualization. Before joining Endeca’s founding team, I worked at the IBM T. J. Watson Research Center and AT&T Bell Labs.

What differentiates Endeca from the field of search and content processing vendors?

In web search, we type a query in a search box and expect to find the information we need in the top handful of results. In enterprise search, this approach too often breaks down. There are a variety of reasons for this breakdown, but the main one is that enterprise information needs are less amenable to the “wisdom of crowds” approach at the heart of PageRank and related approaches used for web search. As a consequence, we must get away from treating the search engine as a mind reader, and instead promote bi-directional communication so that users can effectively articulate their information needs and the system can satisfy them. The approach is known in the academic literature as human computer information retrieval (HCIR).

Endeca implements an HCIR approach by combining a set-oriented retrieval with user interaction to create an interactive dialogue, offering next steps or refinements to help guide users to the results most relevant for their unique needs. An Endeca-powered application responds to a query with not just relevant results, but with an overview of the user’s current context and an organized set of options for incremental exploration.

What do you see as the three major challenges facing search and content processing in 2009 and beyond?

There are so many challenges! But let me pick my top three:

Social Search. While the word “social” is overused as a buzzword, it is true that content is becoming increasingly social in nature, both on the consumer web and in the enterprise. In particular, there is much appeal in the idea that people will tag content within the enterprise and benefit from each other’s tagging. The reality of social search, however, has not lived up to the vision. In order for social search to succeed, enterprise workers need to supply their proprietary knowledge in a process that is not only as painless as possible, but demonstrates the return on investment. We believe that our work at Endeca, on bootstrapping knowledge bases, can help bring about effective social search in the enterprise.

Federation.  As much as an enterprise may value its internal content, much of the content that its workers need resides outside the enterprise. An effective enterprise search tool needs to facilitate users’ access to all of these content sources while preserving value and context of each. But federation raises its own challenges, since every repository offers different levels of access to its contents. For federation to succeed, information repositories will need to offer more meaningful access than returning the top few results for a search query.

Search is not a zero-sum game. Web search engines in general–and Google in particular–have promoted a view of search that is heavily adversarial, thus encouraging a multi-billion dollar industry of companies and consultants trying to manipulate result ranking. This arms race between search engines and SEO consultants is an incredible waste of energy for both sides, and distracts us from building better technology to help people find information.

With the rapid change in the business climate, how will the increasing financial pressure on information technology affect search and content processing?

There’s no question that information technology purchase decisions will face stricter scrutiny. But, to quote Rahm Emmanuel, “Never let a serious crisis go to waste…it’s an opportunity to do things you couldn’t do before.” Stricter scrutiny is a good thing; it means that search technology will be held accountable for the value it delivers to the enterprise. There will, no doubt, be an increasing pressure to cut costs, from price pressure on vendor to substituting automated techniques for human labor. But that is how it should be: vendors have to justify their value proposition. The difference in today’s climate is that the spotlight shines more intensely on this process.

Search / content processing systems have been integrated into such diverse functions as business intelligence and customer support. Do you see search / content processing becoming increasingly integrated into enterprise applications? If yes, how will this shift affect the companies providing stand alone search / content processing solutions? If no, what do you see the role of standalone search / content processing applications becoming?

Better search is a requirement for many enterprise applications–not just BI and Call Centers, but also e-commerce, product lifecycle management, CRM, and content management.  The level of search in these applications is only going to increase, and at some point it just isn’t possible for workers to productively use information without access to effective search tools.

For stand-alone vendors like Endeca, interoperability is key. At Endeca, we are continually expanding our connectivity to enterprise systems: more connectors, leveraging data services, etc.  We are also innovating in the area of building configurable applications, which let businesses quickly deploy the right set features for their users.  Our diverse customer base has driven us to support the diversity of their information needs, e.g., customer support representatives have very different requirements from those of online shoppers. Most importantly, everyone benefits from tools that offer an opportunity to meaningfully interact with information, rather than being subjected to a big list of results that they can only page through.

Microsoft acquired Fast Search & Transfer. SAS acquired Teragram. Autonomy acquired Interwoven and Zantaz. In your opinion, will this consolidation create opportunities or shut doors. What options are available to vendors / researchers in this merger-filled environment?

Yes!  Each acquisition changes the dynamics in the market, both creating opportunities and shutting doors at the same time.  For SharePoint customers who want to keep the number of vendors they work with to a minimum, the acquisition of FAST gives them a better starting point over Microsoft Search Server.  For FAST customers who aren’t using SharePoint, I can only speculate as to what is in store for them.

For other vendors in the marketplace, the options are:

  • Get aligned with (or acquired by) one of the big vendors and get more tightly tied into a platform stack like FAST;
  • Carve out a position in a specific segment, like we’re seeing with Autonomy and e-Discovery, or
  • Be agnostic, and serve a number of different platforms and users like Endeca or Google do.  In this group, you’ll see some cases where functionality is king, and some cases where pricing is more important, but there will be plenty of opportunities here to thrive.

Multi core processors provide significant performance boosts. But search / content processing often faces bottlenecks and latency in indexing and query processing. What’s your view on the performance of your system or systems with which you are familiar? Is performance a non issue?

Performance is absolutely a consideration, even for systems that make efficient use of hardware resources. And it’s not just about using CPU for run-time query processing: the increasing size of data collections has pushed on memory requirements; data enrichment increases the expectations and resource requirements for indexing; and richer capabilities for query refinement and data visualization present their own performance demands.

Multicore computing is the new shape of Moore’s Law: this is a fundamental consequence of the need to manage power consumption on today’s processors, which contain billions of transistors. Hence, older search systems that were not designed to exploit data parallelism during query evaluation will not scale up as hardware advances.

While tasks like content extraction, enrichment, and indexing lend themselves well to today’s distributed computing approaches, the query side of the problem is more difficult–especially in modern interfaces that incorporate faceted search, group-bys, joins, numeric aggregations, et cetera. Much of the research literature on query parallelism from the database community addresses structured, relational data, and most parallel database work has targeted distributed memory models, so existing techniques must be adapted to handle the problems of search.

Google has disrupted certain enterprise search markets with its appliance solution. The Google brand creates the idea in the minds of some procurement teams and purchasing agents that Google is the only or preferred search solution. What can a vendor do to adapt to this Google effect? Is Google a significant player in enterprise search, or is Google a minor player?

I think it is a mistake for the higher-end search vendors to dismiss Google as a minor player in the enterprise. Google’s appliance solution may be functionally deficient, but Google’s brand is formidable, as is its position of the appliance as a simple, low-cost solution. Moreover, if buyers do not understand the differences among vendor offerings, they may well be inclined to decide based on the price tag–particularly in a cost-conscious economy. It is thus more incumbent than ever on vendors to be open about what their technology can do, as well as to build a credible case for buyers to compare total cost of ownership.

Mobile search is emerging as an important branch of search / content processing. Mobile search, however, imposes some limitations on presentation and query submission. What are your views of mobile search’s impact on more traditional enterprise search / content processing?

A number of folks have noted that the design constraints of the iPhone (and of mobile devices in general) lead to an improved user experience, since site designers do a better job of focusing on the information that users will find relevant. I’m delighted to see designers striving to improve the signal-to-noise ratio in information seeking applications.

Still, I think we can take the idea much further. More efficient or ergonomic use of real estate boils down to stripping extraneous content–a good idea, but hardly novel, and making sites vertically oriented (i.e., no horizontal scrolling) is still a cosmetic change. The more interesting question is how to determine what information is best to present in the limited space–-that is the key to optimizing interaction. Indeed, many of the questions raised by small screens also apply to other interfaces, such as voice. Ultimately, we need to reconsider the extreme inefficiency of ranked lists, compared to summarization-oriented approaches. Certainly the mobile space opens great opportunities for someone to get this right on the web.

Semantic technology can make point and click interfaces more useful. What other uses of semantic technology do you see gaining significance in 2009? What semantic considerations do you bring to your product and research activities?

Semantic search means different things to different people, but broadly falls into two categories: Using linguistic and statistical approaches to derive meaning from unstructured text, using semantic web approaches to represent meaning in content and query structure. Endeca embraces both of these aspects of semantic search.

From early on, we have developed an extensible framework for enriching content through linguistic and statistical information extraction. We have developed some groundbreaking tools ourselves, but have achieved even better results by combining other vendor’s document analysis tools with our unique ability to improve their results through corpus analysis.

The growing prevalence of structured data (e.g., RDF) with well-formed ontologies (e.g., OWL) is very valuable to Endeca, since our flexible data model is ideal for incorporating heterogeneous, semi-structured content. We have done this in major applications for the financial industry, media/publishing, and the federal government.

It is also important that semantic search is not just about the data. In the popular conception of semantic search, the computer is wholly responsible derives meaning from the unstructured input. Endeca’s philosophy, as per the HCIR vision, is that humans determine meaning, and that our job is to give them clues using all of the structure we can provide.

Where can I find more information about your products, services, and research?

Endeca’s web site is http://endeca.com/. I also encourage you to read my blog, The Noisy Channel (http://thenoisychannel.com/), where I share my ideas (as do a number of other people!) on improving the way that people interact with information.

Stephen Arnold, February 9, 2009

Microsoft-Nortel Parallel

January 23, 2009

Matthew Nickasch’s “Could Microsoft Become Another Nortel?” here is an article that would not have occurred to us in Harrod’s Creek, Kentucky. We don’t think too much about non search vendors and Nortel is not a player in the space we monitor. Microsoft is a search vendor. The company has Web search, various test search systems which you can follow here, Powerset (based on long standing Xerox technology), Fast Search & Transfer (a Web search company that morphed into enterprise search then publishing systems and now into conference management).

Mr. Nickasch picks up the theme of the layoffs at Microsoft that were triggered by the firm’s financial results reported in January 2009. For me, the most interesting comment in the article was:

Many large companies have much to learn from the recent events of Nortel, who filed for bankruptcy protection last week. Organizations with disjunct structures and complexly-integrated business functions need to critically evaluate their overall business structure.

I am not a fan of MBA speak, but I absolutely agreed with the use of the word “disjunct”. That is a very nice way of saying disorganized, confused, and addled (just like the goose writing this Web log). Nortel, once a giant, is now a mouse. A mouse in debt at that.

Three notions were triggered by Mr. Nickasch’s apt juxtaposition.

First, could this be the start of a more serious effort to break up Microsoft? Unlike Nortel (Canadian debt, government involvement, global competition), Microsoft could be segmented easily. Shareholders would get a boost from a break up in my view.

Second, what happens to orphans like big dollar acquisitions that have modest profile into today’s competitive enterprise market. I hear about SharePoint. I hear about Silverlight. I even hear about Windows Mobile. I don’t hear about ESP. In case you have forgotten, that’s not paranormal insight; that’s enterprise search platform.

Third, what’s the positioning of on premises software versus cloud software. Microsoft has quite a few brands and is at risk in terms of making clear what tool, system, service, and feature is associated with what product line.

In my opinion, I think Mr. Nickasch has forged a brilliant pairing. A happy quack to him.

Stephen Arnold, January 23, 2009

Search 2009: The Arnold Boye08 Lecture

November 11, 2008

What began as a routine speech became a more definitive statement of my views about enterprise search in 2009. I delivered a lecture on this topic to a standing room only crowd in Aarhus, Denmark, at the JBoye 08 conference. The conference organizer asked me to provide a version of my talk for the conference attendees who were unable to attend my lecture. I have now posted the full text of my remarks on the ArnoldIT.com Web site. You can read the PDF of this lecture here.

Let me highlight several of the features of this talk, which concatenated remarks I have made about the future of search over the last 90 days:

  1. I identify the major trends that I am watching in the enterprise search “space”. I don’t dig into social search and some of the more trendy topics. I identify what will keep people using a system and those responsible for search and content processing in their jobs.
  2. I highlight a small number of companies that I think are going to be important in 2009. I mention five companies, but I have a much longer list of promising players. These five are examples of what is going to drive search success going forward.
  3. I spell out some meta challenges that vendors and licensees face. To give one example of what’s in this short list, think SharePoint. With 100 million licensees, SharePoint is likely to have as significant an impact on enterprise information access as Google. But there is a dark side to SharePoint, and I mention it in this report.

I have one request. Feel free to use the information for your personal learning. If you are engaged in teaching, you may reproduce the document and invite your students to critique my ideas. If you are a consultant shopping for a phrase or idea to borrow, that’s okay. Just point back to my original document. I see many “beyonds” now. Beyond Google, Beyond Business Intelligence, and so on. I expect that “just there” search will experience similar diffusion. Of course, if you just pirate my phrasing, I think the addled goose will point out this activity. Geese can lay golden eggs; geese can spoil an automobile’s finish as well.

As always, I have had to cut material from this write up. You may point out my errors, omissions, and shortcomings in the comments section to this Web log. Keep in mind that this Web log is free, and it is an easy way for me to keep track of my ideas and lectures.

Stephen Arnold, November 12, 2008

Microsoft Sees Google as Goliath

October 2, 2008

Imagine my surprise when the $65 billion dollar Microsoft allegedly characterized Google as “Goliath.” By the time I flapped from my nest of reeds and mud, my newsreader refreshed with another 15 stories on this topic. I, quite naturally for an addled goose, dived in. Here’s a quick rundown of the “Goliath” metaphor. I will wrap up with several observations about this wordsmithing. I think the larger issue behind the trope has been overlooked, which says more about how pundits perceive both Google and Microsoft.

The Zero Ambiguity of Goliath

Rory Cellan-Jones, technology correspondent, BBC News wrote “Google Goliath Microsoft Says. You can find the article here. In an interview, Mr. Ballmer allegedly characterized Microsoft as “David” in search. Google, Mr. Cellan-Jones reports, is “Goliath.” Mr. Ballmer, the BBC story reports, said: “We may be the David up against Goliath but we’re working on it…. We probably missed the power of the advertising model, not so much the technology.” My quotes don’t do justice to this excellent article.

The Guardian, a paper that is quite a bit paper than our local weekly Harrod’s Creek shopper, picks up the theme. “Ballmer Says Microsoft is David to Google’s Goliath.” The Guardian piece added for me a useful item of information: “Ballmer says that search is his ‘favourite business’ because when you have nothing the only way is up: ‘Everything is possible, we have nothing to lose. (Of course, you can also just continue along flatlining. But his salesman’s instinct probably won’t let him consider that.)”

Silicon Alley Insider, a Web log I quite like, picks up the theme of Microsoft’s response to Google in its “Ballmer Talks Up Windows Cloud. Don’t Believe It.” You can read Eric Krangel’s article here. Mr. Krangel focuses on the wisp-like Cloud OS, but it’s clear to me that Mr. Ballmer is setting the stage for a major announcement at the upcoming Windows conference on October 27. For me, the most interesting point in the piece was this statement attributed to Mr. Ballmer: “The last thing we want is for somebody else to obsolete us, if we’re gonna get obseleted [sic] we better do it to ourselves.” The somebody else, in my reading, is our pal Goliath.

image

In my opinion, Google equals Goliath.

What’s with Goliath?

In Kentucky, despite the high rate of illiteracy and the miserable education system, there’s no shortage of opinions about David and Goliath. For example, there’s quite a range of opinions about the David and Goliath clash. These range from Goliath won to there were two Goliaths and David only nailed one of them.

My hunch is that the purpose of the metaphor is to make clear that Microsoft with its control of 90 percent or more of traditional personal computer operating systems and common applications like word processing, its 100 million or so SharePoint licenses, its thousands of resellers, its hundreds of thousands of VisualStudio.Net developers, and its activities in games, mobile software, and consumer audio players is an underdog. David is the under dog, a wimp, a Mr. Peepers. Some of the sources I had to grind through in a required ancient history class said he was a musician. He wasn’t a rapper wearing shades, sporting tats, and wearing FBI sunglasses and prison clothes. David played a harp. He was, as I recall, untrained for war. In short, a wimp.

Goliath, on the other hand, is your classic André the Giant professional wrestler. Slow moving and slow of speech, Goliath was the equivalent of a roid-crazed street fighter. Goliath would have made a good power forward for a pick up game in the Bronx. The key point was that this fellow Golyat (standard Hebrew) was a philistine. Forget Goliath’s size. His real transgression may have been that he was perceived as an invader or intruder with access to hot technology; specifically, iron smithing. Goliath had armor; David wore a cotton tunic. Although cool, cotton does not withstanding a sword thrust too well.

The metaphor, then, operates for me on two levels. The little guy (David) has to fight off the big guy (Goliath or Golyat). And, Goliath was an outsider, at least to David and his pals.

The rest of the story is well known even in Kentucky. David uses a sling and throws a stone at Goliath. The stone knocks Goliath down. Then, depending on your preference for murky sources, chops off Goliath’s head or walks up to the prone Goliath and checks out the prostrate enemy. The sling, the stone, the unexpected victory–that’s the metaphor.

The Reality

Google’s revenue for 2008 will be in the $20 billion range or close enough for horse shoes. Microsoft’s revenue for 2008 will be north of $65 billion. Google has 19,000 full time equivalents, give or take 2,000. Microsoft has 55,000 full time equivalents, give or take 5,000 happy workers. Microsoft has a de facto monopoly in desktop operating systems, standard office software for word processing and spreadsheets, and the 100 million SharePoint licenses. Other Microsoft businesses are big, but none is in the monopoly category.

Google, on the other hand, has about 70 percent of the Web search market. Google touches more than two-thirds of the Web search related advertising. Google has a modest footprint in several other businesses, but it is a one-trick Goliath in terms of revenue.

The big difference between the two companies is that Microsoft represents the status quo in personal computing. Google represents the next-generation in personal computing. In 2005, I created this diagram for my The Google Legacy study.

!google three eraas

© Stephen E. Arnold and Infonortics Ltd., 2005

The conclusion of that analysis was that most of the companies in the software business were blissfully ignorant of Google’s single minded build out of an application infrastructure. Furthermore, most pundits looked at Google as a one trick revenue pony and did not abstract that revenue model into a broader business model; that is, someone pays to get access to Google’s systems and users. As a result, Google was running free with no significant oversight, competition, or technical challenges since 1995. Yes, 1995. The Google kids were fiddling with BackRub in the mid 1990s and learning from the AltaVista.com service. Google’s biggest technical guns have roots in one of three companies: AltaVista (Digital Equipment), Bell Labs (AT&T), and Sun Microsystems. What these clever folks did was take the best from research computing and integrate those insights into a distributed, massively parallel architecture. The Internet was the equivalent of the connections in a desktop PC. The Google infrastructure was the computer just as Scott McNealy (Sun Microsystems) allegedly said.

What’s happening is that Microsoft’s business model, not its technology, is colliding with the Google business model. Furthermore, the collision has nothing to do with David and Goliath. The issue is Darwinian. Dragging metaphors into what is a strategic confrontation after a decade of inattention is misleading and indicative of why Microsoft can’t bridge the gap. Microsoft cannot catch up by following its present 10,000 sailboats going in the same general direction approach. Google is doing what it has done for a decade, and the company is now finding itself pulled into new, potentially lucrative new opportunities. David needs to get a Ph.D. in math, publish a couple of important papers, and apply for work at Google in my opinion.

Stephen Arnold, October 2, 2008

Microsoft: What Now for Search?

July 24, 2008

Googzilla twitches its tail and Microsoft goes into convulsions. When I was in the management consulting game, my boss, Dr. William Sommers, talked about “hyper-actions”. The idea was that a single event or a minor event would trigger excessive reactions.

convulsions

Brain scan of a person undergoing excessive “excitement” and “over reaction”.

When I read the flows-like-water prose of Kara Swisher’s “Microsoft’s Latest Web Stumble: Kevin Johnson Out” and then her brief introduction to Mr. Steve Ballmer’s “Full Memo to the Troops about New Reorg”, I thought about Dr. Sommers’s “hyper-action” neologism. In my opinion, we are watching the twitch in Mountain View triggering via management string theory the convulsions in Redmond.

First, let me identify for you the points that jumped from screen to neurons in Ms. Swisher’s write ups.

  1. Ms. Swisher reports that Mr. Kevin Johnson was the architect behind the Yahoo buy out. I thought that the idea was cooked in Mr. Chris Liddell’s lamb-roasting pit. Obviously my sources were off base. Mr. Johnson moves to Juniper and Mr. Liddell continues to get a Microsoft paycheck. Mr. Liddell’s remarks at the March 2008 Morgan Stanley Technology Conference left me with the impression that he was being “systematic” in his analysis. Here’s one take on his remarks.
  2. Ms. Swisher’s run down of Microsoft’s actions so far in 2008 is excellent, and she reminded me that Microsoft bought aQuantive, a fact which had slipped off my radar. What has happened to aQuantive for which Microsoft paid $6 billion, more than what Microsoft paid for Fast Search & Transfer and Powerset combined. He mentioning aQuantive reminded me of those wealthy car collectors on the Speed Channel’s exotic automobile auctions. What do you do with a $1.2 million Corvette? You put it in a garage. You don’t run down to the Speedway in Harrods Creek, Kentucky, to buy a pack of chewing tobacco.
  3. Ms. Swisher turns a great phrase; specifically, “Microsoft has succeeded in burnishing its image as a Web also-ran and still has an uncertain path to change that.” I quite like the notion that a large company takes one action and succeeds in producing an opposite reaction. I think the Google folks would peg that as one of the Laws of Google Dynamics applied to Microsoft. For every action, there is a greater, opposite reaction that persists through time. (Ms. Swisher’s statement that Yahoo looks stable brought a smile to my face as well.)

Next, let me comment on the Mr. Steve Ballmer reorg memo, which will be a classic in business schools for years to come. The opening line will probably read, “Mr. Steve Ballmer, firmly in control of Microsoft, sat at his desk and looked across the Microsoft campus. He knew a bold strategic action was needed to deal with the increasing threat of Google, etc. etc.”

After the razzle dazzle about goals, the memo gets down to business:

We will out-innovate Google in key areas—we’re already seeing this in our maps and news search. Third, we are going to reinvent the search category through user experience and business model innovation. We’ll introduce new approaches that move beyond a white page with 10 blue links to provide customers with a customized view of their world. This is a long-term battle for our company—and it’s one we’ll continue to fight with persistence and tenacity.

Read more

Digital Convergence: A Blast from the Past

July 15, 2008

In 1999, I wrote two articles for a professional journal called Searcher. The editor, Barbara Quint, former guru of information at RAND Corporation, asked me to update the these two articles. I no longer had copies of them, but Ms. Quint emailed my fair copies, and I read my nine-year old prose.

The 2008 version is “Digital Convergence: Building Blocks or Mud Bricks”. You can obtain a hard copy from the publisher, Information Today here. In a month or two, an electronic version of the article will appear in one of the online commercial databases.

My son, Erik, who contributed his column to Searcher this month as well, asked me, “What’s with the mud bricks?” I chose the title to suggest that the technologies I identified as potential winners in 1999 may lack staying power. One example is human assigned tags. This is indexing, and it has been around in one form or another since humans learned to write. Imagine trying to find a single scroll in a stack of scrolls. Indexing was a must. What’s not going to have staying power is my assigning tags. The concept of indexing is a keeper; the function is moving to smart software, which can arguably do a better job than a subject matter expert as long as we define “better” as meaning faster and cheaper”. A “mud brick” is a technology that decomposes into a more basic element. Innovations are based on interesting assemblages of constitute components. Get the mix right and you have something with substance, the equivalent of the Lion’s Gate keystone.

lion-gate-mycenae-2a

Today’s information environment is composed of systems and methods that are durable. XML, for example, is not new. It traces its roots back 50 years. Today’s tools took decades of refinement. Good or bad, the notion of structuring content for meaning and separating the layout information from content is with us for the foreseeable future.

Three thoughts emerged from the review of the original essays whose titles I no longer recall.

First, most of today’s hottest technologies were around nine years ago. Computers were too expensive and storage was too costly to make wide spread deployment of services based on antecedents of today’s hottest applications such as social search and mobile search, among others.

Second, even though I identified a dozen or so “hot” technologies in 1999, I had to wait for competition and market actions to identify the winners. Content processing, to pick one, is just now emerging as a method that most organizations can afford to deploy. In short, it’s easy to identify a group of interesting technologies; it’s hard for me to pick the technology that will generate the most money or have the greatest impact.

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta