Content Management Vendors: We Do Social Stuff Too

June 30, 2008

After a wonderful flight with exceptional service from caring airline employees, I had to read this headline twice to make sure I wasn’t in some state of delirium. The headline is “Content Management Software Vendors Eye Social Networking”. The essay is authored by Larry Dignan, and he does a great job of catching my attention. This headline and essay are keepers.

The key point to me is:

In other words, social networking will become a generic enterprise feature at some point. These CMS players can develop their own community suites (and hire staff that understands the social types), acquire white label networks or just hang back.

The trigger for this story is a consultant report. I can’t recall which firm stuffed full of pundits came up with this observation, but I think there is some truth to content management vendors’ chasing the rainbow of social search, social content, social chit chat, and social anything.

The reason is not far to seek. CMS is a faux application that often doesn’t work very and always costs a lot more than the customer anticipated. I used to write about CMS applications, but after I had to do some clean up in two Federal agencies when systems went off the rails, I just stopped paying attention to the vendors in this software sector.

Content management is software that tries to convert companies that don’t know much about publishing into publishers. As part of the deal, employees who are not skilled writers will get some help to become more information literate. The CMS then tries to keep track of versions, enforce security, output Web pages, and perform levitation.

Why not include social functions? Social software is as much a part of CMS as any other software function. If you can’t make a system better, just make it bigger, more complex, and more trendy. The reason enterprise publishing systems are gaining traction is a result of the opportunities CMS has created with their over-hyped assertions.

Enterprise search is disappointing. CMS is disappointing as well. Instead of delivering a solution that works, just add social features. Sounds like the enterprise software industry is up

Agree? Disagree?

Stephen Arnold, June 30, 2008

The Jab-Google Bandwagon Rolls On

June 30, 2008

Phil Wainewright, whose writing I enjoy, wrote “Google’s Culture Not Fit for Enterprise Apps.” The essay appeared on June 30, 2008. You can find the full text here. Xooglers have been picking on the search dominator, and I posted a link to a story that I thought might be a spoof here. Apparently it wasn’t based on some email feedback I received this weekend, but I remain skeptical. What I am sure about is that criticism about Google seems to be on the uptick, and I am not sure why. The company has been consistent in its behavior for years. The biggest change is the company’s increased “transparency”. Googlers are everywhere: at conferences, in the news, on Web casts. Everywhere I look, there’s the GOOG.

Now, to Mr. Wainewright. He is actually picking up on the theme that Xooglers–that is, employees who cash out, quit, or get fired–are revealing that Mother Google has some idiosyncrasies. The key paragraph for me in Mr. Wainewright’s well-written essay was:

It’s a damning indictment, and one that casts a long shadow over Google’s attempts to replace Microsoft’s pre-eminence in the office collaboration software market with its Google Apps suite. As a disruptive competitor, it doesn’t have to match Microsoft Office feature-for-feature. But if it really is unreliable and buggy as Solyanik claims — and the current outage of Feedburner’s Web analytics service lends further weight to this view — then Google doesn’t even make the grade as a business-class SaaS provider.

Let me offer three observations.

First, demographics will help Google. As Google’s push in the educational institutions increases, future graduates will be comfortable with Google and its characteristics. When these graduates enter the work force, I think some of them will continue using Google or take steps to get Google products and services into the organization. I am not sure quality will have much to do with this sell through. I think habit, loyalty, and the notion that Google is pretty good will have some impact. Ergo, the short term and today’s expectations don’t matter so much.

Second, existing enterprise applications are clunky, disappointing, and costly to deploy, customize and maintain. I think that in a deteriorating economy, Google’s approach or that of its surrogates like Salesforce.com will be good enough. If the price is right, Google has a great opportunity to be pulled into an organization. Sure, traditional outfits and information technology departments may balk. But when money is tight, Google can cut a great deal, and maybe some of those “old fashioned IT professionals” could be rationalized. The systems associated with that crowd may strike some youthful chief financial officers as problems, not solutions.

Third, the competitors in the enterprise space are struggling. Oracle is boosting prices. Microsoft is betting the farm on a polymorphic software solution that is really complicated. (If you have not seen the SharePoint placemat, take a gander. You can find it here.) IBM is a consulting firm with loyal customers who so far have been content to write huge checks for solutions, but in a lousy quarter, IBM could face some pressure from upstarts like Google and its partners.

In short, Google can be baffling. I think that as people learn more about Google, more warts and blotches will become visible. Nevertheless, the GOOG is following its own path. By definition, those who are not Googley cannot be expected to understand how the company works, what it is doing, and when it will take certain actions. The “why” is clear: To make money. The “how” is a baffler, but I think the approach the firm is taking is interesting and more disruptive than many think.

Agree? Disagree? Let me know.

Stephen Arnold, June 30, 2008

Update: July 1, 2008, 9 50 pm Eastern; A round up of Google’s woes is here.

US Copyright Renewal Records

June 30, 2008

A happy quack to a reader who forwarded the information about where to download US copyright renewal records. The original story appeared in BeSpacific, a useful Web log. The download location for the file is here. http://dl.google.com/rights/books/renewals/google-renewals-20080516.zip. BeSpacific reported that Googler Jarkko Hietaniemi helped make these useful data available.

Stephen Arnold, June 30, 2008

Devastating Critique of Google

June 30, 2008

Microsoft wizard goes to Google. Quits. Returns to Microsoft. Writes about Googzilla’s bad habits.

In my opinion, this is a must read for anyone wanting to learn that Google is not the Miss America of technology. The author is, if the posting is not a spoof, is Sergey Solyanik, and you can read his essay “Back to Microsoft” at http://1-800-magic.blogspot.com/2008/06/back-to-microsoft.html. Better hurry. Some of these Xoogler posts can become tough to find after a short period of time.

For me the most interesting statement in the write up was:

It seems like every week 10% of all the features are broken in one or the other browser. And it’s a different 10% every week – the old bugs are getting fixed, the new ones introduced. This across Blogger, Gmail, Google Docs, Maps, and more.

That suggests that the code demonstrates what I call morphing suckiness. The situation arises when intelligent software is not all that smart.

I will let you know if I learn more about this quality problem.

Stephen Arnold, June 30, 2008

Powerset Nails Search: A Very Bold Assertion

June 29, 2008

Chris Gaylord, writing for the Christian Science Monitor, updates a May 2008 essay, and emphasizes this point:

Google has been a bit dismissive of semantic search, preferring (for now at least) its quick keyword approach. But this Microsoft news puts a lot of weight – and $100 million – behind the notion that web users want to ask questions to a search engine, not just feed it keyword clues. We have yet to see if Microsoft will keep the Powerset name or, more likely, integrate the technology into its Live Search. That site certainly needs some help. The company has fought a losing battle against Google and Yahoo for years now. Despite its best efforts and even cash incentives, Microsoft has not been able to distinguish itself. Offering a strong semantic search option is a good way to reboot the challenge.

You can read the full document here.

You may recall that the original Ask Jeeves answered questions. Human figured out answers, put them in a file, and the Ask Jeeves’ system converted the user’s query to a form that could be matched against the canned answers. The buzz about this surged in the late 1990s, but the cost of the Ask Jeeves’ approach was high, and in my view, the system did not work very well.

The desire of information retrieval mavens to take a question, any question, and have software answer it makes some folks darned excited. The technology to answer questions continues to advance, and it is possible to get answers from a number of different systems. I have participated in meetings where smart people much more enthusiastic than I argued about the importance of having a system answer a question.

I have written about NLP or natural language processing in the first three editions of the Enterprise Search Report, and I added some information in my April 2008 Beyond Search study for Gilbane Group. Let me offer some observations:

  1. I don’t type queries into search engines. I prefer Boolean statements and point-and-click interfaces that let me “see” what’s in an indexed corpus. My experience is that typing questions is not too popular, nor is the notion of chopping text from an article and letting a search system find “more like this”. I have an installation of the Brainware trigram system, and it is useful–far more useful to me than asking “When did Columbus discover America?” if indeed he did. No NLP system can make much sense of a short query in the context of archaeological research about pre-Kit Columbus visitors to the North American landmass. Nope, that type of question answering will take a bit more lab work.
  2. NLP imposes considerable computational load on both the document indexing subsystem and the query processing subsystem. I saw an impressive set of PowerPoint slides at the 2007 BearStearns’ Internet conference, and I fiddled with the Wikipedia demonstration in 2008. What I have not seen is proof that Powerset’s amalgam of Xerox technology and its proprietary code scaling. Without scaling, NLP is likely to remain interesting but of little use to me.
  3. Microsoft, like Yahoo, is now in the business of collecting search technologies. There are two “flavors” of SharePoint search. There is the Fast Search & Transfer technology. There is the whizzy new Live.com search. There is search in XP, in Vista, in SQL Server, and probably other search technologies I don’t know about. Toss in Powerset. What the collection resembles is a yard sale, not an exhibit of Etruscan tomb art at the British Museum. Search has to be more than a yard sale in its design, architecture, and technical framework. The cost of integrating this stuff is more than my check book can support.

I appreciate the enthusiasm for Microsoft becoming more competitive. Let us not forget that Google has been doing pretty much the same thing–it’s one trick pony show–for a decade. With two thirds of the market for Web search, Microsoft has some work to do to become a number two in search. Google continues to seep into the enterprise via osmosis. Let’s face facts. Customers have to buy from Google. Google is not very good at sales, customer support, or communicating what its gizmos can do. Microsoft is a good sales organization, but it is watching Google challenging its enterprise revenue the way spilled ink spreads on a white table cloth. And, Google has serious semantic technology which is a widget in a larger data management solution at Google.

Keep cheerleading for Microsoft. Just keep the challenges of NLP in mind. Agree? Disagree? Let me know so I can learn what I don’t now know.

Stephen Arnold, June 30, 2008

Microsoft Research Search Research: Not a Typo

June 29, 2008

In Chicago, I heard two earnest 20-somethings in the Starbucks on Lincoln and Greenview in Chicago arguing about Microsoft search. The two whiz kids wanted to locate information about Microsoft’s Wed Data Management Group. Part of Microsoft’s multi-billion dollar research and development program, WDMG (sometimes abbreciated WSM0 works to crack tough problems in Web search.

The problem with Web search is that content balloon with each tick of the hyper fast Internet clock. The problem boils down to a several hundred megabytes every time slice. To make the problem more interesting, Web data changes. One example ignored by researchers is the facility with which a Web log author can change a posting. Some changes are omissions such as forgetting to assign a tag. Others are more Stalinesque. An author deletes, rewrites, or supplements an original chunk of a Web log. Today, I find more and more Web sites render pages in response to an action that I take. The example which may resonate with you is the operation of a meta search or federating system like Kayak.com. Until I set parameters for a trip, the system offers precious little content. Once I fill in the particulars of my trip, the rendered pages provide some useful information.

If you plan on indexing the Web, you have to figure out these dynamic pages, versions, updates, and new content. The problem has three characteristics. First, timeliness. When I do a query, I want current information. Speed, then, requires an efficient content identification and indexing system. If I lack the computing horsepower for brute force indexing, I have to use user cues such as indexing only the most frequently requested content. In effect, I am indexing less information in order to keep that index current.
Second, I have to be able to get dynamic content into my index. If I miss the information available that becomes evident in response to a curer, I am omitting a good chuck of the content. My tests show that more than half the sites in my test set are dynamic. The static HTML of the good old days makes up a smaller portion of the content that must be processed. Google’s work with Google Forms is that company’s first step into this type of data. Microsoft has its own approaches and some of this work is handled by the wizards at WSM or Web Search and Mining Group here.

Third, I also have to figure out how to deal with queries. When I talk about search, there are two sides to the coin. On one side is indexing. On the other side is converting the query to something that can be passed against the query. If a system purports to understand natural language as Hakia and Powerset assert, then the system has to figure out what the user means. Intent is not such a simple problem. In fact, deciphering a user’s query can be more difficult than indexing dynamic content. Human language is ambiguous. You would not understand my mother if you heard her say to me, “Quilling.” She means something quite specific, and the likelihood any system could figure out that this single word means, “Bring me my work basket” is close to zero unless the system in some ways has considerable information about her specific use of language.

As you probably have surmised, natural language processing is complicated. NLP is resource intensive. I need a capable indexing system and I need a powerful, clever way to clear up ambiguities. Human don’t type long queries, nor do professionals evidence much enthusiasm for crafting query strings that retrieve exactly what that professional needs. Users type 2.3 words and take what the system displays. Others prefer to browse an interface with training wheels; that is, Use For and See Also references and explore. The difference in the two approaches share one common element: a honking big computer with smart algorithms are needed to make search work.

Web Search and Mining

This Microsoft group works on a number of interesting projects related to content processing, text mining, and search. The group’s Web page identifies data management, dynamic data indexing, and and search quality as current topics of interest.

More detail about the group’s activities appear in the list of publicly available research papers. You can browse and download these. I want to comment about three aspects of the research identified on this Web site and then close with several observations about Microsoft research into search.
First, the sample papers date from 2004. I don’t know if the group has filtered its postings of papers, or if the group has been redirected.

Second, a number of papers discuss clustering. A representative paper is Hierarchical Clustering of WWW Image Search Results Using Visual, Textual and Link Analysis. The full paper is here. . The paper explains a system that accepts a query and then outputs a result. Each row is a cluster. Microsoft’s researchers are parsing a query and retrieving images. The images are displayed in a clustered visual display. You will notice that the lead Microsoft researcher worked with a Yahoo researcher and a University of Chicago researcher. You can browse the other clustering papers.

Third, another group of papers touches upon the notion of “information manifolds”. In the 1990s, the phrase “information manifold” enjoyed some buzz. The notion is that a “space” contains indexes which can be queried. One Microsoft paper–” Learning an Image Manifold for Retrieval”–applies the notion to images. Other papers touch upon the topic as well. I found this interest suggestive. Google has some activity is this subject as well.

I want to pick up the thread of WSM and research into “manifolds”. I turned first to Search.Live.com, Microsoft’s own search system and Google.com Microsoft-centric search sub system. You can find Microsoft’s search here and Google’s search sub system here . You may want stray into specialist Microsoft systems such as Libra here, a showcase for some new Microsoft technology. I tried several queries on the Microsoft Live.com search site and was able to locate the paper referenced above. One of the two hits I was able to track down returned a null set.

Read more

Google CFO Search

June 29, 2008

The Washington Post has an interesting essay by Joseph Weisenthal (PaidContent.org) here. The story asks and answers the question, “Google’s CFO Search: Why’d It Take So Long?” This is an important bit of thinking and I urge you to read the full write up.

The short answer is risk. Mr. Weisenthal writes:

In a free-wheeling culture like Google’s, it would be up the CFO to be the stern taskmaster?basically, the parent or teacher that nobody likes cause they actually enforce all the rules. Ribstein adds: “The problem under SOX is that a CFO has to worry about what hedoesn’tknow ? that’s what Butler and I have called SOX’s “litigation time bomb.”

In my KMWorld column, which will appear in September 2008, I talk about Google’s transparency push. Google’s executives have been chatty Kathies in Israel, California, Washington, DC, and anywhere a journalist or three and an audience will listen.

Mr. Weisenthal’s discussion of snail-like CFO vetting clanks against the apparent transparency of Google. The last line of the essay nails the issue squarely:

You can see why it might not appeal to someone who from a typical CFO’s background, given the current regulatory environment.

The non-traditional approach of Google is working well. Will it work in the CFO’s office?

Stephen Arnold, June 29, 2008

Google Vulnerabilities

June 29, 2008

Seeking Alpha has an interesting discussion of chinks in Googzilla’s armor. The essay “Does Google Have a Weakness Microsoft Can Exploit? is here. The analysis touches upon my listing of Google weaknesses which first appeared in The Google Legacy, which I updated in Google Version 2.0, 2005 and 2007 respectively.

The part of the analysis that I found interesting touches upon Microsoft’s cash back. The idea of buying market share is not new, and I think Microsoft may expand its efforts in this area. The question for me, Is Microsoft able to see the buying market share through to its logical end; that is, to win may require sucking resources from other Microsoft initiatives. Such a shift could create a weaker Microsoft and one that is vulnerable not to Google but to other firms salivating at the idea of a weaker, distracted Microsoft.

Stephen Arnold, June 29, 2008

50 Niche Search Engines

June 28, 2008

Alisa Miller has compiled a list of 50 niche search engines. You can find the listing on Accredited Degrees here. Ms. Miller groups the search engines, which adds to the usefulness of her list. As I worked my way through the links, two of her finds struck me as useful:

  • Bookmatch provides search results from 3,300 sources with spam and silliness removed from Web log postings and news aggregators.
  • Congoo delivers results results from news and other sources. The company claims a higher level of information. My test queries returned useful results.

A happy quack to Ms. Miller for her list.

Stephen Arnold, June 29, 2008

IDC’s Database Market Share Analysis

June 28, 2008

IDC’s Chris Kanaracus has summarized the relational database market size in “Oracle Maintains Lead in Database Market”. You can read the full round up here. The total market tallies an estimated $19 billion. For me, the most important data in the news story is:

Oracle once again took the top spot, capturing 44.3 percent of the market with revenue growth of 13.3 percent. IBM came in second with a 21 percent share, also logging a 13.3 percent revenue growth rate. It was followed by Microsoft, with 18.5 percent of the market and a 14 percent jump in revenue. Sybase and Teradata rounded out the top five, garnering market shares of 3.5 percent and 3.3 percent, respectively.

My question is, How long will the traditional database vendors remain in ascendancy. The volume of data choking enterprises is increasing. The traditional row-and-column data tables remain administrative headaches. Basic queries often require hours, days, or weeks to execute after data cubes are built, queries written and debugged, and end users given a chance to review the reports.

The traditional database vendors are not solving the data management problems their licensees face. The companies in the IDC Top Five are creating an appetite for a different approach. Who will emerge with a soluition? The work I am going points to some newcomers. I have written about Aster Data and other firms with different angles of attach on the growing database problem. One thing I have learned is that the incumbents think their market positions are unassailable. These companies, despite their grip on the market, are dead wrong. This is not an innovator’s dilemma. This is the ostrich response: put the head in the sand and the outside world can’t be seen.

Stephen Arnold, June 28, 2008

Next Page »