Scotsman Makes Lemonade from Google Revenues
April 18, 2009
The UK newspapers have a chip on their shoulder when it comes to the GOOG. I shook my beak at this headline in Scotland’s leading newspaper: “Google ‘Slow Growth’ and ‘Falling Sales’. You may want to read the story here. I thought the GOOG did well, certainly better than most of the dead tree outfits. The Scotsman wrote:
It is also the first time that year-on-year growth has dropped to single digits since Google’s stock market launch in August 2004.
What made the sentence notable was that the story was broken by a pulsing eBay ad for local classifieds. I found this juxtaposition which was probably accidental amusing.
Stephen Arnold, April 18, 2009
End Game for Microsoft Yahoo
April 18, 2009
What a week for Microsoft search. I heard from three different sources that the Fast ESP technology will run on Windows, not Linux or the other forbidden operating systems. Then I read a Reuters’ “analysis” of the Microsoft Yahoo Web search chit chat. Written by Thomson Reuters’ Alexei Oreskovic, the headline was: “ Yahoo and Microsoft Approach Endgame on Search.” With Google’s search share north of 60 percent, I wondered whose game it was. Mr. Oreskovic wrote:
For that reason, running ads with Google is generally considered a “no-brainer.” But a combined Microsoft-Yahoo with nearly 30 percent search market share could provide a large enough audience to also be worthwhile.
My thoughts were:
- What if Google’s share were higher? Closing the gap becomes more expensive and may be less attractive to advertisers
- What if the costs of mashing up multiple search services sky rockets so that the anticipated financial upside become a ski jump into unexpected cost overruns
- What if the technology does not deliver what users want?
I love analyses that evoke more questions than the mavens’ explication answers.
Stephen Arnold, April 19, 2009
Custom Publishing The Time Warner Way
April 18, 2009
Custom publishing is tricky business. First, there’s the database that contains the customer particulars. Then there is the input file that contains the customer preferences. And there are algorithms that take customer preferences and match them with content that is “ready” for the publication. Then there are pesky variables such as an advertiser who pulls out creating a copy hole which may be filled with a public service ad or a bit of scintillating prose that was chopped to fit text around the paying customers’ messages. You have arts and crafts people poking around. You have some legal eagles getting worry lines over rights. The fact checkers scurry about fretting that the inevitable errors are not going to slip through their 20 something fingers. And so on.
Lots of moving parts.
According to Fast Company here, Time Warner had a vision of cranking out customized magazines. Now there are companies who have the work flow and the systems to deliver this type of service. Most of my readers will be uninterested in companies like InfoPrint, Exstream Software, and StreamServe, among others. There are the outfits who put in a car payment, a reminder for a coupon, info about your model’s most recent recall, and other items intended to make you believe that the financial institution holding your loan cares about you and your vehicle. Dead tree outfits don’t use these types of systems. A whole ecosystem of publishing software companies create custom publishing systems that deliver personalized content to whizzy digital presses.
“Time’s Printed RSS Feed Magazine Needs Debugging, Ad Blocking” by Ariel Schwartz wrote:
A number of the magazine’s 31,000 subscribers received content intended for other subscribers (i.e. In Style fans ended up with Sports Illustrated content). Time Inc. spokespeople say that the glitch was the result of a computer error. To make matters worse, many of the stories picked by the project’s editors were up to two years old–something that Time Inc. claims was done on purpose since it “was never the intent for this to be a breaking news vehicle,” and that future issues will have more recent content.
So what went wrong? Many slips twixt cup and lip. I would wager a crust of bread on the margin of my mine run off pond that the Time Warner managers have convinced themselves that the problem was an anomaly and won’t happen again. Life was easier when content was cast in lead and legions of specialists created the weekly. Those bits and bytes are tricky beasts.
Stephen Arnold, April 18, 2009
Copyright: I Told You So Twice from Techdirt
April 17, 2009
Short honk: If you have been following the copyright guerilla skirmishes, you will want to read Techdirt’s “A Look Back At Some Prescient Predictions On Copyright” here via Michael Scott from Thomas O’Toole (a provenance chain that makes clear online is not the same as a high school term paper with footnotes). The article points to two documents that presaged the murky nature of copyright in a pervasive network and the difficult of getting money for digital content when copying is a basic system function. Worth reading. Download the referenced papers if you don’t have them in your repository now. The dead tree crowd may have a liquid lunch after revisiting these documents, one of which is almost 20 years young.
Stephen Arnold, April 17, 2009
Google and Guha: The Semantic Steamroller
April 17, 2009
I hear quite a lot about semantic search. I try to provide some color on selected players. By now, you know that I recycle in this Web log, and this article is no exception. The difference is that few people pay much attention to patent documents. In general, these are less popular than a printed dead tree daily paper, but in my opinion quite a bit more exciting. But that’s what makes me an addled goose, and you a reader of free Web log posts.
You will want to snag a copy of US20090100036 from our ever efficient USPTO. Please, read the instructions for running a query on the USPTO system. I don’t provide for free support to public facing, easy to use, elegant interfaces such as that available from the Federal government.
The “eyes” of Googzilla. From US20090100036, Figure 21, Cyrus, in case you want to see what your employer is doing these days.
The title of the document is “Methods and Systems for Classifying Search Results to Determine Page Elements” by a gaggle of Googlers, one of whom is Ramanathan Guha. If you read my Google Version 2.0 or the semantic white paper I wrote for Bear Stearns when it was respected and in business, you know that Dr. Guha is a bit of a superstar in my corner of the world. The founder of Epinions.com and a blue chip wizard with credentials (Semantic Web RDF, Babelfish, Open Directory, etc.) that will take away the puffery of newly minted search consultants, Dr. Guha invented, wrote up, and filed five major inventions. These five set forth the Programmable Search Engine. You will have to chase down one of my for fee writings to get more detail about how the PSE meshes with Google’s data management inventions. If you are IBM or Microsoft, you will remind me that patents are products and that Google is not doing anything particularly new. I love those old eight track tapes, don’t you.
The new invention is the work of Tania Bedrax-Weiss, Patrick Riley, Corin Anderson, and Ramanathan Guha. His name is spelled “Ramanthan” in the patent snippet I have. Fish & Richardson, Google’s go-to search patent attorney may have submitted it correctly in October 2007 but it emerged from the USPTO on April 16, 2009, with the spelling error.
The application is a 33 page long document, which is beefy by Google’s standard. Google dearly loves brevity so the invention is pushing into Gone with the Wind length for the GOOG. The Fish & Richardson synopsis said:
This invention relates to determining page elements to display in response to a search. A method embodiment of this invention determines a page element based on a search result. The method includes: (1) determining a set of result classifications based on the search result, wherein each result classification includes a result category and a result score; and (2) determining the page element based on the set of result classifications. In this way, a classification is determined based on a search result and page elements are generated based on the classification. By using the search result, as opposed to just the query, page elements are generated that corresponds to a predominant interpretation of the user’s query within the search results. As result, the page elements may, in most cases, accurately reflect the user’s intent.
Got that? If you did not, you are not alone. The invention makes sense in the context of a number of other Google technical initiatives ranging from the non hierarchical clustering methods to the data management innovations you can spot if you poke around Google Base. I noted classification refinement, snippets, and “signal” weighting. If you are in the health biz, you might want to check out the labels in the figures in the patent application. If you were at my lecture for Houston Wellness, I described some of Google’s health related activities.
On the surface, you may think, “Page parsing. No big deal.” You are not exactly right. Page parsing at Google scale, the method, and the scores complement Google’s “dossier” function about which Sue Feldman and I wrote in our September 2008 IDC client only report. This is IDC paper 213562.
What does a medical information publisher need with those human editors anyway?
Stephen Arnold, April 17, 2009
YAGG: Twitter Aflame with Gmail Glitch
April 17, 2009
Short honk: Google does well in a lousy economy. Google sends a signal it would work with Twitter (even with its Amazon hook). Gmail goes down… for some. The big story for me is not the money or the Twitter air kiss. The news is YAGG, yet another Google glitch. You can read Steve Shankland’s “Gmail Outage Afflicts Some Users” here. No YAGG for the CNet take on the story. Beyond Search is not quite so hesitant to honk, “YAGG, YAGG.”
Stephen Arnold, April 17, 2009
OpenText and Endeca Tie Up: Digital Asset Management Play
April 17, 2009
OpenText has a six pack of search systems. There’s the original Tim Bray SGML search system (either the first or one of the first), the Information Dimensions BASIS (structure plus analytics which we used for a Bellcore project eons ago), BRS Search (a rewrite of STAIRS III which I’m sure the newly minted search consultant who distributed a search methodology built on a taxonomy will have in depth expertise), the Fulcrum engine (sort of Windows centric with some interesting performance metrics), and a couple of others which may or may not be related to the ones I’ve named). Endeca is a privately held vendor of search and content processing technology. I like the Endeca system for ecommerce sites where the “guided navigation” can display related products. Endeca has been working overtime to develop a business intelligence revenue stream and probe new markets such as traditional library search. The company received an infusion of cash last year and I heard that the company had made strides in addressing both scaling and performance challenges. One reseller allegedly told a government procurement officer that Endeca had no significant limit on the volume of content that it could index and make findable.
So what are these two powerhouses doing?
According to Newsfactor here, the two companies are teaming up for digital asset reuse. Most organizations have an increasing amount of podcasts, videos, images, and other rich media. If you read my link tasty essay about content management (the mastodon) and the complexities of dealing with content objects in containers (tar pit), you know that there is an opportunity to go beyond search.
The Newsfactor story is called “Open Text, Endeca to Deliver Digital Asset Reuse”. My understanding of the Newsfactor version of the deal is that OpenText will integrate Endeca’s asset management system into OpenText content management systems. There are a number of product names in the write up, and I must confess I confuse them with one another. I am an old and addled goose.
What’s the implication of the tie up? I think that Autonomy’s push into asset management with its IDOL server and the Virage software has demonstrated that there’s money in those rich media objects that are proliferating like gerbils. The world of ediscovery has an asset twist as well. Videos and podcasts have to be located and analyzed either by software or a semi alert paralegal, maybe a junior lawyer. OpenText has a solid ediscovery practice, so there’s some opportunity there. In short, I think this tie up helps two established companies deal with a competitor who is aggressive and quicker to seize enterprise opportunities. Autonomy is a serious competitor.
What will Autonomy and other vendors do? I think that in this economic climate there will be several reactions to monitor. Some aggressiveness on the part of Autonomy and probably Adobe will be quick to come. Second, other vendors of search and content processing systems will shift their marketing messages. A number of search systems have this capability and some, like Exalead, can make videos searchable with markers where particular passages can be viewed in the video object. This is quite useful. You can see a demo here. Third, I think that eDiscovery companies already adept at handling complex matters and content objects will become more price competitive. Stratify comes to mind as one outfit that may use price as a counter to the OpenText and Endeca tie up. I can point to start ups, aging me-too outfits like IBM, and a fair number of little known specialists in rich media who may step up their marketing.
This will be interesting to watch. OpenText is a bit like the old Ling Temco Vought type of roll up. Endeca is a solid vendor of search and content processing technology that was unable to pull off an initial public offering and a recipient of cash infusions from Intel and SAP’s venture arm. The expectation is that one plus one will equal three. In today’s market, there’s a risk that a different outcome may result.
Stephen Arnold, April 17, 2009
Content Management: Modern Mastodon in a Tar Pit, Part One
April 17, 2009
Editor’s Note: This is a discussion of the reasons why CMS continues to thrive despite the lousy financial climate. The spark for this essay was the report of strong CMS vendor revenues written by an azure chip consulting firm; that is, a high profile outfit a step or two below the Bains, McKinseys, and BCGs of this world.
Part 1: The Tar Pit and Mastodon Metaphor or You Are Stuck
PCWorld reported “Web Content Management Staying Strong in Recession” here. The author, Chris Kanaracus, wrote:
While IT managers are looking to cut costs during the recession, most aren’t looking for savings in Web content management, according to a recent Forrester Research study. Seventy-two percent of the survey’s 261 respondents said they planned to increase WCM deployments or usage this year, even as many also expressed dissatisfaction with how their projects have turned out. Nineteen percent said their implementations would remain the same, and just 3 percent planned to cut back.
When consulting firms generate data, I try to think about the data in the context of my experience. In general, pondering the boundaries of “statistically valid data from a consulting firm” with the wounds and bruises this addled goose gets in client work is an enjoyable exercise.
These data sort of make sense, but I think there are other factors that make CMS one of the alleged bright spots in the otherwise murky financial heavens.
La Brea, Tar, and Stuck Trapped Creatures
I remember the first time I visited the La Brea tar pits in Los Angeles. I was surprised. I had seen well heads chugging away on the drive to a client meeting in Longbeach in the early 1970s, but I did not know there was a tar pit amidst the choked streets of the crown jewel in America’s golden west. It’s there, and I have an image of a big elephant (Mammut americanum for the detail oriented reader) stuck in the tar. Good news for those who study the bones of extinct animals. Bad news for the elephant.
Is this a CMS vendor snagged in litigation or the hapless CMS licensee after the installation of a CMS system?
I had two separate conversations about CMS, the breezy acronym for content management systems. I can’t recall the first time I discovered that species of mastodon software, but I was familiar with the tar pits of content in organizations. Let’s set the state, er, prep the tar pit.
Organizational Writing: An Oxymoron
Organizations produce quite a bit of information. The vast majority of this “stuff” (content objects for the detail oriented reader) is in a constant state of churn. Think of the memos, letters, voice mails, etc. like molecules in a fast-flowing river in New Jersey. The environment is fraught with pollutants, regulators, professional garbage collection managers, and the other elements of modern civilization.
The authors of these information payloads are writing with a purpose; that is, instrumental writing. I have not encountered too many sonnets, poems, or novels in the organizational information I have had the pleasure of indexing since 1971. In the studies I worked on first at Halliburton Nuclear Utility Services and then at Booz, Allen & Hamilton, I learned that most organizational writing is not read by very many people. A big fat report on nuclear power plants had many contributors and reviewers, but most of these people focused on a particular technical aspect of a nuclear power generation system, not the big fat book. I edited the proceedings of a nuclear conference in 1972, and discovered that papers often had six or more authors. When I followed up with the “lead author” about a missing figure or an error in a wild and crazy equation, I learnedthat the “lead author” had zero clue about the information in the particular paragraph to which I referred.
Flash forward. Same situation today just lots more digital content. Instrumental writing, not much accountability, and general cluelessness about the contents of a particular paragraph, figure, chart, whatever in a document.
Organizational writing is a hotch potch of individuals with different capabilities and methods of expressing themselves. Consider an engineer or mathematician. Writing is not usually a core competency, but there are exceptions. In technical fields, there will be a large number of people who are terse to the point of being incomprehensible and a couple of folks who crank out reams of information. In an organization, volume may not correlate with “right” or “important”. A variation of this situation crops up in sales. A sales report often is structured, particularly if the company has licensed a product to force each salesperson to provide a name, address, phone, number, and comments about a “contact”. The idea is that getting basic information is pretty helpful if the salesperson quits or simply refuses to fill in the blanks. Often the salesperson who won’t play ball is the guy or gal who nails a multi million dollar deal. The salesperson figures, “Someone will chase up the details.” The guy or gal is right. Distinct content challenges arise in the legal department. Customer support has its writing preferences, sometimes compressed to methods that make the customer quit calling.
Why CMS for Text?
The Web’s popularization as cheap marketing created a demand for software that would provide writing training wheels to those in an organization who had to contribute information to a Web site. The Web site has gained importance with each passing year since 1993 when hyperlinking poked its nose from the deep recesses of Standard Generalized Markup Language.
Customer relationship management systems really did not support authoring, editorial review, version control, and the other bits and pieces of content production. Enterprise resource planning systems manage back office and nitty gritty warehouse activities. Web content is not a core competency of these labyrinthine systems. Content systems mandated for regulatory compliance are designed to pinpoint which supplier delivered an Inconel pipe that cracked, what inspector looked at the installation, what quality assurance engineer checked the work, and what tech did the weld when the pipe was installed. Useful for compliance, but not what the Web marketing department ordered. Until recently, enterprise publishing systems were generally confined to the graphics department or the group that churned out proposals and specifications. The Web content was an aberrant content type.
Enter content management.
I recall the first system that I looked at closely was called NCompass. When I got a demo in late 1999, I recall vividly that it crashed in the brightly lit, very cheerful exhibition stand in San Jose. Reboot. Demo another function. Crash. Repeat. Microsoft acquired this puppy and integrated it into SharePoint. SharePoint has grown over time like a snowball. Here’s a diagram of the SharePoint system from www.JoiningDots.net:
SharePoint. Simplicity itself. Source: http://www.joiningdots.net/downloads/SharePoint_History.jpg
A Digital Oklahoma Land Rush
By 2001, CMS was a booming industry. In some ways, it reminded me of the case study I wrote for a client about the early days of the automobile industry. There were many small companies which over time would give way to a handful of major players. Today CMS has reached an interesting point. The auto style aggregation has not worked out exactly like the auto industry case I researched. Before the collapse of the US auto industry in 2008, automobile manufacturing had fractured and globalized. There were holding companies making more vehicles than the US population would buy from American firms. There were vast interconnected of supplier subsystems and below these huge pipelines into more fundamental industrial sectors like chemicals, steel, and rubber.
Rumor, Disinformation, or Reality – Google Twitter Tie Up
April 17, 2009
A happy quack to the reader who sent me a link to this remarkable story without sources. “Google to Announce Twitter Acquisition Tomorrow” here is amazing in its audacity. The addled goose does his best to get some chit chat going, but this is beyond our wingspan. The author is The Raw Feed, a young man with a radio telescope for entertainment. Who knows? My recollection is that the GOOG has no interest in Twitter. Ah, truth. Ah, beauty. That is all one needs to know.
Stephen Arnold, April 17, 2009
Reading News: Amazing Math Equals Falling Revenues
April 17, 2009
If you are an MBA looking for work, you may want to check out the fancy math here. The article sports a remarkable title: “Print Is Still King: Only 3 Percent of Newspaper Reading Happens Online”. The author is Martin Langeveld. If an MBA finds the assumptions acceptable, that person may want to apply for a job at the Nieman Journalism Lab. I did not feel comfortable with the assumptions, so I resisted the main thrust of the write up. In my opinion, the numbers indicate that print newspapers have more reach and other goodness that online news does not. I think that online news has some major flaws, but traditional newspapers are not setting the world on fire. The other thought that crossed my mind is that those younger than I are into digital info and don’t see paper documents quite as old geese like me.
One final comment: if those eyeballs had value, the local newspaper would not have had to double the forced time off for certain staff. Gannett and other newspaper companies would be cashing checks, not riffing staff. With MBAs somewhat discredited, their math skills might mesh with analyses for the newspaper industry. Who knows? Fancy math might work and the media giants will once again rule the information universe.
Stephen Arnold, April 16, 2009