Evvie 2009 Winners: David Evans and Martin Baumgartel

May 4, 2009

Stephen E. Arnold of ArnoldIT.com, http://www.arnoldit.com, announced the Evvie “best paper award” for 2009 at Infonortics’ Boston Search Engine Meeting on April 28.

The 2009 Evvie Award went to Dr. David Evans of Just Systems Evans Research for “E-Discovery: A Signature Challenge for Search.” The paper explains the principal goals and challenges of E-Discovery techniques. The second place award went to Martin Baumgärtel of bioRASI for “Advanced Visualization of Search Results: More Risks or More Chances?”, which addressed the gap between breakthroughs in visualization and actual application of techniques.

Stephen Arnold (left) is pictured with Dr. David Evans, Just System Evans Research on the right.

The Evvie is given in honor of Ev Brenner, one of the leaders in online information systems and functions. The award was established after Brenner’s death in 2006. Brenner served on the program committee for the Boston Search Engine Meeting since its inception almost 20 years ago. Everett Brenner is generally regarded as one of the “fathers” of commercial online databases. He worked for the American Petroleum Institute and served as a mentor to many of the innovators who built commercial online.

Martin Baumgartel (left) and Dr. David Evans discuss their recognition at the 2009 Boston Search Engine Meeting.

Mr. Brenner had two characteristics that made his participation a signature feature of each year’s program: He was willing to tell a speaker or paper author to “add more content,” and after a presentation, he would ask a presenter one or more penetrating questions that helped make a complex subject more clear.

The Boston Search Engine meeting attracts search professionals, search vendors, and experts interested in content processing, text analysis, and search and retrieval. Held each year in Boston, Ev, as he was known to his friends, demanded excellence in presentations about information processing.

Sponsored by Stephen E. Arnold (ArnoldIT.com), this award goes to the speaker who best exemplifies Ev’s standards of excellence. The selection committee consists of the program committee, assisted by Harry Collier (conference operator) and Stephen E. Arnold.

This year’s judges were Jill O’Neill, NFAIS, Sue Feldman, IDC Content Technologies Group, and Anne Girard, Infonortics Ltd.

Mr. Arnold said, “This award is one way for us to respect his contributions and support his life long commitment to excellence.”

The recipients receive a cash prize and an engraved plaque. Information about the conference is available on the Infonortics, Ltd. Web site at www.infonortics.com and here. More information about the award is here. Information about ArnoldIT.com is here.

Written by Stephen E. Arnold · Filed Under Conferences, Feature, News, Search, Technology, Text analytics, Text processing | 2 Comments

AP Google Spat

May 2, 2009

Forbes.com’s Dirk Smillie wrote “AP’s Curley Has Fighting Words for Google” at a time when dead tree outfits (my way of describing traditional publishing companies) experience a surge in blood pressure. The GOOG and the AP had a deal for content. Money changed hands. According to Mr. Smillie, presumably in the know with regard to discussions between the information giant of the past (AP) and the information giant of the future (Google), the two are having difficulty communication. Mr. Smillie wrote:

The AP and Google ( GOOG – news – people ) have been debating content and compensation issues for months. In an interview with Forbes on Wednesday, Curley warned that if Google doesn’t strike the right deal with the AP soon, “They will not get our copy going forward.” The threat follows Rupert Murdoch’s accusation earlier this month that Google is committing copyright thievery when it borrows material from news stories to assemble search rankings. A few days later, the AP weighed in with a similar charge–though it did not mention Google–announcing a content protection initiative and threatening legal and legislative action against news aggregators.

I am thrilled to be an addled goose paddling on a pond filled with mine draining run off in rural Kentucky. This battle could be Dickensian. The ghost of information past rails at the ghost of information yet to come. We know the outcome, don’t we?

Google wins.

Now let’s think about why this is, in my opinion, the trajectory of this dispute.

First, the financial ground on which the AP (Associated Press) stands is crumbling. The erosion is not caused by Google. The erosion is a consequence of the flow of bits that are tunneling worm holes in the once solid foundations of the newspaper business. Whatever actions AP takes will be similar to the hapless home owners who use sand to shore up shaky foundations. Wrong material. Wrong action.

Second, the Google is an information platform. If the folks with news want to get their information in front of people, Google is a major distribution channel. But, as described in my new monograph Google: The Digital Gutenberg, the GOOG can make it easy for those with content to monetize that information. One of Google’s disclosed inventions allows a partner to use the Google platform to perform many information functions, including monetization. Should Google wish, in a blink (I wanted to use the word nonce but a reader said it carried negative connotations), Google becomes a Swiss Army knife of news. If Google doesn’t take this step, another online upstart will. AP can’t be that upstart. AP can’t stop the trajectory of online information.

Third, it is too late. The AP inked a deal, thought it knew the ropes, realized it didn’t know the ropes were located in a Costco, and now is trying to advantage itself. In my opinion, Google is not too fond of second chances. As a result, the AP is negotiating from a position of weakness. Maybe the copyright lawyers will have an answer, but I think the children of the AP executives and the copyright attorneys will be the generation that decides in favor of the GOOG or a similar service. Let me repeat: Too late. No more flights to Cleveland today.

Do I agree with Mr. Smillie? He’s an objective reporter. I am writing a Web log, and I think that it’s game over for the AP.

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Financial, Google, Publishing | Comments Off on AP Google Spat

Autonomy Thrives in Lousy Economic Climate

April 29, 2009

I am at Day Two of the Boston Search Engine Meeting. At the break, I talked with a small group and the subject was the impact of the financial climate on the enterprise search vendors. I heard the names of two vendors who in the opinion of a couple of people with whom I spoke are gasping for nutrients in the form of dollars and euros. I don’t feel comfortable mentioning the name of one semantic-centric vendor and one non-US vendor who were the subject of speculation. In my opinion, there are probably a half dozen or more of the companies that I track in a resource pickle.

One notable exception is the UK based vendor Autonomy. I did not see a representative of Autonomy at this conference, but I have been too busy to conduct an inventory of the attendees. Autonomy reported a week or two ago that it was likely to have a solid financial performance. I did a quick check, and it is evident, if I understand Autonomy’s data, that the lousy climate is not inhibiting Autonomy’s growth.

You can Kathy Sandler’s take here. She reported on April 23, 2009, that Autonomy plans to upgrade its 2009 earnings projections. I am not a financial whiz, but the information in Ms. Sandler’s write up looks good across the board – revenue, earnings, and cost management.

My high school history teacher was fond of repeating the alleged anecdote about the drunk General US Grant and President Lincoln’s alleged comment: “Find out what he’s drinking and send a case to my other generals?”

Is it time for other enterprise search companies to take a hard look at what fuels Autonomy’s crops. Say what you will about the company’s acquisition strategy, the firm seems to be harvesting.

Are Autonomy’s competitors to arrogant to look at Mr. Lynch and determine what he does to harvest cash as others shrivel?

Stephen Arnold, April 29, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Enterprise, Feature, Technology, Text processing, Vertical search | 3 Comments

Microsoft Fast Arrow Electronics Parametric Search

April 24, 2009

In April 2008, BNet reported here that Arrow Electronics signed on with Fast Search & Transfer for the deployment of the Fast ESP (enterprise search platform). Today, an observant reader sent me a link to a story dated April 23, 2009, that appeared in 4G Wireless Evolution here. The title was “Arrow Electronics Launches New Features to Online Search Engine.” The newly enhanced system offers:

new online features providing greater access to product data and simplifying the search and ordering process within its expansive electronic components database. Building on the strength of FAST (Microsoft), Arrow’s new parts search engine, the enhancements represent the next step to provide greater tools and information via Arrow’s online resources.

The 4GWE story added:

In fall of 2008, Arrow launched FAST, offering enhanced site functionality, a greater range of user options and improved search speed and accuracy. Since the launch, search effectiveness on components.arrow.com has increased by 75 percent – giving customers faster, easier access to Arrow’s expansive parts database with readily available product and inventory data, enhanced filtering and cross-referencing capabilities.

Arrow appears to have indexed content about electrical products from about 800 suppliers and 120,000 original equipment manufacturers. There is scant information about the size of the content indexed. I navigated to the Arrow site here and ran some test queries. My initial reaction was that the system seemed snappy. As I clicked through the result pages, I saw output like that shown in the screenshot below for the query capacitors:

I clicked on the PDF logo for the first result, viewed that document, and tried to enter the following phrase “Monolithic Ceramic Capacitors”. I entered that phrase in the search box. What I discovered in that the search box only accommodated a portion of the phrase, 25 characters to be exact. This type of query constraint has been common to parametric search systems for decades, but I was surprised to encounter that hard stop.

Written by Stephen E. Arnold · Filed Under Database, Enterprise, Feature, Microsoft, Search, Technology, Text processing | 1 Comment

Google and Media: iBreakfast Synopsis

April 23, 2009

Editor’s Note: I gave a short talk at the iBreakfast meeting on April 23, 2009. The organizer—Alan Brody—asked me to prepare a short write up for the audience. I did not have much time, so I pulled together some text from my new book, Google: The Digital Gutenberg plus some information I had in my files. Here is the rough draft of the write up I provided Mr. Brody. Keep in mind that I will be making changes to this text and may be changing some of the examples and wording. Constructive criticism is invited.

“Google is best known as a Web search vendor and an online advertising system. Google as a publisher is a new concept. How many of you know about the financial problems facing newspapers?

It may surprise you to know that Google offers a number of revenue generating opportunities to publishers. These can be as simple as the AdSense program. A publisher displays Google-provided advertisements on a publisher’s Web site. When a visitor clicks on an ad, the publisher receives a share of the revenue. A rough rule of thumb is that every 250,000 unique visitor clicks per months translates into about $200,000 in revenue. Over the course of a year, the Web site yields as much or more than $2.0 million in revenue to the Web site owner. Your mileage may vary, of course.

Another opportunity is for a partner to organize video content, take responsibility for selling the ads, and using the Google system to make the content findable. Google also handles the delivery of the content and the monetizing. The partner who uses Google as a back office can negotiate revenue splits with Google. This is a relatively new initiative at Google and disclosed in a Google patent document. (US2008/0275763 “Monetization of Digital Content Contributions”.)
But there’s more to Google than AdSense and ways for innovative content providers to make money. Much more.
I want to run through some public facing content services and provide a somewhat different observation platform for you to look at Google and the opportunities it offers those who see a potential pot of gold in Mountain View.

First, Web logs. There are more than 100 million of these “diary” or “blog” publications. Some are commercial grade; for example, TechMeme. Others are ephemera and rarely updated. Google publishes more than 70 Web logs about itself. Google owns Blogger.com. Google operates a blog search service. Google has made it possible to hook blogs into Google’s Web page service Google Sites, which is a commercial grade online publishing system.

Second, Knols. A Knol is a unit of knowledge. More practically, Knol is an encyclopedia. Articles are contributed by people with knowledge about a subject. The Knol publishing system borrows from the JotSpot engine purchased by Google from Joe Kraus, the founder of the old Excite.com service. Knols can hook into other Google services such as YouTube.com and Google’s applications.

Third, Google Books. Books is the focus of considerable controversy. What I want to point out is that if you navigate to the Books site and click on a magazine cover, Google has created a very useful reference service. You can browse the table of contents for a magazine and see the locations on a map when a story identifies a place.

Finally, directories. Google operates a robust directory service. It has a content intake system which makes it easy for a person to create a company listing, add rich media, and generate a coupon. If you are in the Yellow Pages business, the Google Local service seems to be encroaching. In today’s wireless world, Google Local could become the next Yellow Pages 21st century style. Here’s a representative input form. Clean, simple, easy. Are you listed?

The White House has gone Googley as well. Recovery.gov makes use of Google’s search and other technology to some degree. The White House uses Google Apps to accept questions and comments for the president. Google’s communications tools appear to be playing an important role in the Obama White House.

What’s been happening since the Google initial public offering in 2004 has been a systematic build out of functions. The core of Google is search and advertising. But the company has been adding industrial-strength functions at a rapid clip. The pace has put increasing pressure on the likes of Microsoft and Yahoo, not just in search but in mindshare.
The challenge Google represents to newspapers in particular and to traditional media in general is an old story. When Gutenberg “invented” printing (at least in the eyes of my Euro-centric history teachers), scribes were put out of work. New jobs were created but the dislocation for those skilled with hand copying was severe. Then the Industrial Revolution changed cottage industries because economies of scale relegate handwork to specialists who served the luxury market. Another dislocation. Google is a type of large scale disruptor. Google, however, is not the cause of the disruption. Google is the poster child of larger changes made possible by technology, infrastructure, and user demands.

Here’s a representation of how one created a newspaper from the early 17th century to roughly 1993, when the Web gained traction. Notice that there are nine steps. Time, cost, and inefficiency are evident. Now here’s a depiction of the Google Local or the Google Blogger.com service. Two steps. Disruption is inevitable, and it will be painful for those unable to adapt. For some, yesterday’s jobs and income levels are no longer possible. This is a serious problem, but Google did not cause it. Google, as I said in my 2005 monograph The Google Legacy, is a company skilled at applying technology in clever ways. Google doesn’t invent in the Eureka! myth. Google is more like Thomas Edison, an inspired tinkerer, a person who combines ideas until one clicks. That’s the reason for Google’s beta tests and stream of test products and services.
Google applies its technology to work around the inefficiency of humans. When I worked at Booz, Allen & Hamilton, then at 245 Park Avenue in the old American Brands Building, I spent my days, nights, and weekends preparing reports. Here’s a figure from Google patent document US: 2007/0198481.

Google continues to push products and services into different business sectors. These waves can be disruptive and often the cause of surprising reactions. A good example is the Associated Press’s view that Google is the cause of problems in daily newspapers. The AP overlooks Craigslist.org, questionable management practices, the rising cost of traditional printing and distribution. Google is successful; therefore, Google is the cause. Its technology is the root of the present financial evil at the New York Times, the San Francisco Chronicle, and the Detroit News.

What Google represents is a platform. For those who choose to ignore Google, the risk is similar to that of the people under this rock. If the rock moves, the people will have little time to move to safety.

Stephen Arnold, April 23, 2009

Written by Stephen E. Arnold · Filed Under Business strategy, Feature, Google, News, Online (general), Publishing | Comments Off on Google and Media: iBreakfast Synopsis

Lawson: Enterprise Search, Apps, and CRM

April 22, 2009

The consensus this morning is that software and systems companies want to own digital versions of Henry Ford’s white elephant, the River Rouge facility. The idea was to ingest coal and iron ore at one end and eject Ford motor cars at the other. Like a medieval tailor, the River Rouge notion was to put everything under one roof. Today the MBAs compress this idea of total integration into the breezy “one stop shop”.

Lawson, according to the company’s Web site here,

Lawson provides enterprise software and service solutions in the manufacturing, distribution, maintenance, and service industries. Over 4,500 customers use our software throughout the world.
Our mission is simple: to make you stronger. We start by comparing your performance to industry benchmarks. We help you identify your weaknesses, bottlenecks and pain points. Then we help you implement our integrated enterprise software to alleviate – or even eliminate – those weaknesses. We measure your progress and identify the next set of improvements. Many of our customers say we help them continuously improve their operations. We make them stronger. Lawson Software used its 2009 Lawson Conference and User Exchange the CUE to make enterprise search one of the focal points of this program.

I saw a number of news items about Lawson’s enterprise search solution. The Gilbane Group reported here:

Lawson Enterprise Search is a new product to search both structured and unstructured data across the Lawson S3 enterprise system, Lawson Business Intelligence, the user’s desktop, and even their personal history such as comments entered in Microsoft Office applications.

My recollection of Lawson is that the company offers enterprise resource planning solutions. The company’s software can handle finance, manufacturing, distribution, maintenance, and supply chain functions for an organization. The on premises software has picked up additional functions over the years. Lawson can be deployed for personnel, customer support, and business intelligence applications, among others.

After reading the Gilbane Group’s news story, I navigated to the Lawson site and ran a query “enterprise search” to see how the search system performed. The Gilbane story ran down a checklist of functions that triggered in my mind a dashboard type of system. A user could run a query and then perform various tasks on the result or results. The Gilbane Group’s summary leaned heavily on search functions associated with structured data retrieval or the new “data spaces” technology I report on in Google: The Digital Gutenberg. I was also intrigued by the notion of searching “indexed data”, not the “live transaction database”. Latency becomes a key question for me in this era of real time search. After all, looking for a part that is no longer in inventory to meet the needs of a big customer means that the search must return fresh results. Getting the index our of sync with what’s in the warehouse can be a very big deal in some situations.

Web Site Search

The results of my query on the Lawson’s Web site search function were:

The first hit was a link to the Business Wire news release which I was able to determine was the source of most of the news stories about the roll out of Lawson Enterprise Search. The lingo “search keys” reminded me of my mainframe days. A “freeform” search suggested to me that I could enter a free text query. I was baffled by this statement, however: “Perform directed searches via an interest center”. I am not sufficiently familiar with Lawson to know if an “interest center” is a function in a Lawson installation or if it is a buzzword.

I clicked on the “show marked button” and the system displayed each of the terms in my query “enterprise search” highlighted as shown below:

The system did not limit the query to the bound phrase “enterprise software”. The system also defaulted to a Boolean AND, which I prefer to indiscriminate Boolean ORs favored by some search systems. I manually scanned the first 1,720 results in the list and found that two were relevant to my query “enterprise search”. The other 1,728 did not contain the terms. You can see this for yourself. Run the query “enterprise search” without quotes and click to result number 1,710 here. Neither term appears. I assume that the Lawson engine includes a term injection method that inserts the terms “enterprise” and “search” regardless of the content of the document. I would have looked at more results, but after 1,700 items, I cut off my scan. Based on this, I have questions about the relevance method used in the Lawson Web search system. The misindexed item invited me to write to Lawson at opinionizer@lawson.com. I was not sure what an “opinionizer” record accomplished.

Written by Stephen E. Arnold · Filed Under Business strategy, Database, Enterprise, Feature, Search, Technology | 3 Comments

Content Management: Modern Mastodon in a Tar Pit, Part Two

April 20, 2009

Part 2: Challenges and the Upside… Sort of an Upside

The CMS Tar Pits

Today, search and content management systems are a really big problem. There’s no easy solution for several reasons:

CMS in many cases have become larger and more complex over time. At the same time, the notion of an information object has raced along about a half mile ahead of the vendors’ software. In an organization, folks want to do podcasts, create fancy reports with embedded videos, and sometimes animated charts and graphs. “Search” is an overburdened word and it is unlikely that the wide range of content objects can be indexed without considerable resources.
CMS has morphed into more than Web content. As a result, the often primitive workflow and repository functions choke when asked to support a special purpose retrieval; for example, eDiscovery. The solution to this problem is not to upgrade the CMS search system but to license another solution, pump the content into the specialized system, and run the eDiscovery with spoliation features from that system.
CMS has not solved the problem of Web content. The reason goes back to the relationship between a human writing something and a system that sort of keeps that “something” organized and mostly eliminates the writer’s interaction with a programmer. CMS shifts the focus from setting up a method for creating useful, substantive content to the mechanics of keeping track of content objects and components. As a result, after the hoo haa of the CMS, Web sites have a content problem. The problem is that the information is often out of phase with the needs of the Web site user and the people who want the Web site to generate sales.
CMS increases inefficiencies associated with writing. Organizations are committee writing machines. One or more individuals may write something. Then that “something” gets routed around, changes are made, a version is created, that version is shuffled around, and then an output occurs. Most document decisions are made at the 11th hour under an artificial “crisis”. This method absolves the “author” and the reviewers of real responsibility. The result is a lot of versions of the “something” and a document that is mostly something that is impenetrable. The “author” is like the guy or gal who sent me the engineering paper with a bunch of names on it. That person does not know what’s in the document and does not understand some parts of it. To see this type of writing in action read the instructions for a 1099 or a patent application.
CMS costs only go up. Because CMS systems have to handle the content generated by their licensees, the costs for these puppies go one way—through the roof. Here’s why: CMS infrastructure has to be expanded to handle more documents and ever larger content objects. An email may be 4 Kb of XML. Stuff in a video and you get a bit of an extra load. Stuff in 20,000 documents with rich content and you get to buy lots of hardware, storage, bandwidth, and engineers to keep the Rube Goldberg machine running. The CMS has to be rebuilt on the fly which is plugging a leak in a speedboat towing a skier on Lake Huron. The fix is at best temporary.

In this environment, customers want facets, real time indexing, context sensitive queries, personalization, and access to structured data. No problem, but it won’t be cheap, easy, or doable with most of the existing budgets with which I am familiar.

Do marketers say these features can be delivered? You bet your life. Once the sale is made, the marketer goes to the next account. The vendor’s technical team is left to explain the reality and limitations of what search and content processing can do within the CMS environment.

Who’s the Lucky Mastodon?

So what’s the mastodon? The CMS that companies struggle to make work. What’ s the tar pit? The chair in front of the CFO’s desk. The owner of the CMS has to sit down and explain the cost overruns. The CFO may not care that the system is generating massive indirect costs, but she will certainly want to know about the hardware, software, license fees, consulting services, and programming expenditures.

Where do CMS consultants fit in?

There are good consultants (blue chippers) and not so good consultants (azure chippers). The “blue” connotes proven professionals from established services firms; for example, some units of IBM and some McKinsey and Boston Consulting Group folks. The azure chippers come from the companies with a modest track record and probably some wonder marketing lingo. The Regis McKenna school of marketing is a model for the azure chippers.

Consultants are usually a mirror of their clients. So clients get what they purchase and what emerges as “needs”. The result is that clients with a death of expertise in writing, content production, and enterprise publishing don’t get the problem fixed.

What exists now is a feedback loop that leads from the edge of the tar pit to the bottom of the tar pit. After a few million years, a preserved system is dug up, dissected, and compared to whatever tools are available. Because of the turnover among some enterprise technology professionals, the corporate memory is often shallow and the folks responsible for the mastodon have moved on.

The Upside of the CMS Tar Pit

What’s the positive view of this situation?

I see three positives.

First, the disasters of today’s CMS means that a number of individuals have attended the School of Hard Knocks and learned about some of the demands of content creation, production, and distribution.

Second, the newer systems have advanced beyond training wheels. You get air bags. You get seat belts. You get safety glass. You might be injured, but you probably won’t be killed. The US Senate’s CMS after several years of effort with two high profile vendors was shelved and a different approach pursued.

Third, some of today’s systems work and can be used by normal humans with so so writing skills. I know that it is great fun to whack on the Google, but I know that Adhere Solutions (a Google partner) has implemented some nifty systems that use the GOOG as plumbing. I referenced the newer cloud based services from a Web log vendor elsewhere in this essay. I also pointed out that the Xquery outfit MarkLogic may warrant a look.

What should you do if you want to have a CMS with lousy search? My first thought was to ask you to call me. My second thought was to tell you to buy a copy of Successful Enterprise Search Management. You can get information about this 2009 study by Martin White (European guru) and me (American addled goose) here. My third thought was to suggest a Google search. My fourth thought is to start over.

You will have to choose an appropriate path. My suggestion is to avoid the azure chip consulting firm crowd, newly minted experts, and anyone who sounds like a TV game show announcer.

Stephen Arnold, April 20, 2009

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search, Technology, Text processing | 1 Comment

Content Management: Modern Mastodon in a Tar Pit, Part One

April 17, 2009

Editor’s Note: This is a discussion of the reasons why CMS continues to thrive despite the lousy financial climate. The spark for this essay was the report of strong CMS vendor revenues written by an azure chip consulting firm; that is, a high profile outfit a step or two below the Bains, McKinseys, and BCGs of this world.

Part 1: The Tar Pit and Mastodon Metaphor or You Are Stuck

PCWorld reported “Web Content Management Staying Strong in Recession” here. The author, Chris Kanaracus, wrote:

While IT managers are looking to cut costs during the recession, most aren’t looking for savings in Web content management, according to a recent Forrester Research study. Seventy-two percent of the survey’s 261 respondents said they planned to increase WCM deployments or usage this year, even as many also expressed dissatisfaction with how their projects have turned out. Nineteen percent said their implementations would remain the same, and just 3 percent planned to cut back.

When consulting firms generate data, I try to think about the data in the context of my experience. In general, pondering the boundaries of “statistically valid data from a consulting firm” with the wounds and bruises this addled goose gets in client work is an enjoyable exercise.

These data sort of make sense, but I think there are other factors that make CMS one of the alleged bright spots in the otherwise murky financial heavens.

La Brea, Tar, and Stuck Trapped Creatures

I remember the first time I visited the La Brea tar pits in Los Angeles. I was surprised. I had seen well heads chugging away on the drive to a client meeting in Longbeach in the early 1970s, but I did not know there was a tar pit amidst the choked streets of the crown jewel in America’s golden west. It’s there, and I have an image of a big elephant (Mammut americanum for the detail oriented reader) stuck in the tar. Good news for those who study the bones of extinct animals. Bad news for the elephant.

Is this a CMS vendor snagged in litigation or the hapless CMS licensee after the installation of a CMS system?

I had two separate conversations about CMS, the breezy acronym for content management systems. I can’t recall the first time I discovered that species of mastodon software, but I was familiar with the tar pits of content in organizations. Let’s set the state, er, prep the tar pit.

Organizational Writing: An Oxymoron

Organizations produce quite a bit of information. The vast majority of this “stuff” (content objects for the detail oriented reader) is in a constant state of churn. Think of the memos, letters, voice mails, etc. like molecules in a fast-flowing river in New Jersey. The environment is fraught with pollutants, regulators, professional garbage collection managers, and the other elements of modern civilization.

The authors of these information payloads are writing with a purpose; that is, instrumental writing. I have not encountered too many sonnets, poems, or novels in the organizational information I have had the pleasure of indexing since 1971. In the studies I worked on first at Halliburton Nuclear Utility Services and then at Booz, Allen & Hamilton, I learned that most organizational writing is not read by very many people. A big fat report on nuclear power plants had many contributors and reviewers, but most of these people focused on a particular technical aspect of a nuclear power generation system, not the big fat book. I edited the proceedings of a nuclear conference in 1972, and discovered that papers often had six or more authors. When I followed up with the “lead author” about a missing figure or an error in a wild and crazy equation, I learnedthat the “lead author” had zero clue about the information in the particular paragraph to which I referred.

Flash forward. Same situation today just lots more digital content. Instrumental writing, not much accountability, and general cluelessness about the contents of a particular paragraph, figure, chart, whatever in a document.

Organizational writing is a hotch potch of individuals with different capabilities and methods of expressing themselves. Consider an engineer or mathematician. Writing is not usually a core competency, but there are exceptions. In technical fields, there will be a large number of people who are terse to the point of being incomprehensible and a couple of folks who crank out reams of information. In an organization, volume may not correlate with “right” or “important”. A variation of this situation crops up in sales. A sales report often is structured, particularly if the company has licensed a product to force each salesperson to provide a name, address, phone, number, and comments about a “contact”. The idea is that getting basic information is pretty helpful if the salesperson quits or simply refuses to fill in the blanks. Often the salesperson who won’t play ball is the guy or gal who nails a multi million dollar deal. The salesperson figures, “Someone will chase up the details.” The guy or gal is right. Distinct content challenges arise in the legal department. Customer support has its writing preferences, sometimes compressed to methods that make the customer quit calling.

Why CMS for Text?

The Web’s popularization as cheap marketing created a demand for software that would provide writing training wheels to those in an organization who had to contribute information to a Web site. The Web site has gained importance with each passing year since 1993 when hyperlinking poked its nose from the deep recesses of Standard Generalized Markup Language.

Customer relationship management systems really did not support authoring, editorial review, version control, and the other bits and pieces of content production. Enterprise resource planning systems manage back office and nitty gritty warehouse activities. Web content is not a core competency of these labyrinthine systems. Content systems mandated for regulatory compliance are designed to pinpoint which supplier delivered an Inconel pipe that cracked, what inspector looked at the installation, what quality assurance engineer checked the work, and what tech did the weld when the pipe was installed. Useful for compliance, but not what the Web marketing department ordered. Until recently, enterprise publishing systems were generally confined to the graphics department or the group that churned out proposals and specifications. The Web content was an aberrant content type.

Enter content management.

I recall the first system that I looked at closely was called NCompass. When I got a demo in late 1999, I recall vividly that it crashed in the brightly lit, very cheerful exhibition stand in San Jose. Reboot. Demo another function. Crash. Repeat. Microsoft acquired this puppy and integrated it into SharePoint. SharePoint has grown over time like a snowball. Here’s a diagram of the SharePoint system from www.JoiningDots.net:

SharePoint. Simplicity itself. Source: http://www.joiningdots.net/downloads/SharePoint_History.jpg

A Digital Oklahoma Land Rush

By 2001, CMS was a booming industry. In some ways, it reminded me of the case study I wrote for a client about the early days of the automobile industry. There were many small companies which over time would give way to a handful of major players. Today CMS has reached an interesting point. The auto style aggregation has not worked out exactly like the auto industry case I researched. Before the collapse of the US auto industry in 2008, automobile manufacturing had fractured and globalized. There were holding companies making more vehicles than the US population would buy from American firms. There were vast interconnected of supplier subsystems and below these huge pipelines into more fundamental industrial sectors like chemicals, steel, and rubber.

Written by Stephen E. Arnold · Filed Under Enterprise, Feature, Online (general), Search, Technology, Text processing | Comments Off on Content Management: Modern Mastodon in a Tar Pit, Part One

Lou Rosenfeld on Content Architecture

April 15, 2009

Editor’s Note: The Boye 09 Conference in Philadelphia takes place the first week in May 2009, May 5 to May 7, 2009, to be precise. Attendees can choose from a number of special interest tracks. These include strategy and governance, Intranet, Web content management, SharePoint, user experience, and eHealth. You can get more information about this conference here. One of the featured speakers, is Lou Rosenfeld. You can get more information here. Janus Boye spoke with Mr. Rosenfeld on April 14, 2009. The full text of the interview appears below.

Why is it so hard for organizations to get a grip on user experience design?

Because UX is an interdisciplinary pursuit. In most organizations, the people who need to work together to develop good experiences–designers, developers, content authors, customer service personnel, business analysts, product managers, and more–currently work in separate silos. Bad idea. Worse, these people already have a hard time working together because they don’t speak the same language.

Once you get them all in the same place and help them to communicate better, they’ll figure out the rest.

Why is web analytics relevant when talking about user experience?

Web sites exist to achieve goals of some sort. UX people, for various reasons, rely on qualitative research methods to ensure their designs meet those goals. Conversely, Web analytics people rely on quantitative methods. Both are incomplete without the other – one helps you figure out what’s going on, the other why. UX and WA folks two more groups that need help communicating; I’m hoping my talk in some small way helps them see how they fit together.

Is your book “Information Architecture for the World Wide Web” still relevant 11 years later?

Nah, not the first edition from 1998. It was geared toward developing sites–and information architectures–from scratch. But the second edition, which came out in 2002, was almost a completely new book, much longer and geared toward tuning existing sites that were groaning under the weight of lots of content: good and bad, old and new. The third edition–which was more of a light update–came out in 2006. I don’t imagine information architecture will ever lose relevance as long as there’s content. In any case, O’Reilly has sold about 130,000 copies, so apparently they think our book is relevant.

Does Facebook actually offer a better user experience after the redesign?

I really don’t know. I used to find Facebook an excellent platform for playing Scrabble, but thanks to Hasbro’s legal department, the Facebook version of Scrabble has gone the way of all flesh. Actually, I think it’s back now, but I’ve gotten too busy to fall again to its temptation.

Sorry, that’s something of an underhanded swipe at Facebook. But now, as before, I find it too difficult to figure out. I have a hard time finding (and installing) applications that should be at my fingertips. I’m overwhelmed – and, sometimes, troubled–by all the notifications which seem to be at the core of the new design. I’d far prefer to keep up with people via Twitter (I’m @louisrosenfeld), which actually integrates quite elegantly with the other tools I already use to communicate, like my blog (http://louisrosenfeld.com) and email. But I’m the wrong person to ask. I’m not likely Facebook’s target audience. And frankly, my opinion here is worth what you paid for it. Much better to do even a lightweight user study to answer your question.

Why are you speaking at a Philadelphia web conference organized by a company based in Denmark?

Because they asked so nicely. And because I hope that someday they’ll bring me to their Danish event, so I can take my daughter to the original Legoland.

Janus Boye, April 15, 2009

Written by Stephen E. Arnold · Filed Under Conferences, Feature, News, Online (general) | 1 Comment

Composite Software

April 12, 2009

I was asked about data virtualization last week. As I worked on a short report for the client, I reminded myself about Composite Software, a company with “data virtualization” as a tagline on on its Web site. You can read about the company here. Quick take: the firm’s technology performs federation. Instead of duplicating data in a repository, Composite Software “uses data where it lives.” If you are a Cognos or BMS customer, you may have some Composite technology chugging away within those business intelligence systems. The company opened for business in 2002 and has found a customer base in financial services, military systems, and pharmaceuticals.

The angle that Composite Software takes is “four times faster and one quarter the cost.” The “faster” refers to getting data where it resides and as those data are refreshed. Repository approaches introduce latency. Keep in mind that no system is latency free, but Composite’s approach minimizes latency associated with more traditional approaches. The “cost” refers to the money saved by eliminating the administrative and storage costs of a replication approach.

The technology makes use of a server that handles querying and federating. The user interacts with the Composite server and sees a single-view of the available data. The system can operate as an enabling process for other enterprise applications, or it can be used as a business intelligence system. In my files, I located this diagram that shows a high level view of Composite’s technology acting as a data services layer:

A more detailed system schematic appears in the companies datasheet “Composite Information Server 4.6” The here. A 2009 explanation of the Composite virtualization process is also available from the same page as the information server document.

The system includes a visual programming tool. The interface makes it easy to point and click through SQL query build up. I found the graphic touch for joins useful but a bit small for my aging eyeballs.

If you are a fan of mashups, Composite makes it possible to juxtapose analyzed data from diverse sources. The company makes available a white paper, written by Bloor Research, that provides a useful round up of some of the key players in the data discovery and data federation sector. You have to register before you can download the document. Start the registration process here.

Keep in mind that this sector does not include search and content processing companies. Nevertheless, Composite offers a proven method for pulling scattered, structured data together into one view.

Stephen Arnold, April 12, 2009

Written by Stephen E. Arnold · Filed Under EDiscovery, Enterprise, Feature, Federated search, Technology | Comments Off on Composite Software

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Evvie 2009 Winners: David Evans and Martin Baumgartel

AP Google Spat

Autonomy Thrives in Lousy Economic Climate

Microsoft Fast Arrow Electronics Parametric Search

Google and Media: iBreakfast Synopsis

Lawson: Enterprise Search, Apps, and CRM

Content Management: Modern Mastodon in a Tar Pit, Part Two

Content Management: Modern Mastodon in a Tar Pit, Part One

Lou Rosenfeld on Content Architecture

Composite Software

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta