Exalead Reports Banner Year

February 5, 2010

Exalead contacted me after reading the posts about Autonomy and Coveo. The essence of the message from the Paris-based, search-enabled-applications company was,

Despite the economic downturn, Exalead continued its growth in 2009 with a worldwide revenue of $22,7M. Software sales grew by almost 20% in France and by 30% in the US, which is a great source of satisfaction to us. We added 50 new references to our global customer base, achieving a 25% growth. (Exalead source)

I followed up and learned that the experienced a strong demand for its professional services revenues. The company is, according to my Exalead source, a leader in “SBA”. This acronym means search-based applications.

Exalead indexes the Google Web logs on the ArnoldIT.com Web site as part of my team’s effort to provide demonstrations of participating vendors’ technology. You can see the Exalead search system with entity extraction and other features at http://overflight.labs.exalead.com/.

Stephen E Arnold, February 4, 2010

No one paid me to write this short item. I will demand that the Exalead CTO provide me with a Diet Pepsi next time I am in Paris. No wine, please. The goose is an alcohol-free zone, avoids paté, and usually steers clear of CDG. I will report this lack of payment to the Department of State. Bring back American fries!

Coveo Reports Record Fourth Quarter 2009 Results

February 4, 2010

I learned via the Coveo Web site that the Coveo search and content processing company turned in a strong fourth quarter. I did learn that the company closed 37 deals and closed its Series B round of financing. In addition, the company’s engineers have continued to expand the capabilities of the Coveo platform. Coveo is privately held, and I was not able to get my hands on specific numbers. According the company,

Coveo expanded relationships with existing customers, including CA and GEICO, and added new enterprise clients including one of the world’s largest restaurant companies; Quantum Corp., the leading global specialist in backup, recovery and archive; as well as Trading Technologies, Platt Electrical Supply,  Laureate Education, Inc., Grand Circle Travel,  and several other leading organizations.

The Coveo team pointed me to a client, John Ragsdale, VP of Technology at Technology Services Industry Association who said:

Tied to no single knowledge base or content management tool, Coveo’s platform does very creative indexing of all enterprise and customer support content (biggest library of packaged connectors I’ve seen) and enables additional attribution or meta data to be associated to the content–sort of sophisticated tagging. With the slickest mashup capabilities I’ve seen–including real time data pulls–they have created dashboards, pulling in and analyzing data from any number of content sources, and showing the results.  It is much more than a search engine, or a dashboarding tool, or a reporting platform, though it can do all of these things well. Additionally, by enriching existing content with additional metadata, Coveo can help companies leverage old legacy systems that still serve their purpose but don’t allow much in the way of integration or reporting.

The question becomes, “What’s next?”  For more information, navigate to www.coveo.com.

Stephen E Arnold, February 4, 2010

A freebie. No one paid me to write this article. I am still waiting for the taco promised to me in October 2009 by one of Coveo executives. I will report this misstep to the director of the GSA’s cafeteria in Washington, DC.

The Wages of SEO

February 4, 2010

A not so happy quack to the reader who sent me a long, long diatribe by SEO guru Daniel Sullivan, father of the mega-search engine optimization conferences. These are held seemingly every few days in every city around the world. Fearful marketing managers and snake oil sales professionals meet and greet in a unbridled mating game. The idea is that a fearful marketing manager with lousy Web traffic will speed date SEO experts, and both will go off to click through bliss. Well, that’s the theory.

Let me give you the cast of characters:

  • Daniel Sullivan, search expert, SEO guru, and father of giant, zealot-stuffed conferences
  • Mark Cuban, entrepreneur, Google critic and basketball team owner with an investment in Mahalo, IceRocket and other properties
  • Jason Calacanis, entrepreneur, business seer, and New Yorker who nurtures Mahalo.com, a conference, and a snazzy electric sports car
  • Google. Yes, the Google that is the bane of Rupert Murdoch and other publishing executives obsessed with “real journalism” and pay walls.

My goodness. This line up is like a modern version of a Greek drama. Each character is larger than life itself.

You will want to read “He Calls Google A Vampire, But Mark Cuban’s Mahalo Is Doing The Sucking.” I quite liked the screen shots, the red arrows, and the description of the SEO tricks identified by the master himself. If you have some trouble figuring out who is the bad guy in this analysis, you are with me. The basic idea behind the write up is that a basketball team owner is not happy with Google. The basketball team owner sees Google as a company profiting on the labor of others. The SEO guru is annoyed that the basketball team owner has invested in the New Yorker’s search company that uses the SEO methods taught at the SEO guru’s conferences to generate money.

In the write up, the savvy New Yorker (Brooklyn, in fact) is an alleged villain. The write up explains in great detail the SEO tricks used by the New Yorker to generate money via Google’s monetization programs. Keep in mind that these tactics are part of the warp and woof of the SEO guru’s conferences.

The “vampire” Google wants traffic and, therefore, wants to get as many people clicking within the Google world as possible. Web site owners want to ride the money train too, so Web site owners need SEO. The SEO guru delivers the goods; that is, methods for spoofing Google.

What we have in the write up is a description of the feedback loop that has made Web search less effective over the last three or four years in my experience. I can’t figure out who is the good guy and who is the bad guy. Maybe the cast of characters, like Greek mythological figures, are a mix of good and evil, deeply conflicted, and sufficiently confused to make really bad mistakes. Remember Orpheus, Sisyphus, et al?

I know a fix.

Why not log on to a social networking system and post a question. You may have a better chance of getting a useful result just asking people. Search is broken. SEO has played a role. Move on.

Stephen E Arnold, February 4, 2010

Inside Search: Raymond Bentinck of Exalead, Part 2

February 4, 2010

This is the second part of the interview with Raymond Bentinck of Exalead.

Isn’t this bad marketing?

No. This makes business sense.Traditional search vendors who may claim to have thousands of customers tend to use only a handful of well managed references. This is a direct result of customers choosing technology based on these overblown marketing claims and these claims then driving requirements that the vendor’s consultants struggle to deliver. The customer who is then far from happy with the results, doesn’t do reference calls and ultimately becomes disillusioned with search in general or with the vendor specifically. Either way, they end up moving to an alternative.

I see this all the time with our clients that have replaced their legacy search solution with Exalead. When we started, we were met with much skepticism from clients that we could answer their information retrieval problems. It was only after doing Proof of Concepts and delivering the solutions that they became convinced. Now that our reputation has grown organizations realize that we do not make unsubstantiated claims and do stick by our promises.

What about the shift to hybrid solutions? An appliance or an on premises server, then a cloud component, and maybe some  fairy dust thrown in to handle the security issues?

There is a major change that is happening within Information Technology at the moment driven primarily by the demands placed on IT by the business. Businesses want to vastly reduce the operational cost models of IT provision while pushing IT to be far more agile in their support of the business. Against this backdrop, information volumes continue to grow exponentially.

The push towards areas such as virtual servers and cloud computing are aspects of reducing the operational cost models of information technology provision. It is fundamental that software solutions can operate in these environments. It is surprising, however, to find that many traditional search vendors solutions do not even work in a virtual server environment.

Isn’t this approach going to add costs to an Exalead installation?

No, because another aspect of this is that software solutions need to be designed to make the best use of available hardware resources. When Exalead provided a solution to the leading classified ads site Fish4.co.uk, unlike the legacy search solution we replaced, not only were we able to deploy a solution that met and exceeded their requirements but we reduced the cost of search to the business by 250 percent. A large part of this was around the massively reduced hardware costs associated with the solution.

What about making changes and responding quickly? Many search vendors simply impose a six month or nine month cycle on a deployment. The client wants to move quickly, but the vendor cannot work quickly.

Agility is another key factor. In the past, an organization may implement a data warehouse. This would take around 12 to 18 months to deploy and would cost a huge amount in hardware, software and consultancy fees. As part of the deployment the consultants needed to second guess the questions the business would want to ask of the data warehouse and design these into the system. After the 12 to 18 months, the business would start using the data warehouse and then find out they needed to ask different types of questions than were designed into the system. The data warehouse would then go through a phase of redevelopment which would last many more months. The business would evolve… making more changes and the cycle would go on and on.

With Exalead, we are able to deploy the same solution in a couple months but significantly there is no need to second guess the questions that the business would want to ask and design them into the system.

This is the sort of agile solution that businesses have been pushing their IT departments to deliver for years. Businesses that do not provide agile IT solutions will fall behind their competitors and be unable to react quickly enough when the market changes.

One of the large UK search vendors has dozens of niche versions of its product. How can that company keep each of these specialty products up to date and working? Integration is often the big problem, is it not?

The founders of Exalead took two years before starting the company to research what worked in search and why the existing search vendors products were so complex. This research led them to understand that the search products that were on the marketplace at the time all started as quite simple products designed to work on relatively low volumes of information and with very limited functional capabilities. Over the years, new functionality has been added to the solutions to keep abreast of what competitors have offered but because of how the products were originally engineered they have not been clean integrations. They did not start out with this intention but search has evolved in ways never imagined at the time these solutions were originally engineered.

Wasn’t one of the key architects part of the famous AltaVista.com team?

Yes. In fact, both of the founders of Exalead were from this team.

What kind of issues occur with these overly complex products?

As you know, this has caused many issues for both vendors and clients. Changes in one part of the solution can cause unwanted side effects in another part. Trying to track down issues and bugs can take a huge amount of time and expense. This is a major factor as to why we see the legacy search products on the market today that are complex, expensive and take many months if not years to deploy even for simple requirements.

Exalead learned from these lessons when engineering our solution. We have an architecture that is fully object-orientated at the core and follows an SOA architecture. It means that we can swap in and out new modules without messy integrations. We can also take core modules such as connectors to repositories and instead of having to re-write them to meet specific requirements we can override various capabilities in the classes. This means that the majority of the code that has gone through our quality-management systems remains the same. If an issue is identified in the code, it is a simple task to locate the problem and this issue is isolated in one area of the code base. In the past, vendors have had to rewrite core components like connectors to meet customers’ requirements and this has caused huge quality and support issues for both the customer and the vendor.

What about integration? That’s a killer for many vendors in my experience.

The added advantage of this core engineering work means that for Exalead integration is a simple task. For example, building new secure connectors to new repositories can be performed in weeks rather than months. Our engineers can take this time saved to spend on adding new and innovative capabilities into the solution rather than spending time worrying about how to integrate a new function without affecting the 1001 other overlaying functions.

Without this model, legacy vendors have to continually provide point-solutions to problems that tend to be customer-specific leading to a very expensive support headache as core engineering changes take too long and are too hard to deploy.

I heard about a large firm in the US that has invested significant sums in retooling Lucene. The solution has been described on the firm’s Web site, but I don’t see how that engineering cost is offset by the time to market that the fix required. Do you see open source as a problem or a solution?

I do not wake up in the middle of the night worrying about Lucene if that is what you are thinking! I see Lucene in places that have typically large engineering teams to protect or by consultants more interested in making lots of fees through its complex integration. Neither of which adds value to the company in, for example, reducing costs of increasing revenue.

Organizations that are interested in providing cost effective richly functional solutions are in increasing numbers choosing solutions like Exalead. For example, The University of Sunderland wanted to replace their Google Search Appliance with a richer, more functional search tool. They looked at the marketplace and chose Exalead for searching their external site, their internal document repositories plus providing business intelligence solutions over their database applications such as student attendance records. The search on their website was developed in a single day including the integration to their existing user interface and the faceted navigation capabilities. This represented not only an exceptionally quick implementation, far in excess of any other solution on the marketplace today but it also delivered for them the lowest total cost of ownership compared to other vendors and of course open-source.

In my opinion, Lucene and other open-source offerings can offer a solution for some organizations but many jump on this bandwagon without fully appreciating the differences between the open source solution and the commercially available solutions either in terms of capability or total cost. It is assumed, wrongly in many instances, that the total cost of ownership for open source must be lower than the commercially available solutions. I would suggest that all too often, open source search is adopted by those who believe the consultants who say that search is a simple commodity problem.

What about the commercial enterprise that has had several search systems and none of them capable of delivering satisfactory solutions? What’s the cause of this? The vendors? The client’s approach?

I think the problem lies more with the vendors of the legacy search solutions than with the clients. Vendors have believed their own marketing messages and when customers are unsatisfied with the results have tended to blame the customers not understanding how to deploy the product correctly or in some cases, the third-party or system integrator responsible for the deployment.

One client of ours told me recently that with our solution they were able to deliver in a couple months what they failed to do with another leading search solution for seven years. This is pretty much the experience of every customer where we have replaced an existing search solution. In fact, every organization that I have worked with that has performed an in-depth analysis and comparison of our technology against any search solution has chosen Exalead.

In many ways, I see our solution as not only delivering on our promises but also delivering on the marketing messages that our competitors have been promoting for years but failing to deliver in reality.

So where does Exalead fit? The last demo I received showed me search working within a very large, global business process. The information just appeared? Is this where search is heading?

In the year 2000, and every year since, a CEO of one of the leading legacy search vendors made a claim that every major organization would be using their brand of meaning based search technology within two years.

I will not be as bold as him but it is my belief that in less than five years time the majority of organizations will be using search based applications in mission critical applications.

For too long software vendors have been trying to convince organizations, for example, that it was not possible to deploy mission critical solutions such as customer 360 degree customer view, Master Data Management, Data Warehousing or business intelligence solutions in a couple months, with no user training, with with up-to-the-minute information, with user friendly interfaces, with a low cost per query covering millions or billions of records of information.

With Exalead this is possible and we have proven it in some of the world’s largest companies.

How does this change the present understanding of search, which in my opinion is often quite shallow?

Two things are required to change the status quo.

Firstly, a disruptive technology is required that can deliver on these requirements and secondly businesses need to demand new methods of meeting ever greater business requirements on information.

Today I see both these things in place. Exalead has proven that our solutions can meet the most demanding of mission critical requirements in an agile way and now IT departments are realizing that they cannot support their businesses moving forward by using traditional technologies.

What do you see as the trends in enterprise search for 2010?

Last year was a turning point around Search Based Applications. With the world-wide economy in recession, many companies have put projects on hold until things were looking better. With economies still looking rather weak but projects not being able to be left on ice for ever, they are starting to question the value of utilizing expensive, time consuming and rigid technologies to deliver these projects.

Search is a game changing technology that can deliver more innovative, agile and cheaper solutions than using traditional technologies. Exalead is there to deliver on this promise.

Search, a commodity solution? No.

Editor’s note: You can learn more about Exalead’s search enable applications technology and method at the Exalead Web site.

Stephen E Arnold, February 4, 2010

I wrote this post without any compensation. However, Mr. Bentinck, who lives in a far off land, offered to buy me haggis, and I refused this tasty bribe. Ah, lungs! I will report the lack of payment to the National Institutes of Health, an outfit concerned about alveoli.

Google Content Engine Adds Functions

February 3, 2010

The technical gizmos within Google are less exciting that Android tablets, China, and dust ups in Europe. I want to document that the Google received a patent for its invention of a system and method for “Automatic completion of Fragments of Text.” You can read US7657423 at the USPTO’s fine Web site. The inventors are a couple of real wizards. If you have read my Google Version 2.0, you will recognize the names of Simon Tong and Georges Harik. Here’s the abstract for the patent filed in January 2007 and awarded on February 2, 2010:

A system offers potential completions for fragments of text. The system may obtain a text fragment and identify documents that include the text fragment. The system may locate sentences within the documents that include at least a portion of the text fragment, identify sentence endings associated with the located sentences, and present the sentence endings as potential completions for the text fragment.

This is one of Google’s fill in the blanks methods. These are quite important when assembling meaningful chunks of content or locating missing pieces of information when a source has a gap. The method can be applied to other operations as well. Considered in conjunction with Google’s disambiguation and dataspace methods, the invention is an important one in my opinion.

Stephen E Arnold, February 3, 2010

A freebie. No one paid me to point out that this open source document contains useful information about Google’s plumbing. I will report this lack of payment to the subcontractor who handles janitorial duties at the New Executive Office Building. “Janitors” are really an important Google innovation as well. Just think about Dilbert’s janitor.

Oracle Sun Will Try to Roadblock the Commodity Hardware Bandwagon

February 2, 2010

What’s the difference in cost between commodity hardware and branded hardware? The answer depends on how you count. I unearthed some old Google data years ago that suggested the Google could deliver orders of magnitude more performance with its home brew approach than it could achieve with name brand gear. The data were sufficiently obscure that my client at BearStearns was reluctant to include a 17X performance increase in one of our reports. I had enough Googley charts and tech articles to get the risk loving BearStearns’ crowd to go with a 4X benefit, but it was a tough discussion. My hunch is that Google won’t update or comment on how fast its home brew system goes. If these Google data were accurate, the implications for commodity hardware with the Googley fairy dust translates to a name brand stocked data center, running name brand gizmos would be more expensive than a Google equipped data center. Stated simply, if Google spends $1,000 dollars for a chunk of performance, a competitor would have to spend $4,000 or more to match Google’s performance. Big difference. Now you see why the 17X type of number made the BearStearns’ wizards nervous. $1,000 of Google gear translates to a whopping $17,000 of branded gear for comparable performance. With data centers hitting $650 million or more for outfits like Microsoft, the price tags to match Google become quite large.

image

When you say, RDBMS and performance, I see this technical diagram for addressing petascale data management challenges.

Now shift to the Oracle Sun deal. The Cnet write up “Oracle-Sun Versus Commodity Hardware” got my tiny goose brain turning in circles. Performance has been an issue with Oracle installations in my experience for a long time. The standard solution has been to throw hardware at the problem. Other RDBMS systems require the same remediation. Adding machines boosts the Oracle license revenue. For many years, the Fortune 1000 cheerfully pumped money into hardware and Oracle in order to keep response time within acceptable limits. Some companies such as the financial institution with which I worked a decade ago wanted me to find an alternative to throwing hardware at an Oracle system to speed up performance. I hooked the outfit up with a company called CrossZ, which has morphed into QueryObject. But most Oracle customers are happy to follow the recommendations of their Oracle DBA and the Oracle sales professionals.

The Cnet story included this interesting passage:

…with its newly acquired Sun hardware business, announced last week that it would go in the opposite direction and start selling direct in order to gain back the profit margin lost to VARs. As CNET’s Stephen Shankland wrote, Oracle is now a hardware company and needs to offset the fact that it owns a number of commodity products, including not just Sun servers but also MySQL and other pieces of software. By eliminating the middleman channel, Oracle can bump up margins. But it’s not clear that the market will be willing to pay a premium for Oracle-Sun products.

I disagree with Cnet’s belief that Oracle can make this variant of throwing hardware at a performance problem. Here’s why:

  • The economic meltdown has reminded CFOs that the free spending days of yore are not appropriate for the present business climate. This means pushback from some Oracle clients who used to roll over like a dog under the ministrations of Caesar Millan, the dog whisperer.
  • Hardware fixes to aging Oracle technology won’t do the job in the present world of big data. The Oracle database is not the right tool for big data. If it were, perhaps Google’s engineers, many of whom had some Sun experience on their bios, would have embraced Oracle. Google went a different direction, and I think it was a wise one as do the people who have suitcases of money from the Google home run.
  • Innovators like Mark Logic are forcing Oracle to write quasi technical papers to point out that Mark Logic’s performance metrics are wrong. Nope. Oracle is wrong, and the problem is not Mark Logic. The problem is a variant of the Microsoft and other aging architectures. Just as the Model T cannot win a drag race against a hopped up Honda, Oracle cannot outperform a Mark Logic system. Old is to be respected. Old is not a solution for certain data management problems.

In short, Oracle purchased Sun for some good reasons. I am not sure hardware – software bundles and Sun servers offered to clients with tortoise like Oracle systems are among the best reasons. Just my opinion.

Stephen E Arnold, February 2, 2010

No one paid me to write this. A year ago, Mark Logic bought me a bagel. Since then, zip. I will report the food payment to the FDA in the morning.

Google Doubts Competence of Outsell Survey Team

February 2, 2010

I don’t know much about newspaper news click through rates or azure chip consultants, but I know when a Googler awards a failing grade for research. First, read the story “Google Exec: We’re Here to Help Newspapers.” Then read this passage:

And last week, digital marketing firm Outsell released a report claiming that 44% of Google News users don’t click through to the original sites. Mr. Varian dismissed Outsells’s report, claiming the survey design is “not very impressive,” and a Google spokesperson said Google sends more than 4 billion clicks to publishers worldwide each month. “It’s a symbiotic relationship,” Mr. Varian said. “As a search engine, we want rich content out there for our users to find.”

When I read these words, I understood Google’s economics whiz to be underwhelmed by the work of Bay Area consulting firm “Outsell”. (I am not sure what this word means I must confess. I know “up sell” but not “outsell”.) Now don’t get me wrong, I think some of the Googler economist’s ideas were likely to be less than helpful to the troubled newspaper industry. Traditional publishing and Google come from different domains, well, maybe different planets. Technology means one thing to publishers and in my opinion quite another to Googlers. My hunch is that “Outsell” won’t displace Gartner’s super consultant as the Google’s go-to azure chip consultant. Maybe that is “moonbeam azure” consultant?

Stephen E Arnold, February 2, 2010

No one paid me to write about the color of consultants. I am not sure which agency in Washington is responsible for blogs that write without pay about color. Maybe HUD. I think there is a Porter Paint color called “moonbeam azure” and I know there is a “fail red.”

SSN Launched

February 1, 2010

Strategic Social Networking is now live. You can visit the site by pointing your browser to http://ssnblog.com. The Beyond Search team developed SSN to cover the management implications and strategic applications of social networking to business and professionals. The blog will provide commentary, brief original videos, links, lists, news, and information.

SSN will include original articles and opinions from social experts like Craig James (CatStrat) and experienced executives like Jerry Constantino, author of the popular ItsNutsOutThere blog. Jerry is a deeply experienced publishing executive, author, and entrepreneur. We will also feature original research which will bring you new insights into the strategic implications of social media.

The SSN blog wants to give you a way to see the strategic angles social media use introduces to business. We want to write for a business professional who needs to generate sales leads, build a brand, and jump start a consulting opportunity. And we want to provide examples, tips, and useful sources for the individual working in an organization embracing the social networking revolution. We offer an RSS feed, and we will have a Facebook and Twitter presence. The comments section of the blog is available to you. Editor Jessica Bratcher and her team want to hear from you.

Stephen E Arnold
February 1, 2010

This is a sponsored post paid for by Stephen E. Arnold

Digital River Flows into Search

January 31, 2010

The Seeking Alpha transcript of Digital River’s “analyst call” signaled a shift in enterprise search offerings. Here’s what executives of Digital River (a network services firm that is now an ecommerce, marketing, and online services outfit) said, according to “Digital River, Inc. Q4 2009 Earnings Call Transcript”:

In 2010, we plan to further drive the adoption and monetization of these new product investments and shift our focus from expanding the breadth of enhancing the products to the depth of our product portfolio. This means continuing to focus on areas where our clients have indicated they have significant interest. Our 2010 plans including going even deeper into remote control by offering an easy to deploy shopping cart and more options for enterprises to speed their time to market. We also intend to expand our merchandising and product management capabilities for our B-to-B offering, enhance our enterprise search and global business intelligence capabilities, make end customer and administrative performance improvements, and introduce more localized payments and currencies to support our expansion into rapidly growing emerging markets.

I find this interesting because companies like Digital River have been commodity providers in my opinion. The shift to complex, value-added solutions such as search and business intelligence is an interesting development. The assumption is that Digital River will have sufficient bandwidth to index an organization’s content, update the index, and deliver results with the same aplomb that Google has condition the 20 somethings to accept as the status quo. The business intelligence angle is interesting as well because that adds another lay of complexity because end users need reports. Canned reports make great demos but they often fail to answer the specific question at hand. The more years a query has to cover, the more crunching and disc access are needed.

My hunch is that the move will be an interesting one to watch, but when it comes to commodity services in the cloud like search and business intelligence, it will be tough to compete with subsidized business models or bundling. In short, the words sound great, but the delivery might be a bit trickier than the MBA wizards on the call understood. And not a peep about the guts of the search technology.

Stephen E Arnold, January 31, 2010

This “hope springs eternal” write up was a freebie. I shall report this sad fact to the SEC, the US government’s hope specialists.

Info Fragmentation

January 31, 2010

I don’t want to tackle a big philosophical issue is this blog. I do want to point out that while Google has been explaining that it is not a country, Amazon and Macmillan have agreed to disagree. You can read “Amazon and Macmillan Go to War: Readers and Writers are the Civilian Casualties” for a good run down. The point is that online services have been for decades chopping out content when problems arise. The fact is that most online users are clueless about what constitutes an online information system’s content holdings. Researchers jump online, run a query and grab the results. The perception is that the citation list is complete. A student will run a Google query and assume that Google has everything he or she needs to write a killer essay in 15 minutes for an overworked high school teacher. Attorneys are also falling into the trap of assuming that a body of content is complete and accurate. Wrong, dudes and dudettes, wrong, wrong, wrong. I can hear the azure chip consultants and the self appointed search experts gasp in horror. This hypothetical reaction from folks who like to watch videos is not surprising because most people do not do detailed bibliographic and collection analysis. When these cuties encounter someone who does this type of work, there is essentially a miasma of confusion that settles over their brows. Here’s what the scoop is:

  1. A company gets rights to specific information. The publisher changes staff; the database publisher gets an email saying, “The deal must be reworked.” The publisher doesn’t offer more money or customer names or some other requirement. The publisher tells the online vendor to remove the content. This the database producer does and very few people know that info has disappeared. The only  way to track this type of publisher-vendor change is to hope that it becomes a big news item like the Amazon-Macmillan squabble.
  2. An online system has a glitch at loading time. The data * never * make it into the online system. Because  most users do not check online version versus a hard copy, few notice. Heck, at the old Dialog when “gentle Ben” screwed up a file load, we had to tell Dialog that its system spit a hair ball. After denials and excuses, the Dialog tape would be reloaded and all was well. Not every database producer performed this quality check. I can hear the owners of ABI/INFORM snorting now. “Quality. We know quality.” Righto.
  3. A user looks the wrong place for information. Google yaps about universal search but when you need to find info on Google, you have to know the ins and outs of the news archive, the caches, and the specialty indexes. Overlook a manual exercise of running the same query across different indexes, and you will miss info. This happens on most public facing, free systems. Do you run exhaustive queries? I didn’t think so.
  4. Latency. Do you know what this means? Well in a Web index it means that the spider pings a server and the server doesn’t respond. The spider, impatient lass that she is, moves on. Maybe the spider will come back. Maybe not. This means that if an updated content object resides on a  system with latency—that is, really slow system—the content may not be indexed. Ah, ha. Now how do you as a content provider fix this problem? If you don’t know about it, you may not have a quick fix.
  5. Malformed information. A whiz kid does a post and inserts all types of fancy stuff. If you use template developed by third parties for your online service, your cute little widget may “kill” the page. The indexing system can’t “see” the page, so the content does not get indexed.
  6. Corrections. I bet you think that when content is online it is the last, best, and final version. Wrong. Most online services * do not * update a static file indexed at a prior time when a correction to that original article appears in print or on a data feed. Don’t believe me. Run some queries on any online service with a newspaper hard copy that has a correction to a previous story. Now look for that correction online in the original article. My team did the first database to put corrections into online business news. This was expensive and difficult. No one noticed. I think that the new owners of Business Dateline may have forgotten the original correction part of the editorial cycle.

There are other reasons why content disappears and then magically comes back when another change takes place. As people do less rigorous research, the cluelessness about comprehensive, accurate collections increases. Know a librarian. Most can help in this department in my experience.

Stephen E Arnold, January 31, 2010

This is a no fee write up. When I give my SLA spotlight talk in June I will demand a free Diet Pepsi. That’s compensation, and I will report this to the Library of Congress, an outfit moving into open source software. I thought collection management was important too.

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta