Digital River Flows into Search

January 31, 2010

The Seeking Alpha transcript of Digital River’s “analyst call” signaled a shift in enterprise search offerings. Here’s what executives of Digital River (a network services firm that is now an ecommerce, marketing, and online services outfit) said, according to “Digital River, Inc. Q4 2009 Earnings Call Transcript”:

In 2010, we plan to further drive the adoption and monetization of these new product investments and shift our focus from expanding the breadth of enhancing the products to the depth of our product portfolio. This means continuing to focus on areas where our clients have indicated they have significant interest. Our 2010 plans including going even deeper into remote control by offering an easy to deploy shopping cart and more options for enterprises to speed their time to market. We also intend to expand our merchandising and product management capabilities for our B-to-B offering, enhance our enterprise search and global business intelligence capabilities, make end customer and administrative performance improvements, and introduce more localized payments and currencies to support our expansion into rapidly growing emerging markets.

I find this interesting because companies like Digital River have been commodity providers in my opinion. The shift to complex, value-added solutions such as search and business intelligence is an interesting development. The assumption is that Digital River will have sufficient bandwidth to index an organization’s content, update the index, and deliver results with the same aplomb that Google has condition the 20 somethings to accept as the status quo. The business intelligence angle is interesting as well because that adds another lay of complexity because end users need reports. Canned reports make great demos but they often fail to answer the specific question at hand. The more years a query has to cover, the more crunching and disc access are needed.

My hunch is that the move will be an interesting one to watch, but when it comes to commodity services in the cloud like search and business intelligence, it will be tough to compete with subsidized business models or bundling. In short, the words sound great, but the delivery might be a bit trickier than the MBA wizards on the call understood. And not a peep about the guts of the search technology.

Stephen E Arnold, January 31, 2010

This “hope springs eternal” write up was a freebie. I shall report this sad fact to the SEC, the US government’s hope specialists.

Info Fragmentation

January 31, 2010

I don’t want to tackle a big philosophical issue is this blog. I do want to point out that while Google has been explaining that it is not a country, Amazon and Macmillan have agreed to disagree. You can read “Amazon and Macmillan Go to War: Readers and Writers are the Civilian Casualties” for a good run down. The point is that online services have been for decades chopping out content when problems arise. The fact is that most online users are clueless about what constitutes an online information system’s content holdings. Researchers jump online, run a query and grab the results. The perception is that the citation list is complete. A student will run a Google query and assume that Google has everything he or she needs to write a killer essay in 15 minutes for an overworked high school teacher. Attorneys are also falling into the trap of assuming that a body of content is complete and accurate. Wrong, dudes and dudettes, wrong, wrong, wrong. I can hear the azure chip consultants and the self appointed search experts gasp in horror. This hypothetical reaction from folks who like to watch videos is not surprising because most people do not do detailed bibliographic and collection analysis. When these cuties encounter someone who does this type of work, there is essentially a miasma of confusion that settles over their brows. Here’s what the scoop is:

  1. A company gets rights to specific information. The publisher changes staff; the database publisher gets an email saying, “The deal must be reworked.” The publisher doesn’t offer more money or customer names or some other requirement. The publisher tells the online vendor to remove the content. This the database producer does and very few people know that info has disappeared. The only  way to track this type of publisher-vendor change is to hope that it becomes a big news item like the Amazon-Macmillan squabble.
  2. An online system has a glitch at loading time. The data * never * make it into the online system. Because  most users do not check online version versus a hard copy, few notice. Heck, at the old Dialog when “gentle Ben” screwed up a file load, we had to tell Dialog that its system spit a hair ball. After denials and excuses, the Dialog tape would be reloaded and all was well. Not every database producer performed this quality check. I can hear the owners of ABI/INFORM snorting now. “Quality. We know quality.” Righto.
  3. A user looks the wrong place for information. Google yaps about universal search but when you need to find info on Google, you have to know the ins and outs of the news archive, the caches, and the specialty indexes. Overlook a manual exercise of running the same query across different indexes, and you will miss info. This happens on most public facing, free systems. Do you run exhaustive queries? I didn’t think so.
  4. Latency. Do you know what this means? Well in a Web index it means that the spider pings a server and the server doesn’t respond. The spider, impatient lass that she is, moves on. Maybe the spider will come back. Maybe not. This means that if an updated content object resides on a  system with latency—that is, really slow system—the content may not be indexed. Ah, ha. Now how do you as a content provider fix this problem? If you don’t know about it, you may not have a quick fix.
  5. Malformed information. A whiz kid does a post and inserts all types of fancy stuff. If you use template developed by third parties for your online service, your cute little widget may “kill” the page. The indexing system can’t “see” the page, so the content does not get indexed.
  6. Corrections. I bet you think that when content is online it is the last, best, and final version. Wrong. Most online services * do not * update a static file indexed at a prior time when a correction to that original article appears in print or on a data feed. Don’t believe me. Run some queries on any online service with a newspaper hard copy that has a correction to a previous story. Now look for that correction online in the original article. My team did the first database to put corrections into online business news. This was expensive and difficult. No one noticed. I think that the new owners of Business Dateline may have forgotten the original correction part of the editorial cycle.

There are other reasons why content disappears and then magically comes back when another change takes place. As people do less rigorous research, the cluelessness about comprehensive, accurate collections increases. Know a librarian. Most can help in this department in my experience.

Stephen E Arnold, January 31, 2010

This is a no fee write up. When I give my SLA spotlight talk in June I will demand a free Diet Pepsi. That’s compensation, and I will report this to the Library of Congress, an outfit moving into open source software. I thought collection management was important too.

Autonomy Pops Up an Email Archiving Toaster

January 31, 2010

Autonomy is in the appliance business. You can get what The Orange Rag called “the Autonomy eDiscovery Appliance.” The idea is that the features of a Clearwell-type of solution is combined with Autonomy’s smart software and connectors. The solution, according to The Orange Rag: “delivers a broad set of unique capabilities” and “meaning based computing”. Among the features embedded in the appliance are search, connectors to various content types, visualization, scalability, and reports. The appliance that has captured some loyal fans is the Clearwell Systems’ “rocket docket” service in its appliance. Clearwell now has a formidable competitor, and I wonder if the value-added software that allows a report to be generated that can be slapped in the hand of opposing counsel and a nifty audit trail feature will be enough to deal with the steroid infused marketing of Autonomy. Should be interesting because Recommind has tried to broaden beyond the legal market in a bid to become an enterprise search vendor. Stratify has morphed several times in its eDiscovery journey. EMC bought Kazeon and may be getting ready to attack the legal eagles from the storage angle. I suppose this is what the azure chip crowd calls “search specialization”. I thought it was savvy product packaging, but what do I know. I am not young and inclined to perceive myself as infallible. I am an addled goose who forgets when he puts his pin feathers.

Stephen E Arnold, January 31, 2010

A freebie. I will report this unpleasant fact to the director of the US Postal Museum where old information methods are on display.

Embedding Lucene

January 31, 2010

The goslings and I participated in a search conference call last week. One of the topics du jour is Lucene. The open source search system continues to fascinate certain government procurement teams and those looking for a low-cost way to provide users with a search-and-retrieval system. The enthusiasm for Lucene and Solr goes up as the age of the information technology professionals decreases. Whatever universities are putting in the Red Bull sold in computer science departments seems to trigger a Lucene / Solr craving.

In the course of the conversation, I mentioned embedding Lucene in commercial software. The advantages ranged from low cost to sidestepping the blow-back from customers. The blow back occurs when the users of software want a feature not in the OEM “stub” embedded in a system or gizmo. The fix is to buy the full version of the software. The “stub” is a good enough chunk of functionality, but it won’t do the fancy back flips some users want when looking for information.

scribovox diagram

© Scribovox 2009

Lucene can be extended as long as the outfit doing the embedding has some Lucene experts on staff or access to a consultant able to keep appointments, complete work on time and in budget, and writes code that works. The example I gave was the Lucene within Scribovox.com.

Scribovox is a software that performs such tricks as converting a podcast to text. You can get more information about the product at http://www.scribvox.com. The information I referenced came from a June 17, 2009 Scribovox design document called “Integration with Social Networks.” I found the information in this write up quite useful, and you can download a copy of the paper from this link.

The author of the paper is Patrick Nicholas. He discusses some interesting ideas; for example:

  • Flow diagrams for processing real time content
  • A useful architecture diagram
  • A discussion of indexing and summarization
  • Some information about Amazon EC2, MapReduce and Hadoop.

If you are serious about open source, I would tuck this document in your bag of tricks. The time estimation puts search and semantics into perspective. Useful for the azure chip crowd since most don’t have too much, if any, oil under their fingers from removing the fuel injection unit from a search system.

Stephen E Arnold, January 31, 2010

A freebie. No one paid me to write this. I will report this charitable act to the boss at the National Cathedral on Wisconsin Avenue, in Washington, DC.

Black and White Photo Search

January 30, 2010

Short honk: I wanted to let you know that “Top 5 Black & White Image Search Engines” provides a description of five photo search systems. What makes this  list useful is that the angle is black and white pix.

Stephen E Arnold, January 29, 2010

A freebie. I will report this to the photo manager at the Department of Energy where there is considerable expertise in managing images.

Fighting over Services: License Fees Faltering, Consulting Fees the Future?

January 30, 2010

I read the UK newspaper Independent’s “Oracle Claims Firm Stole Its Intellectual Property.” The byline says “Reuters”, which is okay with me. The point of the story is that Oracle is taking an outfit in the consulting and services business to the legal wrecking yard. The key passage in the write up in my opinion was:

Corp has filed a suit against a little known rival that provides low-cost software maintenance services, in a case similar to one that Oracle is fighting against rival SAP AG.  The lawsuit, filed in US district court in Nevada on Monday, alleges that privately held Rimini Street stole copyrighted material using the online access codes of Oracle customers.

Oracle, a firm whose pricing model when I was a wee lad, hinged on pegging software license fees to hardware. The more hardware one threw at an Oracle implementation, the more the licensee had to pay. There were options, which have increased the service choices, but the main game was license fees.

The shift that is evident in the big enterprise software world is what I watched IBM do when Microsoft figured out how to catapult a clunky PC opportunity into a $70 billion empire. IBM has been forced to become a consulting firm. Now I know that IBM sells mainframes and mounts hearty public relations campaigns to convince me that mainframes are exactly what I need to run my business. But the main event is services and consulting or what I call “soft work”.

Oracle is facing the same problem even though the cause of Oracle’s woes is deeper than a bad deal with a teenager which changed IBM decades ago. This dust up strikes me as interesting for three reasons:

First, I think this Oracle battle over companies providing “soft work” related to Oracle products and software is motivated by Oracle’s desire to get high margin business. Upstarts and interlopers are in a space of interest to Oracle.

Second, the intellectual property angle is quite important in my opinion. What constitutes know how about a complex enterprise software system? Are the scripts that people post on a forum hosted by a vendor something that a third party could use?

Third, as Oracle chops staff from Sun Microsystems, I wonder if this adds more brainpower to the third party consulting firms. For example, I think I have heard talks by Sun engineers who suggested that open source software and commodity machines were cheap and fast enough. The expertise in Sun hardware’s limitations might be ideal for some service and consulting firms to exploit. How will these bits and pieces of expertise be managed?

In a broader sense, the Oracle actions are harbingers of what other enterprise software vendors will be forced to to. With disruption of traditional enterprise software business models continuing, companies will have to ramp up their services business. Growth via acquisition only works under certain conditions. Services is a vital component of revenue growth if a firm is to survive or avoid takeover.

In the search business, I expect to see more content processing vendors chasing services as well.

Stephen E Arnold, January 30, 2010

A freebie. I shall report this sad state of affairs to the Prospect, Kentucky mayor, a person who has not been able to build a bridge for two years. A consulting firm is assisting I believe. This is why services are a big business in the post crash America.

When Dinosaurs Fight: Oracle vs IBM

January 30, 2010

I enjoyed the Bistahieversor sealeyi fight between Oracle and IBM. The eWeek story “IBM Defends DB2 Against Ellison’s ‘Ignorant’ Remarks” is a delightful he-said, she-said. What made it even more delicioius for me is that both of the companies have aging products and are facing some tough competition from a certain outfit in Silicon Valley. When I read the article, I thought about two dinosaurs making big bird calls and scratching the earth. I want to highlihgt one exchange I though worthy of the Scott McNealy school of competitive sniping:

Ellison [Oracle]: “I can’t understand why IBM has never come out with a database machine. DB2 doesn’t cluster, doesn’t scale, nothing. You cannot run an OLTP [online transaction processing] application on DB2, because it doesn’t scale.”

Spang [IBM]: “Let’s talk about the TPC-C [Transaction Processing Performance Council] benchmark. Over the last seven years, DB2 has been in the leadership position about twice as long as Oracle. This game with benchmarks is a leapfrog game. Companies use the latest hardware, [the results improve] and it depends on point in time. What really matters is looking over a period of time for the consistency in the leadership position. So seven years, about twice as many days in the leadership position [over Oracle].
“I’ll give you another one close to a real-world situation: In the three-tiered SAP benchmark, DB2 [on Power systems] has held the record there for almost five years now, doing more than 50 million SAP steps per hour.
“Let’s talk about the SAP apps themselves. Just last year we announced that more than 100 companies had switched from Oracle to DB2 to power their SAP applications. The stories we hear are: better performance—in the range of 20 percent better—while reducing costs 30 to 40 percent. Coca-Cola Bottling was one that was quoted back then, talking about migrating from Sun servers to Power systems. It just made sense to them from a money point of view. “Larry also said something else: That the [recent] uncertainty about Sun systems was just a blip [due to the acquisition process]. Well, Coca-Cola pointed out that they have been switching from Sun to Power systems over a number of years. “I would argue that the uncertainty about Sun systems versus IBM accelerated a trend, and frankly, the uncertainty remains.

IBM and Oracle are more alike than different. Neither seems ready to acknowledge that an ecosystem change may send both big birds to the butcher.

Stephen E Arnold, January 30, 2010

A freebie. I will report this food related post to the Department of Agriculture.

The Fragility of Microsoft

January 30, 2010

I think Microsoft is a $70 or $80 billion a year outfit. What struck me as interesting was the write up called “Earnings Take Away: Microsoft Is Still Powered by Windows.” (This is one zippy url, so it may go dead at any time.)

The author is Mary Jo Foley, a Microsoft expert, a book about Microsoft, and a blogger with Microsoft ads on her Web column for ZDNet. In short, she is one of the go-to people about things Microsoft. She wrote:

Consumer sales of Windows 7 buoyed Microsoft to report record earnings, even after deferrals were figured in. Microsoft reported net income of $6.66 billion, or 74 cents a share, on revenue of $19.02 billion, which included $1.71 billion in Windows 7 deferred revenue for the quarter. As part of that announcement, Microsoft reported that it has sold more than 60 million Windows 7 licenses to date. The combined Windows and the Windows Live division had operating income of $5.39 billion on revenue of $6.9 billion, compared to the year-ago quarter’s operating income of $2.71 billion on revenue of $4.06 billion.

I am no Microsoft expert. Heck, I couldn’t get Fast Search to work without a cast of dozens and lots of money. What’s that tell you?

When I read this, I realized that Google doesn’t have to do much to put Microsoft in a reverse naked choke hold. If Google incrementally improves its word processing, spreadsheet, presentation, contacts, and email services, there is a big downside potential for Microsoft. Google, on the other hand, has little, if any, risk. If Google opens some tiny slits in Microsoft’s Windows 7 money supply with Android, Chrome, and any other consumer service, another tiny slit opens. Get enough of these tiny slits, and the water buffalo wading in the river to cool off can be consumed by tiny fish. The process is painful, takes a long time, and can result in the water buffalo becoming too weak to get out of the river. What do you do with a dead water buffalo? Have a feast? Make burgers?

That’s the fragility of Microsoft that Google in its Googley way may be banking on.

Stephen E Arnold, January 30, 2010

Nope, no one paid me to write this. Someone connected in a tenuous way to Microsoft sent me an email, but it did not contain any money. It wanted information for free. What a fool am I. I will report this to the folks at NIH who sometimes research fools.

SSN Roll Out Set for February 1, 2010

January 29, 2010

The SSN team has been working hard to prepare the content for the SSN Web log and information service. You can get a taste of the types of information that will appear in the SSN blog. We have made one of our stories available for preview, “Ning: Revising Growth Estimates”.

Watch for our SSN mascot, a bird found at the beach, announcing its presence with a gentle squawk.

tern head

We are planning on beginning with brief news and original features. We have compiled some lists of useful resources, and we have created some reference pages to make it easy to click through a number of sites offering social services. Our Facebook.com page will be one way for readers and critics to communicate with us. We will generate an RSS feed of the new information and send out tweets about our content. The editor is Jessica Bratcher, a former newspaper editor who joined the ArnoldIT.com team two years ago. You can contact her at ssnblog@gmail.com.

Stephen E Arnold, January 29, 2010

This is a shameless marketing pitch for ArnoldIT.com’s and the Beyond Search team’s new Web log. I will report this to the first holy person I see in Harrod’s Creek today. Marketing is shameful. I apologize.

Thomson Reuters Redefines Real Time

January 29, 2010

“Real time” is one of those phrases that is so easy to say but so, so difficult to deliver. Exalead has demonstrated to me a latency of 12 to 15 minutes. This means that when a change is made to the location of a package, that datum becomes available to a user of the client’s search enabled application within 12 to 15 minutes. In my experience, that’s fast. The old Excite.com (Architex) indexing system would grind for hours to update Adverworld pages. A mainstream search system labored for hours to update several million Web pages. But real time means no latency. Zero. Zip. Nada.

Thomson Reuters’ approach is explained in “Thomson Reuters Delivers Microsecond Access To News In London And Chicago.” Real time means that in Chicago and New York, certain content is available in microseconds. The write up said:

Rich Brown, Global Business Manager, Machine Readable News, Thomson Reuters, said: “Being first to act on this information can dramatically affect a firm’s profit and loss. The launch of NewsScope Direct, the market’s fastest machine readable news service, into London and Chicago reflects our commitment to delivering the market moving information our clients need at the speed required by their high performance trading strategies.”

Some questions:

  1. Are the data numeric or text?
  2. What is the latency for the information prior to its being received at a Thomson Reuters’ data center?
  3. What does “microsecond” mean?
  4. What part of the system delivers “microsecond” access?

Until I know more, I think this is a marketing and PR play to differentiate Thomson Reuters from other financial trading data vendors. I wonder if Thomson Reuters is able to beat the pants off Exegy, another outfit with speedy systems for the financial services industry?

Stephen E Arnold, January 29, 2010

A post I wrote whilst watching Tyson shiver in front of the fire. I will report his chill and my lack of compensation to the sharp eyed folks at the SEC.

Next Page »