Charging Backward by Charging for Content
October 25, 2009
In my years in the commercial online business, I learned a few things the hard way. When I got rolling in the commercial database business, there were few examples of proven money making methods from digital content. As a result, one had to do some thinking, devise tests, and then go forward. Today there are some models to examine. For example, the idea of the third party payer is a useful one. Originated by GoTo.com (I think), Google looked at the model and used it as the lifeblood of its revenue approach. Another model is what’s called “must have” information. A lawyer engaged in patent litigation knows the USPTO system is less than perfect. Commercial services such as those available from Derwent and Questel provide an alternative but an expensive one. In fact, these services are sufficiently costly to keep most online users in the dark about what the services contain and how they work.
This is an image of Stephen E Arnold when he worked at a traditional publishing company. The environment and the business processes created a case study in the nature vs nurture debate.
One lesson I learned was that an online company has to find a mix of business methods that produce revenue. There is no one size that fits all. Even the Google is pursuing subscriptions, license fees, and partner upfront payments as ways to keep the money pumping.
This is a picture of Stephen E. Arnold when he focused exclusively on electronic publishing. His Neanderthal characteristics have become less evident.
It is, therefore, not surprising that publishing companies want to charge for content. What I find interesting is that some of the publishers are taking a somewhat war like stance to what is little more than a business problem. For example, read “WSJ Editor: Those Who Believe Content Should Be Free Are Neanderthals”. The idea that I took away from this article is that ad hominen arguments are in vogue. I am not sure that I am upset with the possibility that I am prehistoric.
The question I had when reading the article was, “Why are publishers so late to the pay for content party?”
I know that publishers have been trying to crack the online revenue code for a long time. I was a beta tester of the original Dow Jones desktop software. The idea was that I could use the software to search for content on the fledgling Dow Jones service in the 1990s. The Wall Street Journal is still trying. In fact, a person with whom I spoke last week told me, “Dow Jones is the only publisher making money from its online service.” That’s not true. One of the most successful online publishing companies is Thomson Reuters. Others include Reed Elsevier and Consumer Reports.
Reflections on SharePoint and Search
October 19, 2009
I had an interesting conversation with a librarian at the international library conference in Hammersmith on December 15, 2009. This professional from a central European country asked my views about SharePoint. As I understood her comments, she was in a SharePoint centric environment and found that making small changes to improve information access was difficult, if not impossible.
One reason was her organization’s bureaucratic set up. Her unit was in the same building at the information technology group, but IT reported to one senior manager and her library to another. Another reason was the reluctance of the IT team to make too many changes to SharePoint and its indifference to her legacy library systems. Her challenge was even more difficult because there were multiple legacy systems in the information center. One of these system offered search, and she wanted to have SharePoint make use of the content in that legacy system’s repository. She was not sure which vendors’ search system was in the legacy system, but she thought it was from the “old” Fast Search & Transfer outfit.
The motto of IT and another unit’s management.
Okay, I thought. Fast Search & Transfer, the company Microsoft bought in 2008 and had spent the last 18 months converting into the world class search system for SharePoint. Fast ESP would break through the 50 million document ceiling in SharePoint search and add user experience plus a nifty range of functions.
To make a long story short, she wanted to know, “Will Microsoft and SharePoint support the “old” versions of Fast ESP?” I told her that I recalled reading that Microsoft would stand behind the Fast ESP technology for a decade. She replied, “Really?” Frankly, I do not know if that guarantee extends to OEM instances of the “old” Fast ESP. My hunch is that the problem will fall upon the vendor licensing the Fast ESP engine. But I have learned and suffered from the fact that it is very easy to write marketing collateral and somewhat more difficult to support a system—any system—for a decade. I know mainframe systems that have been around for 30 years, possibly more. But the Linux centric search systems built to index the Web content are a different kettle of Norwegian herring.
My understanding of the trajectory of Fast ESP is that the company took the core of its high speed Web indexing system and reworked it to handle enterprise content. The “old” Fast ESP abandoned the Web search market when the company’s top brass realized that Google’s technology was powering past the Fast technology. Enterprise search became the focus of the company. Over the years, the core of Fast’s Web search system has been “wrapped” and “tweaked” to handle the rigors and quite different requirements of enterprise search. The problem with the approach as I pointed out in the three editions of the Enterprise Search Report I wrote between 2003 and 2006 were:
- Lots of moving parts. My research revealed that a change in a Fast script “here” could produce an unexpected result elsewhere in the system “there”. Not even my wizard son could deal with these discomforting technical killer bees. Chasing down these understandable behaviors took time. With time converted to money, I concluded that lots of moving parts was not a net positive in my enterprise search engagements. Once a system is working, the attitude of the librarian’s IT department is my reaction. Just leave something working alone.
- Performance. Web search required in the late 1990s a certain type of spidering. Today, indexing has become more tricky. Updates are often larger than in the past, so the content processing subsystem has to do more work. Once the index has been updated, other processes can take longer because indexes are bigger or broken into chunks. Speeding up a system is not a simple matter of throwing hardware at the problem. In fact, adding more hardware may not improve performance because the bottleneck may be a consequence of poor engineering decisions made a long time ago or because the hardware was added on but to the wrong subsystem; for example, the production server, not the indexing subsystem.
- Ignorance. Old systems were built by individuals who may no longer be at the company. Today, the loss of the engineers with implicit knowledge of a subsystem make it very difficult—maybe even impossible– to resolve certain functions such as inefficient use of scratch space for index updates. I recall one job we did at a major telco several years ago. The legacy search system was what it was. My team froze it in place and worked with data the legacy system wrote to a file on a more modern server on a fixed schedule. Not perfect, but it was economical and it worked. No one at the company knew how the legacy search system worked. The client did not want to pay my team to figure out the mystery. And I did not want to make assumptions about how long gone engineers cooked their code. Companies that buy some vendors’ search systems may discover that there is a knowledge problem. Engineers often document code in ways that another engineer cannot figure out. So the new owner has to work through the code line by line to find out what is going on. Executives who buy companies make some interesting but often uninformed assumptions; for example, the software we just bought works, is documented, and assembled in a textbook manner. Reality is less tidy than the fantasies of a spreadsheet jockey.
Image Search Headache Removal Service
October 18, 2009
Searching for images giving you a headache? We just might have found a golden goose.
Along with the data explosion on our hands comes a related problem: image explosion. There are just as many graphics out there as there are files, I bet, and searching through them is just as difficult a prospect, even not even more so. Graphics like .jpg, .tiff, .gif and others don’t necessarily have embedded data–so there’s nothing to catch in a search–and graphics that do carry some form of metadata end up top of the search pile.
Beyond Search was working on a recent project and needed a picture. Our top dog’s comment: “I just had an Easter egg hunt for an image. What a mess.” He’s referring to the mess of figuring what pictures can or cannot be used with securing rights. The Internet is just as visual as it is textual, and more images than you think actually belong to someone. But how do you know? And on the owner’s part, how do you know if your image is being ripped off?
As I learned in my years in journalism, you can’t just pick a pretty picture, slap it on a page (paper or pixel), and publish it. There are such things as digital rights. Yes, someone owns that picture. In a few cases, it may be a free-use image. In some cases, it may be free-use image that use can use if you give them credit. But in many cases, it is a copyrighted image–and you need permission or need to pay to use it. So when you enter your search term “blooming moonflower” into the browser box, beware that list of 1,443,873,899 images that is returned.
We were recently contacted by a representative of PicScout, http://www.picscout.com, a company that deals in image copyright solutions. She sent us a press release about several worldwide companies joining the effort to help safeguard digital rights. Her comment: “Until now, image ownership online was disregarded and disrespected, to say the least. Now that every image has the potential to be visibly associated and directly connected with its licensor or owner, value is restored.”
And it occurred to me that search is intrinsically involved in that effort. From a PicScout press release: “In 2008 an online survey from KRC Research, undertaken for iStockphoto, revealed that 33 percent of Americans use downloaded digital content, but nearly 30 percent are unaware that permission is required for its use.” In general, PicScout’s products produce a list of “tens of millions of images” that display online with the universal information symbol if a surfer is using the ImageExchange Firefox add-on (you can apply to beta test it at http://imagex.picscout.com). And ta-da! You can see right away which images you may or may not use without checking first. How simple is that?
So here’s the scenario: Say ordinary Joe Schmo goes to Google Images and does a search for “pretty pony” to use in a home-grown PR campaign. The GOOG does its thing and returns a billion search results. Joe chooses one and goes on his merry little way… until ImageTracker catches up. Joe fell into the digital rights trap. Any search browser will return results, and unless you have some serious search savvy (and if it’s possible at all), there’s likely no way you can refine your search results to keep that from happening. Well, PicScout’s got that fixed for you.
Now, keep in mind, I’m a lowly duckling dealing with this technology, so this is how I understand it to work. PicScout already had a program called ImageTracker, which searches and finds images based on the algorithms it employs for image identification, as a part of the Image IRC platform. The IRC is what programs the images with metadata (licensing information, owner, etc.) so that ImageTracker will work to police infringement. Those two products are connected by the ImageExchange Add-on, which lets you, a person searching for an image, connect with the owner.
It’s a nifty little idea, quite simple, and it can go a lonnnnnnng way to solving your image search nightmares. So if you deal with online images a lot, you might check PicScout out.
Jessica Bratcher, October 18, 2009
The PR lady did not even thank me for this item. Sigh.
HP Analysis Urges Mainframe Rip and Replace
October 1, 2009
I read “Staying on Legacy Systems Ends Up Costing IT More” absolutely fascinating. The article appeared on the Ziff Davis Web site. There is a link to a podcast (latency and audio made this tough for me given my age, lousy hearing, and general impatience with serial info streams) and a series of excerpts from the “Briefings Direct” podcast discussion. The sponsor of the podcast was, according to the Web site, is Hewlett Packard. HP is on my radar because the company just merged its personal computer and printer business. I suppose that will make it untenable for me to describe HP as “the printer cartridge company”. I really liked that description, but now HP is a consulting firm and a PC company. Much better I suppose.
I abandoned the audio show and jumped to the transcript which you can obtain by clicking http://interarborsolutions.books.officelive.com/Documents/DoingNothing901.pdf.
The premise of the podcast, in my interpretation, is that smart companies will want to dump legacy hardware and systems for the hot, new hardware and systems available from HP. I understand this type of message. I use them myself. The idea sounds good. The notion of progress is based on the idea that what’s new is better than what came before. I won’t drag out the Jacques Ellul argument that technology creates more technology and more, unexpected problems. I will also ignore the studies of progress such as Gregg Easterbrook’s The Progress Paradox: How Life Gets Better While People Feel Worse, originally published in December 2003, five years before the economic dominos starting falling in April 2008. I won’t point out that “legacy” is not defined in a way that helped me understand the premise of the discussion. And, I won’t beat too forcefully on the fuzziness of word “cost” as the industry experts use the term. But costs are the core of the podcast, so I will have to make a quick dash through the thicket of accounting methods but not yet.
HP red ink as metaphor for the cost problems of a mainframe to next generation platform solution.
The first idea that snagged me was “cost hasn’t changed”. What changed was the amount of cash available to organizations. I don’t buy this. First, it is not clear what is included in the data to support the generalization. Without an indication of direct and indirects, capital, services, and any other costs that are associated with a legacy system, I can’t let the generalization into the argument. Without this premise in place, the rest of the assertions are on think ice, at least for me.
Second, consider this assertion by one of the HP “transformation” experts:
What’s still there, and is changing today, is the ability to look at a legacy source code application. We have the tools now to look at the code and visualize it in ways that are very compelling. That’s typically one of the biggest obstacles. If you look at a legacy application and the number of lines of code and number of people that are maintaining it, it’s usually obvious that large portions of the application haven’t really changed much. There’s a lot of library code and that sort of thing
My view is that “obvious” is a word that can be used to create a cloud of unknowing. Mainframe apps, if stable, and doing a good enough job may be useful because the application has not changed. As one of my neighbors here in Harrods Creek said, “If it ain’t broke, don’t fix it.” In my experience, that applies to mainframe apps that are working. If a mainframe app is broken, then an analysis is required to track down direct and indirect costs, opportunity costs, and fuzzy to be sure, but important going-forward costs. Not much is obvious once one gets rolling down the path of the rip-and-replace approach. In my experience, the reason mainframe apps continue to chug along in insurance companies, certain travel sectors, and some manufacturing firms is because they are predictable, known, and stable. Jumping into a whizzy new world may be fun, but such a step may not be prudent within the context of the business. But HP and its wizards aren’t known for their own rock solid business decisions. I am thinking of the ball drop with AltaVista.com and the most recent mash up of the printer and the PC industry. Ink revenue will make HP’s PC revenues soar, but it won’t change the nature of that low margin business.
What If Google Books Goes Away?
September 21, 2009
I had a talk with one of my partners this morning. The article in TechRadar “Google Books Smacked Down by US Government” was the trigger. This Web log post captures the consequences portion of our discussion. I am not sure Google, authors, or any other pundit embroiled in the dust up over Google Books will agree with these points. That’s okay. I am capturing highlights for myself. If you have forgotten this function of this Beyond Search Web log, quit reading or look at the editorial policy for this marketing / diary publication.
Let’s jump into the discussion in media res. The battle is joined and at this time, Google is on the defensive. Keep in mind that Google has been plugging away at this Google Book “project” since 2000 or 2001 when it made a key hire from Caere (now folded into Nuance) to add a turbo charge to the Books project.
Who is David? Who is Goliath?
With nine years of effort under its belt, Google will get a broken snout if the Google Books project stops. Now, let’s assume that the courts stop Google. What might happen?
First, Google could just keep on scanning. Google lawyers will do lawyer-type things. The wheels of justice will grind forward. With enough money and lawyers, Google can buy time. Let’s face it. Publishers could run out of enthusiasm or cash. If the Google keeps on scanning, discourse will deteriorate, but the acquisition of data for the Google knowledge base and for Google repurposing keeps on keeping on.
Second, Google might agree. Shut up shop and go directly to authors with an offer to buy rights to their work. I have four or five publishers right now. I would toss them overboard for a chance to publish my next monograph on the Google system, let Google monetize it any way it sees fit, and give me a percentage of the revenue. Heck, if I get a couple of hundred a month from the Google I am ahead of the game. Note this: none of my publishers are selling very many expensive studies right now. The for fee columns I write produce a pittance as well. One publisher cut my pay by 30 percent as part of a shift to a four day week and a trimmed publishing schedule. Heck, I love my publishers, but I love an outfit that pays money more. I think quite a few authors would find publishing on the Google Press most interesting. If that happens, the Google Books project has a gap, but going forward, Google has the info and the publishers and non participating authors have a different type of competitive problem.
Third, Google cuts a new deal, adjusts the terms, and keeps on scanning books. Google’s management throws enough bird feed to the flock. Google is secure in its knowledge that the future belongs to a trans-national digital information platform stuffed with digital information of various types. No publisher or group of publishers has a comparable platform. Microsoft and Yahoo were in the book game and bailed out. Perhaps their platforms can at some point in the future match Google’s. But my hunch is that the critics of Google’s book project are not looking at the value of the information to Google’s knowledge base, Google’s repurposing technologies, and Google’s next generation dataspace applications. Because these are dark corners, the bright light of protest is illuminating the dust and mice only.
One theme runs through these three possibilities. Google gets information. In this game, the publishers have lost but have not recognized it. Without a better idea and without an alternative to the irreversible erosion of libraries, Google is not the miserable little worm that so many want the company to be. Just my opinion.
Stephen Arnold, September 21, 2009
European Search Vendor Round Up
September 16, 2009
Updated at 8 29 am, September 17, 2009, to 23 vendors
I received a call from a very energetic, quite important investment wizard from a “big” financial firm yesterday. Based in Europe, the caller was having a bad hair day, and he seemed pushy, almost angry. I couldn’t figure out why he was out of sorts and why he was calling me. I asked him. He said, “I read your Web log and you annoy me with your poor coverage of European search vendors.”
I had to admit that I was baffled. I mentioned the companies that I tracked. But he wanted me to do more. I pointed out that the Web log is a marketing vehicle and he can pay me to cover his favorite investment in search. That really set him off. He wanted me to be a journalist (whatever that meant) and provide more detailed information about European vendors. And for free.
Right.
After the call, I took a moment and went through my files to see which European vendors I have mentioned and the general impression I have of each of these companies. The table below summarizes the companies I have either profiled in my for fee studies or the companies I have mentioned in this diary / marketing Web log. You may disagree with my opinions. I know that the azure chip consultants at Gartner, Ovum, Forrester, and others certainly do. But that’s understandable. The addled geese here in Harrod’s Creek actually install systems and test them, a step that most of the azure chip crowd just don’t have time because of their exciting work to generate enough revenue to keep the lights on, advise clients, and conduct social network marketing events. Just my opinion, folks. I am entitled to those despite the wide spread belief that I should be in the Happy Geese Retirement Home.
Vendor | Function | Opinion |
Autonomy | Search and eDiscovery | One of the key players in content processing; good marketing |
Bitext | Semantic components | Impressive technology |
Brox | Open source semantic tools | Energetic, marketing centric open source play |
Empolis GmbH | Information management and business intel | No cash tie with Attensity |
Exalead | Next generation application platform | The leader in search and content processing technology |
Expert System | Semantic toolkit | Works; can be tricky to get working the way the goslings want |
Fast ESP | Enterprise search, business intelligence, and everything else | Legacy of a police investigation hangs over the core technology |
InfoFinder | Full featured enterprise search system | my contact in Europe reports that this is a European technology. Listed customers are mostly in Norway. |
Interse Scan Jour | SharePoint enterprise search alternative | Based in Copenhagen, the Interse system adds useful access functions to SharePoint; sold in Dec 2008 |
Intellisearch | Enterprise search; closed US office | Basic search positioned as a one size fits all system |
Lumur Consulting | Flax is a robust enterprise search system | I have written positively about this system. Continues to improve with each release of the open source engine. |
Lexalytics | Sentiment analysis tools | A no cash merger with a US company and UK based Infonics; |
Linguamatics | Content processing focused on pharma | Insists that it does not have a price list |
Living-e AG | Information management | No cash tie with Attensity |
Mindbreeze | Another SharePoint snap in for search | Trying hard; interface confusing to some goslings |
Neofonie | Vertical search | Founded in the late 1990s, created Fireball.de |
Ontoprise GmbH | Semantic search | The firm’s semantic Web infrastructure product, OntoBroker, is at Version 5.3 |
Pertimm | Enterprise search | Now positioned as information management |
PolySpot | Enterprise search with workflow | Now at Version 4.8, search, work flow, and faceted navigation |
SAP Trex | Search tool in NetWeaver; works with R/3 content | Works; getting long in the tooth |
Sinequa | Enterprise search with workflow | Now at Version 7, the system includes linguistic tools |
Sowsoft | High speed desktop search | Excellent, lightweight desktop search |
SurfRay | Now focused on SharePoint | Uncertain; emerging from some business uncertainties |
Temis | Content processing and discovery | Original code and integrated components |
Tesuji | Lucene enterprise search | Highly usable and speedy; recommended for open source installations |
Updated at 8 29 am Eastern, September 17, 2009
Beyond the Database: Implications for Organizations
September 9, 2009
The challenge in information technology in general and information management is particular is that we face a “bridge” challenge. On one side are individuals using a wide range of devices. These include Microsoft Zune HD, Google Android phones, and netbooks like the one I am using. Millions of the young-at-heart have full-scale computers like this Apple iTouch.
On the other side, are large businesses with entrenched information technology infrastructures. Change is expensive, time consuming, and often fiercely resisted by employees. Change means relearning methods that work.
When someone searches for information, a variety of sources is available. For example, if a NOAA professionals gets a weather alert, he or she can pull from many sources. The problem is that the “answer” is not evident.
What about a search for “Florida severe weather”? Bing and Google return laundry lists of results. My research suggests that users do not want laundry lists. Users do want answers or a result that gets them closer to an answer and farther from the almost useless laundry list of results.
In this talk (converted to an essay), I will comment about some of Google’s new technology, but I want to point out that Microsoft is working in this field as well. Most of the major players in search, content processing, and business intelligence know that laundry lists are a dead end, of low value, and a commodity.
Google’s corporate strategy looks unorganized. The Sunday, January 28, 2007, New York Times’s article about Steve Ballmer included a reference to Google’s dependence on search advertising. The implication was that Google is a one-trick pony and therefore vulnerable. Google is in a tough spot because if advertising goes south, the company has to have a way to monetize its infrastructure. Google has spend billions building a global datasphere, a subject to which I will return at the end of this talk / essay.
Stand on the edge of a slice in the land near Antrim, Northern Ireland. You see a gap which you can cross using a rope bridge. Someday, a modern steel structure may be put in place. But for now, the Northern Ireland residents need a “good enough” solution.
That’s the problem the Federal government and many organizations face. Instead of a gap in the terrain, there are many legacy systems inside the organization and new systems outside the organization. The systems gap creates major problems in information access, security, and efficiency. In today’s economic climate and in the new Administration’s commitment to serving citizens, a digital bridge is needed, sooner rather than later.
The opportunity is to bridge these two different sides of the river of technology that flows through our society. Similar gaps can be identified in the structured and unstructured information gap, the legacy systems versus the Web service enabled systems gap, the Microsoft versus Google gap, archived data versus real time data gap, the semantic versus statistical gap, and others.
The question is, “How can we get the bridge built?” and “How can we deal with these gaps?”
These are important issues, and the good news is that tools and approaches are now becoming available. I will highlight some of Google’s innovations and mention one company that has a product available that provides a functional “bridge” between existing IT infrastructure and Google’s services. Many tools are surprisingly affordable, so progress—in my opinion—will be picking up steam in the next six to 12 months.
Because I have limited time, I will focus on Google and make do with side references to other vendors working to build bridges between organizations’ internal systems and the fast-moving, sometimes more innovative world external to the organization.
I have written three monographs about Google technology: The Google Legacy in 2005, Google Version 2.0 in 2006, and Google: The Digital Gutenberg this year. Most of the information I am going to mention comes from my research for these monographs which are available from Infonortics, Ltd. (http://www.infonortics.com). The information in my monographs comes from open source intelligence.
In the Q&A session, I will take questions about IBM’s, Microsoft’s, and other companies’ part in this information drama, but in this talk most of my examples will be drawn from Google. I don’t work for Google and Google probably prefers that I retire, stop writing my monographs and blog posts about the company, and find a different research interest.
Let’s start with a query for an airplane flight.
Real Time Search: Point of View Important
September 3, 2009
Author’s Note: I wrote a version of this essay for Incisive Media, the company that operates an international online meeting. This version of the write up includes some additional information.
Real-time search is shaping up like a series of hurricanes forming off the coast of Florida. As soon as one crashes ashore, scattering Floridians like dry leaves, another hurricane revs up. Real-time search shares some similarities with individual hurricanes and the larger weather systems that create the conditions for hurricanes.
This is a local-global or micro-macro phenomenon. What real time search is and is becoming depends on where one observes the hurricane.
Look at the two pictures below. One shows you a local weather station. Most people check their local weather forecast and make important decisions on the data captured. I don’t walk my dogs when there is a local thunderstorm. Tyson, my former show ring boxer, is afraid of thunder.
Caption: The Local Weather: Easy to Monitor, Good for a Picnic
Image source: http://www.usa.gov
The other picture taken from an earth orbit shows a very different view of a weather system. Most people don’t pay much attention to global weather systems unless they disrupt life with hurricanes or blizzards.
Local weather may be okay for walking a dog. Global weather may suggest that I need to prepare for a larger, more significant weather event.
The Weather from the International Space Station
Image source: http://www.usa.gov
I want to identify these two storms and put each in the context of a larger shift in the information ecosystem perturbed by real time search. The first change in online is the momentum within the struggling traditional newspaper business to charge for content. Two traditional media oligopolies appear to be shifting from the horse latitudes of declining revenue, shrinking profit, and technology change. Rupert Murdoch’s News Corporation wants to charge for quality journalism which is expensive. I am paraphrasing his views which have been widely reported.
The Financial Times–confident with its experiments using information processing technology from Endeca (www.endeca.com) and Lexalytics (www.lexalytics.com)–continues to move forward with its “pay for content” approach to its information. The fact that the Financial Times has been struggling to find a winning formula for online almost as long as the Wall Street Journal has not diminished the newspaper’s appetite for online success. The notion of paying for content is gaining momentum among organizations that have to find a way to produce money to cover their baseline costs. Charging me for information seems to be the logical solution to these companies.
With these two international giants making a commitment to charge customers to access online content, this local storm system is easy to chart. I think it will be interesting to see how this shift in a newspaper’s traditional business model transfers to online. In a broader context, the challenge extends to book, magazine, and specialist publishers. No traditional print-on-paper company is exempt from inclement financial weather.
One cannot step into the same river twice, so I am reluctant to point out that both News Corporation and the Pearson company have struggled with online in various incarnations. News Corporation has watched as Facebook.com reached 350 users as MySpace.com has shriveled. Not even the tie for advertising with Google has been sufficient to give MySpace.com a turbo boost. The Wall Street Journal has embraced marketing with a vengeance. I have documented in my Web log (www.arnoldit.com/wordpress) how the Wall Street Journal spams paying subscribers to buy additional subscriptions. You may have noticed the innovation section of the Wall Street Journal that featured some information and quite a bit of marketing for a seminar series sponsored by a prestigious US university. I was not sure where “quality journalism” began and where the Madison Avenue slickness ended.
Ideal, Simple, and Good Enough
September 1, 2009
I just read this Web page headline: “SAP NetWeaver Enterprise Search: Simple and Secure Access to Information”. Wow, simple and search pushed together like peanut butter and jelly, ham and eggs, and hammer and nail. The problem is the word “simple”. Who does not want simplicity? Life today is too complicated. Make it simple.
The three meta issues swirling around simple search and content processing have their roots in the fecund soil of user annoyance. Most users have zero clue about the more sophisticated features in any desktop or Web application. The evidence is not far to seek. Look at these three questions. How many can you answer without recourse to Google, your friendly power user, or digging through books in the ever smaller computer section of Barnes & Noble or Borders.
- How do you limit Google results to only those for US government and state information?
- How do you create a single, presentation quality graphic from Excel 2007?
- How do you delete unwanted colors in Framemaker 7.2 when you import a graphic format other than jpg?
The answer to the Google question is to navigate to Google.com, click on Advanced, scroll to the bottom of the page, and click on the Uncle Sam option.
The answer to the second question is to use a third party application from an outfit in France called GlobFX.
The answer to the Framemaker question is to open a version of the document with the correct color information. Go to File Import and select the option for importing a template. Make sure only the color information option is selected. Make the source the file with the “correct” color information and the target the file with the unwanted color information.
An even better example can be found in the usage of the advanced search functions for Web search systems. In general, users enter 2.3 words on average per query and fewer than five percent of search users access the advanced search functions.
Who cares?
I care a little bit, but not enough to give a talk about the way in which those creating systems make life almost unbearable for users. I am sufficiently motivated to define three terms and offer some comments.
Ideal
I find meetings in which requirements emerge from a group discussion. The focus jumps between a micro problem (“I can’t find my most recent version of this document”) to science fiction (“I need to see information from many sources on one screen so I don’t have to hunt or scan for the information I need”). Unless there is a method for capturing these requirements and assigning some meaningful tag for difficulty or cost to each, the exercise is interesting but often not super productive.
In my experience, folks like to talk about ideal features and functions. The chatter is similar to that I recall from my freshman class in Philosophy in 1962.
The problem is that when a vendor or a developer charts a course for the idea, the journey may be more expensive and time consuming than Odysseus’s return home from Troy.
When my team encounters a cost overrun and a system that is never completed, I think, “Ideal”.
Silobreaker Update
August 25, 2009
I was exploring usage patterns via Alexa. I wanted to see how Silobreaker, a service developed by some savvy Scandinavians, was performing against the brand name business intelligence companies. Silobreaker is one of the next generation information services that processes a range of content, automatically indexing and filtering the stream, and making the information available in “dossiers”. A number of companies have attempted to deliver usable “at a glance” services. Silobreaker has been one of the systems I have relied upon for a number of client engagements.
I compared the daily reach of LexisNexis (a unit of the Anglo Dutch outfit Reed Elsevier), Factiva (originally a Reuters Dow Jones “joint” effort in content and value added indexing now rolled back into the Dow Jones mothership), Ebsco (the online arm of the EB Stevens Co. subscription agency), and Dialog (a unit of the privately held database roll up company Cambridge Scientific Abstracts / ProQuest and some investors). Keep in mind that Silobreaker is a next generation system and I was comparing it to the online equivalent of the Smithsonian’s computer exhibit with the Univac and IBM key punch machine sitting side by side:
Silobreaker is the blue line which is chugging right along despite the challenging financial climate. I ran the same query on Compete.com, and that data showed LexisNexis showing a growth uptick and more traffic in June 2009. You mileage may vary. These types of traffic estimates are indicative, not definitive. But Silobreaker is performing and growing. One could ask, “Why aren’t the big names showing stronger buzz?”
A better question may be, “Why haven’t the museum pieces performed?” I think there are three reasons. First, the commercial online services have not been able to bridge the gap between their older technical roots and the new technologies. When I poked under the hood in Silobreaker’s UK facility, I was impressed with the company’s use of next generation Web services technology. I challenged the R&D team regarding performance, and I was shown a clever architecture that delivers better performance than the museum piece services against which Silobreaker competes. I am quick to admit that performance and scaling remain problems for most online content processing companies, but I came away convinced that Silobreaker’s engineering was among the best I had examined in the real time content sector.
Second, I think the museum pieces – I could mention any of the services against which I compared Silobreaker – have yet to figure out how to deal with the gap between the old business model for online and the newer business models that exist. My hunch is that the museum pieces are reluctant to move quickly to embrace some new approaches because of the fear of [a] cannibalization of their for fee revenues from a handful of deep pocket customers like law firms and government agencies and [b] looking silly when their next generation efforts are compared to newer, slicker services from Yfrog.com, Collecta.com, Surchur.com, and, of course, Silobreaker.com.
Third, I think the established content processing companies are not in step with what users want. For example, when I visit the Dialog Web site here, I don’t have a way to get a relationship map. I like nifty methods of providing me with an overview of information. Who has the time or patience to handcraft a Boolean query and then paying money whether the dataset contains useful information or not. I just won’t play that “pay us to learn there is a null set” game any more. Here’s the Dialog splash page. Not too useful to me because it is brochureware, almost a 1998 approach to an online service. The search function only returns hits from the site itself. There is not compelling reason for me to dig deeper into this service. I don’t want a dialog; I want answers. What’s a ProQuest? Even the name leaves me puzzled.
I wanted to make sure that I was not too harsh on the established “players” in the commercial content processing sector. I tracked down Mats Bjore, one of the founders of Silobreaker. I interviewed him as part of my Search Wizards Speak series in 2008, and you may find that information helpful in understanding the new concepts in the Silobreaker service.
What are some of the changes that have taken place since we spoke in June 2008?
Mats Bjore: There are several news things and plenty more in the pipeline. The layout and design of Silobreaker.com have been redesigned to improve usability; we have added an Energy section to provide a more vertically focused service around both fossil fuels and alternative energy; we have released Widgets and an API that enable anyone to embed Silobreaker functionality in their own web sites; and we have improved our enterprise software to offer corporate and government customers “local” customizable Silobreaker installations, as well a technical platform for publishers who’d like to “silobreak” their existing or new offerings with our technology. Industry-wise,the recent statements by media moguls like Rupert Murdoch make it clear that the big guys want to monetize their information. The problem is that charging for information does not solve the problem of a professional already drowning in information. This is like trying to charge a man who has fallen overboard for water instead of offering a life jacket. Wrong solution. The marginal loss of losing a few news sources is really minimal for the reader, as there are thousands to choose from anyways, so unless you are a “must-have” publication, I think you’ll find out very quickly that reader loyalty can be fickle or short-lived or both. Add to that that news reporting itself has changed dramatically. Blogs and other types of social media are already favoured before many newspapers and we saw Twitters role during the election demonstrations in Iran. Citizen journalism of that kind; immediate, straight from the action and free is extremely powerful. But whether old or new media, Silobreaker remains focused on providing sense-making tools.
What is it going to be, free information or for fee information?
Mats Bjore: I think there will be free, for fee, and blended information just like Starbuck’s coffee.·The differentiators will be “smart software” like Silobreaker and some of the Google technology I have heard you describe. However, the future is not just lots of results. The services that generate value for the user will have multiple ways to make money. License fees, customization, and special processing services—to name just three—will differentiate what I can find on your Web log and what I can get from a Silobreaker “report”.
What can the museum pieces like Dialog and Ebsco do to get out of their present financial swamp?
Mats Bjore: That is a tough question. I also run a management consultancy, so let me put on my consultant hat for a moment. If I were Reed Elsevier, Dow Jones/Factiva, Dialog, Ebsco or owned a large publishing house, I must realize that I have to think out of the box. It is clear that these organizations define technology in a way that is different from many of the hot new information companies. Big information companies still define technology in terms of printing, publishing or other traditional processes. The newer companies define technology in terms of solving a user’s problem. The quick fix, therefore, ought to be to start working with new technology firms and see how they can add value for these big dragons today, not tomorrow.
What does Silobreaker offer a museum piece company?
Mats Bjore: The Silobreaker platform delivers access and answers without traditional searching. Users can spot what is hot and relevant. I would seriously look at solutions such as Silobreaker as a front to create a better reach to new customers, capture revenues from the ads sponsored free and reach a wider audience an click for premium content – ( most of us are unaware of the premium content that is out there, since the legacy contractual types only reach big companies and organizations. I am surprised that Google, Microsoft, and Yahoo have not moved more aggressively to deliver more than a laundry list of results with some pictures.
Is the US intelligence community moving more purposefully with access and analysis?
The interest in open source is rising. However, there is quite a bit of inertia when it comes to having one set of smart software pull information from multiple sources. I think there is a significant opportunity to improve the use of information with smart software like Silobreaker’s.
Stephen Arnold, August 25, 2009