Start Ups Will Fail: Quite an Insight that’s 50 Years Too Late to Be New

September 29, 2008

One of the truisms for new companies, new products, and new girl friends in high school is:

80 to 90 percent will fail.

The fellow who gave currency to this notion was Conrad Jones, who died in 1992. If you don’t know about him and his work, click here for some information about this exceptional business analyst.

Imagine my surprise when I saw the Silicon Alley Insider’s article “Calcanis: Collapsing Economy Will Kill 50%-80% of Start Ups”. You can read this article here. Citing Scott Kurnit ( and other thought leaders, the author walked me through thr trials and tribulations of starting a Web venture. I found the mini-consulting tutorial in the section “10 Specific Things You Can Do” more intriguing than the set up to this advice. For example, the author suggests “cutting spending wherever you can.” I wanted to email my former colleagues at Booz, Allen & Hamilton to clue them in on something the firm has not known since 1917, the year Mr. Booz sold his first gig to the Sears’s management team. Tip nine also caused my heart to palpitate. The advice, “Build marketshare.” I had to take a dose of blood pressure medicine to calm down.


Reflecting on this article, several thoughts went through my mind:

  1. The advice is probably going to be made into a “Start Ups for Dummies” book. It will sell millions of copies. PT Barnum and other business wizards of the past would be supportive.
  2. The recycling of old ideas as new may be what’s behind the rediscovery of human-intermediated indexing, taxonomies, and finding ways to let people and colleagues comment on one’s work. Getting input is now “social software.” When I was 20, it was called “letting someone comment on a paper or an idea.” Paper worked. The phone worked. Now we need enterprise search systems that allow a user to tag. Old wine, new plastic bottles.
  3. Data from many fields of inquiry–quantum mechanics to framing stores in strip malls–suggest that in 100 tries at anything, most fail. Some behaviors can be learned. So line up the MBAs who are investing money as the Wall Street Journal does periodically, and most of the best fail when compared to one another. “Fail” is tough to define. Ignoring the definition makes it easier to give advice.

Read more

On Premises Software: More and Worse Headaches

September 29, 2008

Before I urge you to read an article about a recent IDC report, you need to know that I have done work for IDC. Nevertheless, you will find LinuxPR’s summary “Quality Problems Cost Software Companies up to $22 Million Annually According to New Report” here. The report quantifies the magnitude of software hassles. For me, the key point was this statement, “the costs of debugging are significant, reaching up to $22 million each year for some companies.”

Cloud computing may not be ready for prime time, but in the months ahead, more organizations will be looking for relief for these high cost headaches. Cloud computing will get longer and harder looks. The baloney that passes for enterprise search and content management systems will create indigestion and then trigger a projectile response. This IDC report provides a hook on which one can hang a driver for growing antipathy to on premises software that doesn’t deliver on time, within budget, and to user satisfaction.

Stephen Arnold, September 29, 2008

Expert System: Morphing into an Online Advertising Tool Vendor

September 28, 2008

Several years ago, YourAmigo (an Australian search and content processing vendor) shifted from enterprise search to search engine optimization. I stopped following the company because I have zero interest in figuring out how to get traffic to my Web site or my Web log. Now Expert System has rolled out what it calls Cogito Advertiser. A brief write up appeared in when I was in Europe. You can read that article here.

The new service, according to

automatically analyzes Web pages to identify the most relevant topics and extract the main themes included in the text. It classifies content by assigning the category related to the text in real time, based on an optimized taxonomy and high precision. By processing the text, it collects all useful data in an output format structured to be uploaded into a database and directly integrates it with the ad server.

Expert System has some interesting technology.The idea is that software that can “understand” will be able to a better job of key word identification than a human, often fresh out of college with vocabulary flush with “ums”, “ers”, and “you knows”.

You can learn more about the company here. As the financial and competitive pressures mount, I expect other vendors to repackage their technology in an effort to tap into more rapidly growing markets with shorter buying cycles than enterprise search typically merits.

Stephen Arnold, September 28, 2008

Exegy: Pushing Deeper in Financial Markets

September 28, 2008

Exegy is not a company that comes up when 20-something search experts kick back and trade stories about water pistol fights in the dorm. The company’s technology processes large volumes of data in near real time. This is not the near real time of the uninformed. Exegy crunches North American equity data fees and option exchanges so that it can display the highest one second peak occurring every 60 seconds. Ivy Schmerken’s “Exegy Launches along with Xasax and Financial Information Forum” highlights Exegy’s content processing technology. You can read the write up here. Processing large content streams in near real time is a non trivial task. Most search vendors dance around the issue of machine infrastructure to perform the nifty tricks shown in Flash demos. Not Exegy. The company installs its proprietary appliance. Working with Xasax, the new service provides financial services firms with useful data that are otherwise difficult if not impossible to obtain.

I profiled Exegy in my April 2008 study for the Gilbane Group. You can learn more here. The reason I included the company was to highlight the importance of matching hardware to the content processing task. Clearwell Systems, Google, Thunderstone, and Index Engines have taken a somewhat similar approach. The 20 somethings who are true search mavens are confident that a couple of Dell or HP servers can whip almost any content processing job. The kiddies are wrong. To learn more about the Exegy engineering behind its throughput, click here.

A final thought to the victors of the water pistol wars for search: infrastructure matters. Infrastructure makes or breaks many enterprise search systems.

Search Monopoly: The Users Are Guilty

September 28, 2008

After 21 days of travel, I enjoyed flicking through the digital fish my newsreader snags for me. One article in Seeking Alpha caught my eye. The title is “We Can’t Afford a Search Monopoly, Even If It Kills Yahoo”. You can read the article here. I think the article is by Michael Arrington, and it originally appeared on TechCrunch, but I can’t be sure. The pivot on which the article turns is Google’s deal with Yahoo. In the background are the data that most Internet users in North America turn to Google for search. I am supportive of Google, not because I love the 20 somethings who come up to me after my lectures and expect me to explain to them why I am describing a Google they don’t recognize. The reason is that Googlers are so involved with a tiny circle of other Googlers that these bright folks can’t see the Googzilla against the background of “do no evil”.

I side with Google because every day users vote with their mouse clicks and key taps to make Google number one. I don’t for a minute think advertisers care who gets them business. As long as the ad money delivers sales, advertisers are happy. I also don’t think that users think too much about how Google generates useful results for a query or a click on a Google canned search within Chrome. If users and advertisers were deeply dissatisfied, the GOOG would find itself just another vendor fighting for survival.

I think folks are agitated about Google is that after a decade of indifference and casual dismissal of the company as a search engine that sells ads, some people are waking up to the reality of Google’s application infrastructure. Why is the infrastructure important? At this time, none of the competitors have what Google has. Microsoft is rushing to catch up, but it is tough to close a 60 percent market share gap and an infrastructure gap quickly. In fact, if Microsoft catches up, it will be in the difficult position of finding Google farther ahead. Microsoft or any other competitor  has to leapfrog Google, not catch up.

Furthermore, trying to legislate away Google’s success may be tough too. The legal process can be long and drawn out. Google has enough money to keep its lawyers digging away for decades. Google is morphing into other business sectors, and it is not clear to me how a decision about Google can transfer from one sector to another or from one country to another. Toss in the fact that most people use Google of their own free will, and the problem of Google’s alleged search monopoly becomes more challenging.

Google now finds itself in a combination position. In some ways it is the 21st century version of the pre break up AT&T. In other ways, it is the child of the thumb typing generation. Google’s been chugging away for a decade, and it is going to be difficult to alter the company’s trajectory or bleed off its momentum with Web log postings, complaints from newspaper publishers, and objections from Microsoft and Yahoo that Google is not playing fair and square.

The problem with Google is that it is a service for users and advertisers. To kill Google, someone is going to have to get those users to stop using Google. That’s a big job and one that may be difficult without the aforementioned technical leap frog.

Stephen Arnold, September 28, 2008

Mercado: Healthy, Wealthy, Wise. Pick One.

September 27, 2008

The Marker IT Computerworld (Israel) ran an interesting story. It’s tough from my hollow in Kentucky to know if this is 100 percent accurate. I want to alert you that my source is the Hebrew language Web page here. My Hebrew is not much better than my Dutch, but I wanted to pass along the gist of this story by Guy Greeamelnde, “Mercado Israel Fired about 30 Workers”. If the Market IT story is accurate, this is about 25 percent of the firm’s work force. According to the story, Mercado itself may be reeling from the economic down turn that is gaining momentum in the fragmented, fiercely competitive search and content processing sector.

Under the leadership of CEO Kari Leiebu, the company has been growing. Mercado generates somewhere in the $18.0 to $20.0 million per year. Despite the growth in the last two years, customers pay monthly because the firm’s business model is software as a service. Despite the bookings, cash remains a precious commodity. To conserve cash, employees have to go. The company received an infusion of cash bringing to total of as much as $70 million.

Information about Mercado Software is here. You can get Mercado white papers here. As of September 26, 2008, no information about the economic problems afflicting Mercado appears on the company’s Web site. The firm’s investors include the Challenge Fund, Consensus Business Group, Eucalyptus Ventures, Pitango Venture Capital, Star Ventures, and Valley Venture Capital. I will keep my eyes open for confirmation of this store in Marker IT.

For now, Mercado seems to be healthy, wealthy, and wise. Tomorrow. Who knows?

Stephen Arnold, September 27, 2008

Taxonomy: Silver Bullet or Shallow Puddle

September 27, 2008

Taxonomy is hot. One of my few readers sent me a link to Fumsi, a Web log that contains a two part discussion of taxonomy. I urge you to read this post by James Kelway, whom I don’t know. You can find the article here. The write up is far better than most of the Webby discussions of taxonomies. After a quick pass at nodes and navigation, he jumps into information architecture requiring fewer than 125 words. The often unreliable Wikipedia discussion of taxonomy here chews up more than 6,000. Brevity is the soul of wit, and whoever contributed to the Wikipedia article must be SWD; that is, severely wit deprived.

Take a look at the Google Trends’ chart I generated at 8 pm on Friday, September 26, 2008. Not only is taxonomy generating more Google traffic than the now mud crawler enterprise search. Taxonomy is not as popular as “CMS”, the shorthand for content management system. But “taxonomy” is a specialist concept that seems to be moving into the mainstream. At the just concluded Information Today trifecta conference featuring search, knowledge management (whatever that is), and streaming media, taxonomy was a hot topic. At the Wednesday roof top cocktail, where I worked on my tan in the 90 degree ambient air temperature, I was asked four times about taxonomies. I know I worked on commercial taxonomies and controlled vocabularies for database, but I learned from those years of experience that taxonomies are really tough, demanding, time consuming intellectual undertakings. I thought I was pretty good at making logical, coherent lists. Then I met the late Betty Eddison and the very active Marje Hlava. These two pros taught me a thing or 50.

google trends taxnonomy

In the dumper is the red line which maps “enterprise search” popularity. The blue line is the up and coming taxonomy popularity. The top line is the really popular, yet hugely disappointing, content management term traffic.

I heard people who have been responsible for failed search systems and non functional content management systems asking, “Will a taxonomy improve our content processing?” The answer is, “Sure, if you get an appropriate taxonomy?” I then excuse myself and head to the bar man for a Diet 7 Up. The kicker, of course, is “appropriate”. Figuring out what’s appropriate and then creating a taxonomy that users will actually exploit directly or indirectly is tough work. But today, you can learn how to do a taxonomy in a 40 minute presentation or if you are really studious a full eight hour seminar.

I remember talking with Betty Eddison and Marje Hlava about their learning how to craft appropriate taxonomies. Marje just laughed and turned to her business partner who also burst out laughing. Betty smiled and in her deep, pleasant voice said, “A life time, kiddo.” She called me “kiddo”, and I don’t think anyone else ever did. Marje Hlava chimed in and added, “Well, Jay [her business partner] and I have been at it for two life times.” I figured out pretty quickly that building “appropriate” taxonomies required more than persistence and blissfully ignorant confidence.

Why are taxonomies perceived as the silver bullet that will kill the vampire search or CMS system. A vampire system is one that will suck those working on it into endless nights and weekends and then gobble available budget dollars. In my opinion, here are the top five reasons:

  1. The notion of a taxonomy as a quick fix is easy to understand. Most people think of a taxonomy as the equivalent of the Dewey Decimal system or the Library of Congress subject headings and think, “How tough can this taxonomy stuff be?” After a couple of runs at the problem, the notion of a quick fix withers and dies.
  2. Vendors of lousy enterpriser search systems wriggle off the hook by asserting, “You just need a taxonomy and then our indexing system will be able to generate an assisted navigation interface.” This is the search equivalent of “The check is in the mail.”
  3. CMS vendors, mired in sluggish performance, lost information, and users who can’t find their writings, can suggest, “A taxonomy and classification module makes it much easier to pinpoint the marketing collateral. If you search for a common term, our system displays those documents with that common term. Yes, a taxonomy will do the trick.” This is the same as “Let’s do lunch” repeated every week to a person whom you know but with whom you don’t want to talk for more than 30 seconds on a street corner in mid town Manhattan.
  4. A shill at a user group meeting–now called a “summit”–praises the usefulness of the taxonomy in making it easier for users to find information. Vendors work hard to get a system that works and win over the project manager. Put on center stage and pampered by the vendor’s PR crafts people, the star customer presents a Kodachrome version of the value of taxonomies. Those in the audience often swallow the tale the way my dog Tess goes after a hot dog that falls from the grill. There’s not much thinking in Tess’s actions either.
  5. Vendors of “automated” taxonomy systems demonstrate how their software chops a tough problem down to size in a matter of hours or days. Stuff in some sample content and the smart algorithms do the work of Betty Eddison and Marje Hlava in a nonce. Not on your life, kiddo. The automated systems really are 100 percent automatic. The training corpus is tough to build. The tuning is a manual task. The smart software needs dummies like me to fiddle. Even more startling to licensees of automatic taxonomy systems is that you may have to buy a third party tool from Access Innovations, Marje Hlava’s company, to get the job done. That old phrase “If ignorance is bliss, hello, happy” comes to mind when I hear vendors pitch the “automated taxonomy” tale.

I assume that some readers may violently disagree with my view of 21st century taxonomy work. That’s okay. Use the comments section to teach this 65 year old dog some new tricks. I promise I will try to learn from those who bring hard data. If  you make assertions, you won’t get too far with me.

Stephen Arnold, September 27, 2008

IBM: Another New Search System from Big Blue

September 27, 2008

IBM announced its eDiscovery Analyzer. You can read the IBM news release on the MarketWatch news release aggregation page here. Alternatively you can put up with the sluggish response of and read the more details here. You won’t be able to locate this page using’s search function. The eDiscovery Analyzer had not been indexed when I ran the query at 7 30 pm on September 27, 2008. I * was * able to locate the page using If I were the IBM person running site search, I would shift to Google, which works.

The eDiscovery Analyzer, according to Big Blue:

… provides conceptual search and analysis of cases created by IBM eDiscovery Manager.

Translating: eDiscovery  Manager  assists  with  legal  discovery,  a  formal  investigation  governed  by  court  rules  and  conducted  before
trial,  and  internal  investigations  on  possible  violations  of  company  policies,  by  enabling  users  to  search  e-mail  documents  that
were  archived  from  multiple  mailboxes  or  Mail  Journaling  databases  into  a  central  repository. You license eDiscovery Manager, the bits and pieces needed to make it go and then you license the brand new eDiscovery Analyzer component.

ibm ediscovery interface

I believe that this is the current interface for the “new” IBM eDiscovery Analyzer. Source: IBM’s Information Management Software IBM eDiscovery Analyzer 2.1 marketing collateral.

You will need FileNet, IBM’s aging content management system. The phrase I liked best in the IBM write up was, “[eDiscovery Analyzer] is easy to deploy and use, Web 2.0 based interface requires minimal user training.” I’m not sure about the easy to deploy assertion. And the system has to be easy to use because the intended users are attorneys. In my experience, which is limited, legal eagles are not too excited about complicated technology unless it boosts their billable hours. You can run your FileNet add in on AIX (think IBM servers) or Windows (think lots of servers).

You can read about IBM’s search and discovery technology here. You can tap into such “easy to deploy” systems as classification, content analysis, OmniFind search, and, if you are truly fortunate, DB2, IBM’s user friendly enterprise database management system. You might want to have a certified database administrator, an expert in SQL, and an IBM-trained optimization engineer on hand in case you run into problems with these user friendly systems. If these systems leave you with an appetite for more sophisticated functions, click here to learn about other IBM search and discovery products. You can, for example, read about four different versions of OmniFind and learn how to buy these products.

Remember: look for IBM products by searching Google.’s search system won’t do the job. Of course, IBM’s enterprise eDiscovery Analyzer is a different animal, and I assume it works. By the way, when you try to download the user guide, you get to answer a question about the usefulness of the information * before * you have received the file. I conclude that IBM prefers users who are able to read documents without actually having the document.

Stephen Arnold, September 27, 2008

Linguamatics Sells Bayer CropScience

September 27, 2008

My newsreader snagged this item, which I found interesting. The little-known Linguamatics (a content processing company based in the UK) retained its deal with the warm and friendly Bayer CropScience. The Linguamatics’ technology is called I2E, and Bayer has been using the I2E system since the summer of 2007. In September, Bayer CropScience decided to renew its license and process patent documents, scientific and technical information, and perform knowledge discovery. (I must admit I am not sure how one discovers knowledge, but I will believe the article that you can find here.)

For me, this small news item was interesting for several reasons. First, for many years a relatively small number of companies had been granted access to the inner circle of European pharma. I find it refreshing that after two centuries, upstarts like Linguamatics are able to follow in the footsteps of Temis and other firms who have worked to make sales in these somewhat conservative companies. “Conservative” might not be the correct word. Computational chemists are a fun-loving group. One computational chemist told me last October in Barcelona that computational chemists were pharma’s equivalent to Brazilian soccer football fans. On the off change that a clinical trial goes off the rails, some pharma players prefer keeping “knowledge” quite undiscovered until an “issue” can be resolved.


A representative I2E results display. © Linguamatics, 2008.

Second, Linguamatics–a company I profiled after significant bother and effort–is profiled in my April 2008 study Beyond Search, published by the Gilbane Group. You can learn more about this study here because ferreting out information about I2E is not the walk in the park that I expected from a content processing company with a somewhat low profile. Linguamatics has some interesting technology, and I surmise that the uses of the system are somewhat more sophisticated and useful to Bayer CropScience than “discovering knowledge”.

Finally, Bayer CropScience is a subsidiary of the influential Bayer AG, an outfit with an annual turnover of about US$8.0 billion, give or take a billion because of the sad state of the dollar on the international market. My hunch is that if the CropScience deal feels good, other units of this chemical and pharmaceutical giant will learn to love the I2E system.

Stephen Arnold, September 27, 2008

BBC: Search Is a Backwater

September 27, 2008

I just read a quite remarkable essay by a gentleman named Richard Titus, Controller, User Experience & Design for BBC Future Media & Technology. (I like the word controller.) I am still confused by the time zone zipping I have experienced in the past seven days. At this moment in time, I don’t recall if I have met Mr. Titus or if I have read other writings by him. What struck me is that he was a keynote at a BBC Future Media & Technology Conference. My first reaction is that to learn the future a prestigious organization like the BBC might have turned toward the non-BBC world. The Beeb disagreed and looked for its staff to illuminate the gloomy passages of Christmas Yet to Come. You can read this essay “Search and Content Discovery” here. In fact, you must read it.

With enthusiasm I read the essay. Several points flew from the page directly into the dead letter office of my addled goose brain. There these hot little nuggets sat until I could approach them in safety. Here are the points that cooked my thinking:

  1. Key word search is brute force search.
  2. Yahoo BOSS is a way to embrace and extend search
  3. The Xoogler system looked promising but possibly disappoints
  4. Viewdle facial recognition software is prescient. (This is an outfit hooked up with Thomson Reuters, known for innovation by chasing markets before the revenue base crumbles away. I don’t associate professional publishers with innovation, however.)
  5. Naver from Korea is a super electronic game portal.
  6. Mahalo is a human-mediated system and also interesting, and the BBC has a topics page which also looks okay
  7. SearchMe, also built by Xooglers, uses a flash-based interface.


Xooglers are inspired by Apple’s cover flow. Now how many hits did my query “beyond search” get. Can your father figure out how to view the next hit or make this one large enough to read, a brute force way to get information of course.

These points were followed by this statement:

When you marry solid data and indexing (everyone forgets that Google’s code base is almost ten years old), useful new data points (facial recognition, behavioral targeting, historical precedent, trust, etc) with a compelling and useful user experience, we may see some changes in the market leadership of search.

I would like to comment on each of these points:

Read more

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta