July 31, 2014
At lunch yesterday, several search aware people discussed a July 2014 Gartner study. One of the folks had a crumpled image of the July 2014 “magic quadrant.” This is, I believe, report number G00260831. Like other mid tier consulting firms, Gartner works hard to find something that will hook customers’ and prospects’ attention. The Gartner approach is focused on companies that purport to have enterprise search systems. From my vantage point, the Gartner approach is miles ahead of the wild and illogical IDC report about knowledge, a “quotient,” and “unlocking” hidden value. See http://bit.ly/1rpQymz. Now I have not fallen in love with Gartner. The situation is more like my finding my content and my name for sale on Amazon. You can see what my attorney complained about via this link, http://bit.ly/1k7HT8k. I think I was “schubmehled,” not outwitted.
I am the really good looking person. Image source: http://bit.ly/1rPWjN3
What the IDC report lacks in comprehensiveness with regard to vendors, Gartner mentions quite a few companies allegedly offering enterprise search solutions. You must chase down your local Garnter sales person for more details. I want to summarize the points that surfaced in our lunch time pizza fest.
First, the Gartner “study” includes 18 or 19 vendors. Recommind is on the Gartner list even though a supremely confident public relations “professional” named Laurent Ionta insisted that Recommind was not in the July 2014 Gartner report. I called her attention to report number G00260831 and urged her to use her “bulldog” motivation to contact her client and Gartner’s experts to get the information from the horse’s mouth as it were. (Her firm is www.lewispr.com and its is supported to be the Digital Agency of the Year and on the Inc 5000 list of the fastest growing companies in America.) I am impressed with the accolades she included in her emails to me. The fact that this person who may work on the Recommind account was unaware that Gartner pegged Recommind as a niche player seemed like a flub of the first rank. When it comes to search, not even those in the search sector may know who’s on first or among the chosen 19.
To continue with my first take away from lunch, there were several companies that those at lunch thought should be included in the Gartner “analysis.” As I recall, the companies to which my motley lunch group wanted Gartner to apply their considerable objective and subjective talents were:
- ElasticSearch. This in my view is the Big Dog in enterprise search at the moment. The sole reason is that ElasticSearch has received an injection of another $70 million to complement the $30 odd million it had previously gather. Oh, ElasticSearch is a developer magnet. Other search vendors should be so popular with the community crowd.
- Oracle. This company owns and seems to offer Endeca solutions along with RightNow/InQuira natural language processing for enterprise customer support, the fading Secure Enterprise Search system, and still popping and snapping Oracle Text. I did not mention to the lunch crowd that Oracle also owns Artificial Linguistics and Triple Hop technology. This information was, in my view, irrelevant to my lunch mates.
- SphinxSearch. This system is still getting love from the MySQL contingent. Imagine no complex structured query language syntax to find information tucked in a cell.
There are some other information retrieval outfits that I thought of mentioning, but again, my free lunch group does not know what it does not know. Like many folks who discuss search with me, learning details about search systems is not even on the menu. Even when the information is free, few want to confuse fantasy with reality.
The second take away is that rational for putting most vendors in the niche category puzzled me. If a company really has an enterprise search solution, how is that solution a niche? The companies identified as those who can see where search is going are, as I heard, labeled “visionaries.” The problem is that I am not sure what a search visionary is; for example, how does a French aerospace and engineering firm qualify as a visionary? Was HP a visionary when it bought Autonomy, wrote off $8 billion, and initiated litigation against former colleagues? How does this Google supplied definition apply to enterprise search:
able to see visions in a dream or trance, or as a supernatural apparition?
The final takeaway for me was the failure to include any search system from China, Germany, or Russia. Interesting. Even my down on their heels lunch group was aware of Yandex and its effort in enterprise search via a Yandex appliance. Well, internationalization only goes so far I suppose.
I recall hearing one of my luncheon guests say that IBM was, according the “experts” at Gartner, a niche player.Gentle reader, I can describe IBM many ways, but I am not sure it is a niche player like Exorbyte (eCommerce mostly) and MarkLogic (XML data management). Nope, IBM’s search embraces winning Jeopardy, creating recipes with tamarind, and curing assorted diseases. And IBM offers plain old search as part of DB2 and its content management products plus some products obtained via acquisition. Cybertap search, anyone? When someone installs, what used to be OmniFind, I thought IBM was providing an enterprise class information retrieval solution. Guess I am wrong again.
Net net: Gartner has prepared the ground for a raft of follow on analyses. I would suggest that you purchase a copy of the July 2014 Gartner search report. You may be able to get your bearings so you can answer these questions:
- What are the functional differences among the enterprise search systems?
- How does the HP Autonomy “solution” compare to the pre-HP Autonomy solution?
- What is the cost of a Google Search Appliance compared to a competing product from Maxxcat or Thunderstone? (Yep, two more vendors not in the Gartner sample.)
- What causes a company to move from being a challenger in search to a niche player?
- What makes both a printer company and a Microsoft-centric solution qualified to match up with Google and HP Autonomy in enterprise search?
- What are the licensing costs, customizing costs, optimizing costs, and scaling costs of each company’s enterprise search solution? (You can find the going rate for the Google Search Appliance at www.gsaadvantage.gov. The other 18? Good luck.)
I will leave you to your enterprise search missions. Remember. Gartner, unlike some other mid-tier consulting firms, makes an effort to try to talk about what its consultants perceive as concrete aspects of information retrieval. Other outfits not so much. That’s why I remain confused about the IDC KQ (knowledge quotient) thing, the meaning of hidden value, and unlocking. Is information like a bike padlock?
Stephen E Arnold, July 31, 2014
July 28, 2014
Shortly after writing the first draft of Google: The Digital Gutenberg, “Enterprise Findability without the Complexity” became available on the Google Web site. You can find this eight page polemic at http://bit.ly/1rKwyhd or you can search for the title on—what else?—Google.com.
Six years after the document became available, Google’s anonymous marketer/writer raised several interesting points about enterprise search. The document appeared just as the enterprise search sector was undergoing another major transformation. Fast Search & Transfer struggled to deliver robust revenues and a few months before the Google document became available, Microsoft paid $1.2 billion for what was another enterprise search flame out. As you may recall, in 2008, Convera was essentially non operational as an enterprise search vendor. In 2005, Autonomy bought the once high flying Verity and was exerting its considerable management talent to become the first enterprise search vendor to top $500 million in revenues. Endeca was flush with Intel and SAP cash, passing on other types of financial instruments due to the economic downturn. Endeca lagged behind Autonomy in revenues and there was little hope that Endeca could close the gap between it and Autonomy.
Secondary enterprise search companies were struggling to generate robust top line revenues. Enterprise search was not a popular term. Companies from Coveo to Sphinx sought to describe their information retrieval systems in terms of functions like customer support or database access to content stored in MySQL. Vivisimo donned a variety of descriptions, culminating in its “reinvention” as a Big Data tool, not a metasearch system with a nifty on the fly clustering algorithm. IBM was becoming more infatuated with open source search as a way to shift development an bug fixes to a “community” working for the benefit of other like minded developers.
Google’s depiction of the complexity of traditional enterprise search solutions. The GSA is, of course, less complex—at least on the surface exposed to an administrator.
Google’s Findability document identified a number of important problems associated with traditional enterprise search solutions. To Google’s credit, the company did not point out that the majority of enterprise search vendors (regardless of the verbal plumage used to describe information retrieval) were either losing money or engaged in a somewhat frantic quest for financing and sales).
Here are the issues Google highlighted:
- User of search systems are frustrated
- Enterprise search is complex. Google used the word “daunting”, which was and still is accurate
- Few systems handle file shares, Intranets, databases, content management systems, and real time business applications with aplomb. Of course, the Google enterprise search solution does deliver on these points, asserted Google.
Furthermore, Google provides integrated search results. The idea is that structured and unstructured information from different sources are presented in a form that Google called “integrated search results.”
Google also emphasized a personalized experience. Due to the marketing nature of the Findability document, Google did not point out that personalization was a feature of information retrieval systems lashed to an alert and work flow component. Fulcrum Technologies offered a clumsy option for personalization. iPhrase improved on the approach. Even Endeca supported roles, important for the company’s work at Fidelity Investments in the UK. But for Google, most enterprise search systems were not personalizing with Google aplomb.
Google then trotted out the old chestnuts gleaned from a lunch discussion with other Googlers and sifting competitors’ assertions, consultants’ pronouncements, and beliefs about search that seemed to be self-evident truths; for example:
- Improved customer service
- Speeding innovation
- Reducing information technology costs
- Accelerating adoption of search by employees who don’t get with the program.
Google concluded the Findability document with what has become a touchstone for the value of the Google Search Appliance. Kimberly Clark, “a global health and hygiene company,” reduced administrative costs for indexing 22 million documents. The costs of the Google Search Appliance, the consultant fees, and the extras like GSA fail over provisions were not mentioned. Hard numbers, even for Google, are not part of the important stuff about enterprise search.
One interesting semantic feature caught my attention. Google does not use the word knowledge in this 2008 document.
- Was Google unaware of the fusion of information retrieval and knowledge?
- Does the Google Search Appliance deliver a laundry list of results, not knowledge? (A GSA user has to scan the results, click on links, and figure out what’s important to the matter at hand, so the word “knowledge” is inappropriate.)
- Why did Google sidestep providing concrete information about costs, productivity, and the value of indexing more content that is allegedly germane to a “personalized” search experience? Are there data to support the implicit assertion “more is better.” Returning more results may mean that the poor user has to do more digging to find useful information. What about a few, on point results? Well, that’s not what today’s technology delivers. It is a fiction about which vendors and customers seem to suspend disbelief.
With a few minor edits—for example, a genuflection to “knowledge—this 2008 Findability essay is as fresh today as it was when Google output its PDF version.
First, the freshness of the Findability paper underscores the staleness and stasis of enterprise search in the past six years. If you scan the free search vendor profiles at www.xenky.com/vendor-profiles, explanations of the benefits and functions of search from the 1980s are also applicable today. Search, the enterprise variety, seems to be like a Grecian urn which “time cannot wither.”
Second, the assertions about the strengths and weaknesses of search were and still are presented without supporting facts. Everyone in the enterprise search business recycles the same cant. The approach reminds me of my experience questioning a member of a sect. The answer “It just is…” is simply not good enough.
Third, the Google Search Appliance has become a solution that costs as much, if not more, than other big dollar systems. Just run a query for the Google Search Appliance on www.gsaadvantage.gov and check out the options and pricing. Little wonder than low cost solutions—whether they are better or worse than expensive systems—are in vogue. Elasticsearch and Searchdaimon can be downloaded without charge. A hosted version is available from Qbox.com and is relatively free of headaches and seven figure charges.
Net net: Enterprise search is going to have to come up with some compelling arguments to gain momentum in a world of Big Data, open source, and once burned twice shy buyers. I wonder why venture / investment firms continue to pump money into what is same old search packaged with decades old lingo.
I suppose the idea that a venture funded operation like Attivio, BA Insight, Coveo, or any other company pitching information access will become the next Google is powerful. The problem is that Google does not seem capable of making its own enterprise search solution into another Google.
This is indeed interesting.
Stephen E Arnold, July 28, 2014
July 24, 2014
“Myths and Misreporting About Malaysia Airlines Flight 17” is an interesting article. I found the examples of misinformation, disinformation, and reformation thought provoking. The write up spotlights a few examples of fake or distorted information about an airline’s doomed flight.
As i considered the article and its appearance in a number of news alerting services, I shifted from the cleverness of the content to a larger and more interesting issue. From the revelations about software that can alter inputs to an online survey (see this link) to fake out “real” news, determining what’s sort of accurate from what’s totally bogus is becoming more and more difficult. I have professional researchers, librarians, and paralegals at my disposal. Most people do not. No longer surprising to me is the email from one of the editors working to fact check my for fee columns. The questions range from “Did IBM Watson invent a recipe with tamarind in its sauce?” to “Do you have a source for the purchase price of Vivisimo?” Now I include online links for the facts and let the editors look up my source without the intermediating email. Even then, there is a sense of wonderment when an editor expresses surmise that what he or she believed is, in fact, either partially true, bogus, or unexpected. Example: “Why do French search vendors feel compelled to throw themselves at the US market despite the historically low success rates?” The answer is anchored in [a] French tax regulations, [b] French culture, particularly when a scruffy entrepreneur from the wrong side of the educational tracks tries to connect with a French money source from the right side of the educational tracks, [c] the lousy financial environment for certain high technology endeavors, and [d] selling to the big US markets looks like a slam dunk, at least for a while.
The reason for the disconnect between factoids and information manipulation boils down to a handful of factors. Let me highlight several:
First, the need for traffic to Web sites (desktop, mobile, app instances, etc.) is climbing up the hierarchy of business / personal needs. You want traffic today? The choices are limited. Pay Google $25,000 or more a month. Pay an SEO (search engine optimization “expert” whatever you can negotiate. Create content, do traditional marketing, and trust that the traffic follows the “if you build it they will come” pipedream. Most folks just whack at getting traffic and use increasingly SEOized headlines as a low cost way of attracting attention. Think headlines from the National Enquirer in the 1980s.
Second, Google has to pump lots of money into plumbing, infrastructure, moon shots, operational costs (three months at the Stanford Psych unit, anyone?) At the same time, mobile is getting hot. Two problems plague the sunny world of the GOOG. [a] Revenue from mobile ads is less than from traditional ads. Therefore, Google has to find a way to keep that 2006 style revenue flowing. Because there is a systemic shift, the GOOG needs money. One way to get it is to think about Adwords as a machine that needs tweaking. How does one sell Adwords to those who do not buy enough today? You ponder the question, but it involves traffic to a Web site. [b] Google gets bigger so the “think cheap” days of yore are easier to talk about than deliver. A 15 year old company is getting more and more expensive to run. The upcoming battles with Amazon and Samsung will not be cheap. The housing developments, the Loon balloons, and the jet fleet, smart people, and other oddments of the company—money pits. If the British government can fiddle traffic, is it possible that others have this capability too?
Third, marketing, an easy whipping boy or girl as the case may be. After spending lots and lots on Web sites and apps, some outfits’ CFOs are asking, “What do we get for this spending?” In order to “prove” their worth and stop the whipping, marketers have kicked into overdrive. Baloney, specious, half baked, crazy, and recycled content is generated by the terabyte drive. The old fashioned ideas about verification, accuracy, and provenance are kicked to the side of the road.
Net net: running a query on a search engine, accepting the veracity of a long form article, or just finding out what happened at an event is very difficult. The fixes are not palatable to some people. Others are content to believe that their Internet or Internet search engine dispenses wisdom like the oracle at Delphi. Who knew the “oracles” relied on confusing entrances, various substances, and stage tricks to get their story across.
We now consult digital Delphis. How is that working out when you search for information to address a business problem, find a person who can use finger manipulation to relax a horse’s muscle, or determine if a company is what its Web site says it is?
Stephen E Arnold, July 24, 2014
July 10, 2014
Editor’s Note: This is information that did not make Stephen E Arnold’s bylined article in Information Today. That forthcoming Information Today story about French search and content processing companies entering the US market. Spoiler alert: The revenue opportunities and taxes appear to be better in the US than in France. Maybe a French company will be the Next Big Thing in search and content processing. Few French companies have gained significant search and retrieval traction in the US in the last few years. Arguably, the most successful firm is the image recognition outfit called A2iA. It seems that French information retrieval companies and the US market have been lengthy, expensive, and difficult. One French company is trying a different approach, and that’s the core of the Information Today story.)
In 1999, I learned about a Swiss enterprise search system. The working name was, according to my Overflight archive, was AMI Albert.The “AMI” did not mean friend. AMI shorthand for Automatic Message Interpreter.
Flash forward to 2014. Note that a Google query for “AMI” may return hits for AMI International a defense oriented company as well as hits to American Megatrends, Advanced Metering Infrastructure, ambient intelligence, the Association Montessori International, and dozens of other organizations sharing the acronym. In an age of Google, finding a specific company can be a challenge and may inhibit some potential customers ability to locate a specific vendor. (This is a problem shared by Thunderstone, for example. The game company makes it tough to locate information about the search appliance vendor.)
Basic search interface as of 2011.
Every time I update my files, I struggle to get specific information. Invariably I get an email from an AMI Software sales person telling me, “Yes, we are growing. We are very much a dynamic force in market intelligence.”
The UK Web site for the firm is www.amisw.co.uk. The French language Web site for the company is http://www.amisw.com/fr/. And the English language version of the French Web site is at http://www.amisw.com/fr/. The company’s blog is at http://www.amisw.com/fr/blog/, but the content is stale. The most recent update as of July 7, 2014, is from December 2013. The company seems to have shifted its dissemination of news to LinkedIn, where more than 30 AMI employees have a LinkedIn presence. The blog is in French. The LinkedIn postings are in English. Most of the AMI videos are in French as well.
Advanced Search Interface as of 2011.
The Managing Director, according to www.amisw.com/fr, is Alain Beauvieux. The person in charge of products is Eric Fourboul. The UK sales manager is Mike Alderton.
Mr. Beauvieux is a former IBMer and worked at LexiQuest, which originally formerly Erli, S.A. LexiQuest (Clementine) was acquired by SPSS. SPSS was, in turn, acquired by IBM, joining other long-in-the-tooth technologies marketed today by IBM. Eric
Fourboul is a former Dassault professional, and he has some Microsoft DNA in his background.
June 30, 2014
I returned from a brief visit to Europe to an email asking about Rocket Software’s breakthrough technology AeroText. I poked around in my archive and found a handful of nuggets about the General Electric Laboratories’ technology that migrated to Martin Marietta, then to Lockheed Martin, and finally in 2008 to the low profile Rocket Software, an IBM partner.
When did the text extraction software emerge? Is Rocket Software AeroText a “new kid on the block”? The short answer is that AeroText is pushing 30, maybe 35 years young.
Digging into My Archive of Search Info
As far as my archive goes, it looks as though the roots of AeroText are anchored in the 1980s, Yep, that works out to an innovation about the same age as the long in the tooth ISYS Search system, now owned by Lexmark. Over the years, the AeroText “product” has evolved, often in response to US government funding opportunities. The precursor to AeroText was an academic exercise at General Electric. Keep in mind that GE makes jet engines, so GE at one time had a keen interest in anything its aerospace customers in the US government thought was a hot tamale.
The AeroText interface circa mid 2000. On the left is the extraction window. On the right is the document window. From “Information Extraction Tools: Deciphering Human Language, IT Pro, November December 2004, page 28.
The GE project, according to my notes, appeared as NLToolset, although my files contained references to different descriptions such as Shogun. GE’s team of academics and “real” employees developed a bundle of tools for its aerospace activities and in response to Tipster. (As a side note, in 2001, there were a number of Tipster related documents in the www.firstgov.gov system. But the new www.usa.gov index does not include that information. You will have to do your own searching to unearth these text processing jump start documents.)
The aerospace connection is important because the Department of Defense in the 1980s was trying to standardize on markup for documents. Part of this effort was processing content like technical manuals and various types of unstructured content to figure out who was named, what part was what, and what people, places, events, and things were mentioned in digital content. The utility of NLToolset type software was for cost reduction associated with documents and the intelligence value of processed information.
The need for a markup system that worked without 100 percent human indexing was important. GE got with the program and appears to have assigned some then-young folks to the project. The government speak for this type of content processing involves terms like “message understanding” or MU, “entity extraction,” and “relationship mapping. The outputs of an NLToolset system were intended for use in other software subsystems that could count, process, and perform other operations on the tagged content. Today, this class of software would be packaged under a broad term like “text mining.” GE exited the business, which ended up in the hands of Martin Marietta. When the technology landed at Martin Marietta, the suite of tools was used in what was called in the late 1980s and early 1990s, the Louella Parsing System. When Lockheed and Martin merged to form the giant Lockheed Martin, Louella was renamed AeroText.
Over the years, the AeroText system competed with LingPipe, SRA’s NetOwl and Inxight’s tools. In the hay day of natural language processing, there were dozens and dozens of universities and start ups competing for Federal funding. I have mentioned in other articles the importance of the US government in jump starting the craziness in search and content processing.
In 2005, I recall that Lockheed Martin released AeroText 5.1 for Linux, but I have lost track of the open source versions of the system. The point is that AeroText is not particularly new, and as far as I know, the last major upgrade took place in 2007 before Lockheed Martin sold the property to AeroText. At the time of the sale, AeroText incorporated a number of subsystems, including a useful time plotting feature. A user could see tagged events on a timeline, a function long associated with the original version of i2’s the Analyst Notebook. A US government buyer can obtain AeroText via the GSA because Lockheed Martin seems to be a reseller of the technology. Before the sale to Rocket, Lockheed Martin followed SAIC’s push into Australia. Lockheed signed up NetMap Analytics to handle Australia’s appetite for US government accepted systems.
What does AeroText purport to do that caused the person who contacted me to see a 1980s technology as the next best thing to sliced bread?
AeroText is an extraction tool; that is, it has capabilities to identify and tag entities at somewhere between 50 percent and 80 percent accuracy. (See NIST 2007 Automatic Content Extraction Evaluation Official Results for more detail.)
The AeroText approach uses knowledgebases, rules, and patterns to identify and tag pre-specified types of information. AeroText references patterns and templates, both of which assume the licensee knows beforehand what is needed and what will happen to processed content.
In my view, the licensee has to know what he or she is looking for in order to find it. This is a problem captured in the famous snippet, “You don’t know what you don’t know” and the “unknown unknowns” variation popularized by Donald Rumsfeld. Obviously without prior knowledge the utility of an AeroText-type of system has to be matched to mission requirements. AeroText pounded the drum for the semantic Web revolution. One of AeroText’s key functions was its ability to perform the type of markup the Department of Defense required of its XML. The US DoD used a variant called DAML or Darpa Agent Markup Language. natural language processing, Louella, and AeroText collected the dust of SPARQL, unifying logic, RDF, OWL, ontologies, and other semantic baggage as the system evolved through time.
Also, staff (headcount) and on-going services are required to keep a Louella/AeroText-type system generating relevant and usable outputs. AeroText can find entities, figure out relationships like person to person and person to organization, and tag events like a merger or an arrest “event.” In one briefing about AeroText I attended, I recall that the presenter emphasized that AeroText did not require training. (The subtext for those in the know was that Autonomy required training to deliver actionable outputs.) The presenter did not dwell on the need for manual fiddling with AeroText’s knowledgebases and I did not raise this issue.)
June 11, 2014
The news of the $70 million injected into Elasticsearch caused me to check out Crunchbase and some other sources of funding data. I looked at a handful of search and content processing vendors in the departures lounge. I am supposed to be retired, but Zurich beckons.
How large is the market for search and content processing software and services. As a former laborer in the vineyards of Halliburton Nuclear and Booz, Allen & Hamilton, the answer is, “You can charge as much as you want when the customer is in a corner.” The flipside of this adage is, “You can’t charge as much when there are many low cost options.”
In my view, search—regardless of the window dressing slapped on decades old systems and methods—is sort of yesterday. One of the goslings posted a list of Hewlett Packard’s verbal arabesques to explain IDOL search as everything EXCEPT search. The HP verbal arabesques make my point:
Search is not going to generate big money going forward.
Is search (regardless of the words used to describe it) a money pit like as the Tom Hanks’ motion picture made vivid?
For that reason, I am wondering what investors are thinking as they pump money into search and content processing companies. The largest revenue generator in the search sector is either Google or Autonomy. Google, as you may know, is in the online advertising business. Search is a Trojan horse. Search is free and the clicks trigger the GoTo/Overture mechanism that caused Google’s moment of inspiration. Before the Google IPO, Google ponied up some dough to Yahoo regarding alleged borrowing of pay to play methods.
Autonomy focused on the enterprise. Between 1996 and October 2011, Sir Michael Lynch grew the company to about $1 billion in revenues. HP’s prescient and always interesting management paid $10.3 billion for Autonomy and then wrote off $8 billion, aimed allegations at Autonomy at the company, and, in general, made it clear that HP was essentially a printer ink business with what seems to be great faith in IDOL, DRE, and assorted rich media tools.
More recently, IBM, the subject of an entertaining analysis The Decline and Fall of IBM by Robert X. Cringely suggested that Watson would grow to be a $10 billion in revenue business. Not a goal to ignore. The fact that Watson is a collection of home grown widgets and open source search technology. I think Watson’s last search contribution was creating a recipe for a tamarind flavored sauce. IBM is probably staffed with folks smarter than I. But a billion dollar bet with a goal of building a revenue stream 10 to 12 times greater than Autonomy’s in one third the time. Wowza.
Let’s do some simple addition in the elegant United lounge.
Let’s assume that IBM and HP actually generate the billions necessary to recover the cost of IDOL and hit the crazy IBM goal of $10 billion in four or five years. To make the math simple, skip interest, the cost of assuaging stakeholders, and the money needed to close deals that total $20 to $25 billion. HP pumps up Autonomy to $10 or $11 billion and IBM tallies another $10 to $12 billion.
So, HP and IBM need or want to build $10 billion or more in revenues from their respective search and content processing ventures. I estimated that the market for “search” was about $1.3 billion in 2006. I am not too sure that market has grown by a significant factor since the economic headwinds began blowing through carpetland.
Now consider the monies invested in some search and content processing companies.
Attensity (sentiment analysis), $90 million
BA Insight (Microsoft centric, search and business intelligence), $14.5 million
Content Analyst (text analysis, SAIC technology, $7.0 million
Coveo (originally all Microsoft all the time, now kitchen sink vendor), $34.7 million
Digital Reasoning (text analysis, no shipping product), $4.2 million
EasyAsk (natural language processing, several owners(, $20 million
Elasticsearch (open source search and consulting), $104 million
Hakia (semantic search), $23.5 million
MarkLogic (XML data management and kitchen sink apps), $73.6 million
Recorded Future (text analysis of Web content), $20.9 million
Recommind (similar to Autonomy method), $15 million
Sinequa (proprietary search and widgets), $5.3 million
X1 (search and new management), $12.2 million
ZyLab (search and licensed visualizations), $2.4 million
May 24, 2014
Most people don’t know that I lived in Brazil in the period before the sheep’s foot rollers crunched through the Brazilian rain forest. The environmental adjustment was due to the need to prepare for the massive Trans Amazon Highway. When the project began to take shape, preparations had to be made. Once Rodovia Transamazonia became “official”, decades of political and economic preparation had been underway. By the mid 1950s, the need for BR 153 was evident to anyone who tried to go west from any major Brazilian city. It was an airplane or weeks, maybe months, of multi-modal transportation. Need to get across a stream. Chop down trees and put up a “bridge.”
Pretty darned effective I learned first hand. Source: http://bit.ly/1r3uFMY
I recall riding in a Caterpillar bulldozer equipped with two sets of sheep foot rollers. Push though the jungle and then drag the rollers over the trees, slow moving animals, and the occasional native’s house, and you are ready to get down to road building. My father, never the environmentally sensitive type, explained that heavy equipment and bulldozing were beautiful: fast, cheap, effective, and potent. And even I, as a child, understood that the natives had to find their future elsewhere. Once the heavy equipment rolled through, the old ways were toast.
I fondly recalled these early lessons from my father, the giant US company for whom he labored as Managing Director, and stunned look on the faces of the people who lived in the forest and scrubland as we rolled through. In my mind’s eye, I imagine the Hachette professionals have that same look: A mixture of surprise, anger, and confusion. The heavy equipment drivers just shifted gears and crushed forward.
I read “As Publishers Fight Amazon, Books Vanish.” Interesting because the company appears to be bulldozing its way through traditional book publishing. My thought is that when the bulldozers finish, the old way is either gone or too expensive to continue. Savvy natives packed up and moved to favelas and reinvented themselves. Some were entrepreneurs and others tried to recapture a life in a transformed environment.
Digital bulldozers transform business process landscapes with speed and brutal efficiency. My father would have been proud of this approach to business. His one regret would be that Amazon’s corporate colors were not the flashy yellow and black that he so loved.
There were a couple of points in the “real” journalism article I noted. Let me highlight each and make a short comment.
First, “The literary community is fearful and outraged, and practically begging for government intervention.” My thought, “Once the forest has been bulldozed, it is tough to regrow.”
Second, “But the real prize is control of e-books, the future of publishing.” My thought, isn’t the future clear. Hasn’t Amazon won? If it had not won, why then the surprise that the bulldozer crushed traditional business processes the way the bulldozer took out the natives’ houses?
Third, the statement “If this is the new American way [attributed to writer and former advertising professional James Patterson], then maybe it has to be changed—by law, if necessary—immediately, if not sooner.” Catchy statement, but I thought, isn’t it too late? Regrowing that jungle and moving the natives back is a somewhat tough task.
Fourth, Amazon allegedly has been making it tough to buy a biography critical of former Wall Street quant Jeff Bezos. My father did not give interviews either. Guess what? The highway was built through the gut of the Amazon.
And the parable?
Once the landscape is changed, going back gets tough. Modern life is not congruent to Rousseau’s fantasy.
Parts of the Transamazonian experience looks like Paramus, New Jersey. Image source: http://bit.ly/1kdwdPz
Amazon, like Google, has been operating for many years, pursuing the same goals, using the mechanisms of online, and building support from people who spend money.
Maybe governments are more powerful than Amazon, Google, and Facebook? The reality, however, is that the bulldozers have already rolled through. The dispossessed, annoyed, and confused can talk. It is going to be very difficult to restore the jungle and the previous way of life.
By the way, search doesn’t work too well on Amazon to begin with. Not being able to find a book is par for Amazon’s course. Bad search helps sales and Amazon’s imperative. I have learned to live with it. Perhaps the publishers, authors, and real journalists should follow my example. Adapt and move on. Yelling at a bulldozer driver and throwing rocks doesn’t change reality.
Stephen E Arnold, May 23, 2014
May 22, 2014
I have a couple of alerts running for the phrase “enterprise search.” The information gathered is not particularly useful. Potentially interesting items like the rather amazing “Future of Search” are not snagged by either Google or Yahoo (Bing). I have noticed a surprising number of alerts about a company doing business as TopSEOS.com. The url is often presented as www.topseos.co.uk and there may be other variants.
Here’s a typical hit in a Google alert. This one appeared on May 22, 2014:
The link leads to a “story” in DigitalJournal.com. a “global media network.” The site is notable because it combines a wide range o f topics, tweets, links, categories, and ads. If you want to more about the service, you can read the about page and get precious little information about this Canadian company. This site appears to be a typical news aggregation service. The “story” is a news release distributed by Google-friendly PRWeb, located in San Francisco.
What is the TopSEOs’ story that appeared as an alert this morning?
The story is a news release about an independent team that evaluates search engine optimization companies. Here’s how the story in my alert looked to me on May 22, 2014:
Several things jumped out at me about the story. First, it lacks substance. The key point is that TopSEOS.co.uk “analyzes market and industry trends in order to remain information of the most important developments which affect the performance of competing companies.” I am not sure exactly what this means, but it sounds sort of important. The link to www.topseos.co.uk redirects to www.uk-topseos.com/rankings-of-best-seo-companies:
April 29, 2014
I read “Consolidation Looms in Business Intelligence, as Tibco Buys Jaspersoft for $185M.” The write up is interesting, but not exactly congruent with my views. May I explain?
The article points out:
Enterprise software vendor TIBCO has acquired Jaspersoft, an open source business intelligence company, for approximately $185 million. It’s not an earth-shaking deal, but it could be a sign of things to come in an analytics software market full of companies and products that have a hard time standing out from the crowd.
MBAs will drooling at the thought of business intelligence deal making if the article’s premise is correct.
But there are several other angles in this Tibco Jaspersoft tie up.
First, check out the list of open source “leaders.” Jaspersoft appears in the list, but with its number six on the “Top of Mind Emerging Companies in Data Discovery Chart,” the response to this deal might be “Who?” The other factoid I gleaned from the Gigaom Research chart was who the heck are SiSense, Logi Analytics, and Roambi. I can only wonder at what firms account for the “other” category. Tibco bought an open source analytics company that is one of those “we’re open source but commercial too” outfits. The purchase price, compared to the deal for Autonomy, is a rounding error in the Autonomy transaction. I find this interesting because Autonomy IDOL does business intelligence, visualization, and a number of other enterprise software functions as well. My take. Why is an open source business intelligence deal going for what seems to be a bargain price?
Second, Tibco did not buy a search company. Jaspersoft is a business intelligence outfit. But what does “business intelligence” mean? A review of Jaspersoft’s products and services points to analytics; that is to say, math. The cloud angle is interesting, but I am not sure how Tibco will convert open source into a hefty chunk of the astronomical $50 billion market the Gigaom research is available for the taking. Is analytics business intelligence? At least, I can sort of define “analytics.” I am not so confident about “business intelligence.”
Third, the implications for search and retrieval are not particularly positive. Search vendors with odd ball product line ups are saying, “We are a business intelligence company.” Maybe so. Without a definition of “business intelligence”, search vendors can say almost anything and be “accurate.” For me, search is clearly a marginalized sector. IBM bought Vivisimo and, as one of my editors, discovered promptly discarded Vivisimo’s roots in clustering and metasearch for the foggy description of “information management.” I wonder if some search vendors are in the undefined Gigaom “other” category.
In my view, search and possibly some “business intelligence” vendors may be dismayed by Tibco’s deal. Can investors recoup their funding for their business intelligence bets? There is a big difference between the estimated $20 million IBM paid for the struggling Vivisimo and the $185 million Tibco paid for Jaspersoft when compared to the $1 billion Oracle paid for the aging Endeca technology. I don’t see consolidation. I see “everything must go” opportunities.
Stephen E Arnold, April 29, 2014
March 28, 2014
We learned on March 26, 2014 suggesting that the German search vendor Intrafind has been looking for the next big thing. The company may have found it, and we expect that this low profile vendor will be plugging into the Elasticsearch power cable. Wikipedia already has, joining hundreds of other firms looking for a solution to doggy indexing in some other open source centric solutions.
Elasticsearch repackager SearchBlox has rolled out Version 8 of its hosted Elasticsearch system, according to Timo Selvaraj, Co-Founder/VP Product Management of SearchBlox.
As if these two recent developments were not enough, GoveWizely, a Washington, DC engineering services firm, has added Elasticsearch to its arsenal. GovWizely, operated by Erik S. Arnold (yep, that’s my boy) has moved adroitly to capitalize on the surging interest in Elasticsearch’s high performance system.
Contrast Elasticsearch’s rise as the go to open source enterprise search system with the struggles of other open source search vendor and some commercial outfits. LucidWorks has ingested $2 million in venture funding, according to Crunchbase. Elasticsearch has received $34 million in funding. Parity, right?
Not so “fast”. (A gentle nod to the fascinating proprietary system shoe horned by Microsoft into SharePoint.) Elasticsearch seems to be catching up to LucidWorks or winning the critical struggle for developers. Here’s the Elasticsearch pitch:
Understated and quiet, according to my engineering team. Could the developments at Intrafind, SearchBlox, and Adhere Solutions, among others, are an early warning system, Elasticsearch certainly could be the “next big thing” in search, enterprise and otherwise.
What’s this mean for the proprietary and non open sourcey vendors like Coveo, Funnelback, Lexmark ISYS, and Hewlett Packard? I would suggest that these firms’ management have to adapt to what appears to an emergent and disruptive force in information processing. If Elasticsearch does emulate the growth of the pre HP Autonomy, the likelihood that the millions of venture funding pumped into search funding and search acquiring may never be repaid. Chilling thought for some stakeholders who may have jumped on the wrong horse and seem compelled to continue to feed the nag fresh, expensive, non recoverable “clover.” (Think millions in hard cash funding with little to show that a payback is imminent or even possible.)