Big Outfits Buy Search Vendors: Does Chaos Commence?
May 25, 2012
I don’t want to mention any specifics in this write up. I have a for-fee Overflight on the subject. I do want to highlight some of the preliminary thoughts the goslings and I collected before creating our client-focused analysis. This write up was sparked by the recent news that the founder of Autonomy, which HP acquired for $10 billion, is seeking new opportunities after eight months immersed in the HP way. See “Hewlett-Packard Can’t Say It Wasn’t Warned about Autonomy.” This write up contained a remarkable statement, even when measured against the work of other “real” journalists:
Some will say this is a classic case of an entrepreneurial business being bought by a hulking, bureaucratic institution which failed to integrate it and failed to understand its culture. Others will say HP, desperate to do a deal, simply overpaid for a company that was going to struggle to maintain its sales and earnings momentum and was deluded about its abilities. Certainly warnings about the latter were there for HP to see before it handed over all that cash. Here’s what Marc Geall, a Deutsche Bank analyst who used to work at Autonomy, said in October 2010 about the business model: “…investment in the business has lagged revenues… [which] could affect customer satisfaction towards the product and the value it delivers.” He went on to warn that Autonomy’s service business was “too lean” and that it “risks falling short of standards demanded by customers”. All of which prompted Geall to question whether the company needed to change its business model – “traditionally, software companies have needed to change their business models at around $1bn in revenues”.
Yep, now the issues are easy to identify: the brutal cost of customer support, the yawning maw of research and development, the time and cost of customizing a system. The problem is that these issues have been identified. However, senior managers looking for the next big thing are extremely confident of their business and technical acumen. Search is a slam dunk. Heck, I can find what I want in Google. How tough can it be to find that purchase order? That confidence may work in business school, but it has not worked in the wild-and-crazy world of enterprise search and content processing.
Think back to the notable search acquisitions over the last few years. Here are some to jump start your memory:
- IBM in 2005 and 2006 purchases iPhrase (a MarkLogic precursor with semantic components) and Language Analysis Systems (a next generation content processing vendor)
- Microsoft which acquired Powerset and Fast Search & Transfer in the 2008 to 2009 period. Both vendors had next-generation systems with semantic, natural language processing, and other near-magical capabilities
- Oracle acquired TripleHop in 2005, focused on its less-and-less visible Secure Enterprise Search line up (SES10g and SES11g), then went on a buying spree to snap up InQuira (actually the company formed when two weaker players, Answerfriend Inc. and Electric Knowledge Inc., merged in 2002 or 2003, RightNow (which uses the Q-Go natural language processing system purchased in 2010 or 2011), and Endeca, an established search vendor with technology dating from the late 1990s)
- SAP snagged some search functions with its NetWeaver buy in 2004 which coexisted in a truce of sorts with the SAP TREX system. SAP bought Business Objects in 2007, the company inherited the Inxight Software, a text analytics vendor with assorted wizardry explained in buzzwords by marketing mavens.
So what have we learned from these buy outs by big companies? Here are the observations:
First, search and content processing does not behave the way other types of software learns to sit, come, and roll over. The MBAs, lawyers, and accountants issue commands like good organizational team players. The enterprise search and content processing crowd listens to the management edicts with bemusement. Everyone thinks search is a slam dunk. How tough can a utility function be? Well, let me remind you, gentle reader, search is pretty darned difficult. Unlike a cloud service for managing contacts, search is not one thing. Furthermore, those who have to use search are generally annoyed because systems have since 1970 failed to generate answers. Search outputs create more work. Usually the outputs are mostly wide of the mark. Big companies want to sell a software product or service that solves a problem like what is the back log for the Midwestern region or when did I last call Mr. Jones? The big companies don’t get this type of system when they buy, often for a premium, companies which purport to make content findable, smart, and accessible. So we have a situation in which a sales presentation whets the appetite of the big company executive who perceives himself or herself as an expert in search. Then when anticipation is at its peak, the sales person closes the deal. In the aftermath, the executives realize that search just does not follow the groove of an accounting system, a videoconferencing system, or a security system. Panic sets in, and you get crazy actions. IBM pretty much jettisoned its search systems and fell in love with open source Lucene / Solr. Good enough was a lot better than trying to figure out the mysteries of proprietary search and how to pay for the brutal research and development costs search requires.
Second, search is a moving target. I find that as recently as my meetings with sleek MBAs from six major financial firms, search was assumed to be a no brainer. Google has figured out search. Move on. When I asked the group how many considered themselves experts in search, everyone replied, “Yes.” I submit that none of these well-paid movers-and-shakers are very good at search and retrieval. Few of them have the time or patience for old fashioned research. Most get information from colleagues, via phone calls which include “I have a hard stop in five minutes”, and emails sent to people whom they have met at social functions or at conferences. Search is not looking up a phone number. Search is not slamming the name of a company into Google. Search is not wandering around midtown Manhattan with an iPhone displaying the location of a pizza joint. Search is whatever the user wishes to find, access, know, or learn at any point in time and in any context. Google is okay at some search functions. Other vendors are okay at others. The problem is that virtually all search and retrieval solutions are okay. People have been trying for about 50 years to deliver responses to queries that are what the user requires. Most systems dissatisfy more than half their users and have for 50 years. A big company buying a next generation search system wants these problems solved. The big company wants to close deals, get client access licenses, or cloud transactions for queries. But the big companies don’t get these things, so the MBAs, lawyers, and accountants are really confused. Confused people make crazy decisions. You get the idea.
Third, search does not mean search. Search technology includes figuring out which words to index in a document. Search does a miserable job of indexing videos unless the video audio track is converted to ASCII and then that ASCII is indexed. Even with this type of content processing system, search does not deliver a usable output. What a user gets is garbled snippets and maybe the opportunity to look at a video to figure out if the information is relevant. Search includes figuring out what a user wants before the user asks the question or even knows what the question is. One company is collecting millions in venture money to achieve this goal. Good luck on that. Search includes providing outputs that answer an employee’s specific question. Most systems provide a horseshoe type of result; that is, the search vendor wants points for getting close to the answer. Employees who have to click, scan, close, and repeat the process are not amused. The employee wants the Smith invoice from April, not increased risk of carpal tunnel problems. The poobahs who acquire search companies want none of these excuses. The poobahs want sales. What search acquisitions generate are increased costs, long sales cycles, and much friction. Marketers overstate and search systems routinely under deliver.
Who cares?

Another enterprise search train wreck. The engineer was either an MBA, an accountant, or a lawyer. No big deal. Just get another search train. How tough can it be to run a search system? Thanks to http://www.eccchistory.org/CCRailroads.htm
Well, the executives selling big companies a search and content processing just want the money. After years of backbreaking effort to generate revenues, the founders usually figure out that there are easier ways to earn a living. If the founders don’t bail out, they get a new job or become a guru at a venture capital firm.
Google and Going Beyond Search
May 17, 2012
The idea for this blog began when I worked through selected Ramanathan Guha patent documents. I have analyzed these in my 2007 Google Version 2. If you are not familiar with them, you may want to take a moment, download these items, and read the “background” and “claims” sections of each. Here are several filings I found interesting:
- US2007 003 8600
- US2007 003 8601
- US2007 003 8603
- US2007 003 8614
- US2007 003 8616
The utility of Dr. Guha’s invention is roughly similar to the type of question answering supported by WolframAlpha. However, there are a number of significant differences. I have explored these in the chapter in The Google Legacy “Google and the Programmable Search Engine.”
I read with interest the different explanations of Google’s most recent enhancement to its search results page. I am not too eager to highlight “Introducing the Knowledge Graph: Things, Not Strings” because it introduces terminology which is more poetic and metaphorical than descriptive. Nevertheless, you will want to take a look at how Google explains its “new” approach. Keep in mind that some of the functions appear in patent documents and technical papers which date from 2006 or earlier. The question this begs is, “Why the delay?” Is the roll out strategic in that it will have an impact on Facebook at a critical point in the company’s timeline or is it evidence that Google experiences “big company friction” when it attempts to move from demonstration to production implementation of a mash up variant.
In the various analyses by experts, “real” journalists, and folks who are fascinated with how Google search is evolving, I am concerned that some experts describe the additional content as “junk” and others view the new approach as “firing back at Bing.”
You must reach your own conclusion. However, I want to capture my observations before they slip from my increasingly frail short term memory.
First, Google operates its own way and in a “Google bubble.” Because the engineers and managers are quite intelligent, clever actions and economy are highly prized. Therefore, the roll out of the new interface tackles several issues at one time. I think of the new interface and its timing as a Google multiple war head weapon. The interface takes a swipe at Facebook, Bing, and Wolfram Alpha. And it captures linkage, wordage, and puffage from the experts, pundits, and wizards. So far, all good for Google.
A MIRV deployment. A single delivery method releases a number of explosive payloads. One or more may hit a target.
Second, the action reveals that Google * had * fallen behind in relevancy, inclusion of new content types, and generating outputs which match the “I have no time or patience for research” user community. If someone types Lady Gaga, the new interface delivers Lady Gaga by golly. Even the most attention deprived Web or mobile user can find information about Lady Gage, click, explore, and surf within a Guha walled garden. The new approach, in my view, delivers more time on Google outputs and increases the number of opportunities to display ads. Google needs to pump those ads for many reasons, not the least of which is maintaining revenue growth in the harsh reality of rising costs.
Third, the approach allows Google to weave in or at least make a case to advertisers that it is getting on its social pony, collecting more fine grained user data, and offering a “better search experience.” The sale pitch side of the new interface is part of Google’s effort to win and retain advertisers. I have to remind myself that some advertisers are starting to realize that “old fashioned” advertising still works for some products and concepts; for example, space advertising in certain publications, direct mail, and causing mostly anonymous Web surfers to visit a Web site and spit out a request for more information or, better yet, buy something.
The new interfaces, however, are dense. I point out in the Information Today column which runs next month that the density is a throw back to the portal approaches of the mid 1990s. There are three columns, dozens of links, and many things with which to entice the clueless user.
In short, we are now in the midst of the portalization of search. When I look for information, I want a list of relevant documents. I want to access those documents, read them, and in some cases, summarize or extract factoids from them. I do not want answers generated by someone else, even if that someone is tapping in the formidable intelligence of Ramanathan Guha.
http://www.billdolson.com/SkyGround/reentryseries/reentryseries.htm
So Google has gone beyond search. The problem is that I don’t want to go there via the Google, Bing, or any other intermediary’s intellectual training wheels. I want to read, think, decide, and formulate my view. In short, I like the dirty, painful research process.
Stephen E Arnold, May 17, 2012
Sponsored by Polyspot
Gartner, A Former Gartner Person, and Ego
May 14, 2012
Computerworld is supposed to be about computers. Now I don’t think too much about Computerworld era computers any more. I think that the owner of Computerworld was gung ho on Verity search once. That told me a great deal about Computerworld’s parent company.
The story “Can a New Analyst Firm Take Down Gartner?” Wow. Quite an amazing write up. Sprawled across three pages, the story is written by a person about whom I know quite a lot after reading the “real” news in Computerworld; for example:
- The author of the story is Rob Enderle who is a big wheel and apparently the brains behind the Enderle Group.
- Mr. Enderle worked at Forrester (an azure chip outfit explaining what’s what in all things related to anything that compute), Giga Information Group (ditto the Forrester services), and a profession who has “worked for” IBM. He worked on audits, competitive analysis, marketing, finance, and security.
- Mr. Enderle is a TV talent type for CNBC, Fox (a Murdoch “real” journalism outfit), Bloomberg, and NPR.
- Mr. Enderle “knows” Gideon Gartner, the brains behind the Gartner we know and love today as a publicly traded azure chip consulting firm.
- Mr. Enderle “helped found” the Giga Information Group.
- Mr. Enderle knows that “line management…doesn’t listen to Gartner and, for that matter, often doesn’t listen to IT either.”
There are other biographical nuggets in the write up too. Mr. Enderle “knows” Gideon Gartner. Be still my heart!
The main point is that an outfit involved in social CRM could—hypothetically and mostly without factual basis—just might be able to “take down Gartner.”
Yowza.
What does the kitty see when it looks in the mirror? A house pet or a wild lion?
The super hero in this story is a company called Ombud, which I assume is shorthand for ombudsman, a full time equivalent who is supposed to be a pair of ears with moist eyes and a warm nature able to solve a customer’s problem. I don’t know any ombudsmen, however. Those characteristics often match up with social workers in my experience.
There were several overt main points in the story about Ombud which I found more like search engine optimization and ego marketing. For instance:
I learned:
Gartner Group was conceived well before social networking, at a time when there not only was no Internet but no PCs. It seemed that it wouldn’t be long before someone would figure out how to blend experts, practitioners and vendors into a service that would be cheaper, more current and more focused on the unique needs of an individual company, thus providing more real value (regardless of price) than the older model.
Er, so what? Ombud is a Web site for a company which offers the same pay to play information which comes from most azure chip and blue chip consulting firms. Check ‘em out yourself at www.ombud.com.
Second, unlike Gartner and I assume any other consulting outfit, Ombud sells “access to RFPs which users create and vendors bid on.” I think the idea is that one can eliminate intermediaries, post a request for work, get bids, and pick a vendor. The organization just goes direct. I know how poorly the traditional procurement process works, but I am sure that a Fortune 50 company will experiment with Ombud. Anything that cuts the burdensome fees imposed by azure chip consultants is a good thing for most chief financial officers.
The Courier Journal: A Louisville Death Rattle
May 13, 2012
In 1981, I joined the Courier Journal and Louisville Times. That was 31 years ago. I am not sure how I made the decision to leave the Washington, DC, area to journey to a city whose zip code and telephone area code were unknown to me. I am a 212, 202, and 301 type of person.
I recall meeting Barry Bingham Jr. He asked me what I did in my spare time. I was thunderstruck. My former employers—Halliburton Nuclear Utility Services and Booz, Allen & Hamilton—never asked me those questions. Those high powered, hard charging outfits wanted to know how much revenue I had generated and how much money I had saved the company, when the next meeting with the Joint Committee on Atomic Energy was, and how the Cleveland Design & Development man trip vehicle was rolling along. The personal stuff floored me.
I did not have an answer. As a Type A, Midwestern, over-achieving, no-brothers-and-no sisters worker bee, fun was not a big part of my personal repertoire.
I asked him, “Why?”
I recall to this day his answer, “I want our officers and employees to have time with their families, get involved in the community, and do great work without getting into that New York City thing.”
Interesting. The Courier Journal had a very good reputation. The newspaper was profitable, operated a wide range of businesses, printed the New York Times’s magazine for the Gray Lady, and operated a commercial database company. In fact, in 1980 the Courier Journal was one of the leaders in commercial online information, competing with a handful of other companies in the delivery of information via digital channels, not the dead-tree, ruin-the-environment, and dump-chemicals approach of most publishing companies.
In 1986, Gannet bought the Courier Journal. The commercial database unit was of zero interest to Gannet, so it and I were sold to Bell+Howell. After a short stint at a company entrenched in 16 mm motion film projectors, I headed back to New York City.
I retained my residence in Louisville, and I have watched the trajectory of the Courier Journal as it moved forward.
I have to be blunt. The Courier Journal is not the newspaper, the company, or the community force it was when I joined Mr. Bingham and a surprisingly diverse, bright, forward-looking team 31 years ago. The 1981 management approach of the Courier Journal was a culture shock to me. Think of the difference between Dick Cheney and Mr. Rogers. The 2012 approach saddens me.
This morning I read “Answering Your Questions on CJ Changes,” written by a person whom I do not know. The author of the article is Wesley Jackson, publisher of the Courier Journal. (I never liked the acronym CJ and still do not.)
The main point of the article is that the Courier Journal has to raise its prices. Last week, Mr. Jackson wrote a short article in the Courier Journal informing subscribers a letter would arrive explaining the new services that would be available. We received our letter on Wednesday, May 9, 2012. We called on Thursday, May 10, 2012, and cancelled our subscription. I am not sure how many other subscribers took this action, but a sufficient number of Courier Journal readers called to kill the phone system at the newspaper.
Mr. Jackson wrote this morning:
Unfortunately our Customer Service Center’s phone system had technical problems, and many of you had long wait times or could not get through to get your questions answered. That I know was frustrating.
I bet. I would love to see the data about the number of calls and the number of cancellations that the paper received when it announced the rate hike, a free iPad application for subscribers, and an email copy of the newspaper sent each day to paying customers.
The write up troubled me for several other reasons:
- Some of the word choices were of the touchy-feely school of communication. There are 19 “we’s”. The word “value” appears twice, there are seven categoricals: six all’s and one never; and the word “conversation” appears twice.
- There is at least one split infinitive “to personally apologize”
- An absolutely amazing promise expressed in this statement: “For those of you who would like to ask questions directly, please email me at publisher@courier-journal.com or send a letter to Publisher, Courier-Journal Media, 525 W. Broadway, Louisville, KY 40202. I promise you will each receive a response.”
“Promise,” “all,” and “never”—yep, I believe those assertions.
I would have included an image of Wesley Jackson but I had to pay for it. Not today, sorry.
My view is that I hear a death rattle from the Courier Journal. The reality of the newspaper is that it runs more and more syndicated content. The type of local coverage for which the paper was known when I joined in 1981 has decreased over the years. When I want news, I look at online services. What I have noticed is that what appears in the Courier Journal has been mentioned on Facebook, Twitter, or headline aggregation services two or three days before the information appears in either the Courier Journal’s hard copy edition or its online site, www.courier-journal.com.
Dave Kellogg, the former president of MarkLogic, used to chide me that I should not refer to major publishing operations and “dead tree publishers.” My view was and is that I am entitled to my opinion. Traditional publishing companies have failed to respond to new opportunities to disseminate and profit from information opportunities.
The list of mistakes include:
- Belief that an app will generate new revenue. Unfortunately apps are not automatic money machines. (Print-centric apps are not the go-to medium for many digital device users.)
- Assumptions about a person’s appetite for paying for “nice to have content.” (One pays for “must have” content, not “nice to have” content.)
- Failure to control costs. (Print margins continue to narrow as traditio0nal publishers try to regain the glory of the pre digital business models.)
- Firing staff who then go on to compete by generating content funded by a different business model. (This blog is an example. We do online advertising and inclusions and sell technical services. For some reason, this works for me thanks to my team which includes some former “real” journalists.)
- Assuming that new technology for printing color on newsprint equips an information technology department that it can handle other information technologies in an effective manner. (Skill in one technical area does not automatically transfer to another technical field.)
I can hear the labored breathing of a local newspaper struggling to stay alive. What do you hear?
Stephen E Arnold, May 13, 2012
Sponsored by HighGainBlog, which is ArnoldIT
Inktomi and Fast Search: Two Troubled Search Companies, One Lesson
May 8, 2012
I found the write up by Diego Basch interesting and thought provoking. I have a little experience with Inktomi. For the original FirstGov.gov system, the US government used Inktomi for the public facing index of US government unclassified information. (FirstGov.gov is now www.usa.gov)
Inktomi had in 2000 a “ready to go” index of content from Dot Gov Web sites. The firm’s business model matched the needs of the US government. There were the normal contracting and technical hurdles for a modestly sized US government project with a fairly tight timeline. No big deal. Job done. Inktomi worked.
When I read “A Relevant Tale: How Google Killed Inktomi,” I thought the write up had some useful information. However, I don’t think Google killed Inktomi or any other search system. Google did not kill Fast Search & Transfer, Excite, HotBot, or any other search system in its rise to its alleged 65 percent share of the search market. (Google share is actually much higher, based on my analyses.)
Excite’s early 1997 attempt at portalization. Can you spot the search box? Does this look like the current version of Google? Say, “No.” Now log into Google and run a query for rental car. Now do you see the similarity between the early portal craziness and the modern Google? I do.
What killed off these outfits was their business models. Let me explain using Inktomi and Fast Search as examples. I could cite other cases, but these two are okay for a free blog post for the two or three readers I have.
Inktomi, for whatever reason, concluded that people wanted to offer search, not do the heavy lifting. In the portal fever that was raging from 1998 to 2001, Web sites wanted to be the “front page” of the Internet. The result was that America Online, Excite, Lycos, and Yahoo among others jammed links on the splash page. At one time, I counted more than 60 links on the Excite home page. Once I hit 50 links, I quit counting. My eyes and patience can cope with three to five things. More than that, and I move on.
Inktomi’s analysts did the spreadsheet fever thing, making assumptions about how many Web sites would license Inktomi results, pay Inktomi’s fees, and generate revenue from the front page of the Internet craziness. The reality was that Inktomi did not have enough customers to support the cost of the spidering, bandwidth, investment in performance, research and development for precision and recall, and the other costs that are underestimated or just ignored. The result was the collapse of the company.
Oracle and SAP: The Milagro Database War
May 3, 2012
I received an email inducing me to read “Hana and Exalytics: SAP’s Hype Versus Oracle’s FUD.” The write up takes a serious or at least semi serious at Milagro database war. If you are not familiar with the Milagro Beanfield War, you might find the write up a loose allegory of what’s happening in traditional data management companies and the NoSQL farmers.

The Information Week write up does not talk about the real story, however. What we get is two giants of traditional enterprise software squabbling over which traditional data management system is most likely to keep the Fortune 1000, government agencies, and big educational institutions within the traditional enterprise software corral.
With regard to Oracle, the write up asserts:
Oracle’s Larry Ellison and Safra Catz have missed few opportunities to discredit Hana in recent months. But executive VP Thomas Kurian took the slams a level deeper on Friday with a one-hour Webinar clearly intended to sow seeds of fear, uncertainty and doubt in the minds of would-be Hana customers. The session was billed as an Exalytics seminar, but each point set up a contrast with Hana. Kurian claimed, among other things, that SAP’s product costs five times to 50 times more than Exalytics and that it doesn’t support SQL (relational) or MDX (multidimensional) query languages, requiring apps to be rewritten to run on the new database.
The Information Week write up reports:
SAP’s hype about these apps is getting a little ahead of deployed market reality. Both Hana and Oracle Exalytics can point to dramatic before-and-after differences in query speeds. (Even SAP grants that Exalytics can accelerate queries.) SAP says the real payoff from Hana will be in transforming business processes, not just accelerating queries. But we haven’t seen enough solid, real-world customer examples documenting transformed business competitiveness.
The Open Source Search Ostriches
April 30, 2012
ArnoldIT, located in Harrod’s Creek, Kentucky, has spotted a new species of search, content processing, and text mining vendor: The Scrutans Struthioniformes. Believed to be related to the ratites, this new subspecies is known to be indifferent to ignorant of the predator from the open source jungle.
The proprietary search vendor, Scrutans Struthioniformes, ignores the impact of open source search and information retrieval systems.
ArnoldIT has completed a couple of exploratory expeditions thought he wilds of open source search, clustering, and related disciplines. Sparked by the bimonthly feature on open source search which is currently appearing in Information Today’s Online Magazine, the discovery of the Scrutans Struthioniformes was unexpected.
For almost 50 years, information retrieval meant proprietary systems built upon innovations by academic researchers. When the influence was from the number crunching of the Cornell school or the semantic shenanigans from Stanford, search and retrieval translated to:
- Expensive to license, install, optimize, and maintain systems
- Licensing restrictions which prevented client-specific tailoring and fast cycle problem remediation or feature addition
- High levels of user dissatisfaction from the CFO’s office (the lady who pays the bills) to the user in the sales department (the person who has to find out what happened to a particular customer’s order).
What’s changed, according to ArnoldIT, is that open source options are readily available. Smart outfits like IBM killed off in house, brute force search efforts and embraced the open source Lucene/Solr technology. IBM is a proprietary outfit, but the use of Lucene/Solr allowed more effort to be put into value-adding projects such as the “wrappers” which make Watson a game show winner. IBM has also used its billions to purchase proprietary vendors to deliver “additional value.” The purchase of Vivisimo is a good example of a quick way to get clustering, deduping, and federating functions to bolt on the open source plumbing. IBM may disagree, but we have our views.
Other vendors have built businesses on open source search. One example is the emergence of Lucid Imagination and its Lucid Works Enterpriser 2.0 solution. Licensees get speedy search and retrieval, a staff able to answer questions, and a the rapid cycle innovation of the open source Lucene/Solr software.
Clever Amazon is a “sort of” open outfit. On one hand, the company uses open source software to make the Amazon cloud work. However,the CloudSearch solution is based on A9. Amazon, however, provides “sort of” open application programming interfaces. Open source as a business angle is part of the CloudSearch play along with making life easy for developers to deliver “good enough” search.
The Basho Riak Search angle is a variation. Riak Search is proprietary but Basho has made it open source. (A free profile of Basho is available by registering at TheSeed2020, an ArnoldIT content delivery Web site.) Good citizens and good marketing. For a company with a problem which requires Basho data management, the Riak Search solution is available, and it is open source.
There are other variations as well, and these are explained in the ArnoldIT briefing about open source search, its opportunities, and its challenges. Unlike the technology payloads delivered by blogs, the ArnoldIT briefing focuses on the business angle of open source search, and the research has delivered some shockers; for example:
- In a sample of 35 proprietary search vendors, 25 assert that their systems are in some way open source. Good marketing, better technology, or great hyperbole?
- In a sample of 100 search vendors, two thirds of those pinged by ArnoldIT know about or are on top of open source search. Quite an assertion as the Lucid Imagination Lucene Revolution approaches with dozens of case studies that reveal large companies’ willingness to shift from proprietary solutions to open source search. Are most vendors of proprietary search systems ignoring reality? Sure looks like some are confident the search world tomorrow will look the way it did in 2003.
- Hosted search is gaining traction in some specific niches. Two of these niches have long been dominated by proprietary systems. More surprising in the fact that the greatest inroads are being made among the Fortune 1000. That’s the market where money often is for enterprise software vendors.
Will vendors of proprietary search and retrieval systems be able to keep their investors and stakeholders happy as open source becomes a greater force in 2013? The briefing considers the scenario when firms pour more funds into open source search and content processing start ups. If this happens, life becomes more difficult from “on the bubble” vendors of taxonomy, clustering, search, and basic information retrieval systems.
Net net: Another search revolution is brewing. Is your proprietary search vendor a Scrutans Struthioniformes? A better question: Are you? For more information about the ArnoldIT open source search briefing, write seaky2000 at yahoo dot com for options and fees. ArnoldIT may create an open source search ostrich T shirt. Stay tuned. Max and Tess are working on this project now.
Stephen E Arnold, April 30, 2012
Sponsored by Ikanow
IBM Buys Vivisimo Allegedly for Its Big Data Prowess
April 25, 2012
Big data. Wow. That’s an angle only a public relations person with a degree in 20th century American literature could craft. Vivisimo is many things, but a big data system? News to me for sure.
IBM has been a strong consumer and integrator of open source search solutions. Watson, the game show winner, used Lucene with IBM wrapper software to keep the folks in Jeopardy post production on their toes.
A screen shot of the Vivisimo Velocity system displaying search results for the RAND organization. Notice the folders in the left hand panel. The interface reveals Vivisimo’s roots in traditional search and retrieval. The federating function operates behind the scenes. The newest versions of Velocity permit a user to annotate a search hit so the system will boost it in subsequent queries if the comment is positive. A negative rating on a result suppresses that result.
I learned that IBM allegedly purchased Vivisimo, a company which I have covered in my various monographs about search and content processing. Forbes ran a story which was at odds with my understanding of what the Vivisimo technology actually does. Here’s the Forbes’ title: “IBM To Buy Vivisimo; Expands Bet On Big Data Analytics.” Notice the phrase “big data analytics.”
Why do I point out the “big data” buzzword? The reasons include:
- Vivisimo has a clustering method which takes search results and groups them, placing similar results identified by the method in “folders”
- Vivisimo has a federating method which, like Bright Planet’s and Deep Web Technologies’, takes a user’s query and sends the query to two or more indexing systems, retrieves the results, and displays them to the user
- Vivisimo has a clever de-duplication method which makes the results list present one item. This is important when one encounters a news story which appears on multiple Web sites.
According to the write up in Forbes, a “real” news outfit:
IBM this morning said it has agreed to acquire Vivisimo, a Pittsburgh-based provider of big data access and analysis tools.
Okay, but in Beyond Search we have documented that Vivisimo followed this trajectory in its sales and marketing efforts since the company opened for business in 2000. In fact, the Wikipedia write up about Vivisimo says this:
Vivisimo is a privately held enterprise search software company in Pittsburgh that develops and sells software products to improve search on the web and in enterprises. The focus of Vivisimo’s research thus far has been the concept of clustering search results based on topic: for example, dividing the results of a search for “cell” into groups like “biology,” “battery,” and “prison.” This process allows users to intuitively narrow their search results to a particular category or browse through related fields of information, and seeks to avoid the “overload” problem of sorting through too many results.
Conversation? I Think Not
April 23, 2012
In my dead tree edition of the New York Times, I read “The Flight from Conversation” by an MIT professor and author. The newspaper put the story on page one of the Sunday Review section with a jump to pages six and seven. The online version was visible to me this morning (April 23, 2012) as “Opinion. The Flight from Conversation.” I am never sure which New York Times story will be available to whom or for how long, so you are on your own if you get a 404 or a begging for dollars screen.
What I know is that “conversation” is idealized in today’s thumb typing world. Defining conversation is useful. Holding a conversation is getting to be an exercise in human interaction archaeology.
Does this Thomas Kinkade painting represent a real place? Does discourse today provide “conversation” or an idealized notion of give and take among and between individuals?
Straight away let me say that I found the write up interesting because it was chock full of “hooks”. I had a boss at Booz, Allen & Hamilton in the days when the firm had a pretty good reputation for management and technology consultant. This particular manager collected “hook phrases,” which he hoped to use in his reports, speeches, and his various writings. On my first pass through the Flight article I noted these keepers:
- Devices change what we do and who we are
- Turn desks into cockpits
- The Goldilocks [sic] effect
- Put ourselves on cable news
- Automatic listeners
- Confuse conversation with connection
- Illusion of companionship without the demands of relationship
- New devices have turned being alone into a problem that can be solved
- Device free zones
- Casual Fridays and conversational Thursdays
Quite a payload. Upon reviewing my collection of hooks from the essay, the author should be working for CNN or CNBC.
The key point of the write up is that instead of engaging in conversation, the thumb typing generation likes being with people and being online. I agree. The notion of checking email in the middle of a face to face conversation with a person at KY Fry or lunch chatter at a trade show often warrants some digital supplements. I get paid to attend to trade shows, but not even money can cut through the marketing blather, the pitches from consultants looking for work, and speakers who are nervous about giving a talk which will avoid controversy, make a good impression, and sell someone something.
My concern is not about the essay. The anomie of modern society has been an idea kicking around since I experienced college lectures from razor sharp academics. I started thinking about the assumptions on which the essay rests. For example, how easy is it at MIT or any big name university focused on funding, start ups, and getting faculty to function as magnets which pull cash for chairs to get faculty to make themselves available for students who want a conversation? Are those office hours real or the academic equivalent of vaporware?
New Open Source Search Information Service Available
April 16, 2012
Open source search was not a viable option for the enterprise in 2003 when ArnoldIT started work on the first Enterprise Search Report. Stephen E. Arnold wrote two more editions before he decided that proprietary search solutions were becoming “look alikes.” In the ArnoldIT 2011 study, The New Landscape of Enterprise Search, Stephen E Arnold and his editorial team decided not to cover open source search solutions because the sector was moving rapidly and no large players had emerged. Now almost a year after the New Landscape of Enterprise Search, the pace of innovation has increased significantly and there are some significant commercial open source search ventures in the US and elsewhere.
The ArnoldIT editorial team, which consists of librarians and technologists, recommended that we begin the task of identifying important articles to determine if there were sufficient mass to warrant a Beyond Search type of publication focused on open source search. We concluded that there was an increasing flow of information about open source search. Therefore, we want to share this information with others who have an interest in what is shaping up to be a disruptive force in information retrieval.
We want to help document that there is a new approach to enterprise search. The solutions involve the cloud, toolkits, and ready-to-run services available with a mouse click. The vendors pushing forward range from companies which have an established profile in the business community; for instance, IBM and Lucid Imagination. There are some open source search solutions which are not widely known in certain organizations; Xapian and Summa Summix come to mind. In between there are dozens of open source search, content processing, and hybrid services.
ArnoldIT recently completed a study of open source search option. After finishing our research for a client, we decided to move forward on a new information service. OpenSearchNews.com will discuss big data search solutions, including Amazon’s CloudSearch service, Basho Riak, and Constellio. If you are not familiar with these solutions and have an interest in search, you will want to check out OpenSearchNews.com.
The new microsite, now publicly available, publishes Monday through Friday and provides critical commentary, information about products, and highlights additional sources about open source search. The information service will report about the companies, trends, and products which offer an alternative to the seven figure solutions from proprietary enterprise search solutions. The approach of the service will be similar to that taken by researchers who want information that provides essential facts and links to high-value sources of information. The service will provide up-to-date news and analysis about the dynamic market for open source search and will publish Monday to Friday at www.opensearchnews.com. Additional information about the new information service is available on the site’s About page. Keep in mind that we don’t do “real” news. We have more in common with researchers and analysts than those who work for organizations embracing the tenets of Mr. Murdoch.
Recent stories include:
- Enterprise Adoption of Solr Lucene Rises
- In the Future, Enterprise Search Will Be a Service
- Lucid Works 2.0 Attracts Enterprise Suitors
Emily Aldridge, the editor of the publication, is an MLS and expert searcher who demonstrated exceptional capabilities in tracking down information about products and projects with names like Hounder, Oxyus, and Piscator.
Emily Aldridge, editor of the new information service, said:
“Open source search has become a fast-growing segment of the enterprise search and big data markets. The number of companies competing in this segment is growing. Large commercial enterprises are embracing open source and providing useful software to anyone who wants to use it. Two good examples are the contributions of Lucid Imagination and LinkedIn. The Danish government has supported an open source search initiative which provides search features for libraries looking to provide a patron with a single search box for a range of content in different collections.”
The information service will cover cloud solutions, open source search appliances, and mention commercial services which have open source software under the glossy exteriors of products and services from Amazon and IBM. We will also cover related subjects such as proprietary cloud search services. Comments will be accepted, and like other ArnoldIT information services we hope to combine useful information with some pointed observations.
Like Beyond Search, we will roll out new features and functions over time. We plan to use Google’s AdSense to help offset the cost of producing the service. If you want to learn more about the publication, contact us at seaky2000 at yahoo dot com.
Don C. Anderson, Senior Engineer, ArnoldIT, April 16, 2012
Sponsored by Pandia.com









