Social Search: Don Quixote Is Alive and Well
January 18, 2013
Here I float in Harrod’s Creek, Kentucky, an addled goose. I am interested in other geese in rural Kentucky. I log into Facebook, using a faux human alias (easier than one would imagine) and run a natural language query (human language, of course). I peck with my beak on my iPad using an app, “Geese hook up 40027.” What do I get? Nothing, Zip, zilch, nada.
Intrigued I query, “modern American drama.” What do I get? Nothing, Zip, zilch, nada.
I give up. Social search just does not work under my quite “normal” conditions.
First, I am a goose spoofing the world as a human. Not too many folks like this on Facebook, so my interests and my social graph is useless.
Second, the key words in my natural language query do not match the Facebook patterns, crafted by former Googlers and 20 somethings to deliver hook up heaven and links to the semi infamous Actor’s Theater or the Kentucky Center.
Social search is not search. Social search is group centric. Social search is an outstanding system for monitoring and surveillance. For information retrieval, social search is a subset of information retrieval. How do semantic methods improve the validity of the information retrieved? I am not exactly sure. Perhaps the vendors will explain and provide documented examples?
Third, without context, my natural language queries shoot through the holes in the Swiss Cheese of the Facebook database.
After I read “The Future of Social Search,” I assumed that information was available at the peck of my beak. How misguided was I? Well, one more “next big thing” in search demonstrated that baloney production is surging in a ailing economy. Optimism is good. Crazy predictions about search are not so good. Look at the sad state of enterprise search, Web search, and email search. Nothing works exactly as I hope. The dust up between Hewlett Packard and Autonomy suggests that “meaning based computing” is a point of contention.
If social search does not work for an addled goose, for whom does it work? According to the wild and crazy write up:
Are social networks (or information networks) the new search engine? Or, as Steve Jobs would argue, is the mobile app the new search engine? Or, is the question-and-answer formula of Quora the real search 2.0? The answer is most likely all of the above, because search is being redefined by all of these factors. Because search is changing, so too is the still maturing notion of social search, and we should certainly think about it as something much grander than socially-enhanced search results.
Yep, Search 2.0.
But the bit of plastic floating in my pond is semantic search. Here’s what the Search 2.0 social crowd asserts:
Let’s embrace the notion that social search should be effortless on the part of the user and exist within a familiar experience — mobile, social or search. What this foretells is a future in which semantic analysis, machine learning, natural language processing and artificial intelligence will digest our every web action and organically spit out a social search experience. This social search future is already unfolding before our very eyes. Foursquare now taps its massive check in database to churn out recommendations personalized by relationships and activities. My6sense prioritizes tweets, RSS feeds and Facebook updates, and it’s working to personalize the web through semantic analysis. Even Flipboard offers a fresh form of social search and helps the user find content through their social relationships. Of course, there’s the obvious implementations of Facebook Instant Personalization: Rotten Tomatoes, Clicker and Yelp offer Facebook-personalized experiences, essentially using your social graph to return better “search” results.
Semantics. Better search results. How does that work on Facebook images and Twitter messages?
My view is that when one looks for information, there are some old fashioned yardsticks; for example, precision, recall, editorial policy, corpus provenance, etc.
When a clueless person asks about pop culture, I am not sure that traditional reference sources will provide an answer. But as information access is trivialized, the need for knowledge about the accuracy and comprehensiveness of content, the metrics of precision and recall, and the editorial policy or degree of manipulation baked into the system decreases.
See Advantech.com for details of a surveillance system.
Search has not become better. Search has become subject to self referential mechanisms. That’s why my goose queries disappoint. If I were looking for pizza or Lady Gaga information, I would have hit pay dirt with a social search system. When I look for information based on an idiosyncratic social fingerprint or when I look for hard information to answer difficult questions related to client work, social search is not going to deliver the input which keeps this goose happy.
What is interesting is that so many are embracing a surveillance based system as the next big thing in search. I am glad I am old. I am delighted my old fashioned approach to obtaining information is working just fine without the special advantages a social graph delivers.
Will today’s social search users understand the old fashioned methods of obtaining information? In my opinion, nope. Does it matter? Not to me. I hope some of these social searchers do more than run a Facebook query to study for their electrical engineering certification or to pass board certification for brain surgery.
Stephen E Arnold, January 18, 2013
Big Data and Search
January 1, 2013
A new year has arrived. Flipping a digit on the calendar prompts many gurus, wizards, failed Web masters, former real journalists, and unemployed English majors to identify trends. How can I resist a chrome plated, Gangnam style bandwagon? Big Data is no trend. It is, according to the smart set:
that Big Data would be “the next big chapter of our business history.
My approach is more modest. And I want to avoid silver-numbered politics and the monitoring business. I want to think about a subject of interest to a small group of techno-watchers: Big Data and search.
My view is that there has been Big Data for a long time. Marketers and venture hawks circle an issue. If enough birds block the sun, others notice. Big Data is now one of the official Big Trends for 2013. Search, as readers of this blog may know, experiences the best of times and the worst of times regardless of the year or the hot trends.
As the volume of unstructured information increases, search plays a part. What’s different for 2013 is that those trying to make better decisions need a helping hand, crutches, training wheels, and tools. Vendors of analytics systems like SAS and IBM SPSS should be in the driver’s seat. But these firms are not. An outfit like Palantir claims to be the leader of the parade. The company has snazzy graphics and $150 million in venture funding. Good enough for me I suppose. The Palantirs suggest that the old dudes at SAS and SPSS still require individuals who understand math and can program for the “end user”. Not surprisingly, there are more end users than there are SAS and SPSS wizards. One way around the shortage is to make Big Data a point-and-click affair. Satisfying? The marketers say, “For sure.”
A new opportunity arises for those who want the benefits of fancy math without the cost, hassle, and delay of dealing with intermediaries who may not have an MBA or aspire to be independently wealth before the age of 30. Toss in the health care data the US Federal government mandates, the avalanche of fuzzy thinking baloney from blogs like this one, and the tireless efforts of PR wizards to promote everything thing from antique abacuses to zebra striped fabrics. One must not overlook e-mail, PowerPoint presentations, and the rivers of video which have to be processed and “understood.” In these streams of real time and semi-fresh data, there must be gems which can generate diamond bright insights. Even sociology major may have a shot at a permanent job.
The biggest of the Big Berthas are firing away at Big Data. Navigate to “Sure, Big Data Is Great. But So Is Intuition.” Harvard, MIT, and juicy details explain that the trend is now anchored into the halls of academe. There is even a cautionary quote from an academic who was able to identify just one example of Big Data going somewhat astray. Here’s the quote:
At the M.I.T. conference, a panel was asked to cite examples of big failures in Big Data. No one could really think of any. Soon after, though, Roberto Rigobon could barely contain himself as he took to the stage. Mr. Rigobon, a professor at M.I.T.’s Sloan School of Management, said that the financial crisis certainly humbled the data hounds. “Hedge funds failed all over the world,” he said. THE problem is that a math model, like a metaphor, is a simplification. This type of modeling came out of the sciences, where the behavior of particles in a fluid, for example, is predictable according to the laws of physics.
Sure Big Data has downsides. MBAs love to lift downsides via their trusty, almost infallible intellectual hydraulics.
My focus is search. The trends I wish to share with my two or three readers require some preliminary observations:
- Search vendors will just say they can handle Big Data. Proof not required. It is cheaper to assert a technology than actually develop a capability.
- Search vendors will point out that sooner or later a user will know enough to enter a query. Fancy math notwithstanding, nothing works quite like a well crafted query. Search may be a commodity, but it will not go away.
- Big Data systems are great at generating hot graphics. In order to answer a question, a Big Data system must be able to display the source document. Even the slickest analytics person has to find a source. Well, maybe not all of the time, but sometimes it is useful prior to a deposition.
- Big Data systems cannot process certain types of data. Search systems cannot process certain types of data. It makes sense to process whatever fits into each system’s intake system and use both systems. The charm of two systems which do not quite align is sweet music to a marketer’s ears. If a company has a search system, that outfit will buy a Big Data system. If a company has a Big Data system, the outfit will be shopping for a search system. Nice symmetry!
- Search systems and Big Data systems can scale. Now this particular assertion is true when one criterion is met; an unending supply of money. The Big Data thing has a huge appetite for resources. Chomp. Chomp. That’s the sound of a budget being consumed in a sprightly way.
Now the trends:
Trend 1. Before the end of 2013, Big Data will find itself explaining why the actual data processed were Small Data. The assertion that existing systems can handle whatever the client wants to process will be exposed as selective content processing systems. Big Data are big and systems have finite capacity. Some clients may not be thrilled to learn that their ore did not include the tonnage that contained the gems. In short, say hello to aggressive sampling and indexes which are not refreshed in anything close to real time.
Trend 2. Big Data and search vendors will be tripping over themselves in an effort to explain which system does what under what circumstances. The assertion that a system can do both structured and unstructured while uncovering the meaning of the data is one I want to believe. Too bad the assertion is mushy in the accuracy department’s basement.
Trend 3.The talent pool for Big Data and search is less plentiful than the pool of art history majors. More bad news. The pool is not filling rapidly. As a result, quite a few data swimmers drown. Example: the financial crisis perhaps? The talent shortage suggests some interesting cost overruns and project failures.
Trend 4. A new Big Thing will nose into the Big Data and search content processing space. Will the new Big Thing work? Nah. The reason is that extracting high value knowledge from raw data is a tough problem. Writing new marketing copy is a great deal easier. I am not sure what the buzzword will be. I am pretty sure vendors will need a new one before the end of 2013. Even PSY called it quits with Gangnam style. No such luck in Big Data and search at this time.
Trend 5. The same glassy eyed confusion which analytics and search presentations engender will lead to greater buyer confusion and slow down procurements. Not even the magic of the “cloud” will be able to close certain deals. In a quest for revenue, the vendors will wrap basic ideas in a cloud of unknowing.
I suppose that is a good thing. Thank goodness I am unemployed, clueless, and living in a rural Kentucky goose pond.
Stephen E Arnold, January 1, 2012
Another Beyond Search analysis for free
Predictive Coding: Who Is on First? What Is the Betting Game?
December 20, 2012
I am confused, but what’s new? The whole “predictive analytics” rah rah causes me to reach for my NRR 33 dB bell shaped foam ear plugs.
Look. If predictive methods worked, there would be headlines in the Daily Racing Form, in the Wall Street Journal, and in the Las Vegas sports books. The cheerleaders for predictive wizardry are pitching breakthrough technology in places where accountability is a little fuzzier than a horse race, stock picking, and betting on football games.
The godfather of cost cutting for legal document analysis. Revenend Thomas Bayes, 1701 to 1761. I heard he said, “Praise be, the math doth work when I flip the numbers and perform the old inverse probability trick. Perhaps I shall apply this to legal disputes when lawyers believe technology will transform their profession.” Yep, partial belief. Just the ticket for attorneys. See http://goo.gl/S5VSR.
I understand that there is PREDICTION which generates tons of money to the person who has an algorithm which divines which nag wins the Derby, which stock is going to soar, and which football team will win a particular game. Skip the fuzzifiers like 51 percent chance of rain. It either rains or it does not rain. In the harsh world of Harrod’s Creek, capital letter PREDICTION is not too reliable.
The lower case prediction is far safer. The assumptions, the unexamined data, the thresholds hardwired into the off-the-shelf algorithms, or the fiddling with Bayesian relaxation factors is aimed at those looking to cut corners, trim costs, or figure out which way to point the hit-and-miss medical research team.
Which is it? PREDICTION or prediction.
I submit that it is lower case prediction with an upper case MARKETING wordsmithing.
Here’s why:
I read “The Amazing Forensic Tech behind the Next Apple, Samsun Legal Dust Up (and How to Hack It).” Now that is a headline. Skip the “amazing”, “Apple”, “Samsung,” and “Hack.” I think the message is that Fast Company has discovered predictive text analysis. I could be wrong here, but I think Fast Company might have been helped along by some friendly public relations type.
Let’s look at the write up.
First, the high profile Apple Samsung trial become the hook for “amazing” technology. the idea is that smart software can grind through the text spit out from a discovery process. In the era of a ballooning digital data, it is really expensive to pay humans (even those working at a discount in India or the Philippines) to read the emails, reports, and transcripts.
Let a smart machine do the work. It is cheaper, faster, and better. (Shouldn’t one have to pick two of these attributes?)
Fast Company asserts:
“A couple good things are happening now,” Looby says. “Courts are beginning to endorse predictive coding, and training a machine to do the information retrieval is a lot quicker than doing it manually.” The process of “Information retrieval” (or IR) is the first part of the “discovery” phase of a lawsuit, dubbed “e-discovery” when computers are involved. Normally, a small team of lawyers would have to comb through documents and manually search for pertinent patterns. With predictive coding, they can manually review a small portion, and use the sample to teach the computer to analyze the rest. (A variety of machine learning technologies were used in the Madoff investigation, says Looby, but he can’t specify which.)
Coveo Funding Hits a Deep Drift $34 Million
December 14, 2012
I was slogging through the Swedish snow when my mobile phone beeped. I glanced at the tiny screen on my Canadian-engineered BlackBerry and saw an interesting headline: “Coveo, Canada’s Big Data Offering, Nabs $18 Million as It Edges Closer to Profitability.” The article told me:
Coveo, the Quebec-based big data analytics company, has received a massive $18 million from local growth equity firm, Tandem Expansion Fund. It is best known for its recently-launched Coveo for Salesforce product, a cloud-based application which delivers quantitative insights about customer interactions. The app works by pushing relevant real-time information to sales and marketing teams, whether its an account, a lead, or new opportunity.
Interesting positioning. I thought about Vivisimo—the company with the deduplication and on-the-fly categorization technology—transforming into a Big Data company after IBM’s PR department wordsmithed the company.
The Venture Beat story about Québec-based Coveo included another fascinating factoid, which I assume is accurate. How am I to judge coping with the brisk wind howling down the fjord? To wit:
The company will use the funding to build out its sales and marketing team, as it anticipates “hyper-growth” in 2013, according to a press release. It anticipates that it will be “operating at or close to profitability” in 2013.
I highlighted the words which stuck in my mind: hyper-growth and close to profitability. Both are good notions, particularly when there are millions of investors’ dollars waiting for the Canada goose to yield its down. (My arctic grade overcoat is stuffed with goose feathers from Canada, by the way.)
The pointy icicle which lodged in my mind was contained in this statement in the story by Christina Farr:
All major existing investors also participated in the round, including BDC Venture Capital, Propulsion Ventures SEC and Fonds de solidarité FTQ. This round of funding brings the company’s total investments to $34.7 million.
Coveo was founded in 2004, according to Wikipedia. Note that source. Assume the data are correct. Coveo has been generating revenue but has required the alleged $34 million to get near profitability. In eight years, the company has required $4 million in year plus whatever it generated from the license to its software.
The Wikipedia write up is not clued into the actual Big Data functions of Coveo. In fact, that source, which may be out of sync with reality, points out that Coveo is in this business:
CRM and Contact Centers for sales & service, WCMs for one-to-one marketing, and Enterprise Content for engineering and operations. Coveo Role-based Insight Consoles™ provide advanced navigation into consolidated, correlated information mashups, within any application, including Coveo for Salesforce.
Coveo’s own Web site is the definitive source. Here’s what the company says is its core competency at www.coveo.com:
The three main lines of business, which I may be misreading as the snow collects on the screen of my outstanding BlackBerry mobile device, are:
Enterprise Search: Is a Golden Age Coming?
December 2, 2012
Let’s recap the enterprise search market. I am 68, so I remember the glory days of SDC Orbit, RECON, and SMART. If you are with me chronologically, think mainframes, batch processing, and the lousy bandwidth which was available within the computer room.
By 1982 even traditional publishers were trying to figure out what to do with digital information. Remember the original New York Times’ search system? Remember the original Dow Jones online system and its desktop search interface? Dow Jones used BRS Search, now part of the OpenText quiver of separate information retrieval arrows. IBM pushed STAIRS.
By the mid 1990s, there were university computing graduates who were on the search bandwagon, even though it was built on a Citroen Deux Chevaux. Think Personal Library Software and Backrub, the precursor of our beloved Google search appliance.
Most of today’s enterprise search systems are as modern as this Citroen Deux Chevaux. A happy quack to http://en.wikipedia.org/wiki/File:Deux-chevaux-rose-pink-2CV-citroen.JPG for this image.
In the late 1990s, the enterprise search market was pulling together threads of different ideas which sort of worked. The first wave of “brand” name vendors date from the 1996 to 2000 period and include Autonomy, Convera, Endeca, Fast Search & Transfer, and Verity. Most of these companies survive either as units of larger firms or as genetic strands woven into various search consulting firms like Comperio and Search Technologies. Google, I want to point out, is using technology which dates from the mid 1990s. So much for the difference between PR and enhanced CLEVER-ness.
When we hit the mid 2000s, the landscape has become barren. There are plenty of innovations and there are entrepreneurs who have embraced the magic of search sub-disciplines like latent semantic indexing, natural language processing, goosed Bayesian, and mish-mashes of every possible indexing and retrieval method. The notable shift in search since 2005 has been the emergence of Lucene, Solr, and Xapian, among other open source information retrieval options.
Have we reached the end of the line?
Nope. The Golden Age is coming.
In 2013, Beyond Search will add coverage of next generation vendors poised to rework search. On Tuesday, December 5, you will be able to read an interview with the chief technology officer of a little known search and content processing vendor named Cybertap. You can dip into the archives of my Search Wizards Speak’s series and get more insight about where search is headed by reading the interviews with such experts as:
- Christopher Ahlberg, www.recordedfuture.com
- Paul Doscher, www.lucidworks.com
- Mike Sorah, http://www.imtholdings.com/
- Luca Scagliarini, http://www.expersystem.net
- Chris Westphal, http://www.visualanalytics.com
HP Autonomy: Thoughts about Big Deals for Search Vendors
November 28, 2012
I just finished my Information Today column for next month (January 2013). I thought briefly about about focusing on the Hewlett Packard Autonomy matter which is a tad too much in the news at the moment.
Caveat emptor. Hasn’t anyone heard this reminder? The deal is over. Type A MBAs, whiz kid lawyers, and blue chip consultants crawled all over this deal. The HP board approved the deal which was roughly 10X more than Microsoft paid for the exciting Fast Search & Transfer technology thrill ride.
I choose not to tackle HP and Autonomy directly. What I decided to do was work through some of the business cases I have encountered over the year which make murky financial water the status quo. The players in these examples which I characterize at a high level and as a non accountant are like the predators in the Amazon River. I wanted to point out that some of the deals related to search, content processing, and analytics can be models of complexity theory for math experts at the Santa Fe Institute to ponder. Normal lawyers and accountants and the run of the mill MBA are out of their depth in my experience when thinking about a search plus services tie up.
As I was finishing the article, my alert service beeped. The occasion was the arrival of articles about letters from Autonomy placed in “open source” and an equally public response from Hewlett Packard. You can find more information in the “Former Autonomy CEO Challenges HP” article in MarketWatch or you can wade through the lists of stories posted on Techmeme.
I don’t have a dog in this fight. I have several observations I want to capture before the slip away from me as I get ready to head to South America.
The 1957 Studebaker Golden Hawk I almost bought in 1963 came with a sidewalk guarantee. Search and content processing systems are warranted in a similar manner by their sellers. The Wikipedia explanation of caveat emptor makes the meaning of this Latin catchphrase clear: Under the principle of caveat emptor, the buyer could not recover from the seller for defects on the property that rendered the property unfit for ordinary purposes. The only exception was if the seller actively concealed latent defects or otherwise made material misrepresentations amounting to fraud. See Wikipedia
First, the gap between some investors’ expectations for revenue from search and content processing greatly exceed reality. I have been around the information retrieval business for a week or two. In that time, I have encountered people who believe that their findability or indexing system will generate Google sized dollars. I tell these folks that Google generates Google sized dollars from ads, not its search technology. Only a handful of companies have been able to generate more than $100 million from search. These companies are the anomalies, not the rule. My hunch is that like the “smart money” that blew $50 million on one promising system, dreams can be expensive. As you may know, the folks who support the high expectations catch “spreadsheet fever”. The result is that when the money is finally sorted out, search is an expensive proposition. There’s a reason why IBM embraces open source search. May I suggest you read my IDC reports on this open source search subject.
Second, the crazy valuations are like the promises of teenagers in love. The parents, if they know, view such tie ups with skepticism. Just try and tell that to the two teens who have the force which through the green fuse drives the flower. In the grip of this “force”, history and hard facts play a modest role to play. What takes over is mutually reinforcing inputs from the youthful lovers on a hormone high. Deal lust works in the teen way. Is this why so many gray heads get into doing bigger and bigger deals under more and more false time constraints. Pant. Pant. Pant. I can hear the breathing now. Those contracts have to be signed, the commissions most definitely earned, and the money transferred pronto. Is it any surprise why so many acquisitions go off the rails? The parties to big deals include the buyer, the seller, the lawyers, the accountants, the partners, and the consultants. If that line up of professionals does not make clear how Voltaire’s bastards operate, read John Ralston Saul’s book on the subject.
Complexificaton: Is ElasticSearch Making a Case for a Google Search Solution?
November 24, 2012
I don’t have any dealings with Google, the GOOG, or Googzilla (a word I coined in the years before the installation of the predator skeleton on the wizard zone campus). In the briefings I once endured about the GSA (Google speak for the Google Search Appliance), I recall three business principles imparted to me; to wit:
- Search is far too complicated. The Google business proposition was and is that the GSA and other Googley things are easy to install, maintain, use, and love.
- Information technology people in organizations can often be like a stuck brake on a sports car. The institutionalized approach to enterprise software drags down the performance of the organization information technology is supposed to serve.
- The enterprise search vendors are behind the curve.
Now the assertions from the 2004 salad days of Google are only partially correct today. As everyone with a colleague under 25 years of age knows, Google is the go to solution for information. A number of large companies have embraced Google’s all-knowing, paternalistic approach to digital information. However, others—many others, in fact—have not.
One company which is replete with $10 million in venture money is ElasticSearch. Based on the open source technology which certain university computer science departments hold in reverence, ElasticSearch is marketing its heart out. I learned that Searchblox, the brother owned and operated cloud search service, has embraced ElasticSearch. Today I received a link to “Working with ElasticSearch in Scala.”
Scala, in case you are not hip to the brave new world, is a “general purpose programming language designed to express common programming patterns in a concise, elegant, and type-safe way. It smoothly integrates features of object-oriented and functional languages, enabling Java and other programmers to be more productive. Code sizes are typically reduced by a factor of two to three when compared to an equivalent Java application.”
Source: The Strategic Complexity Framework for Dummies by Vinay Gupta. See http://goo.gl/k042J Who wants to be “borked”? Not I when implementing an overly complex search solution. Your mileage may vary, of course.
Score one for Google. The article makes clear that Scala and ElasticSearch may require some technical skills which are not likely to be found in the local trucking company’s IT department. Truth be told, the expertise to work through the information in the write up can be found at Google type companies, a good sized state university, and in the noodle shops of Wuhan-like places.
Here’s a snippet from the write up:
Elasticsearch is schemaless. We can index any json to it. We have a bulk json file, each line is a json. For our implementation: Application reads file line by line and index json to the elasticsearch.
Moving on, we learn:
The Coming Financial Crunch on Search Vendors
November 17, 2012
I sat through three, maybe four, “We’re doing great!” teleconferences this week. In one of those teleconference Go To Beating things, I was told, “We think that there is great opportunity in enterprise search.”
I agreed but one cannot call “search” search. Search, in my opinion, have become a new four letter word. But this person insisted that he and his team had the solution to the revenue ceiling problem. Now the concept of a “revenue ceiling” may be unfamiliar to those running companies in a crazed effort to get enough cash to pay last month’s bills. To me, “revenue ceiling” is what keeps most enterprise search vendors below the $20 million in revenue benchmark. In fact, since I have been tracking the enterprise search sector, the companies which have blown past $20 million are no longer in play. These outfits are now part of larger firms, managed by people who are or, should I say, were confident that making oodles of money from enterprise search technology was a “no brainer.”
A happy quack to http://goo.gl/xQMP0 for this inspirational image.
So what’s happening to my through the ceiling outfits? HP owns Autonomy and based on the grim financial results HP continues to report, Autonomy is not lofting HP to new revenue heights. The Endeca crowd managed to get revenues north of $150 million before the sale to Oracle. I have not heard that the Endeca team is pushing Mark Hurd aside due to their financial performance. I do know that Endeca is now just one more arrow in the Oracle quiver of tools and solutions. And Fast Search & Transfer? Microsoft does not break out revenues from Fast which once reported revenues of $170 million. The number was revised downward and I picked up a rumor that some in the Sinofsky free environment were looking at Fast Search as a technological equivalent of a 68 year old soccer player. It’s great the fellow remembers to go to the game, but in a crunch, let’s let gramps watch the 20 somethings win the game.
So, search has been a tough sector to make payoff big. Autonomy, much to the chagrin of the “real” consultants sold for $10 billion. But the real important point is that no other firm to my knowledge has been able to make almost a billion from “search.” Keep in mind that giants like IBM and Google can make numbers dance the tango. But for most search and content processing companies, revenue life and cost control have been similar to earning enough in a war zone to buy a new Rolls Royce. It can be done, but a close look at how may not be a wise idea.
Smash cut to these interesting developments:
- Yahoo is putting more heat on employees and may fire thousands of Yahooligans. See Yahoo CEO Mayer Cuts End-of-Year “Week of Rest” for Employees, While Prepping Plans to Cull Bottom 20 Percent of Staff
- The brains behind Netflix’ brilliant pricing moves alleges that Amazon is losing $1 billion a year on streaming video. (My reaction was, “That number seems low.” The reference is at Netflix CEO: Amazon Losing Up to $1 Billion a Year on Streaming Video
- Apple’s stock continues to decline. some folks think there are worms in the cook’s Thanksgiving pie. See Apple’s Stock Price Falls to Lowest Point in Six months
ElasticSearch: Was Google Right about Simplicity?
November 13, 2012
When the Google Search Appliance became available nine or 10 years ago, I was the victim of a Google briefing. The eager Googler showed me the functions of the original Google Search Appliance. I was not impressed. As I wrote in the Google Legacy, the GSA was a “good start” and showed promise.
But one thing jumped out at me. Google’s product planners had identified the key weakness or maybe “flaw” in most of the enterpriser search solutions available a decade ago—Complexity. No single Googler could install Autonomy, Endeca, Fast Search & Transfer, or Convera without help from the company. Once the system was up and running, not even a Googler could tune the system, perform reliable hit boosting, or troubleshoot indexers which could not update. Not surprisingly, most of the flagship enterprise search systems ran up big bills for the licensees. One vendor went down in flames because there were not enough engineers to keep the paying customers happy. So ended an era of complexity with the Google Search Appliance.
I may have been wrong.
I just read “Indexing BigData with ElasticSearch.” If you are not familiar with ElasticSearch (formerly Compass), think about the Compass search engine and the dozens of companies surfing on Lucene/Solr to get in the search game. Even IBM uses Lucene/Solr to slash development costs and free up expensive engineers for more value added work like the wrappers that allow Watson to win a TV game show. I have completed for IDC an analysis of 13 open source search vendors and some of these profiles are available for only $3,500 each. See http://www.idc.com/getdoc.jsp?containerId=236511 for an example.
Is your search system as easy to learn to ride as a Big Wheel toy? If not, there may be some scrapes and risks ahead. In today’s business climate, who wants to incur additional risks or costs in a pursuit of a short cut only a developer can appreciate. Not me or the CFOs I know. A happy quack to http://www.bigwheeltricycle.net/ for this image.
The write up explains how to perform Big Data indexing with ElasticSearch. I urge you to read the write up. Consider this key passage:
The solution finally appeared in the name of ElasticSearch, an open-source Java based full text indexing system, based on the also open-source Apache Lucene engine, that allows you to query and explore your data set as you collect it. It was the ideal solution for us, as doing BigData analysis requires a distributed architecture.
Sounds good. With a fresh $10 million ElasticSearch seems poised to revolutionize the world of enterprise search, big data, and probably business intelligence, search based applications, and unified information access. Why not? Most open source vendors exercise considerable license in an effort to differentiate themselves from next generation solutions such as CyberTap, Digital Reasoning, and others pushing the envelope of findability technology.
Open Source Search: The Me Too Method Is Thriving
November 5, 2012
In the first three editions of The Enterprise Search Report (2003 to 2007), my team and I wrote, we made it clear that the commercial enterprise search vendors were essentially a bunch of me-too services.
The diagrams for the various systems were almost indistinguishable. Some vendors used fancy names for their systems and others stuck with the same nomenclature used in the SMART system. I pointed out that every enterprise search system has to perform certain basic functions: Content acquisition, indexing, query processing, and administration. But once those building blocks were in place, most of the two dozen vendors I profiled added wrappers which created a “marketing differentiator.” Examples ranged from Autonomy’s emphasis on the neuro linguistic processing to Endeca’s metadata for facets to Vivisimo’s building a single results list from federated content.
The rota fortunae of the medieval software licensee. A happy quack to http://www.artlex.com/ArtLex/Ch.html
The reality was that it was very difficult for the engineers and marketers of these commercial vendors to differentiate clearly their system from dozens of look-alikes. With the consolidation of the commercial enterprise search sector in the last 36 months, the proprietary vendors have not changed the plumbing. What is new and interesting is that many of them are now “analytics,” “text mining,” or “business intelligence” vendors.
The High Cost of Re-Engineering
The key to this type of pivot is what I call “wrappers” or “add ins.” The idea is that an enterprise search system is similar to the old Ford and GM assembly lines of the 1970s. The cost for changing those systems was too high. The manufacturers operated them “as is”, hoping that chrome and options would give the automobiles a distinctive quality. Under the paint and slightly modified body panels, the cars were essentially the same old vehicle.
Commercial enterprise search solutions are similar today, and none has been overhauled or re-engineered in a significant way. That is okay. When a company licenses an enterprise search solution from Microsoft or Oracle, the customer is getting the brand and the security which comes from an established enterprise search vendor.
Let’s face it. The RECON or SDC Orbit system is usable without too much hassle by a high school student today. The precision and recall are in the 80 top 85 percent range. The US government has sponsored a text retrieval program for many years. The results of the tests are not widely circulated. However, I have heard that the precision and recall scores mostly stick in the 80 to 85 percent range. Once in a while a system will perform better, but search technology has, in my opinion, hit a glass ceiling. The commercial enterprise search sector is like the airline industry. The old business model is not working. The basic workhorse of the airline industry delivers the same performance as a jet from the 1970s. The big difference is that the costs keep on going up and passenger satisfaction is going down.
Open Source: Moving to Center Stage
But I am not interested in commercial enterprise search systems. The big news is the emergence of open source search options. Until recently, open source search was not mainstream. Today, open source search solutions are mainstream. IBM relies on Lucene/Solr for some of its search functions. IBM also owns Web Fountain, STAIRS, iPhrase, Vivisimo, and the SPSS Clementine technology, among others. IBM is interesting because it has used open source search technology to reduce costs and tap into a source of developer talent. Attivio, a company which just raised $42 million in additional venture funding, relies on open source search. You can bet your bippy that the investors want Attivio to turn a profit. I am not sure the financial types dive into the intricacies of open source search technology. Their focus is on the payoff from the money pumped into Attivio. Many other commercial content processing companies rely on open source search as well.
The interesting development is the emergence of pure play search vendors built entirely on the Lucene/Solr code. Anyone can download these “joined at the hip” software from the Apache Foundation. We have completed an analysis of a dozen of the most interesting open source search vendors for a big time consulting firm. What struck the ArnoldIT research team was:
- The open source search vendors are following the same path as the commercial enterprise search vendors. The systems are pretty much indistinguishable.
- The marketing “battle” is being fought over technical nuances which are of great interest to developers and, in my opinion, almost irrelevant to the financial person who has to pay the bills.
- The significant differentiators among the dozen companies we analyzed boils down to the companies’ financial stability, full time staff, and value-adding proprietary enhancements, customer support, training, and engineering services.
What this means is that the actual functionality of these open source search systems is similar to the enterprise proprietary solutions. In the open source sector, some vendors specialize by providing search for a Big Data environment or for remediating the poor search system in MySQL and its variants. Other companies sell a platform and leave the Lucene/Solr component as a utility service. Others just take the Lucene/Solr and go forward.
The Business View
In a conversation with Paul Doscher, president of LucidWorks, I learned that his organization is working through the Project Management Committee (PMC) Group of the Lucene/Solr project within the Apache Software Foundation to build the next-generation search technology. The effort is to help transform people’s ability to turn data into decision making information.
This next generation search technology is foundational in developing a big data technology stack to enable enterprisers to reap the rewards of the latest wave of innovation.
The key point is that figuring out which open source search system does what is now as confusing and time consuming as figuring out the difference between the proprietary enterprise search systems was 10 years ago.
Will there be a fix for me-too’s in enterprise search. I think that some technology will be similar and probably indistinguishable to non-experts? What is now raising the stakes is that search systems are viewed as utilities. Customers want answers, visualizations, and software which predicts what will happen. In my opinion, this is search with fuzzy dice, 20 inch chrome wheels, and a 200 watt sound system.
The key points of differentiation for me will remain the company’s financial stability, its staff quality, its customer service, its training programs, and its ability to provide engineering services to licensees who require additional services. In short, the differentiators may boil down to making systems pay off for licensees, not marketing assertions.
In the rush to cash in on organizations’ need to cut costs, open source search is now the “new” proprietary search solution. Buyer beware? More than ever. The Wheel of Fortune in search is spinning again. Who will be a winner? Who will be a loser? Place your bets. I am betting on open source search vendors with the service and engineering expertise to deliver.
Stephen E Arnold, November 5, 2012