Enterprise Search: Fee Versus Free

November 25, 2014

I read a pretty darned amazing article “Is Free Enterprise Search a Game Changer?” My initial reaction was, “Didn’t the game change with the failures of flagship enterprise search systems?” And “Didn’t the cost and complexity of many enterprise search deployments fuel the emergence of the free and open source information retrieval systems?”

Many proprietary vendors are struggling to generate sustainable revenues and pay back increasingly impatient stakeholders. The reality is that the proprietary enterprise search “survivors” fear meeting the fate of  Convera, Delphes, Entopia, Perfect Search, Siderean Software, TREX, and other proprietary vendors. These outfits went away.

image

Many vendors of proprietary enterprise search systems have left behind an environment in which revenues are simply not sustainable. Customers learned some painful lessons after licensing brand name enterprise search systems and discovering the reality of their costs and functionality. A happy quack to http://bit.ly/1AMHBL6 for this image of desolation.

Other vendors, faced with mounting costs and zero growth in revenues, sold their enterprise search companies. The spate of sell outs that began in the mid 2000s were stark evidence that delivering information retrieval systems to commercial and governmental organizations was difficult to make work.

Consider these milestones:

Autonomy sold to Hewlett Packard. HP promptly wrote off billions of dollars and launched a fascinating lawsuit that blamed Autonomy for the deal. HP quickly discovered that Autonomy, like other complex content processing companies, was difficult to sell, difficult to support, and difficult to turn into a billion dollar baby.

Convera, the product of Excalibur’s scanning legacy and ConQuest Software, captured some big deals in the US government and with outfits like the NBA. When the system did not perform like a circus dog, the company wound down. One upside for Convera alums was that they were able to set up a consulting firm to keep other companies from making the Convera-type mistakes. The losses were measured in the tens of millions.

Read more

LinkedIn Enterprise Search: Generalizations Abound

November 11, 2014

Three or four days ago I received a LinkedIn message that a new thread had been started on the Enterprise Search Engine Professionals group. You will need to be a member of LinkedIn and do some good old fashioned brute force search to locate the thread with this headline, “Enterprise Search with Chinese, Spanish, and English Content.”

The question concerned a LinkedIn user information vacuum job. A member of the search group wanted recommendations for a search system that would deliver “great results with content outside of English.” Most of the intelligence agencies have had this question in play for many years.

The job hunters, consultants, and search experts who populate the forum do not step forth with intelligence agency type responses. In a decision making environment when inputs in a range of language are the norm for risk averse, the suggestions offered to the LinkedIn member struck me as wide of the mark. I wouldn’t characterize the answers as incorrect. Uninformed or misinformed are candidate adjectives, however.

One suggestion offered to the questioner was a request to define “great.” Like love and trust, great is fuzzy and subjective. The definition of “great”, according the expert asking the question, boils down to “precision, mainly that the first few results strike the user as correct.” Okay, the user must perceive results as “correct.” But as ambiguous as this answer remains, the operative term is precision.

In search, precision is not fuzzy. Precision has a definition that many students of information retrieval commit to memory and then include in various tests, papers, and public presentations. For a workable definition, see Wikipedia’s take on the concept or L. Egghe’s “The Measures Precision, Recall, Fallout, and Miss As a function of the Number of Retrieved Documents and Their Mutual Interrelations, Universiiteit Antwerp, 2000.

In simple terms, the system matches the user’s query. The results are those that the system determines containing identical or statistically close results to the user’s query. Old school brute force engines relied on string matching. Think RECON. More modern search systems toss in term matching after truncation, nearness of the terms used in the user query to the occurrence of terms in the documents, and dozens of other methods to determine likely relevant matches between the user’s query and the document set’s index.

With a known corpus like ABI/INFORM in the early 1980s, a trained searcher testing search systems can craft queries for that known result set. Then as the test queries are fed to the search system, the results can be inspected and analyzed. Running test queries was an important part of our analysis of a candidate search system; for example, the long-gone DIALCOM system or a new incarnation of the European Space Agency’s system. Rigorous testing and analysis makes it easy to spot dropped updates or screw ups that routinely find their way into bulk file loads.

Our rule of thumb was that if an ABI/INFORM index contained a term, a high precision result set on SDC ORBIT would include a hit with that term in the respective hit. If the result set did not contain a match, it was pretty easy to pinpoint where the indexing process started dropping files.

However, when one does not know what’s been indexed, precision drifts into murkier areas. After all, how can one know if a result is on point if one does not know what’s been indexed? One can assume that a result set is relevant via inspection and analysis, but who has time for that today. That’s the danger in the definition of precision in what the user perceives. The user may not know what he or she is looking for. The user may not know the subject area or the entities associated consistently with the subject area. Should anyone be surprised when the user of a system has no clue what a system output “means”, whether the results are accurate, or whether the content is germane to the user’s understanding of the information needed.

Against this somewhat drab backdrop, the suggestions offered to the LinkedIn person looking for a search engine that delivers precision over non-English content or more accurately content that is not the primary language of the person doing a search are revelatory.

Here are some responses I noted:

  • Hire an integrator (Artirix, in this case) and let that person use the open source Lucene based Elasticsearch system to deliver search and retrieval. Sounds simplistic. Yep, it is a simple answer that ignores source language translation, connectors, index updates, and methods for handling the pesky issues related to how language is used. Figuring out what a source document in an language with which the user is not fluent is fraught with challenges. Forget dictionaries. Think about the content processing pipeline. Search is almost the caboose at the end of a very long train.
  • Use technology from LinguaSys. This is a semantic system that is probably not well known outside of a narrow circle of customers. This is a system with some visibility within the defense sector. Keep in mind that it performs some of the content processing functions. The technology has to be integrated into a suitable information retrieval system. LinguaSys is the equivalent of adding a component to a more comprehensive system. Another person mentioned BASIS Technologies, another company providing multi language components.
  • Rely on LucidWorks. This is an open source search system based on SOLR. The company has spun the management revolving door a number of times.
  • License Dassault’s Exalead system. The idea is wroth considering, but how many organizations are familiar with Exalead or willing to embrace the cultural approach of France’s premier engineering firm. After years of effort, Exalead is not widely known in some pretty savvy markets. But the Exalead technology is not 100 percent Exalead. Third party software delivers the goods, so Exalead is an integrator in my view.
  • Embrace the Fast Search & Transfer technology, now incorporated into Microsoft SharePoint. Unmentioned is the fact that Fast Search relied on a herd of human linguists in Germany and elsewhere to keep its 1990s multi lingual system alive and well. Fast Search, like many other allegedly multi lingual systems, rely on rules and these have to be written, tweaked, and maintained.

So what did the LinkedIn member learn? The advice offers one popular approach: Hire an integrator and let that company deliver a “solution.” One can always fire an integrator, sue the integrator, or go to work for the integrator when the CFO tries to cap the cost of system that must please a user who may not know the meaning of nus in Japanese from a now almost forgotten unit of Halliburton.

The other approach is to go open source. Okay. Do it. But as my analysis of the Danish Library’s open source search initiative in Online suggested, the work is essentially never done. Only a tolerant government and lax budget oversight makes this avenue feasible for many organizations with a search “problem.”

The most startling recommendation was to use Fast Search technology. My goodness. Are there not other multi lingual capable search systems dating from the 1990s available? Autonomy, anyone?

Net net: The LinkedIn enterprise search threads often underscore one simple fact:

Enterprise search is assumed to be one system, an app if you will.

One reason for the frequent disappointment with enterprise search is this desire to buy an iPad app, not engineer a constellation of systems that solve quite specific problems.

Stephen E Arnold,November 11, 2014

Launching and Scaling Elasticsearch

August 21, 2014

Elasticsearch is widely hailed as an alternative to SharePoint or many of the other open source alternatives, but it is not without its problems. Ben Hundley from StackSearch offers his input on the software in his QBox article, “Thoughts on Launching and Scaling Elasticsearch.”

Hundley begins:

“Qbox is a dedicated hosting service for Elasticsearch.  The project began internally to find a more economical solution to Amazon’s Cloudsearch, but it evolved as we became enamored by the flexibility and power of Elasticsearch.  Nearly a year later, we’ve adopted the product as our main priority.  Admittedly, our initial attempt took the wrong approach to scale.  Our assumption was that scaling clusters for all customers could be handled in a generalized manner, and behind the scenes.”

Hundley walks through reader through several considerations that affect their own implementation: knowing your application’s needs, deciding on hardware, monitoring, tuning, and knowing when to scale. These are all decisions that must be made on the front-end, allowing for more effective customization. The upside of an open source solution like Elasticsearch is greater customization, control, and less rigidity. Of course for a small organization, that could also be the downside as time and staffing are more limited and an out-of-the-box solution like SharePoint is more likely to be chosen.

Emily Rae Aldridge, August 21, 2014

Sponsored by ArnoldIT.com, developer of Augmentext

At the Top of the BI Stack

June 28, 2013

Business intelligence tools are becoming a big priority for even small businesses. TopCultured supplies some guidance for those considering their options in, “The 4 Biggest Business Intelligence Companies.” We were a little surprised that writer Drew Hendricks included Microsoft on this list.

The write-up begins:

“Finding the meaning behind mountains of raw data can be a difficult task, especially for companies that have not been monitoring their processes on a regular basis. Keeping an eye on business intelligence can tell stories of new opportunities, potential verticals for growth, and identify dangerous problems, allowing companies to enact a solution.

“As business intelligence becomes more accessible to smaller companies and startups, with app developers driving mobile solutions, the need for BI-trained workers and software solutions goes up. Take a look at the four top business intelligence companies out there now.”

With that, the list begins. Roambi is lauded for being easy to use and interpret. YellowFin boasts a bird’s-eye-view of a company’s strengths and weaknesses. In at number three, Domo is flexible enough to be used throughout an organization. Microsoft‘s SharePoint—well, I suppose being “considered the industry standard” does give the veteran platform some standing.

See the article for more on each of these companies. Organizations would do well to carefully consider their needs and investigate all options before choosing a BI platform.

Cynthia Murrell, June 28, 2013

Sponsored by ArnoldIT.com, developer of Augmentext

HP, Autonomy, and a Context Free Expert Output about Search: The Bet on a Horse Approach to Market Analysis

May 4, 2013

I don’t think too much about:

  1. Azure chip consultants. You know, these are the firms which make a living from rah rahs, buzzwording, and pontification to sell reports. (I know. I labored at a non-azure chip outfit for what seems like decades. Experience is a good instructor. Oh, if you are a consultant, please, complain about my opinion using the comments section of this free blog.)
  2. Hewlett Packard. I recall that the company used to make lab equipment which was cool. Now I think the firm is in some other businesses but as quickly as I latch on to one like the Treo and mobile, HP exits the business. The venerable firm confuses my 69 year old mind.
  3. Autonomy. I think I did some work for the outfit but I cannot recall. Age and the lifestyle in rural Kentucky takes a toll on the memory I admit.

Nevertheless, I read “HP’s Autonomy Could Face Uphill Battle In Data Market.” There were some gems in the write up which I found amusing and illustrative of the problems which azure chip consulting firms and their experts have when tackling certain business issues.

The main idea of the write up for “investors” is that HP faces “challenges.” Okay. That’s a blinding insight. As you may recall, HP bought Autonomy for $11 billion and then a few months later roiled the “investors” by writing off billions on the deal. That was the mobile phone model, wasn’t it?

The write up then pointed out:

HP wanted Autonomy to jump-start its move into software and cloud-based computing. Autonomy is the No. 1 provider of search and retrieval software that companies use to find and share files and other information on their websites and document management systems.

Okay. But that too seems obvious.

Now here comes the kicker. The expert outfit providing inputs to the reporter doing the bull dog grip on this worn out bone is quoted as saying:

“Software license revenue (in this market) isn’t growing at the same rate as before, and we are beginning to see the rise of some new technologies, specifically content analytics and unified information access,” Schubmehl said. These new types of software can be used with types of business analytics software, business intelligence software and other software to help enterprises do a better job of locating specific information, he says, which is the job of search retrieval software.

I don’t know much about IDC but what strikes me from this passage is that there are some assertions in this snippet which may warrant a tiny bit of evaluation.

image

Will context free analyses deliver a winner? Will there be a Gamblers Anonymous for those who bet on what journalists and mid tier (second string) consultancies promulgate? For more about Gamblers Anonymous navigate to http://www.gamblersanonymous.org/ga/

Here goes:

Read more

Now You Are Talking: Can a Company Make Money with Enterprise Search?

January 22, 2013

I have better things to do that capture my immediate thoughts about “Inside H-P’s Missed Chance to Avoid a Disastrous Deal.” You can find the article in a dead tree version of the Wall Street Journal on page 1 with a jump to Page 16, where the “would not comment” phrase appears with alarming frequency.

The most interesting point in the write up is the quote, allegedly crafted by a Hewlett Packard Big Dog:

Now you’re talking.

Like much of the chatter about search, content processing, and Big Data analytics, on the surface these information retrieval software companies are like Kentucky Derby hopefuls on a crisp spring morning. The big pay day is two minutes away. How can the sleek, groomed, documented thoroughbreds lose?

The reality, documented in the Wall Street Journal, is that some companies with sure fire winning strategies can win. Now you’re talking.

How did HP get itself into the headline making situation? How can smart folks spend so much money, reverse course, and appear to be so scattered? Beats me.

I have, however, seen this before. As I read the Wall Street Journal’s story, I wrote down some thoughts in the margin of the dead tree instance of the story at the breakfast table.

image

A happy quack to Lubrisyn.com

Herewith are my notes to myself:

First, name one search vendor in the period from 1970 to the present which has generated more than $1 billion in revenue from search. Acquisitions like IBM’s purchase of iPhrase (er, what happened to that outfit), Vivisimo (now a Big Data company!), or SPSS’s Clementine (ah, you don’t know Clementine. Shame on you.) Don’t toss Google and its search appliance into the mix. Google only hints at the great success of the product. When was the last time you searched using a Google Search Appliance?

Second, didn’t Microsoft purchase Fast Search & Transfer for $1.2 billion in January 2008. How is that working out? The legions of search add in vendors for SharePoint are busy, but the core system has become a little bit like dear old Clementine. Fast Search was the subject of a couple of probes, but the big question which has not yet been answered as far as I know is, “How much revenue did Fast Search generate versus how much revenue Fast Search reported?” I heard that the revenues were, to some degree, inflated. I thought search was a sure fire way to make money.

Third, after more than a decade of top down marketing, why did Endeca need cash infusions from Intel and SAP venture units? How much did Oracle pay for Endeca? Some azure chip consultants have described Endeca as the leading vendor of enterprise search. Endeca added ecommerce and business intelligence to its line up of products. What was the firm’s revenue at the time of its sale to Oracle? I estimated about $150 million.

Fourth, Dassault, the company with the “system”, bought Exalead. What has happened to this promising technology? Is Exalead now a $200 million a year revenue producer for the prestigious French engineering firm? Perhaps the “system” has been so successful that Exalead is now infused into Dassault clients throughout the world? On the other hand, wouldn’t a solution with this type of impact make headlines every week even in the US. Is it more difficult to to cultivate information retrieval revenues than other types of software revenue? The good news is that Dassault paid a reasonable price for Exalead, avoiding the Autonomy, Endeca, and Fast Search purchase prices.

These examples reminded me that even if my estimates are wide of the mark by 20 or 30 percent, how could any company generate the astounding growth required to pay the $11 billion acquisition cost, invest in search technology, and market a product which is pretty much available for free as open source software today? Answer: Long shot. Exercise that horse and make sure you have what it takes to pay the jockey, the stable hands, the vet, and the transportation costs. Without that cash cushion, a Derby hopeful will put a person in a financial hole. Similar to search dreams of big acquirers? Yep. Maybe identical?

Two different points occurred to me.

On one hand, search and its bandwagon riders like Big Data analytics must seems to be a combination of the Klondike’s mother load and a must-have function no matter what a professional does for a living. The reality is that of the 65 search and related vendors I have written about in my books and confidential reports, only three managed to break the $100 million in search revenue ceiling. The companies were Autonomy, Endeca, and Fast Search. Of the three, only Endeca emerged relatively unscathed from the process. The other 62 companies either went out of business (Convera, Delphes, Entopia) or stalled at revenues in the millions of dollar. If one totals the investments in these 65 firms to generate their revenues, search is not a break even investment. Companies like Attivio and Coveo have captured tens of millions of venture dollars. Those investors want a return. What are the odds that these companies can generate more revenues than Autonomy? Interesting question.

On the other hand, search and its child disciplines remain the most complex of modern computing problems. Whether it is voice to text to search and then to predictive analytics for voice call intercepts or just figuring out what Buffy and Trent in the sales department need to understand a new competitor, software is just not up to the task. That means that money pumped into promising companies will pay big dividends. Now the logic may make sense to an MBA, but I have spent more than 35 years explaining that progress in search is tough to achieve, expensive to support, and disappointing to most system users. The notion that a big company could buy software that is essentially customized to each customer’s use cases (notice the plural of “cases”) and make big money is a characteristic of many firms and managers. The reality is that even governments lack the money to make search work.

Don’t get me wrong.

There are small firms which because they focus on quite specific problems can deliver value to a licensee. However, big money assumes that search technology will be a universal, easily applied to many situations. Even Google, with its paid search model, is now facing innovation challenges. With lots of smart people, Google is hiring the aging wizards of search in an attempt to find something that works better than the voting methods in use today.

What do my jottings suggest? Search is a tough business. Assumptions about how much money one can make from search in an era of open source options and cost cutting need to be looked at in a different way. The current approach, as the Wall Street Journal write up makes clear, is not working particularly well. Does this search revenue track record suggest that the azure chip consultants, former middle school teachers, and real journalists miss the larger message of search, content processing, and Big Data analytics? My tentative answer is, “Yep.”

Stephen E Arnold, January 22, 2013

Get A Comprehensive Search Strategy Plan from Aspire

October 12, 2012

People tend to doubt the power of a good search application.  They take it for granted that all out-of-the-box and Internet search engines are as accurate as Google (only the most powerful in the public eye).  The truth of the matter is most businesses are losing business productivity, because they have not harnessed the true potential of search.  Search Technologies, a leading IT company that specializes in search engine implementation, managed services, and consulting, is the innovator behind Aspire:

“Aspire is a powerful framework and application platform for acquiring both structured and unstructured data from just about any content source, processing / enriching that content, and then publishing it to the search engine or business analytics tool of your choice.”

Aspire uses a built-in indexing pipeline and propriety code maintained by Search Technologies high standards.  It is based on Apache Felix, the leading open source implementation for OSGI standard.  OSGI is built for Java and supported by IT companies worldwide. Aspire can gather documents from a variety of resources, including relational databases, SharePoint, file systems, and many more. The metadata is captured and then it can be enriched, combined, reformatted, or normalized to whatever the business needs before it is submitted search engines, document repositories, or business analytics applications.  Aspire performs content processing that cleans and repackages data for findability.

“Almost all structured data is originally created in a tightly controlled or automated way.

By contrast, unstructured content is created interactively by individual people, and is infinitely variable in its format, style, quality and structure.  Because of this, content processing techniques that were originally developed to work with structured data simply cannot cope with the unpredictability and variability of unstructured content.”

By implementing a content processing application like Aspire, unstructured content is “scrubbed,” then enriched, for better search results.  Most commercial search engines do not have the same filters that weed out relevant content from the bad.  The results displayed to the user are thus poor quality and are of zero to little use.  They try to resolve the problem with custom coding and updates for every new data source that pops up, which is tedious.  Aspire fixes tired coding problems, by using automated metadata extraction and manipulation outside the search engine.

As powerful as commercial search engines are they can often lack the refined quality one gets from a robust ISV.  Aspire does not follow the same search technology path as its competitors, rather it has designed a new, original solution to provide its clients with a comprehensive search strategy plan to help improve productivity, organization, and data management.

Remember. Search Technologies is sponsoring a meet up at the October 2012 Enterprise Search Summit. More information is available at http://www.meetup.com/DC-Metro-Enterprise-Search-Network/

Iain Fletcher, October 12, 2012

Sponsored by ArnoldIT.com, developer of Augmentext

Deconstructing HP Autonomy and Its Eight Answers

September 26, 2012

All Things Digital ran a story called “Eight Questions for Hewlett Packard Software Head George Kadifa.” Let me nudge aside any thoughts that the interview and the questions were presented as public relations and marketing. I want to view the comments or “answers” as accurate. Once I have highlighted the points which caught my attention, I want to offer some side observations from my goose pond in rural Kentucky.

First, there were two passages which addressed the $12 billion Autonomy purchase.

The first was information about a recent planning meeting. The Autonomy staff were on deck and ready for duty. The key statement for me was this one:

Basically when you look at Autonomy, the core unit is the IDOL Engine, which is the unique capability of meaning-based computing. We’re going to double down on that. In our labs in Cambridge, England, we have 40 or 50 mathematicians writing algorithms. And we’re going to build a team here in the U.S. to productize it and create a platform around it because it has that potential. Frankly, the way Autonomy was managed previously, they put a lot more emphasis into enabling applications, which was fine, but our belief is that there’s a broad agenda, which is creating a platform around meaning-based computing. So we will maintain those apps, but at the same time we’ll open up the capabilities to a broader set of players outside HP.

Makes sense. Pay $12 billion for IDOL. Leverage it.

The second was semi-business school thinking about how to grow Autonomy’s business. Here’s the passage I noted:

In Europe, they tend to make things complex in order to create more value. For example, they saw the IDOL engine as too complex to just give it to people. Instead they thought they should acquire vendors and then create value by enabling applications. Here we take something that’s complex and we ask how we might simplify it in order to give it more scale for a bigger market. So some of that difference was cultural, and some of it was that I think they fell in love with these acquisitions. … We think Autonomy’s technology has broader implications.

I urge you to read the full “eight questions” and the answers. Now my observations:

  1. Productizing IDOL or any search engine can be difficult. When I use the word “difficult,” I mean time consuming, expensive, and timetable free. Buying a search engine and sticking it in a product or service looks easy. It is not. In fact, IBM has elected to use open source search to provide the basics. Now IBM is working hard to make money from its value add system, the game show winner Watson. There may be a product in “there”, but it is often to find a way to make money. HP has to pay back the $12 billion it spent and then grow the Autonomy business which was within shouting distance of $1 billion.
  2. The notion that Europeans see the world differently from HP is interesting. I am not sure how European Autonomy was. My view is that Autonomy’s senior management acquired companies and did upselling. As a result, only Autonomy broke through the glass ceilings behind which Endeca, Exalead, ISYS, and Fast Search & Transfer were trapped. Before applying business school logic to Autonomy, perhaps one should look at how other acquired search vendors have paid off. The list is, based on my research, a short one indeed. Microsoft, for example, has made Fast Search a component of SharePoint. With Fast Search nearing or at its end of life, Microsoft faces more search challenges, not fewer. HP may find itself with facing more challenges than it expects.
  3. The notion of “broader applications” is a popular one. Dassault Systèmes, acquired Exalead, which is arguably better and more recent technology than IDOL. But Dassault’s senior managers continue to look for ways to convert a more modest expenditure for Exalead into a river of revenue. Dassault has a global approach and many excellent managers. Even for such an exceptional firm, search is not what it seemed to be; that is, a broad application which slots into to many customer needs. Reality, based on my research for The New Landscape of Search, is different from the business school map.

HP is making an trip which other companies have taken before. My view is that HP will have to find answers the these questions, which were not part of the interview cited above:

First, how will HP pay off the purchase price, grow Autonomy’s revenue, and generate enough money to have an impact on HP’s net profit? My work has pointed out that cost control is the major problem search vendors face. It takes money to explain a system no matter how productized it becomes. It takes money to support that technology. It takes money to enhance that system. It takes money to hire people who can do the work. In short, search becomes a bright blip on most CFOs’ radar screens. HP may be different, but I am not sure that the cost issue will remain off the radar for very long.

Second, IDOL is a complex collection of software components. The core is Bayesian, but much of the ancillary IDOL are the add ons, enhancements, and features which have been created and applied to base system over the last two decades. Yep, two decades. In search, most of the systems which have figured in big deals in the last two years date from the mid to late 1990s. The more modern systems are not search at all. These new systems leap frog key word search and push into high value opportunities. HP may be forced to buy one of more of these next generation systems just to stay in the “beyond search” game.

Third, HP is a large company and it faces considerable competition in software. What makes HP interesting is that it has not been able to make its services business offset the decline in personal computers and ink. HP now wants to prove that it can make services work, but as the Inquirer pointed out in mid August 2012:

HP’s write-down of EDS might have resulted in just a paper loss – the firm didn’t actually lose $9bn in cash – but it provides an insight into how a decade of mismanagement has left HP in a bad situation. The fact is that HP cannot lay the blame on diminishing PC sales because its enterprise business, printing and services divisions all reported losses, too. For HP to write down the purchase of EDS, a company it paid $13.9bn for just four years ago, strongly suggests that those who were at the helm of HP in the run-up to that acquisition simply had no clue as to how much EDS was really worth and how to incorporate the company into HP. The value of any company can go down over time – just look at AOL, Microsoft or Yahoo – but for an established business such as EDS to be overvalued by almost $10bn just four years after being acquired is nothing short of gross incompetence by HP in both the purchase and the subsequent handling of the firm once it became a part of HP.

I don’t fully agree with the Inquirer’s viewpoint. But one fact remains: HP must demonstrate that it can manage a complex business based on IDOL, a technology which is not a spring chicken. The man who did manage Autonomy to almost $1 billion in sales is not longer with HP. In the history of enterprise search and content processing, Mike Lynch was unique. Perhaps the loss of that talent will continue to impact HP’s plans for a different approach to the market for Autonomy’s technology?

Life extension treatments are available, but these often do not work as expected and can be expensive. Most fail in the end.

Stephen E Arnold, September 25, 2012

Sponsored by Augmentext

Search: A Persistent Disconnect between Reality and Innovation

August 17, 2012

Two years ago I wrote The New Landscape of Search. Originally published by Pandia in Norway, the book is now available without charge when you sign up for  our new “no holds barred” search newsletter Honk!. In the discussion of Microsoft’s acquisition of Fast Search & Transfer SA in 2008, I cite documents which describe the version of Fast Search which the company hoped to release in 2009 or 2010. After the deal closed, the new version of Fast seemed to drop from view. What became available was “old” Fast.

I read the InfoWorld story “Bring Better Search to SharePoint.” Set aside the PR-iness of the write up. The main point is that SharePoint has a lousy search system. Think of the $1.2 billion Microsoft paid for what seems to be, according to the write up, a mongrel dog. My analysis of Fast Search focused on its age. The code dates from the late 1990s and its use of proprietary, third party, and open source components. Complexity and the 32 bit architecture were in need of attention beyond refactoring.

The InfoWorld passage which caught my attention was:

Longitude Search’s AptivRank technology monitors users as they search, then promotes or demotes content’s relevance rankings based on the actions the user takes with that content. In a nutshell, it takes Microsoft’s search-ranking algorithm and makes it more intelligent…

The solution to SharePoint’s woes amounts to tweaking. In my experience, there are many vendors offering similar functionality and almost identical claims regarding fixing up SharePoint. You can chase down more at www.arnoldit.com/overflight.

The efforts are focused on a product with a large market footprint. In today’s dicey economic casino, it makes sense to trumpet solutions to long standing information retrieval challenges in a product like SharePoint. Heck, if I had to pick a market to pump up my revenue, SharePoint is a better bet than some others.

Contrast the InfoWorld’s “overcome SharePoint weaknesses” with the search assertions in “Search Technology That Can Gauge Opinion and Predict the Future.” We are jumping from the reality of a Microsoft product which has an allegedly flawed search system into the exciting world of what everyone really, really wants—serious magic. Fixing SharePoint is pretty much hobby store magic. Predicting the future: That is big time, hide the Statue of Liberty magic.

Here’s the passage which caught my attention:

A team of EU-funded researchers have developed a new kind of internet search that takes into account factors such as opinion, bias, context, time and location. The new technology, which could soon be in use commercially, can display trends in public opinion about a topic, company or person over time — and it can even be used to predict the future…Future Predictor application is able to make searches based on questions such as ‘What will oil prices be in 2050?’ or ‘How much will global temperatures rise over the next 100 years?’ and find relevant information and forecasts from today’s web. For example, a search for the year 2034 turns up ‘space travel’ as the most relevant topic indexed in today’s news.

Yep, rich indexing, facets, and understanding text are in use.

What these two examples make clear, in my opinion, is that:

Search is broken. If an established product delivers inadequate findability, why hasn’t Microsoft just solved the problem? If off the shelf solutions are available from numerous vendors, why hasn’t Microsoft bought the ones which fix up SharePoint and call it a day? The answer is that none of the existing solutions deliver what users want. Sure, search gets a little better, but the SharePoint search problem has been around for a decade and if search were such an easy problem to solve, Microsoft has the money to do the job. Still a problem? Well, that’s a clue that search is a tough nut to crack in my book. Marketers don’t have to make a system meet user needs. Columnists don’t even have to use the systems about which they write. Pity the users.

Writing about whiz bang new systems funded by government agencies is more fun than figuring out how to get these systems to work in the real world. If SharePoint search does not work, what effort and investment will be required to predict the future via a search query? I am not holding my breath, but the pundits can zoom forward.

The search and retrieval sector is in turmoil, and it will stay that way. The big news in search is that free and open source options are available which work as well as Autonomy- and Endeca-like systems. The proprietary and science fiction solutions illustrate on one hand the problems basic search has in meeting user needs and, on the other hand,  the lengths to which researchers are trying to go to convince their funding sources and regular people that search is going to get better real soon now.

Net net: Search is a problem and it is going to stay that way. Quick fixes, big data, and predictive whatevers are not going to perform serious magic quickly, economically, or reliably without significant investment. InfoWorld seems to see chipper descriptions and assertions as evidence of better search. The Science Daily write up mingles sci-fi excitement with a government funded program to point the way to the future.

Sorry. Search is tough and will remain a chunk of elk hide until the next round of magic is spooned by public relations professionals into the coffee mugs of the mavens and real journalists.

Stephen E Arnold, August 17, 2012

Sponsored by Augmentext

 

IBM Big Data Initiative Coming in Focus with Cloudera, Hadoop Partnerships

May 17, 2012

Big data management and analytics is becoming a key basis of competition as organizations look to turn their complex and large data sets into business assets. In “Analyst Commentary: IBM Adds Search and Broadens Hadoop Strategy with Big Data,” Stuart Lauchlan comments on IBM’s Vivisimo acquisition. Lauchlan says that the acquisition puts to rest the ambiguity of IBM’s Hadoop partnership strategy. He also has this to add about handling big data:

By definition, one of the major problems in discovering the information “nuggets” in Big Data environments is that the volume of data is large and consequently difficult to traverse or search using traditional enterprise search and retrieval (ESR) tools that require the creation and maintenance of indexes before a query can be made. Vivisimo’s offering indexes and clusters results in real time, and its scalability enables dynamic navigation across results delivered, as well as the automation of discovery, reducing the burden/time of analysis.

Even though the actual value of the acquisition has not been declared, we do know that IBM has spent $14 billion in the last seven years on analytics-related products and companies. And while IBM has already acquired a service like Vivisimo, it seems that IBM saw value in the search software’s new capabilities, such as federated discovery and navigation.  IBM is no doubt trying to take SharePoint, the major player in enterprise.

Lauchlan’s article is a comprehensive overview of the IBM strategy. It may be a worthy read to keep in the loop on enterprise search news. But while IBM seeks to develop a comprehensive search solution with big acquisitions, organizations can turn to expert third party solutions to also get the power of efficient and federated search now.

The search experts at Fabasoft Mindbreeze offer a cost-effective suite of solutions to tame big data sprawl and connect your users to the right information at the right time. And with Folio connectors, organizations can access on-premise and cloud data with one easy search. Here you can read about the enterprise search solution:

The data often lies distributed across numerous sources. Fabasoft Mindbreeze Enterprise gains each employee two weeks per through focused finding of data (IDC Studies). An invaluable competitive advantage in business as well as providing employee satisfaction…But an all-inclusive search is not everything. Creating relevant knowledge means processing data in a comprehensible form and utilizing relations between information objects. Data is sorted according to type and relevance. The enterprise search for professionals.

Navigate to http://www.mindbreeze.com/ to learn more.

Philip West, May 17, 2012

Sponsored by Pandia.com

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta