CyberOSINT banner

Autonomy: Leading the Push Beyond Enterprise Search

January 30, 2015

In “CyberOSINT: Next Generation Information Access,” I describe Autonomy’s math-first approach to content processing. The reason is that after the veil of secrecy was lifted with regard to the signal processing`methods used for British intelligence tasks, Cambridge University became one of the hot beds for the use of Bayesian, LaPlacian, and Markov methods. These numerical recipes proved to be both important and controversial. Instead of relying on manual methods, humans selected training sets, tuned the thresholds, and then turned the smart software loose. Math is not required to understand what Autonomy packaged for commercial use: Signal processing separated noise in a channel and allowed software to process the important bits. Thank you, Claude Shannon and the good Reverend Bayes.

What did Autonomy receive for this breakthrough? Not much but the company did generate more than $600 million in revenues about 10 years after opening for business. As far as I know, no other content processing vendor has reached this revenue target. Endeca, for the sake of comparison, flat lined at about $130 million in the year that Oracle bought the Guided Navigation outfit for about $1.0 billion.

For one thing the British company BAE (British Aerospace Engineering) licensed the Autonomy system and began to refine its automated collection, analysis, and report systems. So what? The UK became by the late 1990s the de facto leader in automated content activities. Was BAE the only smart outfit in the late 1990s? Nope, there were other outfits who realized the value of the Autonomy approach. Examples range from US government entities to little known outfits like the Wynyard Group.

In the CyberOSINT volume, you can get more detail about why Autonomy was important in the late 1990s, including the name of the university8 professor who encouraged Mike Lynch to make contributions that have had a profound impact on intelligence activities. For color, let me mention an anecdote that is not in the 176 page volume. Please, keep in mind that Autonomy was, like i2 (another Cambridge University spawned outfit) a client prior to my retirement.) IBM owns i2 and i2 is profiled in CyberOSINT in Chapter 5, “CyberOSINT Vendors.” I would point out that more than two thirds of the monograph contains information that is either not widely available or not available via a routine Bing, Google, or Yandex query. For example, Autonomy does not make publicly available a list of its patent documents. These contain specific information about how to think about cyber OSINT and moving beyond keyword search.

Some Color: A Conversation with a Faux Expert

In 2003 I had a conversation with a fellow who was an “expert” in content management, a discipline that is essentially a step child of database technology. I want to mention this person by name, but I will avoid the inevitable letter from his attorney rattling a saber over my head. This person publishes reports, engages in litigation with his partners, kowtows to various faux trade groups, and tries to keep secret his history as a webmaster with some Stone Age skills.

Not surprisingly this canny individual had little good to say about Autonomy. The information I provided about the Lynch technology, its applications, and its importance in next generation search were dismissed with a comment I will not forget, “Autonomy is a pile of crap.”

Okay, that’s an informed opinion for a clueless person pumping baloney about the value of content management as a separate technical field. Yikes.

In terms of enterprise search, Autonomy’s competitors criticized Lynch’s approach. Instead of a keyword search utility that was supposed to “unlock” content, Autonomy delivered a framework. The framework operated in an automated manner and could deliver keyword search, point and click access like the Endeca system, and more sophisticated operations associated with today’s most robust cyber OSINT solutions. Enterprise search remains stuck in the STAIRS III and RECON era. Autonomy was the embodiment of the leap from putting the burden of finding on humans to shifting the load to smart software.

image

A diagram from Autonomy’s patents filed in 2001. What’s interesting is that this patent cites an invention by Dr. Liz Liddy with whom the ArnoldIT team worked in the late 1990s. A number of content experts understood the value of automated methods, but Autonomy was the company able to commercialize and build a business on technology that was not widely known 15 years ago. Some universities did not teach Bayesian and related methods because these were tainted by humans who used judgments to set certain thresholds. See US 6,668,256. There are more than 100 Autonomy patent documents. How many of the experts at IDC, Forrester, Gartner, et al have actually located the documents, downloaded them, and reviewed the systems, methods, and claims? I would suggest a tiny percentage of the “experts.” Patent documents are not what English majors are expected to read.”

That’s important and little appreciated by the mid tier outfits’ experts working for IDC (yo, Dave Schubmehl, are you ramping up to recycle the NGIA angle yet?) Forrester (one of whose search experts told me at a MarkLogic event that new hires for search were told to read the information on my ArnoldIT.com Web site like that was a good thing for me), Gartner Group (the conference and content marketing outfit), Ovum (the UK counterpart to Gartner), and dozens of other outfits who understand search in terms of selling received wisdom, not insight or hands on facts.

Read more

Enterprise Search Problems: Why NGIA Systems Push Beyond Traditional Information Access Methods

January 29, 2015

Enterprise search has been useful. However, the online access methods have changed. Unfortunately, most enterprise search systems and the enterprise applications based on keyword and category access have lagged behind user needs.

The information highway is littered with the wrecks of enterprise search vendors who promised a solution to findability challenges and failed to deliver. Some of the vendors have been forgotten by today’s keyword and category access vendors. Do you know about the business problems that disappointed licensees and cost investors millions of dollars? Are you familiar with Convera, Delphes, Entopia, Fulcrum Technologies, Hakia, Siderean Software, and many other companies.

cover for ads

A handful of enterprise search vendors dodged implosion by selling out. Artificial Linguistics, Autonomy, Brainware, Endeca, Exalead, Fast Search, InQuira, iPhrase, ISYS Search Software, and Triple Hop were sold. Thus, their investors received their money back and in some cases received a premium. The $11 billion paid for Autonomy dwarfed the billion dollar purchase prices of Endeca and Fast Search and Transfer. But most of the companies able to sell their information retrieval systems sold for much less. IBM acquired Vivisimo for about $20 million and promptly justified the deal by describing Vivisimo’s metasearch system as a Big Data solution. Okay.

Today a number of enterprise search vendors walk a knife edge. A loss of a major account or a misstep that spooks investors can push a company over the financial edge in the blink of an eye. Recently I noticed that Dieselpoint has not updated its Web site for a while. Antidot seems to have faded from the US market. Funnelback has turned down the volume. Hakia went offline.

A few firms generate considerable public relations noise. Attivio, BA Insight, Coveo, and IBM Watson appear to be competing to become the leaders in today’s enterprise search sector. But today’s market is very different from the world of 2003-2004 when I wrote the first of three editions of the 400 page Enterprise Search Report. Each of these companies is asserting that their system provides business intelligence,  customer support, and traditional enterprise search. Will any of these companies be able to match Autonomy’s 2008 revenues of $600 million. I doubt it.

The reason is not the availability of open source search. Elasticsearch, in fact, is arguably better than any of the for fee keyword and concept centric information retrieval systems. The problems of the enterprise search sector are deeper.

Read more

Enterprise Search: X1 Argues Search and Discovery Are the Cure to Findability Ills. Maybe Not?

January 26, 2015

I read a white paper from a search vendor called X1 or X1 Discovery. The company was incubated in the same hot house that produced GoTo.com. As a result of that pay to play model, Web search was changed from objectivity to advertising. X1 search, if I understand the white paper, Why Enterprise Search Fails in Most Cases and How to Fix It (registration from this link required to access the paper) and the companion article “X1 CEO Message: A New Approach to Enterprise Search Resonates” is the future of search.

The fix is an interface that looks like this:

image

Source: “Why Enterprise Search Fails in Most Cases and How to Fix It,” page 3.

In the “X1 CEO Message” I noted:

So in view of this customer and industry feedback, we coined the phrase “business productivity search” to differentiate what X1 focuses on verses most other enterprise search tools, which are typically re-fashioned big data analytics or web search appliances. And the feedback we’ve received on this from end-users and industry experts alike is that this assessment hits the nail on the head. Business productivity search is not big data analytics and it is not web retrieval. It is its own use case with a workflow and interface that is tailored to the end users. X1 provides the end-user with a powerful yet user-friendly and iterative means to quickly retrieve their business documents and emails using their own memory recall as opposed to generic algorithms that generate false positives and a workflow ill-suited to business productivity search.

I am not convinced that search and discovery as described is going to address the core issues that plague enterprise information access. Specifically, the last few decades have beaten keywords to death. The users have expressed their views by grousing about whatever keyword system is provided to them, finding alternatives to keyword search, and shifting attention from keywords to more actionable interfaces provided by a group of vendors largely unfamiliar to the keyword crowd.

There is a role for keyword search, but that utility function can be provided via open source solutions ranging from FLAX to Lucene to SphinxSearch and other options.

What is not provided is the automated collection, analysis, and report functions of the next generation information access systems. I have explained the characteristics of the next generation information access systems in CyberOSINT, described at www.xenky.com/cyberosint. In this study, I profile more than 18 next generation systems, provide a schematic of the functions included in these systems, and provide examples of the outputs these NGIA solutions provide to their users.

What’s interesting is that each of these vendors supports keyword search in some way. Just as a modern automobile provides a lever to display a turn signal, NGIA systems include utility functions. But—and this is a big “but”—the NGIA systems address the needs of the user. The idea is that the user, without trying to guess the keywords that unlock what’s in an index, provide actionable outputs. A dashboard is one option. More useful outputs include dynamic PDF maps with data displayed on a mobile device. The maps update ass the information arrives or the user moves around. There are outputs that show the key players in a deal and provide one click access to supporting data. No search is required. Many of the NGIA system operate in a predictive manner. When the user looks at the device, the information is “just there.”

I appreciate the efforts of vendors like X1, Coveo, Attivio, and IBM Watson in their attempts to breath new life into keyword search. Just as the old marketing essay about buggy whips made vivid to tens of thousands of MBA student, when the automobiles appear, the buggy whip outfits may want to make seat covers.

The fix for enterprise search problems is not more keyword and point and click suggestions. The solution is a shift to the NGIA approach. And that shift, whether traditional vendors of search grasp it, has already begun.

Stephen E Arnold, January 26, 2015

Enterprise Search: A Problem of Relevance to the Users

January 23, 2015

I enjoy email from those who read my for fee columns. I received an interesting comment from Australia about desktop search.

image

In a nutshell, the writer read one of my analyses of software intended for a single user looking for information on his local hard drives. The bigger the hard drives, the greater the likelihood, the user will operate in squirrel mode. The idea is that it is easier to save everything because “you never know.” Right, one doesn’t.

Here’s the passage I found interesting:

My concern is that with the very volatile environment where I saw my last mini OpenVMS environment now virtually consigned to the near-legacy basket and many other viable engines disappearing from Desktop search that there is another look required at the current computing environment.

I referred this person to Gaviri Search, which I use to examine email, and Effective File Search, which is useful for looking in specific directories. These suggestions sidestepped the larger issue:

There is no fast, easy to use, stable, and helpful way to look for information on a couple of terabytes of local storage. The files are a mixed bag: Excels, PowerPoints, image and text embedded PDFs, proprietary file formats like Framemaker, images, music, etc.

Such this problem was in the old days and such this problem is today. I don’t have a quick and easy fix. But these are single user problems, not an enterprise scale problem.

An hour after I read the email about my column, I received one of those frequent LinkedIn updates. The title of the thread to which LinkedIn wished to call my attention was/is: “What would you guess is behind a drop in query activity?”

image

I was enticed by the word “guess.” Most assume that the specialist discussion threads on LinkedIn attract the birds with the brightest plumage, not the YouTube commenter crowd.

I navigated to the provided link which may require that you become a member of LinkedIn and then appeal for admission to the colorful feather discussion for “Enterprise Search Professionals.”

The situation is that a company’s enterprise search engine is not being used by its authorized users. There was a shopping list of ideas for generating traffic to the search system. The reason is that the company spent money, invested human resources, and assumed that a new search system would deliver a benefit that the accountants could quantify.

What was fascinating was the response of the LinkedIn enterprise search professionals. The suggestions for improving the enterprise search engine included:

  • Asking for more information about usage? (Interesting but the operative fact is that traffic is low and evident to the expert initiating the thread.)
  • A thought that the user interface and “global navigation” might be an issue.
  • The idea that an “external factor” was the cause of the traffic drop. (Intriguing because I would include the search for a personal search system described in the email about my desktop search column as an “external factor.” The employee looking for a personal search solution was making lone wolf noises to me.)
  • An former English major’s insight that traffic drops when quality declines. I was hoping for a quote from a guy like Aristotle who said, “Quality is not an act, it is a habit.” The expert referenced “social software.”
  • My tongue in cheek suggestion that the search system required search engine optimization. The question sparked sturm und drang about enterprise search as something different from the crass Web site marketing hoopla.
  • A comment about the need for users to understand the vocabulary required to get information from an index of content and “search friendly” pages. (I am not sure what a search friendly page is, however? Is it what an employee creates, an interface, or a canned, training wheels “report”?)

Let’s step back. The email about desktop search and this collection of statements about lack of usage strike me as different sides of the same information access coin.

Read more

Is Enterprise Search Exempt from Intellectual Dishonesty?

January 20, 2015

I read “Techmeme’s Gabe Rivera on Tech Media: A Lot of Intellectual Dishonesty.” I figured out that “intellectual dishonesty” covers a large swath of baloney information. I have been involved in “technology” since I was hired by Halliburton Nuclear in 1972. In that period, I have watched engineers try to explain to non-engineers the objective functions of processes, algorithms, systems, and methods. I learned quickly that those who were not informed had a tough time figuring out what the engineers were saying or “meant.” Thus, the task became recasting details into something easily understood. Yep, nothing like simplified nuclear fission. It’s just like boiling water over a campfire. There you go. Nuclear energy made simple.

This article is a brief interview with a Silicon Valley luminary. The point seems to be that today much of the information about technology is off the mark. Well, let me make this simple: Almost useless. Today, thanks to innovation and re-imagineering, anyone able to click a mouse button can assert, “I am a technologist.” Many mouse clickers add a corollary: “I can learn anything.” No doubt failed middle school teachers, unemployed webmasters, and knowledge management experts have confidence in their abilities. Gold stars in middle school affirm one’s excellence, right?

In this interview, there were two observations that I related to my field of interest: Information Access.

I noted this comment about technology information:

Another problem: lying by omission, hyperbole and other forms of intellectual dishonesty are creeping into more tech reporting.

Ah, lying, hyperbole, and “other forms of intellectual dishonesty.” Good stuff.

I found this remark on point as well:

Most of the people who can offer key insights for understanding the industry are not incentivized to write, so a lot of crucial knowledge just never appears online. It’s just passed along to certain privileged people in the know.

I think this means that those with high value information may not produce listicles every few days. Too bad.

So what about enterprise search? Some thoughts:

  1. Consultants and experts who write what the prospects or the clients want to get money, consideration, or self aggrandizement. Dave Schubmehl, are you done recycling my research without permission?
  2. Vendors who say almost anything to close a deal. That’s why enterprise search vendors hop from SharePoint utility to customer support to business intelligence to analytics. The idea is that once the money is in hand, the vendor can code up a good enough solution
  3. Cheerleaders for failed concepts promise “value” or performance. The idea that knowledge management or innovation will be a direct consequence of finding information is only a partial truth.
  4. Open source cheerleaders. Open source is one source of information access technology. Open source requires glue code and scripting and often costs as much as a proprietary solution when direct and indirect expenses are tallied and summed. But free is “good”, right?
  5. Bloggers, experts, newly minted consultants, and unemployed English majors conclude that they are expert searchers and can learn anything.
  6. Job seekers. I find some of the information available on LinkedIn and Slideshare quite amazing, fascinating, and unfortunately disheartening.
  7. Unemployed search administrators. These folks want to use failure as a ladder to climb higher in their next job.

Net net: In enterprise search, the problems are significant because of the nature of human utterance. Those who are uninformed cater to the customers who may be uninformed. The result is the all-too-predictable rise and fall of companies like Delphes, Convera, Entopia, or Fast Search & Transfer, among many others. For example, Google tried to “fix” enterprise search with a locked down appliance. How is that working out?

The volume of misinformation, disinformation, and reformation makes accurate, objective analysis of search an almost impossible job. When everyone is an expert in search and content processing, most information about information access has almost zero knowledge value.

Stephen E Arnold, January 20, 2015

Enterprise Search: Is Search Big Data Ready?

January 17, 2015

At lunch on Thursday, January 15, 2015, one of my colleagues called my attention to “10 Hot Big Data Startups to Watch in 2015 from A to Z.” The story is by a professional at a company named Zementis. The story appears in or on a LinkedIn page, and I believe this may be from a person which LinkedIn considers a thought leader.

The reason I perked up when my colleague read the list of 10 companies was two fold. First, the author put his company Zementis on the list. Second, the consulting services firm LucidWorks—which I write in this way LucidWorks (Really?)—turned up.

Straight away, here’s the list of the “hot start ups” I am enjoined to “watch” in 2015. I assume that start up means “a newly established business,” according to Google’s nifty, attribution free definition service. “New” means “not existing before; made, introduced, or discovered recently or now for the first time.” Okay, with the housekeeping out of the way, on to the list:

  • Alpine Data Labs, founded in 2010
  • Confluent, founded in 2014 by LinkedIn engineers
  • Databricks, founded in 2013
  • Datameer, founded in 2009
  • Hadoop, now 10 years old and originally an open source project and not a company but figure 2004
  • Interana, founded in 2014 by former Facebook engineers
  • LucidWorks (Really?), né Lucid Imagination, founded in 2007
  • Paxata, founded in 2012
  • Trifacta, founded in 2012
  • Zementis, founded in 2004

Of these 10 companies, the firms that is not a commercial enterprise is Hadoop. Wikipedia suggests that Hadoop is a set of algorithms based on Google’s MapReduce open source version of code the search giant developed prior to 2004.

Okay, now we have nine hot data startups.

I am okay with Confluent and Interana being considered as new. Now we have seven companies that do not strike me as either “hot” or “new”. These non-hot and non-new outfits are Databricks (two years old), Datameer (four years old), LucidWorks Really? (eight years old), Paxata (three years old), and Zementis (11 years old).

I guess I can see that one could describe five of these companies as startups, but I cannot accept the “new” or “hot” moniker without some client names, revenue data, or some sort of factual substantiation.,

Now we have two companies to consider: LucidWorks Really? and Zementis.

LucidWorks Really? is a value added services firm based on Lucene/Solr. The company charges for its home-brew software and consulting and engineering services. According to Wikipedia, Lucene is:

Apache Lucene is a free open source information retrieval software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License.

Apache offers this about Solr:

Solr is the popular, blazing-fast, open source enterprise search platform built on Apache Lucene. [Lucene is a trademark of Apache it seems]

As Elasticsearch’s success in combining several open source products as a mechanism for accessing large datasets shows, it is possible to use Lucene as a query tool for information. But, and this is a large but, both the thriving Elasticsearch and LucidWorks Really? are search and retrieval systems. Yep, good old keyword search with some frosting tossed in by various community members and companies repackaging and marketing special builds of what is free software. LucidWorks has been around for eight years. I have trouble perceiving this company and its repositionings as “new”. The Big Data label seems little more than a marketing move as the company struggles to generate revenues.

Now Zementis. Like Recorded Future (funded by the GOOG and In-Q-Tel), Zementis is in the predictive analytics game. The company focuses on “holistic and actionable customer insight across all channels.” I did not include this company in my CyberOSINT study because the company seems to focus on commercial clients like retail stores and financial services. CyberOSINT is an analysis of next generation information access companies primarily serving law enforcement and intelligence entities.

But the deal breaker for me is not the company’s technology. I find it difficult to accept that a company founded 11 years ago is new. Like LucidWorks Really?, the label start up has more to do with the need to find a positioning that allows the company to generate sales and sustainable revenue.

These are essential imperatives. I do not accept the assertions about new, startup, and, to some degree, Big Data.

Furthermore, the inclusion of a project as a startup just adds evidence to support this hypothesis:

The write up is a listicle with little knowledge value. See http://amzn.to/1rUoQyn.

Why am I summarizing this information? The volume of disinformation about companies engaged in next generation information access are making the same marketing mistakes that pushed Delphes, Fast Search & Transfer, Entopia, Fulcrum Technology, iPhrase, and other hype oriented vendors into a corner.

Why not explain what a product does to solve a problem, offer specific case examples, and deal in concrete facts?

I assume that is just too much for the enterprise search and content processing “experts” to achieve in today’s business climate. Wow, what a confused listicle.

Stephen E Arnold, January 17, 2015

Enterprise Search: Evidence It Is a Commodity

January 17, 2015

I was browsing through some information gathered by Overflight last week. I cam across an interesting page showing Libraries Australia Architecture Overview. Here’s a miniature of the diagram. The link provides a larger version. Where is search? Well, it is in the middle, represented by a purple storage icon.

image

The search system is Solr. I find this interesting for several reasons:

First, Solr replaced the Australian-developed TeraText search system, which I think is pretty good. TeraText was a commercial product, and Solr is an open source system.

Second, Solr is a component in a far larger system. No surprise here, but the diagram makes clear that search is a utility supporting many other library functions. For vendors who make search the fabric for a large-scale application, the Libraries Australia team may want you to give them a lecture about ways to improve their system.

Third, Libraries Australia has a number of systems, each of which presumably has its native search tools. The implication is that Solr provides one screen access to these diverse resources. I wonder if the Oracle DBA uses Solr instead of the native Oracle tools. My thought is that the Solr champions see no reason to fool with Oracle command lines. The DBA, on the other hand, may see information access from a different point of view.

Net net: A commercial account closes, and an open source account begins. Does this fact suggest that closing deals for proprietary search systems might be more difficult in 2015?

Stephen E Arnold, January 17, 2015

Worlds Apart: The Schism between Information Access and OId School Keyword Search

January 9, 2015

Ah, Dave Schubmehl. You may remember my adventures with this “expert” in search. He published four reports based on my research, and then without permission sold one of these recycled $3,500 gems on Amazon. A sharp eyed law librarian and my attorney were able to get this cat back into the back.

He’s back with a 22 page report “The Knowledge Quotient: Unlocking the Hidden Value of Information Using Search and Content Analytics” that is free. Yep, free.

I was offered this report at a Yahoo email address I use to gather the spam and content marketing fluff that floods to me each day. I received the spam from Alisa Lipzen, an inside sales representative, of Coveo. Ms. Lipzen is sufficiently familiar with me to call me “Ben”. That’s a familiarity that may be unwarranted. She wants me to “enjoy.” Okay, but how about some substance.

To put this report in perspective, it is free. To me this means that the report was written for Coveo (a SharePoint centric keyword search vendor) and Lexalytics (a unit of Infonic if this IDC item is accurate). IDC, in my view, was paid to write this report and then cooperated with Coveo and Lexalytics to pump out the document as useful information.

My interest is not in the content marketing and pay-for-fame methods of consulting firms and their clients. Nope. I am focused on the substance of the write up which I was able to download thanks to the link in the spam I received. Here’s the cover page.

image

For background, I have just finished CyberOSINT: Next Generation Information Access. Fresh in my mind are the findings from our original and objective research. That’s right. I funded the research and I did not seek compensation from any of the 21 companies profiled in the report. You can read about the monograph on my Xenky site.

What’s interesting to me is that the IDC “expert” generated marketing document misses the major shift that has taken place in information access.

Keyword search is based on looking at what happened. That’s the historical bias of looking for content that has been processed and indexed. One can sift through that index and look for words that suggest happiness or dissatisfaction. That’s the “sentiment” angle.

But these methods are retrospective.

As CyberOSINT points out the new approach that is gaining customers and the support of a number of companies like BAE and Google is forward looking.

One looks up information when one knows what one is seeking. But what does the real time flow of information mean for now and the next 24 hours or week. The difference is one that is now revolutionizing information access and putting old school vendors at a disadvantage.

Read more

Grand View Research Looks at Enterprise Search and Misses a Market Shift

January 7, 2015

Every time I write about a low-tier or mid-tier consulting firm’s reports, I get nastygrams. One outfit demanded that I publish an apology. Okay, no problem. I apologize for expressing that the research was at odds with my own work. So before I tackle Grand View Research’s $4,700 report called “Enterprise Search Market Analysis By End-Use (Government & Commercial Offices, Banking & Finance, Healthcare, Retail), By Enterprise Size (Small, Medium, Large) And Segment Forecasts To 2020,” Let me say, I am sorry. Really, really sorry.

This is a report that is about a new Fantasyland loved by the naive. The year 2020 will not be about old school search.

fantasyland

Image source: http://www.themeparkreview.com/parks/photo.php?pageid=116&linkid=12739

I know I am taking a risk because my new report “CyberOSINT: Next Generation Information Access” will be available in a very short time. The fact that I elected to abandon search as an operative term is one signal that search is a bit of a dead end. I know that there are many companies flogging fixes for SharePoint, specialized systems that “do” business intelligence, and decades old information retrieval approaches packaged as discovery or customer service solutions.

But the reality is that plugging words into a search box means that the user has to know the terminology and what he or she needs to answer a question. Then the real work begins. Working through the results list takes time. Documents have to read and pertinent passages copied and pasted in another file. Then the researcher has to figure out what is right or wrong, relevant or irrelevant. I don’t know about you, but most 20 somethings are spending more time thumb typing than old fashioned research.

What has Grand View Research figured out?

First off, the company knows it has to charge a lot of money for a report on a topic that has been beaten to death for decades. Grand View’s approach is to define “search” by some fairly broad categories; for example, small, medium and large and Government and commercial, banking and finance, healthcare, retail and “others.”

Read more

Enterprise Search: Parkour for Venture Funded Enterprise Search Vendors

January 3, 2015

Parkour refers to the sport of jumping and climbing on man made constructions. Note that most of these “obstacles” have doors, staircases, and maybe elevators.

There are some terms that make this seemingly crazy activity sound really cool. For example, I learned whilst on vacation about the KONG. This is a suat de chat and involves “diving forward over an obstacle so that the body becomes horizontal, pushing off with the hands and tucking the legs such that the body is brought back to a vertical positio0n, ready to land.” See Parkour Terminology.

I also found this maneuver fascinating:

Kash vault This vault is a combination of two vaults; the cat pass and the dash vault. After pushing off with the hands in a cat pass, the body continues past vertical over the object until the feet are leading the body. The kash vault is then finished by pushing off the object at the end, as in a dash vault.

Here’s an image of a parkour expert doing parkour, of course:

image

Image source: http://parkourfreerunningblog.com/wp-content/uploads/2011/10/parkour.jpg

Now this looks like something a crazy person does: Jumping off a large concrete structure. Just my opinion, of course.

And, from my point of view, parkour is very similar to selling proprietary enterprise search and content processing solutions to commercial enterprises. The danger comes from having to pay stakeholders for the cash borrowed to keep the enterprise search company afloat. The thrill comes from the knife edge under feet: one error and some serious pain results. I suppose this focuses the mind.

As 2015 gets underway, enterprise search “experts” and vendors are gearing up to make sales. Some of the antics are beneficial to the mid tier consulting firms and publications that list the “visionaries,” the “companies that matter”, and the “leaders.” There are individual experts who conflate search with mastering Big Data or delivering the fuzzy wuzzy notion of information governance. Then there are the search vendors who wrap keyword search and classification in Dollar General wrapping paper. The idea is that keyword search is customer relationship management, analytics, and business intelligence.

For me, this is search vendor parkour, and it is okay for the tiny percentage of the population who want to jump off man-made structures. But for a person with a bit of information retrieval perspective, there are some other ways to get some exercise, remain whole, and not look absolutely crazy to an outside observer.

Here are some enterprise search realities to ponder this weekend:

First, if IBM and HP actually hit their magical billion collar goals for Watson and IDOL, how much money will be left for the hundreds and hundreds of smaller search system vendors. The answer is, “Generating billions from search is not possible, and the money available tends to be a tiny fraction of these behemoths’ projections.”

Second, why would a company pay for a commercial keyword search system when there are perfectly functional open source solutions like Elasticsearch, FLAX, and SphinxSearch?

Third, how can keyword search enriched with some clustering deliver actionable intelligence? There are companies specializing in delivering actionable intelligence. Such firms as BAE and Leidos have robust platforms that collect, analyze, and report automatically. Guessing which words unlock the treasures of an index seems somewhat old fashioned to me.

Fourth, how will the companies pouring millions upon millions into Attivio, BA Insight, Coveo, and a dozen other keyword search companies get their money back? I suppose there is the hope that Google, Microsoft, or Oracle will buy one of these firms. But that looks like a long shot. My view is that paying back the investors is going to be difficult, if not impossible.

Now these statements are sobering. One can immerse oneself in that baloney generated by the mid tier consultants (one of which Dave Schubmehls my research), the silliness generated by content management blogs about findability, and the wonkery of search engine optimization wizards.

The year 2015 will witness some significant shifts in the enterprise search landscape. In my forthcoming CyberOSINT: Next Generation Information Access, I explain the type of systems that are underpinning intelligence systems in the US and EC nations. I point out the specific functionalities of these next generation systems that make search a utility. Think of Mac OSX and its inclusion of Spotlight. Nice to have, for sure, but search is not OSX. My research team and I also identify some important lessons the NGIA vendors are teaching their customers. We also look ahead and identify some research areas that are likely to capture investors’ attention and yield measurable results.

Search is a utility. The fact that some brave people convert it to parkour does not change the fact that the activity itself is risky, entertaining, and useless. If I were an athlete, which I am not, I would focus on sports that generate the big bucks. Hoops. Football. Soccer. Parkour? That looks nuts from my vantage point in Harrod’s Creek.

Why not sell something the customer can see solves a problem? Crazy jumps just call attention to the last gasps of a software sector that needs life support.

Stephen E Arnold, January 3, 2015

Next Page »