January 30, 2015
In “CyberOSINT: Next Generation Information Access,” I describe Autonomy’s math-first approach to content processing. The reason is that after the veil of secrecy was lifted with regard to the signal processing`methods used for British intelligence tasks, Cambridge University became one of the hot beds for the use of Bayesian, LaPlacian, and Markov methods. These numerical recipes proved to be both important and controversial. Instead of relying on manual methods, humans selected training sets, tuned the thresholds, and then turned the smart software loose. Math is not required to understand what Autonomy packaged for commercial use: Signal processing separated noise in a channel and allowed software to process the important bits. Thank you, Claude Shannon and the good Reverend Bayes.
What did Autonomy receive for this breakthrough? Not much but the company did generate more than $600 million in revenues about 10 years after opening for business. As far as I know, no other content processing vendor has reached this revenue target. Endeca, for the sake of comparison, flat lined at about $130 million in the year that Oracle bought the Guided Navigation outfit for about $1.0 billion.
For one thing the British company BAE (British Aerospace Engineering) licensed the Autonomy system and began to refine its automated collection, analysis, and report systems. So what? The UK became by the late 1990s the de facto leader in automated content activities. Was BAE the only smart outfit in the late 1990s? Nope, there were other outfits who realized the value of the Autonomy approach. Examples range from US government entities to little known outfits like the Wynyard Group.
In the CyberOSINT volume, you can get more detail about why Autonomy was important in the late 1990s, including the name of the university8 professor who encouraged Mike Lynch to make contributions that have had a profound impact on intelligence activities. For color, let me mention an anecdote that is not in the 176 page volume. Please, keep in mind that Autonomy was, like i2 (another Cambridge University spawned outfit) a client prior to my retirement.) IBM owns i2 and i2 is profiled in CyberOSINT in Chapter 5, “CyberOSINT Vendors.” I would point out that more than two thirds of the monograph contains information that is either not widely available or not available via a routine Bing, Google, or Yandex query. For example, Autonomy does not make publicly available a list of its patent documents. These contain specific information about how to think about cyber OSINT and moving beyond keyword search.
Some Color: A Conversation with a Faux Expert
In 2003 I had a conversation with a fellow who was an “expert” in content management, a discipline that is essentially a step child of database technology. I want to mention this person by name, but I will avoid the inevitable letter from his attorney rattling a saber over my head. This person publishes reports, engages in litigation with his partners, kowtows to various faux trade groups, and tries to keep secret his history as a webmaster with some Stone Age skills.
Not surprisingly this canny individual had little good to say about Autonomy. The information I provided about the Lynch technology, its applications, and its importance in next generation search were dismissed with a comment I will not forget, “Autonomy is a pile of crap.”
Okay, that’s an informed opinion for a clueless person pumping baloney about the value of content management as a separate technical field. Yikes.
In terms of enterprise search, Autonomy’s competitors criticized Lynch’s approach. Instead of a keyword search utility that was supposed to “unlock” content, Autonomy delivered a framework. The framework operated in an automated manner and could deliver keyword search, point and click access like the Endeca system, and more sophisticated operations associated with today’s most robust cyber OSINT solutions. Enterprise search remains stuck in the STAIRS III and RECON era. Autonomy was the embodiment of the leap from putting the burden of finding on humans to shifting the load to smart software.
A diagram from Autonomy’s patents filed in 2001. What’s interesting is that this patent cites an invention by Dr. Liz Liddy with whom the ArnoldIT team worked in the late 1990s. A number of content experts understood the value of automated methods, but Autonomy was the company able to commercialize and build a business on technology that was not widely known 15 years ago. Some universities did not teach Bayesian and related methods because these were tainted by humans who used judgments to set certain thresholds. See US 6,668,256. There are more than 100 Autonomy patent documents. How many of the experts at IDC, Forrester, Gartner, et al have actually located the documents, downloaded them, and reviewed the systems, methods, and claims? I would suggest a tiny percentage of the “experts.” Patent documents are not what English majors are expected to read.”
That’s important and little appreciated by the mid tier outfits’ experts working for IDC (yo, Dave Schubmehl, are you ramping up to recycle the NGIA angle yet?) Forrester (one of whose search experts told me at a MarkLogic event that new hires for search were told to read the information on my ArnoldIT.com Web site like that was a good thing for me), Gartner Group (the conference and content marketing outfit), Ovum (the UK counterpart to Gartner), and dozens of other outfits who understand search in terms of selling received wisdom, not insight or hands on facts.
January 29, 2015
Users Want More Than Hunting through a Rubbish
CyberOSINT: Next Generation Information Access is, according to Ric Manning, the publisher of Stephen E Arnold’s new study, is now available. You can order a copy at the Gumroad online store or via the link on Xenky.com.
One of the key chapters in the 176 page study of information retrieval solution that move beyond search takes you under the hood of an NGIA system. Without reproducing the 10 page chapter and its illustrations, I want to highlight two important aspects of NGIA systems.
When a person requires information under time pressure, traditional systems pose a problem. The time required to figure out which repository to query, craft a query or take a stab at what “facet” (category) may contain the information, scanning the outputs the system displays, opening a document that appears to be related to the query, and then figuring out exactly what item of data is the one required makes traditional search a non starter in many work situations. The bottleneck is the human’s ability to keep track of which digital repository contains what. Many organizations have idiosyncratic terminology, and users in one department may not be familiar with the terminology used in another unit of the organization.
Register for the seminar on the Telestrategies’ Web site.
Traditional enterprise search systems trip and skin their knees over the time issue and over the “locate what’s needed issue.” These are problems that have persisted in search box oriented systems since the days of RECON, SDC Orbit, and Dialcom. There is little a manager can do to create more time. Time is a very valuable commodity and it often determines what type of decision is made and how risk laden that decision may be.
There is also little one can do to change how a bright human works with a system that forces a busy individual to perform iterative steps that often amount to guessing the word or phrase to unlock what’s hidden in an index or indexes.
Little wonder that convincing a customer to license a traditional keyword system continue to bedevil vendors.
A second problem is the nature of access. There is news floating around that Facebook has been able to generate more ad growth than Google because Facebook has more mobile users. Whether Facebook or Google dominates social mobile, the key development is “mobile.” Works need information access from devices which have smaller and different form factors from the multi core, 3.5 gigahertz, three screen workstation I am using to write this blog post.
January 29, 2015
Enterprise search has been useful. However, the online access methods have changed. Unfortunately, most enterprise search systems and the enterprise applications based on keyword and category access have lagged behind user needs.
The information highway is littered with the wrecks of enterprise search vendors who promised a solution to findability challenges and failed to deliver. Some of the vendors have been forgotten by today’s keyword and category access vendors. Do you know about the business problems that disappointed licensees and cost investors millions of dollars? Are you familiar with Convera, Delphes, Entopia, Fulcrum Technologies, Hakia, Siderean Software, and many other companies.
A handful of enterprise search vendors dodged implosion by selling out. Artificial Linguistics, Autonomy, Brainware, Endeca, Exalead, Fast Search, InQuira, iPhrase, ISYS Search Software, and Triple Hop were sold. Thus, their investors received their money back and in some cases received a premium. The $11 billion paid for Autonomy dwarfed the billion dollar purchase prices of Endeca and Fast Search and Transfer. But most of the companies able to sell their information retrieval systems sold for much less. IBM acquired Vivisimo for about $20 million and promptly justified the deal by describing Vivisimo’s metasearch system as a Big Data solution. Okay.
Today a number of enterprise search vendors walk a knife edge. A loss of a major account or a misstep that spooks investors can push a company over the financial edge in the blink of an eye. Recently I noticed that Dieselpoint has not updated its Web site for a while. Antidot seems to have faded from the US market. Funnelback has turned down the volume. Hakia went offline.
A few firms generate considerable public relations noise. Attivio, BA Insight, Coveo, and IBM Watson appear to be competing to become the leaders in today’s enterprise search sector. But today’s market is very different from the world of 2003-2004 when I wrote the first of three editions of the 400 page Enterprise Search Report. Each of these companies is asserting that their system provides business intelligence, customer support, and traditional enterprise search. Will any of these companies be able to match Autonomy’s 2008 revenues of $600 million. I doubt it.
The reason is not the availability of open source search. Elasticsearch, in fact, is arguably better than any of the for fee keyword and concept centric information retrieval systems. The problems of the enterprise search sector are deeper.
January 23, 2015
I enjoy email from those who read my for fee columns. I received an interesting comment from Australia about desktop search.
In a nutshell, the writer read one of my analyses of software intended for a single user looking for information on his local hard drives. The bigger the hard drives, the greater the likelihood, the user will operate in squirrel mode. The idea is that it is easier to save everything because “you never know.” Right, one doesn’t.
Here’s the passage I found interesting:
My concern is that with the very volatile environment where I saw my last mini OpenVMS environment now virtually consigned to the near-legacy basket and many other viable engines disappearing from Desktop search that there is another look required at the current computing environment.
There is no fast, easy to use, stable, and helpful way to look for information on a couple of terabytes of local storage. The files are a mixed bag: Excels, PowerPoints, image and text embedded PDFs, proprietary file formats like Framemaker, images, music, etc.
Such this problem was in the old days and such this problem is today. I don’t have a quick and easy fix. But these are single user problems, not an enterprise scale problem.
An hour after I read the email about my column, I received one of those frequent LinkedIn updates. The title of the thread to which LinkedIn wished to call my attention was/is: “What would you guess is behind a drop in query activity?”
I was enticed by the word “guess.” Most assume that the specialist discussion threads on LinkedIn attract the birds with the brightest plumage, not the YouTube commenter crowd.
I navigated to the provided link which may require that you become a member of LinkedIn and then appeal for admission to the colorful feather discussion for “Enterprise Search Professionals.”
The situation is that a company’s enterprise search engine is not being used by its authorized users. There was a shopping list of ideas for generating traffic to the search system. The reason is that the company spent money, invested human resources, and assumed that a new search system would deliver a benefit that the accountants could quantify.
What was fascinating was the response of the LinkedIn enterprise search professionals. The suggestions for improving the enterprise search engine included:
- Asking for more information about usage? (Interesting but the operative fact is that traffic is low and evident to the expert initiating the thread.)
- A thought that the user interface and “global navigation” might be an issue.
- The idea that an “external factor” was the cause of the traffic drop. (Intriguing because I would include the search for a personal search system described in the email about my desktop search column as an “external factor.” The employee looking for a personal search solution was making lone wolf noises to me.)
- An former English major’s insight that traffic drops when quality declines. I was hoping for a quote from a guy like Aristotle who said, “Quality is not an act, it is a habit.” The expert referenced “social software.”
- My tongue in cheek suggestion that the search system required search engine optimization. The question sparked sturm und drang about enterprise search as something different from the crass Web site marketing hoopla.
- A comment about the need for users to understand the vocabulary required to get information from an index of content and “search friendly” pages. (I am not sure what a search friendly page is, however? Is it what an employee creates, an interface, or a canned, training wheels “report”?)
Let’s step back. The email about desktop search and this collection of statements about lack of usage strike me as different sides of the same information access coin.
January 22, 2015
I was reviewing the France24.com item “Paris Attacks: Tracing Shadowy Terrorist Links.” I came across this graphic:
Several information-access thoughts crossed my mind.
First, France24 presented information that looks like a simplification of the outputs generated by a system like IBM’s i2. (Note: I was an advisor to i2 before its sale to IBM.) i2 is an NGIA or next generation information access system which dates from the 1990s. The notion that crossed my mind is that this relationship diagram presents information in a more useful way than a list of links. After 30 years, I wondered, “Why haven’t traditional enterprise search systems shifted from lists to more useful information access interfaces?” Many vendors have and the enterprise search vendors that stick to the stone club approach are missing what seems to be a quite obvious approach to information access.
A Google results list with one ad, two Wikipedia item, pictures, and redundant dictionary links. Try this query “IBM Mainframe.” Not too helpful unless one is looking for information to use in a high school research paper.
Second, the use of this i2-type diagram, now widely emulated from Fast Search centric outfits like Attivio to high flying venture backed outfits like Palantir permits one click access to relevant information. The idea is that a click on a hot spot—a plus in the diagram—presents additional information. I suppose one could suggest that the approach is just a form of faceting or “Guided Navigation”, which is Endeca’s very own phrase. I think the differences are more substantive. (I discuss these in my new monograph CyberOSINT.)
Third, no time is required to figure out what’s important. i2 and some other NGIA systems present what’s important, identifies key data points, and explains what is known and what is fuzzy. Who wants to scan, click, read, copy, paste, and figure out what is relevant and what is not? I don’t for many of my information needs. The issue of “time spent searching” is an artifact of the era when Boolean reigned supreme. NGIA systems automatically generate indexes that permit alternatives to a high school term paper’s approach to research.
Little wonder why the participants in enterprise search discussion groups gnaw bones that have been chewed for more than 50 years. There is no easy solution to the hurdles that search boxes and lists of results present to many users of online systems.
France24 gets it. When will the search vendors dressed in animal skins and carrying stone tools figure out that the world has changed. Demographics, access devices, and information have moved on.
Most enterprise search vendors deliver systems that could be exhibited in the Smithsonian next to the Daystrom 046 Little Gypsy mainframe and the IBM punch card machine.
Stephen E Arnold, January 22, 2015
January 9, 2015
Ah, Dave Schubmehl. You may remember my adventures with this “expert” in search. He published four reports based on my research, and then without permission sold one of these recycled $3,500 gems on Amazon. A sharp eyed law librarian and my attorney were able to get this cat back into the back.
He’s back with a 22 page report “The Knowledge Quotient: Unlocking the Hidden Value of Information Using Search and Content Analytics” that is free. Yep, free.
I was offered this report at a Yahoo email address I use to gather the spam and content marketing fluff that floods to me each day. I received the spam from Alisa Lipzen, an inside sales representative, of Coveo. Ms. Lipzen is sufficiently familiar with me to call me “Ben”. That’s a familiarity that may be unwarranted. She wants me to “enjoy.” Okay, but how about some substance.
To put this report in perspective, it is free. To me this means that the report was written for Coveo (a SharePoint centric keyword search vendor) and Lexalytics (a unit of Infonic if this IDC item is accurate). IDC, in my view, was paid to write this report and then cooperated with Coveo and Lexalytics to pump out the document as useful information.
My interest is not in the content marketing and pay-for-fame methods of consulting firms and their clients. Nope. I am focused on the substance of the write up which I was able to download thanks to the link in the spam I received. Here’s the cover page.
For background, I have just finished CyberOSINT: Next Generation Information Access. Fresh in my mind are the findings from our original and objective research. That’s right. I funded the research and I did not seek compensation from any of the 21 companies profiled in the report. You can read about the monograph on my Xenky site.
What’s interesting to me is that the IDC “expert” generated marketing document misses the major shift that has taken place in information access.
Keyword search is based on looking at what happened. That’s the historical bias of looking for content that has been processed and indexed. One can sift through that index and look for words that suggest happiness or dissatisfaction. That’s the “sentiment” angle.
But these methods are retrospective.
As CyberOSINT points out the new approach that is gaining customers and the support of a number of companies like BAE and Google is forward looking.
One looks up information when one knows what one is seeking. But what does the real time flow of information mean for now and the next 24 hours or week. The difference is one that is now revolutionizing information access and putting old school vendors at a disadvantage.
January 7, 2015
Every time I write about a low-tier or mid-tier consulting firm’s reports, I get nastygrams. One outfit demanded that I publish an apology. Okay, no problem. I apologize for expressing that the research was at odds with my own work. So before I tackle Grand View Research’s $4,700 report called “Enterprise Search Market Analysis By End-Use (Government & Commercial Offices, Banking & Finance, Healthcare, Retail), By Enterprise Size (Small, Medium, Large) And Segment Forecasts To 2020,” Let me say, I am sorry. Really, really sorry.
This is a report that is about a new Fantasyland loved by the naive. The year 2020 will not be about old school search.
I know I am taking a risk because my new report “CyberOSINT: Next Generation Information Access” will be available in a very short time. The fact that I elected to abandon search as an operative term is one signal that search is a bit of a dead end. I know that there are many companies flogging fixes for SharePoint, specialized systems that “do” business intelligence, and decades old information retrieval approaches packaged as discovery or customer service solutions.
But the reality is that plugging words into a search box means that the user has to know the terminology and what he or she needs to answer a question. Then the real work begins. Working through the results list takes time. Documents have to read and pertinent passages copied and pasted in another file. Then the researcher has to figure out what is right or wrong, relevant or irrelevant. I don’t know about you, but most 20 somethings are spending more time thumb typing than old fashioned research.
What has Grand View Research figured out?
First off, the company knows it has to charge a lot of money for a report on a topic that has been beaten to death for decades. Grand View’s approach is to define “search” by some fairly broad categories; for example, small, medium and large and Government and commercial, banking and finance, healthcare, retail and “others.”
December 13, 2014
I have been following the “AI will kill us”, the landscape of machine intelligence craziness, and “Artificial Intelligence Isn’t a Threat—Yet.”
The most recent big thinking on this subject appears in the Wall Street Journal, an organization in need of any type of intelligence: Machine, managerial, fiscal, online, and sci-fi.
Harsh? Hmm. The Wall Street Journal has been running full page ads for Factiva. If you are not familiar with this for fee service, think 1981. The system gathers “high value” content and makes it available to humans clever enough to guess the keywords that unlock, not answers, but a list of documents presumably germane to the keyword query. There are wrappers that make Factiva more fetching. But NGIA systems (what I call next generation information access systems) use the Factiva methods perfected 40 years ago as a utility.
These are Cheetos. nutritious, right? Will your smart kitchen let you eat these when it knows you are 30 pounds overweight, have consumed a quart of alcohol infused beverages, and ate a Snickers for lunch? Duh? What?
NGIA systems are sort of intelligent. The most interesting systems recurse through the previous indexes as the content processing system ingests data from users happily clicking, real time content streaming to the collection service, and threshold adjustments made either by savvy 18 year olds or some numerical recipes documented by Google’s Dr. Norvig in his standard text Artificial Intelligence.
So should be looking forward to the outputs of a predictive system pumping directly into an autonomous unmanned aerial vehicle? Will a nifty laser weapon find and do whatever the nifty gizmo does to a target? Will the money machine figure out why I need $300 for concrete repairs and decline to give it to me because the ATM “knows” the King of Concrete could not lay down in a feather bed. Forget real concrete.
The Wall Street Journal write up offers up this titbit:
December 7, 2014
I read “HP Takes Analytics to the Cloud in Comeback to IBM’s Watson.” The write up is darned interesting. Working through the analysis reminded me that HP does not realized that Autonomy’s 1999 customer BAE Systems has been working with analytics from the cloud for—what?—15 years? What about Recorded Future, SAIC, and dozens of other companies running successful businesses with this strategy?
The article points out that two large and somewhat pressured $100 billion companies are innovating like all get out. I learned:
Although it [Hewlett Packard] may not win any trivia contests in the foreseeable future, the hardware maker’s entry into the world of end-of-end analytics does hold up to Watson where the rubber meets the road in the enterprise…But the true equalizer for the company is IDOL, the natural language processing and search it obtained through the $11.7 billion acquisition of Autonomy Corp. PLC in 2011, which reduces the gap between human and machine interaction in a similar fashion to IBM’s cognitive computing platform.
Okay. IBM offers Watson, which was supposed to generate a billion or more by 2015 and then surge to $10 billion in revenue in another four or five years. What is Watson? As I understand it, Watson is open source code, some bits and pieces from IBM’s research labs, and wrappers that convert search into a towering giant of artificial intelligence. Why doesn’t IBM focus on its next generation information access units that are exciting and delivering services that customers want. i2 does not produce recipes incorporating tamarind. Cybertap does not help sick teenagers.
HP, on the other hand, owns the Autonomy Digital Reasoning Engine and the Integrated Data Operating Layer. These incorporate numerical recipes based on the work of Bayes, LaPlace, and Markov, among others. The technology is not open source. Instead, IDOL is a black box. HP spent $11 billion for Autonomy, figured out that it overpaid, wrote off $5 billion or so, and launched a global scorched earth policy for its management methods. Recently, HP has migrated DRE and IDOL to the cloud. Okay, but HP is putting more effort into accusing Autonomy of fooling HP. Didn’t HP buy Autonomy after experts reviewed the deal, the technology, and the financial statements? HP has lost years in an attempt to redress a perceived wrong. But HP decided to buy Autonomy.
December 3, 2014
In UK talk, a gritter is a giant machine that dumps sand (grit) on a highway to make it less slippery. Enterprise search gritters are ready to dump sand on my forthcoming report about next generation information access.
The reason is that enterprise search is running on a slippery surface. The potential customers are coated in Teflon. The dust up between HP and Autonomy, the indictment of a former Fast Search & Transfer executive, and the dormancy of some high flying vendors (Dieselpoint, Hakia, Siderean Software, et al)—these are reasons why enterprise customers are looking for something that takes the company into information access realms that are beyond search. Here’s an example: “Accounting Differences, Not Fraud, Led to HP’s Autonomy Write Down.” True or false, the extensive coverage of the $11 billion deal and the subsequent billions in write down has not built confidence in the blandishments of the enterprise search vendors.
Enter the gritters. Enterprise search vendors are prepping to dump no skid bits on their prospects. Among the non skid silica will be pages from mid tier consultants’ reports about fast movers and three legged rabbits. There will be conference talks that pummel the audience with assertions about the primacy of search. There will be recycled open source technology and “Fast” think packaged as business intelligence. There will be outfits that pine for the days of libraries with big budgets pitching rich metadata to trucking companies and small medical clinics who rightly ask, “What’s metadata?”