The AIIM Enterprise Search Study 2014

I worked through the 34 page report “Industry Watch. Search and Discovery. Exploiting Knowledge, Minimizing Risk.” The report is based on a sampling of 80,000 AIIM community members. The explanation of the process states:

Graphs throughout the report exclude responses from organizations with less than 10 employees, and suppliers of ECM products and services, taking the number of respondents to 353.

The demographics of the sample were tweaked to discard responses from organizations with fewer than 10 employees. The sample included respondents from North America (67 percent), Europe (18 percent) and “rest of world” (15 percent).

Some History for the Young Reader of Beyond Search

AIIM has roots in imaging (photographic and digital imaging). Years ago I spent an afternoon with Betty Steiger, a then well known executive with a high profile in Washington, DC’s technology community. She explained that the association wanted to reach into the then somewhat new technology for creating digital content. Instead of manually indexing microfilm images, AIIM members would use personal computers. I think we connected in 1982 at her request. My work included commercial online indexing, experiments in full text content online, a CD ROM produced in concert with Predicasts’ and Lotus, and automated indexing processes invented by Howard Flank, a sidekick of mine for a very long time. (Mr. Flank received the first technology achievement award from the old Information Industry Association, now the SIIA).

AIIM had its roots in the world of microfilm. And the roots of microfilm reached back to University Microfilms at the close of World War II. After the war, innovators wanted to take advantage of the marvels of microimaging and silver-based film. The idea was to put lots of content on a new medium so users could “find” answers to questions.

The problem for AIIM (originally the National Micrographics Association) was indexing. As an officer at a company considered in the 1980 as one of the leaders in online and semi automated indexing methods, Ms. Steiger and I had a great deal to discuss.

But AIIM evokes for me:

Microfilm —> Finding issues —> Digital versions of microfilm —> CD ROMs —> On premises online access —> Finding issues.

I find the trajectory of a microfilm leading to pronouncements about enterprise search, content processing, and eDiscovery fascinating. The story of AIIM is a parallel for the challenges the traditional publishing industry (what I call the “dead tree method”) has, like Don Quixote, galloped, galloped into battle with ones and zeros.

Asking a trade association’s membership for insights about electronic information is a convenient idea. What’s wrong with sampling the membership and others in the AIIM database, discarding those who belong to organizations with fewer than 10 employees, and tallying up the survey “votes.” For most of those interested in search, absolutely nothing. And that may be part of the challenge for those who want to get smart about search, findability, and content processing.

Let’s look at three findings from the 30 plus page study. (I have had to trim because the number of comments and notes I wrote when reading the report is too massive  for Beyond Search.)

Finding: 25 percent have no advanced or dedicated search tools. 13 percent have five or more [advanced or dedicated search tools].

Talk about good news for vendors of findability solutions. If  one thinks about the tens of millions of organizations in the US, one just discards the 10 percent with 10 or fewer employees, and there are apparently quite a large percentage with simplistic tools. (Keep in mind that there are more small businesses than large businesses by a very wide margin. But that untapped market is too expensive for most companies to penetrate with marketing messages.) The study encourages the reader to conclude that a bonanza awaits the marketer who can identify these organizations and convince them to acquire an advanced or dedicated search tool. There is a different view. The research Arnold IT (owner of Beyond Search) has conducted over the last couple of decades suggests that this finding conveys some false optimism. For example, in the organizations and samples with which we have worked, we found almost 90 percent saturation of search. The one on one interviews reveal that many employees were unaware of the search functions available for the organization’s database system or specialized tools like those used for inventory, the engineering department with AutoCAD, or customer support. So, the search systems with advanced features are in fact in most organizations. A survey of a general population reveals a market that is quite different from what the chief financial officer perceives when he or she tallies up the money spent for software that includes a search solution. But the problems of providing one system to handle the engineering department’s drawings and specifications, the legal departments confidential documents, the HR unit’s employee health data, and the Board of Director’s documents revealing certain financial and management topics have to remain in silos. There is, we have found, neither an appetite to gather these data nor the money to figure out how to make images and other types of data searchable from a single system. Far better to use a text oriented metasearch system and dismiss data from proprietary systems, images, videos, mobile messages, etc. We know that most organizations have search systems about which most employees know nothing. When an organization learns about these systems and then gets an estimate to creating one big federated system, the motivation drains from those who write the checks. In our research, senior management perceives aggregation of content as increasing risk and putting an information time bomb under the president’s leather chair.

Finding:  47% feel that universal search and compliant e-discovery is becoming near impossible given the proliferation of cloud share and collaboration apps, personal note systems and mobile devices. 60% are firmly of the view that automated analytics tools are the only way to improve classification and tagging to make their content more findable.

The thrill of an untapped market fades when one considers the use of the word “impossible.” AIIM is correct in identifying the Sisyphean tasks vendors face when pitching “all” information available via a third party system. Not only are the technical problems stretching the wizards at Google, the cost of generating meaningful “unified” search results are a tough nut to crack for intelligence and law enforcement entities. In general, some of these groups have motivation, money, and expertise. Even with these advantages, the hoo hah that many search and eDiscovery vendors pitch is increasing potential customers’ skepticism. The credibility of over-hyped findability solutions is squandered. Therefore, for some vendors, their marketing efforts are making it more difficult for them to close deals and causing a broader push back against solutions that are known by the prospects to be a waste of money. Yikes. How does a trade association help its members with this problem? Well, I have some ideas. But as I recall, Ms. Steiger was not too thrilled to learn about the nitty gritty of shifting from micrographics to digital. Does the same characteristic exist within AIIM today? I don’t know.

Read more »


Elasticsearch: A Platform for Third Party Revenue

Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.

I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”

In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:

We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.

Simple enough, but I have ample evidence that dozens and dozens of search and content  processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.

Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:

When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.

When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.

Here’s Mr. Brandon’s moment of insight:

During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?

You can read the full interview at http://bit.ly/1mADZ29. Information about QBox is at www.qbox.com.

Stephen E Arnold, July 2, 2014

Latest News

Kroll Ontrack Enjoys Predictive Coding Award

What happened to Recommind and ZyLAB? We thought they were eDiscovery frontrunners, but now BusinessWire tells us, “Kroll Ontrack Voted Best Predictive Coding... Read more »

October 20, 2014 | | Comment

Sail Labs Applauded for ASR that Understands Children

Sail Labs is clearly proud of its recent contribution to children’s education. The News & Events section of their website crows, “ITalk2Learn—a Learning... Read more »

October 20, 2014 | | Comment

Gartner and Enterprise Video Content Management

I read “Panopto Recognized as a “Leader” in Gartner’s Magic Quadrant for Enterprise Video Content Management.” I learned that Panopto is a... Read more »

October 19, 2014 | | Comment

Harvard and Loeb Digital Library

Hungry for a digital version of Fragments of Old Comedy, Volume 1: Diopeithes to Pherecrates? Navigate to this Loeb link. You may want to consider this question... Read more »

October 18, 2014 | | Comment

Google Scholar and Google Silos of Content

I read “Making the World’s Problem Solvers 10% More Efficient.” The article explains that the Google engineer who was “the key inventor” of Google Scholar... Read more »

October 18, 2014 | | 1 Comment

Earnings and Google

I read the dead tree version of “analysts Ask What’s Next for Google.” You can find the write up in the New York Times in section B, page 1 and 2 of the October... Read more »

October 17, 2014 | | 3 Comments

Blippex: By the People, For the People

Would Blippex be the search engine Alexis de Toqueville would love? The search engine is, according to Bloomberg, “a new crowd sourced public search engine.”... Read more »

October 17, 2014 | | Comment

Talend Gets a New CTO

For a technology-driven company, the position of CTO takes on special significance. We learn from Edubourse.com that “Talend Appoints New Chief Technology Officer... Read more »

October 17, 2014 | | Comment

IBM Downloads More Brains Into Watson

IBM is really pushing ways to prove that Watson was a smart invention. Technology Review explains that “EMTech: IBM Tries To Make Watson Smarter” by seeking... Read more »

October 17, 2014 | | Comment

Gool.li Service Offline

This may be old news. We were updating out list of search engines and received an error from the service called Gool.li, a metasearch system. Our last check for... Read more »

October 16, 2014 | | Comment