June 16, 2015
I saw a link this morning to an 11 month old report from an azure chip consulting firm. You know, azure chip. Not a Bain, BCG, Booz Allen, or McKinsey which are blue chip firms. A mid tier outfit. Business at the Boozer is booming is the word from O’Hare Airport, but who knows if airport gossip is valid.
Which enterprise search vendor will come up a winner in December 2015?
What is possibly semi valid are analyses of enterprise search vendors. The “Magic Quadrant for Enterprise Search” triggered some fond memories of the good old days in 2003 when the leaders in enterprise search were brands or almost brands. You probably recall the thrilling days of these information retrieval leaders:
- Autonomy, the math oriented outfit with components names like neuro linguistic programming and integrated data operating layer and some really big name customers like BAE
- Convera, formerly Excalibur with juice from ConQuest (developer by a former Booz, Allen person no less)
- Endeca, the all time champ for computationally intensive indexing
- Fast Search & Transfer, the outfit that dumped Web search in order to take over the enterprise search sector
- Verity, ah, truth be told, this puppy’s architecture ensured plenty of time to dash off and grab a can of Mountain Dew.
In 2014, if the azure chip firm’s analysis is on the money, the landscape was very different. If I understand the non analytic version of Boston Consulting Group’s matrix from 1970, the big players are:
- Attivio, another business intelligence solution using open source technology and polymorphic positioning for the folks who have pumped more than $35 million into the company. One executive told me via LinkedIn, that the SEC investigation of an Attivio board member had zero impact on the company. I like the attitude. Bold.
- BA Insight, a business software vendor focused on making SharePoint somewhat useful and some investors with deepening worry lines
- Coveo, a start up which is nudging close to a decade in age, and more than $30 million in venture backing. I wonder if those stakeholders are getting nervous.
- Dassault Systèmes, the owner of Exalead, who said in the most recent quarterly report that the company was happy, happy, happy with Exalead but provided no numbers and no detail about the once promising technology
- Expert System, an interesting company with a name that makes online research pretty darned challenging
- Google, ah, yes, the proud marketer of the ever thrilling Google Search Appliance, a product with customer support to make American Airlines jealous
- Hewlett Packard Autonomy, now a leader in the acrimonious litigation field
- IBM, ah, yes, the cognitive computing bunch from Armonk. IBM search is definitely a product that is on everyone’s lips because the major output of the Watson group is a book of recipes
- IHS, an outfit which is banking on its patent analysis technology to generate big bucks in the Goldmine cellophane
- LucidWorks (Really?), a repackager of open source search and a distant second to Elastic (formerly Elasticsearch, which did not make the list. Darned amazing to me.)
- MarkLogic, a data management system trying to grow with a proprietary XML technology that is presented as search, business intelligence, and a tool for running a restaurant menu generation system. Will MarkLogic buy Smartlogic? Do two logics make a rational decision?
- Mindbreeze, a side project at Fabasoft which is the darling of the Austrian government and frustrated European SharePoint managers
- Perceptive Software, which is Lexmark’s packaging of ISYS Search Software. ISYS incorporates technology from – what did the founder tell me in 2009? – oh, right, code from the 1980s. Might it not be tough to make big bucks on this code base? I have 70 or 80 million ideas about the business challenge such a deal poses
- PolySpot, like Sinequa, a French company which does infrastructure, information access, and, of course, customer support
- Recommind, a legal search system which has delivered a down market variation of the Autonomy-type approach to indexing. The company is spreading its wings and tackling enterprise search.
- Sinequa, another one of those quirky French companies which are more flexible than a leotard for an out of work acrobat
But this line up from the azure chip consulting omits some companies which may be important to those looking for search solutions but not so much for azure chip consultants angling for retainer engagements. Let me highlight some vendors the azure chip crowd elected to ignore:
June 11, 2015
I read “Lucidworks Accelerates Product Focused Mission with Major Fusion Upgrades.” LucidWorks (Really?)—né Lucid Imagination—appears to be working on products. (Note that the company names appears in different ways: “Lucidworks” with variants “LucidWorks”, “Lucid Works,” and “lucidworks”.)
Lucidworks wants to accelerate its mission. Will this be a quick and easy task?
Flashback in time. Lucid Imagination was founded in 2007. You can read about the vision of the company in interviews with these Lucid (no pun intended) executives:
- Marc Krellenstein, formerly Northern Light and one of the founders of Lucid Imagination, March 17, 2009
- Brian Pinkerton, formerly, December 21, 2010, possibly Amazon?
- Paul Doscher, formerly with Exalead, April 16, 2012
- Miles Kehoe, formerly New Idea Engineering, January 29, 2013, now a consultant
- Mark Bennett, formerly New Idea Engineering, March 4, 2014
These interviews make clear the difficult journey that Lucid Imagination took. (What is interesting is that Lucid’s principal competitor was Elasticsearch, now Elastic. That company came from obscurity to the go-to provider of open source search. To be fair, Shay Bannon, founder of Elastic, had compiled considerable experience with the Compass open source search system.)
Why did I cover Lucid in five interviews?
The reason is that open source search appeared to be the salve to soothe the wounds inflicted by proprietary search system vendors. Satisfaction with search was declining. Users were disaffected with high profile proprietary brands. The community approach addressed, in part, the brutal research, development, and customer support costs which search drags to each meeting with stakeholders.
Lucid had a lead; Elastic benefited. Lucid seeks a focus; Elastic is serving customers. Lucid would be an excellent business school case study, ranking at the top along with the Hewlett Packard Autonomy search situation and the Fast Search criminal charges matter. That is rarified case study company.
In the interviews cited above, it is clear that Lucid embraced Solr and made an attempt to emulate the full featured approach to content processing exemplified by Autonomy and Fast Search & Transfer. Elastic, on the other hand, took a more direct approach, relying on Lucene for the heavy lifting, and narrowing its focus to tools which were almost utilitarian. Want to search a log file? Go with Elastic.
The other key difference is the lack of managerial drama at Elastic. Elastic’s management team appears, at least to this observer in Kentucky, as stable. Lucid, on the other hand, has seen the departure of founders early in the company’s history. Presidents arrived and departed. Marketers appeared and disappeared. Major committers joined and then jumped ship; for example, Brian Pinkerton ended up at Amazon, working on its search product. Yonik Seeley also left to start his own search company Heliosearch. Dr. Krellenstein went from strong supported of Lucid to a disaffected founded. He quit.
As recently as September 2014, Lucid Works featured in “Trouble at LucidWorks: Lawsuits, Lost Deals, and Layoffs Plague the Search Startup Despite Funding.” The headline makes several points. First, LucidWorks has ingested more than $40 million, which puts it on a par with Attivio and Coveo in the money department. But Elastic garnered about $70 million at about the same time. The headline also reveals the disjunctions among managers, regardless of which president was on watch. And, the headline focuses on the point that it is a search vendor, which is not in my opinion a particularly magnetic positioning for software.
According to the “Trouble at LucidWorks” article The Guardian and Nordstrom’s abandoned Lucid’s software. The less than flattering Venture Beat story added:
The situation seems to have worsened following shakeups in the sales team, leaving young salespeople inexperienced in the enterprise-software game trying to win deals. “I don’t think any of the sales team hits (their) number except one guy,” said a former employee. And that one guy has resorted to “dropping his pants,” as the sales expression goes, promising to significantly chop the price of a service if his lead commits to buying right away, a different former employee said. The sales goals aren’t increasing. The revenue target for the year is $12 million, right in line with last year, that former employee said. And it doesn’t help that LucidWorks has fumbled with partnerships it was trying to get in place. It was working on alliances with Amazon Web Services, Intel, and Splunk, one former employee told VentureBeat. “Will [Hayes] imploded that with comments he made in the final agreement,” that former employee said of one partnership. And after Hayes stepped up as chief executive in June, he’s laid off people in marketing, sales, and business development. On the technology side of the company, meanwhile, employees have missed deadlines for shipping software to customers, month after month, another former employee said. Outside the office, the company has other distractions — in court, to be exact. Mike Moody, a former senior vice president of engineering at LucidWorks who was terminated in December, sued LucidWorks and certain executives in February for unlawful termination, according to documents submitted to the U.S. District Court for the Northern District of California. LucidWorks is also ensnared in a case it filed against Seeley, one of its founders, in the Superior Court of California, San Mateo County. “This is a case about double-dealing on an employer, which arises from the secretive founding and launching of the company Heliosearch by Yonik Seeley before his resignation from his former employer LucidWorks in October 2013,” the complaint begins. “Unknown to LucidWorks, while Seeley was still employed by LucidWorks, he simultaneously was working directly against LucidWorks’ interests by developing and promoting his new venture Heliosearch as a competing alternative to LucidWorks.”
May 28, 2015
For a moment, I thought search was undergoing a renascence. But I was wrong. I noted a chart which purports to illustrate that the future is not keyword search. You can find the illustration (for now) at this Twitter location. The idea is that keyword search is less and less effective as the volume of data goes up. I don’t want to be a spoil sport, but for certain queries key words and good old Boolean may be the only way to retrieve certain types of information. Don’t believe me. Log on to your organization’s network or to Google. Now look for the telephone number of a specific person whose name you know or a tire company located in a specific city with a specific name which you know. Would you prefer to browse a directory, a word cloud, a list of suggestions? I want to zing directly to the specific fact. Yep, key word search. The old reliable.
But the chart points out that the future is composed of three “webs”: The Social Web, the Semantic Web, and the Intelligent Web. The dates for the Intelligent Web appears to be 2018 (the diagram at which I am looking is fuzzy). We are now perched half way through 2015. In 30 months, the Intelligent Web will arrive with these characteristics:
- Web scale reasoning (Don’t we have Watson? Oh, right. I forgot.)
- Intelligent agents (Why not tap Connotate? Agents ready to roll.)
- Natural language search (Yep, talk to your phone How is that working out on a noisy subway train?)
- Semantics. (Embrace the OWL. Now.)
Now these benchmarks will arrive in the next 30 months, which implies a gradual emergence of Web 4.0.
The hitch in the git along, like most futuristic predictions about information access, is that reality behaves in some unpredictable ways. The assumption behind this graph is “Semantic technology help to regain productivity in the face of overwhelming information growth.”
May 8, 2015
I must admit that I knew very little about the collaborative economy. I used AirBnB once time and worried about my little test. I survived. I rode in an Uber car one time because my son is an aficionado. I am okay with the subway and walking. I ignore apps which allegedly make my life better, faster, and more expensive.
I saw a post which pointed me to the Chief Digital Officer Summit and that pointed me to this page with the amazing honeycomb shown below. The title is “Collaborative Economy Honeycomb 2: Watch It Grow”
The hexagons are okay, but the bulk of the write up is a listing of companies which manifest the characteristics of a collaborative honeycomb outfit.
Most of the companies were unfamiliar to me. I did recognize the names of a couple of the honeycombers; for example, Khan Academy, Etsy, eBay (ah, delightful eBay), Craigslist, Freelancer, the Crypto currencies (yep, my Dark Web work illuminated this hexagon in the honeycomb for me), and Indiegogo (I met the founder at a function in Manhattan).
But the other 150 companies in the list were news to me.
But what caused me to perk up and pay attention was one factoid:
There were zero search, content processing, or next generation information access companies in the list.
I formed a hypothesis which will probably give indigestion to the individuals and financial services firm pumping money into search and content processing companies. Here it is:
The wave of innovation captured in the wonky honeycomb is moving forward with search as an item on a checklist. The finding functions of these outfits boil down to social media buzz and niche marketing. Information access is application centric, not search centric.
If I am correct, why would honeycomb companies in collaboration mode want to pump money into a proprietary keyword search system? Why not use open source software and put effort into features for the app crowd?
Net net: Generating big money from organic license deals may be very difficult if the honeycomb analysis is on the beam. How hard will it be to sell a high priced search system to the companies identified in this analysis? I think that the task might be difficult and time consuming.
the good news is that the list of companies provides outfits like Attivio, BA Insight, Coveo, Recommind, Smartlogic, and other information retrieval firms with some ducks at which to shoot. How many ducks will fall in a fusillade of marketing?
One hopes that the search sharpshooters prevail.
Stephen E Arnold, May 8, 2015
April 25, 2015
Need patent information? Lots of folks believed that making sense of the public documents available from the USPTO were the road to riches. Before I kicked back to enjoy the sylvan life in rural Kentucky, I did some work on Fancy Dan patent systems. There was a brush with the IBM Intelligent Patent Miner system. For those who do not recall their search history, you can find a chunk of information in “Information Mining with the IBM Intelligent Miner Family.” Keep in mind that the write up is about 20 years old. (Please, notice that the LexisNexis system discussed below uses many of the same, time worn techniques.)
Patented dog coat.
Then there was the Manning & Napier “smart” patent analysis system with analyses’ output displayed in three-D visualizations. I bumped into Derwent (now Intellectual Property & Science) and other Thomson Corp. solutions as well. And, of course, there was may work for an unnamed, mostly clueless multi billion dollar outfit related to Google’s patent documents. I summarized the results of this analysis in my Google Version 2.0 monograph, portions of which were published by BearStearns before it met its thrilling end seven years ago. (Was my boss the fellow carrying a box out of the Midtown BearStearns’ building?)
Why the history?
Well, patents are expensive to litigate. For some companies, intellectual property is a revenue stream.
There is a knot in the headphone cable. Law firms are not the go go business they were 15 or 20 years ago. Law school grads are running gyms; some are Uber drivers. Like many modern post Reagan businesses, concentration is the name of the game. For the big firms with the big buck clients, money is no object.
The problem in the legal information business is that smaller shops, including the one and two person outfits operating in Dixie Highway type of real estate do not want to pay for the $200 and up per search commercial online services charge. Even when I was working for some high rollers, the notion of a five or six figure online charge elicited what I would diplomatically describe as gentle push back.
I read “LexisNexis TotalPatent Keeps Patent Research out of the Black Box with Improved Version of Semantic Search.” For those out of touch with online history, I worked for a company in the 1980s which provided commercial databases to LexisNexis. I knew one of the founders (Don Wilson). I even had reasonably functional working relationships with Dan Prickett and people named “Jim” and “Sharon.” In one bizarre incident, a big wheel from LexisNexis wanted to meet with me in the Cherry Hill Mall’s parking lot across from the old Bell Labs’ facility where I was a consultant at the time. Err, no thanks. I was okay with the wonky environs of Bell Labs. I was not okay with the lash up of a Dutch and British company.
Snippet of code from a Ramanathan Guha invention. Guha used to be at IBM Almaden and he is a bright fellow. See US7593939 B2.
What does LexisNexis TotalPatent deliver for a fee? According to the write up:
TotalPatent, a web-based patent research, retrieval and analysis solution powered by the world’s biggest assortment of searchable full-text and bibliographic patent authorities, allows researchers to enter as much as 32,000 characters (comparable to more than 10 pages of text)—much over along a whole patent abstract—into its search industry. The newly enhanced semantic brains, pioneered by LexisNexis during 2009 and continually improved upon utilizing contextual information supplied by the useful patent data offered to the machine, current results in the form of a user-adjustable term cloud, where the weighting and positioning of terms may be managed for lots more precise results. And countless full-text patent documents, TotalPatent in addition utilizes systematic, technical also non-patent literature to go back the deepest, most comprehensive serp’s.
April 15, 2015
I have a view of Yahoo. Sure, it was formed when I was part of the team that developed The Point (Top 5% of the Internet). Yahoo had a directory. We had a content processing system. We spoke with Yahoo’s David Filo. Yahoo had a vision, he said. We said, No problem.
The Point became part of Lycos, embracing Fuzzy and his round ball chair. Yahoo, well, Yahoo just got bigger and generally went the way of general purpose portals. CEOs came and went. Stakeholders howled and then sulked.
I read or rather looked at “Yahoo. Semantic Search From Document Retrieval to Virtual Assistants.” You can find the PowerPoint “essay” or “revisionist report” on SlideShare. The deck was assembled by the director of research at Yahoo Labs. I don’t think this outfit is into balloons, self driving automobiles, and dealing with complainers at the European Commission. Here’s the link. Keep in mind you may have to sign up with the LinkedIn service in order to do anything nifty with the content.
The premise of the slide deck is that Yahoo is into semantic search. After some stumbles, semantic search started to become a big deal with Google and rich snippets, Bing and its tiles, and Facebook with its Like button and the magical Open Graph Protocol. The OGP has some fascinating uses. My book CyberOSINT can illuminate some of these uses.
And where is Yahoo in the 2008 to 2010 interval when semantic search was abloom? Patience, grasshopper.
Yahoo was chugging along with its Knowledge Graph. If this does not ring a bell, here’s the illustration used in the deck:
The date is 2013, so Yahoo has been busy since Facebook, Google, and Microsoft were semanticizing their worlds. Yahoo has a process in place. Again from the slide deck:
I was reminded of the diagrams created by other search vendors. These particular diagrams echo the descriptions of the now defunct Siderean Software server’s set up. But most content processing systems are more alike than different.
April 8, 2015
When a bean counter tallies up the cost of an enterprise search system, the reaction, in my experience, is, “How did we get to this number?” The question is most frequently raised in larger organizations, and it is one to which enterprise search staff and their consultants often have no acceptable answer.
Search-splainers position the cost overruns, diminish the importance of the employees’ dissatisfaction with the enterprise search system, and unload glittering generalities to get a consulting deal. Meanwhile, enterprise search remains a challenged software application.
Consulting engineers, upgrades, weekend crash recoveries, optimizing, and infrastructure hassles balloon the cost of an enterprise search system. At some point, a person charged with figuring out why employees are complaining, implementing workarounds, and not using the system have to be investigated. When answers are not satisfying, financial meltdowns put search vendors out of business. Examples range from Convera and the Intel and NBA matters to the unnoticed death of Delphes, Entopia, Siderean, et al.
Search to most professionals, regardless of occupation, means Google. Bang in a word or two and Google delivers the bacon or the soy bean paste substitute. Most folks do not know the difference, nor, in my view, do they care. Google is how one finds information.
The question, “Why can’t enterprise search be like Google?”
Another question, “How can a person with a dog in the search find search-plain; that is, “prove” how important search is to kith and kin, truth and honor, sales and profit.
For most professionals, search Google style is “free.” The perception is fueled with the logs of ignorance. Google is providing objective information. Google is good. Google is the yardstick by which enterprise search is measured. Enterprise search comes up short. Implement a Google Search Appliance, and the employees don’t like that solution either.
Inside an organization, finding information is an essential part of a job. One cannot work on a report unless that person can locate information about the topic. Most of the data are housed in emails, PowerPoints, multiple drafts of Word documents stuffed with change tracking emendations, and maybe some paper notes. In some cases, a professional will have to speak face to face or via the phone to a colleague. The information then requires massaging, analysis, and reformation.
Ah, the corporate life is little more than one more undergraduate writing assignment with some Excel tossed in.
March 31, 2015
I read an article from the outfit that relies on folks like Dave Schubmehl for expertise. The write up is “HP Links Vertica and IDOL Seeking Better Unstructured Data Analysis.” But I quite like the subtitle because it provides a timeline; to wit:
The company built a connector server for the products, which it acquired separately in 2011.
Let’s see that is just about three years plus a few months. The story reminded me of Rip Van Winkle who woke to a different world when he emerged from his slumber. The Sleepy Hollow could be a large technology company in the act of performing mitosis in order to generate [a] excitement, [b] money, and [c] the appearance of progress. I wonder if the digital Sleepy Hollow is located near Hanover Street? I will have to investigate that parallel.
What’s a few years of intellectual effort in a research “cave” when you are integrating software that is expected to generate billions of dollars in sales. Existing Vertica and Autonomy licensees are probably dancing in the streets.
The write up states:
Promising more thorough and timelier data analysis, Hewlett-Packard has released a software package that combines the company’s Vertica database with its IDOL data analysis platform. The HP Haven Connector Framework Server may allow organizations to study data sets that were too large or unwieldy to analyze before. The package provides “a mixture of statistical and contextual understanding,” of data, said Jeff Veis, HP vice president of marketing for big data. “You can pull in any form of data, and then do real-time high performance analysis.”
Hmm. “Promising” and “may allow” are interesting words and phrases. It seems as if the employer of Mr. Schubmehl is hedging on the HP assertions. I wonder, “Why?”
March 19, 2015
I review a couple of times a week a free digital “newspaper” called Paper.li. I learned about this Paper.li “newspaper” When Vivisimo sent me its version of “search news.” The enterprise search newspaper I receive is assembled under the firm hand of Edwin Stauthamer. The stories are automatically assembled into “The Enterprise Search Daily.”
The publication includes a wide range of information. The referrer’s name appears with each article. The title page for the March 18, 2015, issue is looks like this.
In the last week or so, I have noticed a stridency in the articles about search and the disciplines the umbrella term protects from would-be encroachers. Search is customer support, but from the enterprise search vendors’ viewpoint, enterprise search is the secret sauce for a great customer support soufflé. Enterprise search also does Big Data, business intelligence, and dozens of other activities.
The reason for the primacy of search, as I understand the assertions of the search companies and the self appointed search “experts” is that information retrieval makes the business work. Improve search. It follows, according to the logic, that revenues will increase, profits will rise, and employee and customer satisfaction will skyrocket.
Unfortunately enterprise search is difficult to position at the alpha and omega of enterprise software. Consider this article from the March 18 edition of The Enterprise Search Daily.
The article begins:
Enterprise search has notoriously been a problem in the content management equation. Various content and document management systems have made it possible to store files. But the ability to categorize that information intuitively and in a user-friendly way, and make that information easy to retrieve later, has been one of several missing pieces in the ECM market. When will enterprise search be as easy to use and insightful as Google’s external search engine? If enterprise search worked anywhere near as effectively as Google, it might be the versatile new item in our content management wardrobes, piecing content together with a clean sophistication that would appeal to users by making everything findable, accessible and easy to organize.
I am not sure how beginning with the general perception that enterprise search has been, is, and may well be a failure flips to a “must have” product. My view is that keyword search is a utility. For organizations with cash to invest, automated indexing and tagging systems can add some additional findability hooks. The caveat is that the licensee of these systems must be prepared to spend money on a professional who can ride herd on the automated system. The indexing strays have to be rounded up and meshed with the herd. But the title’s assertion is a dream, a wish. I don’t think enterprise content management is particularly buttoned up in most organizations. Even primitive search systems struggle to figure out what version is the one the user needs to find. Indexing by machine or human often leads to manual inspection of documents in order to locate the one the user requires. Google wanders into the scene because most employees give Google.com a whirl before undertaking a manual inspection job. If the needed document is on the Web somewhere, Google may surface it if the user is lucky enough to enter the secret combination of keywords. Google is deeply flawed, but for many employees, it is better than whatever their employer provides.
February 28, 2015
For years, I have posted a public indexing Overflight. You can examine the selected outputs at this Overflight link. (My non public system is more robust, but the public service is a useful temperature gauge for a slice of the content processing sector.)
When it comes to indexing, most vendors provide keyword, concept tagging, and entity extraction. But are these tags spot on? No, most are good enough.
A happy quack to Jackson Taylor for this “good enough” cartoon. The salesman makes it clear that good enough is indeed good enough in today’s marketing enabled world.
I chose about 50 companies that asserted their systems performed some type of indexing or taxonomy function. I learned that the taxonomy business is “about to explode.” I find that to be either an interesting investment tip or a statement that is characteristic of content processing optimists.
Like search and retrieval, plugging in “concepts” or other index terms is a utility function. For example, if one indexes each word in an article appearing in this blog, the article might be about another subject. For example, in this post, I am talking about Overflight, but the real topic is the broader use of metadata in information retrieval systems. I could assign the term “faceted navigation” to this article as a way to mark this article as germane to point and click navigation systems.
If you examine the “reports” Overflight outputs for each of the companies, you will discover several interesting things as I did on February 28, 2015 when I assembled this short article.
- Mergers or buying failed vendors at fire sale prices are taking places. Examples include Lucidea’s purchase of Cuadra and InMagic. Both of these firms are anchored in traditional indexing methods and seemed to be within a revenue envelope until their sell out. Business Objects acquired Inxight and then SAP acquired Business Objects. Bouvet acquired Ontopia. Teradata acquired Revelytix
- Moving indexing into open source. Thomson Reuters acquired ClearForest and made most of the technology available as OpenCalais. OpenText, a rollup outfit, acquired Nstein. SAS acquired Teragram. Smartlogic acquired Schemalogic. (A free report about Schemalogic is available at www.xenky.com/vendor-profiles.)
- A number of companies just failed, shut down, or went quiet. These include Active Classification, Arikus, Arity, Forth ICA, MaxThink, Millennium Engineering, Navigo, Progris, Protege, punkt.net, Questans, Quiver, Reuse Company, Sandpiper,
- The indexing sector includes a number of companies my non public system monitors; for example, the little known Data Harmony with six figure revenues after decades of selling really hard to traditional publishers. Conclusion: Indexing is a tough business to keep afloat.
There are numerous vendors who assert their systems perform indexing, entity, and metadata extraction. More than 18 of these companies are profiled in CyberOSINT, my new monograph. Oracle owns Triple Hop, RightNow, and Endeca. Each of these acquired companies performs indexing and metadata operations. Even the mashed potatoes search solution from Microsoft includes indexing tools. The proprietary XML data management vendor MarkLogic asserts that it performs indexing operations on content stored in its repository. Conclusion: More cyber oriented firms are likely to capture the juicy deals.
So what’s going on in the world of taxonomies? Several observations strike me as warranted:
First, none of the taxonomy vendors are huge outfits. I suppose one could argue that IBM’s Lucene based system is a billion dollar baby, but that’s marketing peyote, not reality. Perhaps MarkLogic which is struggling toward $100 million in revenue is the largest of this group. But the majority of the companies in the indexing business are small. Think in terms of a few hundred thousand in annual revenue to $10 million with generous accounting assumptions.
What’s clear to me is that indexing, like search, is a utility function. If a good enough search system delivers good enough indexing, then why spend for humans to slog through the content and make human judgments. Why not let Google funded Recorded Future identify entities, assign geo codes, and extract meaningful signals? Why not rely on Haystax or RedOwl or any one of more agile firms to deliver higher value operations.
I would assert that taxonomies and indexing are important to those who desire the accuracy of a human indexed system. This assumes that the humans are subject matter specialists, the humans are not fatigued, and the humans can keep pace with the flow of changed and new content.
The reality is that companies focused on delivering old school solutions to today’s problems are likely to lose contracts to companies that deliver what the customer perceives as a higher value content processing solution.
What can a taxonomy company do to ignite its engines of growth? Based on the research we performed for CyberOSINT, the future belongs to those who embrace automated collection, analysis, and output methods. Users may, if the user so chooses, provide guidance to the system. But the days of yore, when monks with varying degrees of accuracy created catalog sheets for the scriptoria have been washed to the margin of the data stream by today’s content flows.
What’s this mean for the folks who continue to pump money into taxonomy centric companies? Unless the cyber OSINT drum beat is heeded, the failure rate of the Overflight sample is a wake up call.
Buying Apple bonds might be a more prudent financial choice. On the other hand, there is an opportunity for taxonomy executives to become “experts” in content processing.
Stephen E Arnold, February 28, 2015