IDOL Is Back and with NLP
December 11, 2016
I must admit that I am confused. Hewlett Packard bought Autonomy, wrote off billions, and seemed to sell the Autonomy software (IDOL and DRE) to an outfit in England. Oh, HPE, the part of the Sillycon Valley icon, which sells “enterprise” products and services owns part of the UK outfit which owns Autonomy. Got that? I am not sure I have the intricacies of this stunning series of management moves straight in my addled goose brain.
Close enough for horseshoes, however.
I read what looks like a content marketing flufferoo called “HPE Boosts IDOL Data Analytics Engine with Natural Language Processing Tools.”
I thought that IDOL had NLP functions, but obviously I am wildly off base. To get my view of the Autonomy IDOL system, check out the free analysis at this link. (Nota bene: I have done a tiny bit of work for Autonomy and have had to wrestle with the system when I labored with the system as a contractor when I worked on US government projects. I know that this is no substitute for the whizzy analysis included in the aforementioned write up. But, hey, it is what it is.)
The write up states in reasonably clear marketing lingo:
HPE has added natural language processing capabilities to its HPE IDOL data analytics engine, which could improve how humans interact with computers and data, the company announced Tuesday. By using machine learning technology, HPE IDOL will be able to improve the context around data insights, the company said.
A minor point: IDOL is based on machine learning processes. That’s the guts of the black box comprising the Bayesian, LaPlacian, and Markovian methods in the guts of the Digital Reasoning Engine which underpins the Integrated Data Operating Layer of the Autonomy system.
Here’s the killer statement:
… the company [I presume this outfit is Hewlett Packard Enterprise and not MicroFocus] has introduced HPE Natural Language Question Answering to its IDOL platform to help solve the problem. According to the release, the technology seeks to determine the original intent of the question and then “provides an answer or initiates an action drawing from an organization’s own structured and unstructured data assets in addition to available public data sources to provide actionable, trusted answers and business critical responses.
I love the actionable, trusted bit. Autonomy’s core approach is based on probabilities. Trust is okay, but it is helpful to understand that probabilities are — well — probable. The notion of “trusted answers” is a quaint one to those who drink deep from the statistical springs of data.
I highlighted this quotation, presumably from the wizards at HPE:
“IDOL Natural Language Question Answering is the industry’s first comprehensive approach to delivering enterprise class answers,” Sean Blanchflower, vice president of engineering for big data platforms at HPE, said in the release. “Designed to meet the demanding needs of data-driven enterprises, this new, language-independent capability can enhance applications with machine learning powered natural language exchange.”
My hunch is that HPE or MicroFocus or an elf has wrapped a query system around the IDOL technology. The write up does not provide too many juicy details about the plumbing. I did note these features, however:
- An IDOL Answer Bank. Ah, ha. Ask Jeeves style canned questions. There is no better way to extract information than the use of carefully crafted queries. None of the variable, real life stuff that system users throw at search and retrieval systems. My experience is that maintaining canned queries can become a bit tedious and also expensive.
- IDOL Fact Bank. Ah, ha. A query that processes content to present “factoids.” Much better than a laundry list of results. What happens when the source data return factoids which are [a] not current, [b] not accurate, or [c] without context? Hey, don’t worry about the details. Take your facts and decide, folks.
- IDOL Passage Extract. Ah, ha. A snippet or key words in context! Not exactly new, but a time proven way to provide some context to the factoid. Now wasn’t that an IDOL function in 2001? Guess not.
- IDOL Answer Server. Ah, ha. A Google style wrapper; that is, leave the plumbing alone and provide a modernized paint job.
If you match these breakthroughs with the diagram in the HP IDOL write up’s diagrams, you will note that these capabilities appear in the IDOL/DRE system diagram and features.
What’s important in this content marketing piece. The write up provides a takeaway section to help out those who are unaware of the history of IDOL, which dates from the late 1990s. Here you go. Revel in new features, enjoy NLP, and recognize that HPE is competing with IBM Watson.
There you go. Factual content in action. Isn’t modern technology analysis satisfying? IBM Watson, your play.
Stephen E Arnold, December 11, 2017
MC+A Is Again Independent: Search, Discovery, and Engineering Services
December 7, 2016
Beyond Search learned that MC+A has added a turbo-charger to its impressive search, content processing, and content management credentials. The company, based in Chicago, earned a gold star from Google for MC+A’s support and integration services for the now-discontinued Google Search Appliance. After working with the Yippy implementation of Watson Explorer, MC+A retains its search and retrieval capabilities, but expanded its scope. Michael Cizmar, the company’s president told Beyond Search, “Search is incredibly important, but customers require more multi-faceted solutions.” MC+A provides the engineering and technical capabilities to cope with Big Data, disparate content, cloud and mixed-environment platforms, and the type of information processing needed to generate actionable reports. [For more information about Cizmar’s views about search and retrieval, see “An Interview with Michael Cizmar.”
Cizmar added:
We solve organizational problems rooted in the lack of insight and accessibility to data that promotes operational inefficiency. Think of a support rep who has to look through five systems to find an answer for a customer on the phone. We are changing the way these users get to answers by providing them better insights from existing data securely. At a higher level we provide strategy support for executives looking for guidance on organizational change.
Alphabet Google’s decision to withdraw the Google Search Appliance has left more than 60,000 licensees looking for an alternative. Since the début of the GSA in 2002, Google trimmed the product line and did not move the search system to the cloud. Cizmar’s view of the GSA’s 12 year journey reveals that:
The Google Search Appliance was definitely not a failure. The idea that organizations wanted an easy-to-use, reliable Google-style search system was ahead of its time. Current GSA customers need some guidance on planning and recommendations on available options. Our point of view is that it’s not the time to simply swap out one piece of metal for another even if vendors claim “OEM” equivalency. The options available for data processing and search today all provide tremendous capabilities, including cognitive solutions which provide amazing capabilities to assist users beyond the keyword search use case.
Cizmar sees an opportunity to provide GSA customers with guidance on planning and recommendations on available options. MC+A understands the options available for data processing and information access today. The company is deeply involved in solutions which tap “smart software” to deliver actionable information.
Cizmar said:
Keyword search is a commodity at this point, and we helping our customers put search where the user is without breaking an established workflow. Answers, not laundry lists of documents to read, is paramount today. Customers want to solve specific problems; for example, reducing average call time customer support using smart software or adaptive, self service solutions. This is where MC+A’s capabilities deliver value.
MC+A is cloud savvy. The company realized that cloud and hybrid or cloud-on premises solutions were ways to reduce costs and improve system payoff. Cizmar was one of the technologists recognized by Google for innovation in cloud applications of the GSA. MC+A builds on that engineering expertise. Today, MC+A supports Google, Amazon, and other cloud infrastructures.
Cizmar revealed:
Amazon Elastic Cloud Search is probably doing as much business as Google did with the GSA but in a much different way. Many of these cloud-based offerings are generally solving the problem with the deployment complexities that go into standing up Elasticsearch, the open source version of Elastic’s information access system.
MC+A does not offer a one size fits all solution. He said:
The problem still remains of what should go into the cloud, how to get a solution deployed, and how to ensure usability of the cloud-centric system. The cloud offers tremendous capabilities in running and scaling a search cluster. However, with the API consumption model that we have to operate in, getting your data out of other systems into your search clusters remains a challenge. MC+A does not make security an afterthought. Access controls and system integrity have high priority in our solutions.
MC+A takes a business approach to what many engineering firms view as a technical problem. The company’s engineers examine the business use case. Only then does MC+A determine if the cloud is an option. If so, which product or projects capabilities meet the general requirements. After that process, MC+A implements its carefully crafted, standard deployment process.
Cizmar noted:
If you are a customer with all of your data on premises or have a unique edge case, it may not make sense to use a cloud-based system. The search system needs to be near to the content most of the time.
MC+A offers its white-labeled search “Practice in a Box” to former Google partners and other integrators. High-profile specialist vendors like Onix in Ohio are be able to resell our technology backed by the MC+A engineering team.
In 2017, MC+A will roll out a search solution which is, at this time, shrouded in secrecy. This new offering will go “beyond the GSA” and offer expanded information access functionality. To support this new product, MC+A will announce a specialized search practice.
He said:
This international practice will offer depth and breadth in selling and implementing solutions around cognitive search, assist, and analytics with products other than Google throughout the Americas. I see this as beneficial to other Google and non-Google resellers because, it allows other them to utilize our award winning team, our content filters, and a wealth of social proofs on a just in time basis.
For 2017, MC+A offers a range of products and services. Based on the limited information provided by the secrecy-conscious Michael Ciznar, Beyond Search believes that the company will offer implementation and support services for Lucene and Solr, IBM Watson, and Microsoft SharePoint. The SharePoint support will embrace some vendors supplying SharePoint centric like Coveo. Plus, MC+A will continue to offer software to acquire content and perform extract-transform-load functions on premises, in the cloud, or in hybrid configurations.,
MC+A’s approach offers a business-technology approach to information access.
For more information about MC+A, contact sales@mcplusa.com 312-585-6396.
Stephen E Arnold, December 7, 2016
Search Competition Is Fiercer Than You Expect
December 5, 2016
In the United States, Google dominates the Internet search market. Bing has gained some traction, but the results are still muddy. In Russia, Yandex chases Google around in circles, but what about the enterprise search market? The enterprise search market has more competition than one would think. We recently received an email from Searchblox, a cognitive platform that developed to help organizations embed information in applications using artificial intelligence and deep learning models. SearchBlox is also a player in the enterprise software market as well as text analytics and sentiment analysis tool.
Their email explained, “3 Reasons To Choose SearchBlox Cognitive Platform” and here they are:
1. EPISTEMOLOGY-BASED. Go beyond just question and answers. SearchBlox uses artificial intelligence (AI) and deep learning models to learn and distill knowledge that is unique to your data. These models encapsulate knowledge far more accurately than any rules based model can create.
2. SMART OPERATION Building a model is half the challenge. Deploying a model to process big data can be even for challenging. SearchBlox is built on open source technologies like Elasticsearch and Apache Storm and is designed to use its custom models for processing high volumes of data.
3. SIMPLIFIED INTEGRATION SearchBlox is bundled with over 75 data connectors supporting over 40 file formats. This dramatically reduces the time required to get your data into SearchBlox. The REST API and the security capabilities allow external applications to easily embed the cognitive processing.
To us, this sounds like what enterprise search has been offering even before big data and artificial intelligence became buzzwords. Not to mention, SearchBlox’s competitors have said the same thing. What makes Searchblox different? The company claims to be more inexpensive and they have won several accolades. SearchBlox is made on open source technology, which allows it to lower the price. Elasticsearch is the most popular open source search software, but what is funny is that Searchblox is like a repackaged version of said Elasticsearch. Mind you are paying for a program that is already developed, but Searchblox is trying to compete with other outfits like Yippy.
Whitney Grace, December 5, 2016
BA Insight and Its Ideas for Enterprise Search Success
October 25, 2016
I read “Success Factors for Enterprise Search.” The write up spells out a checklist to make certain that an enterprise search system delivers what the users want—on point answers to their business information needs. The reason a checklist is necessary after more than 50 years of enterprise search adventures is a disconnect between what software can deliver and what the licensee and the users expect. Imagine figuring out how to get across the Grand Canyon only to encounter the Iguazu Falls.
The preamble states:
I’ll start with what absolutely does not work. The “dump it in the index and hope for the best” approach that I’ve seen some companies try, which just makes the problem worse. Increasing the size of the haystack won’t help you find a needle.
I think I agree, but the challenge is multiple piles of data. Some data are in haystacks; some are in odd ball piles from the AS/400 that the old guy in accounting uses for an inventory report.
Now the check list items:
- Metadata. To me, that’s indexing. Lousy indexing produces lousy search results in many cases. But “good indexing” like the best pie at the state fair is a matter of opinion. When the licensee, users, and the search vendor talk about indexing, some parties in the conversation don’t know indexing from oatmeal. The cost of indexing can be high. Improving the indexing requires more money. The magic of metadata often leads back to a discussion of why the system delivers off point results. Then there is talk about improving the indexing and its cost. The cycle can be more repetitive than a Kenmore 28132’s.
- Provide the content the user requires. Yep, that’s easy to say. Yep, if its on a distributed network, content disappears or does not get input into the search system. Putting the content into a repository creates another opportunity for spending money. Enterprise search which “federates” is easy to say, but the users quickly discover what is missing from the index or stale.
- Deliver off point results. The results create work by not answering the user’s question. From the days of STAIRS III to the latest whiz kid solution from Sillycon Valley, users find that search and retrieval systems provide an opportunity to go back to traditional research tools such as asking the person in the next cube, calling a self-appointed expert, guessing, digging through paper documents, or hiring an information or intelligence professional to gather the needed information.
The check list concludes with a good question, “Why is this happening?” The answer does not reside in the check list. The answer does not reside in my Enterprise Search Report, The Landscape of Search, or any of the journal and news articles I have written in the last 35 years.
The answer is that vendors directly or indirectly reassure that their software will provide the information a user needs. That’s an easy hook to plant in the customer who behaves like a tuna. The customer has a search system or experience with a search system that does not work. Pitching a better, faster, cheaper solution can close the deal.
The reality is that even the most sophisticated search and content processing systems end up in trouble. Search remains a very difficult problem. Today’s solutions do a few things better than STAIRS III did. But in the end, search software crashes and burns when it has to:
- Work within a budget
- Deal with structured and unstructured data
- Meet user expectations for timeliness, precision, recall, and accuracy
- Does not require specialized training to use
- Delivers zippy response time
- Does not crash or experience downtime due to maintenance
- Outputs usable, actionable reports without having to involve a programmer
- Provides an answer to a question.
Smart software can solve some of these problems for specific types of queries. Enterprise search will benefit incrementally. For now, baloney about enterprise search continues to create churn. The incumbent loses the contract, and a new search vendors inks a deal. Months later, the incumbent loses the contract, and the next round of vendors compete for the contract. This cycle has eroded the credibility of search and content processing vendors.
A check list with three items won’t do much to change the credibility gap between what vendors say, what licensees hope will occur, and what users expect. The Grand Canyon is a big hole to fill. The Iguazu Falls can be tough to cross. Same with enterprise search.
Stephen E Arnold, October 25, 2016
Attivio: Search and Almost Everything Else
October 24, 2016
I spent a few minutes catching up with the news on the Attivio blog. You can find the information at this link. As I worked through the write ups over the past five weeks, I was struck by the diversity of Attivio’s marketing messages. Here are the ones which I noted:
- Attivio is a cognitive computing company, not a search or database company
- Attivio has an interest in governance and risk / compliance
- Attivio is involved in Big Data management
- Attivio is active in anti fraud solutions
- Attivio embraces NoSQL
- Attivio knows about modernizing an organization’s data architecture
- Attivio is a business intelligence solution.
My reaction to these capabilities is two fold:
First, for a company which has its roots in Fast Search & Transfer type of software, Attivio has added a number of applications to basic content processing and information access. Attivio embodies the vision Fast Search articulated before the company ran into some “challenges” and sold to Microsoft in 2008. Fast Search, as I understood the vision, was a platform upon which information applications could be built. Attivio appears to be heading in that direction.
The second reaction is that Attivio is churning out capabilities which embody buzzwords, jargon, and trends. Like a fisherman in a bass boat, the Attivio approach is to use different lures in order to snag a decent sized bass. I find it difficult to accept the assertion that a company rooted in search can deliver in the array of technical niches the blog posts reference.
The major takeaway for me was that Attivio has hired a new Chief Revenue Officer whose job is to generate revenue from the company’s “data catalog” business. I learned from “Attivio Names New Chief Revenue Officer”:
Connon [the insider who took over the revenue job] sees his new role as a reflection of the growing demand for technology that can break down data silos and help successful companies answer, not just the question of “what” the data is reporting, but identify correlation and patterns to answer critical “why” questions. Connon is passionate when he talks about the value of Attivio’s newest technology solution—the Semantic Data Catalog–and its ability to unify a wide array of data for a diverse customer base. “The Semantic Data Catalog is not just for financial service industries. It’s truly a horizontal technology solution that can benefit companies in any industry with data—in other words, with any company, in any industry,” explains Connon. “Our established Cognitive Search and Insight technology provides the foundation for our Semantic Data Catalog to provide companies with a self-service, permission-based ability to locate, sort, and analyze key information across an unlimited number of data applications,” adds Connon.
For me, Attivio’s “momentum” in marketing has to be converted to sustainable revenue. My assumption is that almost every professional at a software / services company sells and generates revenue. When a company lags in revenue, will one person be able to generate revenue?
I don’t have an answer. Worth monitoring to learn if the Chief Revenue Officer can deliver the money.
Stephen E Arnold, October 24, 2016
Quote to Note: Enterprise Search As a Black Hole
October 19, 2016
Here’s a quote to note from “Slack CEO Describes Holy Grail of Virtual Assistants.” Slack seeks to create smart software capable of correlating information from enterprise applications. Good idea. The write up says:
Slack CEO Stewart Butterfield has an audacious goal: Turning his messaging and collaboration platform into an uber virtual assistant capable of searching every enterprise application to deliver employees pertinent information.
Got it. Employees cannot locate information needed for their job. Let me sidestep the issue of hiring people incapable of locating information in the first place.
Here’s the quote I noted:
And if Slack succeeds, it could seal the timeless black hole of wasted productivity enterprise search and other tools have failed to close.
I love the “timeless black hole of wasted productivity of enterprise search.” Great stuff, particularly because outfits like Wolters Kluwer continue to oscillate between proprietary search investments like Qwant.com and open source solutions like Lucene/Solr.
Do organizations create these black holes or is software to blame? Information is a slippery fish, which often find “timeless black holes” inhospitable.
Stephen E Arnold, October 19, 2016
Definitions of Search to Die For. Maybe With?
October 13, 2016
I read “Search Terminology. Web Search, Enterprise Search, Real Time Search, Semantic Search.” I have included glossaries in some of my books about search. I did not realize that I could pluck out four definitions and present them as a stand alone article. Ah, the wonders of content marketing.
If you want to read the definition with which one can die, either for or with, have at it. May I suggest that you consider these questions prior to your perusing the content marketing write up thing:
Web search
- What’s the method for password protected sites and encrypted sites which exist under current Web technology?
- What Web search systems build their own indexes and which send a query to multiple search systems and aggregate the results? Does the approach matter?
- What is the freshness or staleness of Web indexes? Does it matter that one index may be a few minutes “old” and another index several weeks “old”?
Enterprise search
- How does an enterprise search system deliver internal content points and external content pointers?
- What is the consequence of an enterprise search user who accesses content which is incomplete or stale?
- What does the enterprise search system do with third party content such as consultants’ reports which someone in the organization has purchased? Ignore? Re-license? Index the content and worry later?
- What is the refresh cycle for changed and new content?
- What is the search function for locating database content or rich media residing on the organization’s systems?
Real time search
- What is real time? The indexing of content in the millisecond world of Wall Street? Indexing content when machine resources and network bandwidth permit?
- How does a user determine the latency in the search system because marketers can write “real time” while programmers implement index update options which the search administrator selects?
- What search system indexes videos in real time? YouTube struggles with 10 minute or longer latency with some videos requiring hours before the index points to those videos?
Semantic search
- What is the role of human subject matter experts in semantic search?
- What is the benefit of human-intermediated systems versus person-machine or automated smart indexing?
- How does one address concept drift as a system “learns” from its indexing of information?
- What happens to taxonomies, dictionary lists of entities, and other artifacts of concept indexing?
- What does a system do when encountering documents, audio, and videos in a language different from the language of the majority of a system’s users?
Get the idea that zippy, brief definitions cannot deliver Gatorade to the college football players studying in the dorm the night before a big game?
Stephen E Arnold, October 13, 2016
Structured Search: New York Style
October 10, 2016
An interesting and brief search related content marketing white paper “InnovationQ Plus Search Engine Technology” attracted my attention. What’s interesting is that the IEEE is apparently in the search engine content marketing game. The example I have in front of me is from a company doing business as IP.com.
What does InnovationQ Plus do to deliver on point results? The write up says:
This engine is powered by IP.com’s patented neural network machine learning technology that improves searcher productivity and alleviates the difficult task of identifying and selecting countless keywords/synonyms to combine into Boolean syntax. Simply cut and paste abstracts, summaries, claims, etc. and this state-of-the art system matches queries to documents based on meaning rather than keywords. The result is a search that delivers a complete result set with less noise and fewer false positives. Ensure you don’t miss critical documents in your search and analysis by using a semantic engine that finds documents that other tools do not.
The use of snippets of text as the raw material for a behind-the-scenes query generator reminds me of the original DR-LINK method, among others. Perhaps there is some Syracuse University “old school” search DNA in the InnovationQ Plus approach? Perhaps the TextWise system has manifested itself as a “new” approach to patent and STEM (scientific, technology, engineering, and medical) online searching? Perhaps Manning & Napier’s interest in information access has inspired a new generation of search capabilities?
My hunch is, “Yep.”
If you don’t have a handy snippet encapsulating your search topic, just fill in the query form. Google offers a similar “fill in the blanks” approach even thought a tiny percentage of those looking for information on Google use advanced search. You can locate the Google advanced search form at this link.
Part of the “innovation” is the use of fielded search. Fielded search is useful. It was the go to method for locating information in the late 1960s. The method fell out of favor with the Sillycon Valley crowd when the idea of talking to one’s mobile phone became the synonym for good enough search.
To access the white paper, navigate the IEEE registration page and fill out the form at this link.
From my vantage point, structured search with “more like this” functions is a good way to search for information. There is a caveat. The person doing the looking has to know what he or she needs to know.
Good enough search takes a different approach. The systems try to figure out what the searcher needs to know and then deliver it. The person looking for information is not required to do much thinking.
The InnovationQ Plus approach shifts the burden from smart software to smart searchers.
Good enough search is winning the battle. In fact, some Sillycon Valley folks, far from upstate New York, have embraced good enough search with both hands. Why use words at all? There are emojis, smart software systems predicting what the use wants to know, and Snapchat infused image based methods.
The challenge will be to find a way to bridge the gap between the Sillycon Valley good enough methods and the more traditional structured search methods.
IEEE seems to agree as long as the vendor “participates” in a suitable IEEE publishing program.
Stephen E Arnold, October 10, 2016
Crimping: Is the Method Used for Text Processing?
October 4, 2016
I read an article I found quite thought provoking. “Why Companies Make Their Products Worse” explains that reducing costs allows a manufacturer to expand the market for a product. The idea is that more people will buy a product if it is less expensive than a more sophisticated version of the product. The example which I highlighted in eyeshade green explained that IBM introduced an expensive printer in the 1980s. Then IBM manufactured the different version of the printer using cheaper labor. The folks from Big Blue added electronic components to make the cheaper printer slower. The result was a lower cost printer that was “worse” than the original.
Perhaps enterprise search and content processing is a hybrid of two or more creatures?
The write up explained that this approach to degrading a product to make more money has a name—crimping. The concept creates “product sabotage”; that is, intentionally degrading a product for business reasons.
The comments to the article offer additional examples and one helpful person with the handle Dadpolice stated:
The examples you give are accurate, but these aren’t relics of the past. They are incredibly common strategies that chip makers still use today.
I understand the hardware or tangible product application of this idea. I began to think about the use of the tactic by text processing vendors.
The Google Search Appliance may have been a product subject to crimping. As I recall, the most economical GSA was less than $2000, a price which was relatively easy to justify in many organizations. Over the years, the low cost option disappeared and the prices for the Google Search Appliances soared to Autonomy- and Fast Search-levels.
Other vendors introduced search and content processing systems, but the prices remained lofty. Search and content processing in an organization never seemed to get less expensive when one considered the resources required, the license fees, the “customer” support, the upgrades, and the engineering for customization and optimization.
My hypothesis is that enterprise content processing does not yield compelling examples like the IBM printer example.
Perhaps the adoption rate for open source content processing reflects a pent up demand for “crimping”? Perhaps some clever graduate student would take the initiative to examine the content processing product prices? Licensees spend for sophisticated solution systems like those available from outfits like IBM and Palantir Technologies. The money comes from the engineering and what I call “soft” charges; that is, training, customer support, and engineering and consulting services.
At the other end of the content processing spectrum are open source solutions. The middle between free or low cost systems and high end solutions does not have too many examples. I am confident there are some, but I could identify Funnelback, dtSearch, and a handful of other outfits.
Perhaps “crimping” is not a universal principle? On the other hand, perhaps content processing is an example of a technical software which has its own idiosyncrasies.
Content processing products, I believe, become worse over time. The reason is not “crimping.” The trajectory of lousiness comes from:
- Layering features on keyword retrieval in hopes of finding a way to generate keen buyer interest
- Adding features helps justify price increases
- The greater the complexity of the system, the less likely the licensee will be able to fiddle with the system
- A refusal to admit that content processing is a core component of many other types of software so “finding information” has become a standard component for other applications.
If content processing is idiosyncratic, that might explain why investors pour money into content processing companies which have little chance to generate sufficient revenue to pay off investors, generate a profit, and build a sustainable business. Enterprise search and content processing vendors seem to be in a state of reinventing or reimagining themselves. Guitar makers just pursue cost cutting and expand their market. It is not so easy for content processing companies.
Stephen E Arnold, October 4, 2016
Five Years in Enterprise Search: 2011 to 2016
October 4, 2016
Before I shifted from worker bee to Kentucky dirt farmer, I attended a presentation in which a wizard from Findwise explained enterprise search in 2011. In my notes, I jotted down the companies the maven mentioned (love that alliteration) in his remarks:
- Attivio
- Autonomy
- Coveo
- Endeca
- Exalead
- Fabasoft
- IBM
- ISYS Search
- Microsoft
- Sinequa
- Vivisimo.
There were nodding heads as the guru listed the key functions of enterprise search systems in 2011. My notes contained these items:
- Federation model
- Indexing and connectivity
- Interface flexibility
- Management and analysis
- Mobile support
- Platform readiness
- Relevance model
- Security
- Semantics and text analytics
- Social and collaborative features
I recall that I was confused about the source of the information in the analysis. Then the murky family tree seemed important. Five years later, I am less interested in who sired what child than the interesting historical nuggets in this simple list and collection of pretty fuzzy and downright crazy characteristics of search. I am not too sure what “analysis” and “analytics” mean. The notion that an index is required is okay, but the blending of indexing and “connectivity” seems a wonky way of referencing file filters or a network connection. With the Harvard Business Review pointing out that collaboration is a bit of a problem, it is an interesting footnote to acknowledge that a buzzword can grow into a time sink.
There are some notable omissions; for example, open source search options do not appear in the list. That’s interesting because Attivio was at that time I heard poking its toe into open source search. IBM was a fan of Lucene five years ago. Today the IBM marketing machine beats the Watson drum, but inside the Big Blue system resides that free and open source Lucene. I assume that the gurus and the mavens working on this list ignored open source because what consulting revenue results from free stuff? What happened to Oracle? In 2011, Oracle still believed in Secure Enterprise Search only to recant with purchases of Endeca, InQuira, and Rightnow. There are other glitches in the list, but let’s move on.