New Enterprise Search Market Study

August 1, 2017

Don Quixote and Solving Death: No Problem, Amigo

I read “Global Enterprise Search Market 2017-2022.” I was surprised that a consulting firms would invest time and energy in writing about a market sector which has not been thriving. Now don’t start sending me email about my lack of cheerfulness about enterprise search. The sector is thriving, but it is doing so with approaches that are disguised as applications which deliver something other than inflated expectations, business closures, and lawsuits.

Image result for don quixote

I will slay the beast that is enterprise search. “Hold still, you knave!”

First, let’s look at what the report covers, then I will tackle some of the issues about which I think as the author of the Enterprise Search Report and a number of search-related articles and analyses. (The articles are available from the estimable Information Today Web site, and the free analyses may be located at www.xenky.com/vendor-profiles.

The write up told me that enterprise search boils down to these companies:

Coveo Corp
Dassault Systemes
IBM Corp
Microsoft
Oracle
SAP AG

Coveo is a fork of Copernic. Yep, it’s a proprietary system which originally was focused on providing search for Microsoft. Now the company has spread its wings to include a raft of functions which range from the cloud to customer support / help desk services.

Dassault Systèmes is the owner of Exalead. Since the acquisition, Exalead as a brand has faded. The desktop search system was killed, and its proprietary technology lives on mostly as a replacement for Dassault’s internal search system which was based on Autonomy. Most of the search wizards have left, but the Exalead technology was good before Dassault learned that selling search was indeed a challenge.

IBM offers a number of products which include open source Lucene, acquired technology like Vivisimo’s clustering engine, and home brew code from its IBM wizards. (Did you  know that the precursor of PageRank was an IBM “invention”?) The key is that IBM uses search to sell services which have a higher margins than providing a free version of brute force information access.

Read more

Study of Search: Weird Results Plus Bonus Errors

December 30, 2016

I was able to snag a copy of “Indexing and Search: A Peek into What Real Users Think.” The study appeared in October 2016, and it appears to be the work of IT Central Station, which is an outfit described as a source of “unbiased reviews from the tech community.” I thought, “Oh, oh, “real users.” A survey. An IDC type or Gartner type sample which although suspicious to me seems to convey some useful information when the moon is huge. Nope. Nope.Unbiased. Nope.

Note that the report is free. One can argue that free does not translate to accurate, high value, somewhat useful information. I support this argument.

The report, like many of the “real” reports I have reviewed over the decades is relatively harmless. In terms of today’s content payloads, the study fires blanks. Let’s take a look at some of the results, and you can work through the 16 pages to double check my critique.

First, who are the “top” vendors? This list reads quite a bit about the basic flaw in the “peek.” The table below presents the list of “top” vendors along with my comment about each vendor. Companies with open source Lucene/Solr based systems are in dark red. Companies or brands which have retired from the playing field in professional search are in bold gray.

Vendor Comment
Apache This is not a search system. It is an open source umbrella for projects of which Lucene and Solr are two projects among many.
Attivio Based on Lucene/Solr open source search software; positioned as a business intelligence vendor
Copernic A desktop search and research system based on proprietary technology from the outfit known as Coveo
Coveo A vendor of proprietary search technology now chasing Big Data and customer support
Dassault Systèmes Owns Exalead which is now downgraded to a utility with Dassault’s PLM software
Data Design, now Ryft.com Pitches search without indexing via propriety “circuit module” method
Data Gravity Search is a utility in a storage centric system
DieselPoint Company has been “quiet” for a number of years
Expert System Publicly traded and revenue challenged vendor of a metadata utility, not a search system
Fabasoft Mindbreeze is a proprietary replacement for SharePoint search
Google Discontinued the Google Search Appliance and exited enterprise search
Hewlett Packard Enterprise Sold its search technology to Micro Focus; legal dispute in progress over alleged fraud
IBM Ominifind Lucene and proprietary scripts plus acquired technology
IBM StoredIQ Like DB2 search, a proprietary utility
ISYS Search Software Now owned by Lexmark and marginalized due to alleged revenue shortfalls
Lookeen Lucene based desktop and Outlook search
Lucidworks Solr add ons with floundering to be more than enterprise search
MAANA Proprietary search optimized for Big Data
Microsoft Offers multiple search solutions. The most notorious are Bing and Fast Search & Transfer proprietary solutions
Oracle Full text search is a utility for Oracle licenses; owns Artificial Linguistics, Triple Hop, Endeca, RightNow, InQuira, and the marginalized Secure Enterprise Search. Oh, don’t forget command line querying via PL/SQL
Polyspot, now CustomerMatrix Now a customer service vendor
Siderean Software Went out of business in 2008; a semantic search outfit
Sinequa Now a Big Data outfit with hopes of becoming the “next big thing” in whatever sells
X1 Search An eternal start up pitching eDiscovery and desktop search with a wild and crazy interface

What’s the table tell us about “top” systems? First, the list includes vendors not directly in the search and retrieval business. There is no differentiation among the vendors repackaging and reselling open source Lucene/Solr solutions. The listing is a fruit cake of desktop, database, and unstructured search systems. In short, the word “top” does not do the trick for me. I prefer “a list of eclectic and mostly unknown systems which include a search function.”

The report presents 10 bar charts which tell me absolutely nothing about search and retrieval. The bars appear to be a popularity content based on visits to the author’s Web site. Only two of the search systems listed in the bar chart have “reviews.” Autonomy IDOL garnered three reviews and Lookeen one review. The other eight vendors’ products were not reviewed. Autonomy and Lookeen could not be more different in purpose, design, and features.

The report then tackles the “top five” search systems in terms of clicks on the author’s Web site. Yep, clicks. That’s a heck of a yardstick because what percentage of clicks were humans and what percentage was bot driven? No answer, of course.

The most popular “solutions” illustrate the weirdness of the sample. The number one solution is DataGravity, which is a data management system with various features and utilities. The next four “top” solutions are:

  • Oracle Endeca – eCommerce and business intelligence and whatever Oracle can use the ageing system for
  • The Google Search Appliance – discontinued with a cloud solution coming down the pike, sort of
  • Lucene – open source, the engine behind Elasticsearch, which is quite remarkably not on the list of vendors
  • Microsoft Fast Search – included in SharePoint to the delight of the integrators who charge to make the dog heel once in a while.

I find it fascinating that DataGravity (1,273) garnered almost 4X the “votes” as Microsoft Fast Search (404). I think there are more than 200 million plus SharePoint licensees. Many of these outfits have many questions about Fast Search. I would hazard a guess that DataGravity has a tiny fraction of the SharePoint installed base and its brand identity and company name recognition are a fraction of Microsoft’s. Weird data or meaningless.

The bulk of the report are comparison of various search engines. I could not figure out the logic of the comparisons. What, for example, do Lookeen and IBM StoredIQ have in common? Answer: Zero.

The search report strikes me as a bit of silliness. The report may be an anti sales document. But your mileage will differ. If it does, good luck to you.

Stephen E Arnold, December 30, 2016

MC+A Is Again Independent: Search, Discovery, and Engineering Services

December 7, 2016

Beyond Search learned that MC+A has added a turbo-charger to its impressive search, content  processing, and content management credentials. The company, based in Chicago, earned a  gold star from Google for MC+A’s support and integration services for the now-discontinued Google Search Appliance. After working with the Yippy implementation of Watson Explorer, MC+A retains its search and retrieval capabilities, but expanded its scope. Michael Cizmar, the company’s president told Beyond Search, “Search is incredibly important, but customers require more multi-faceted solutions.” MC+A provides the engineering and technical capabilities to cope with Big Data, disparate content, cloud and mixed-environment platforms, and the type of information processing needed to generate actionable reports. [For more information about Cizmar’s views about search and retrieval, see “An Interview with Michael Cizmar.”

Cizmar added:

We solve organizational problems rooted in the lack of insight and accessibility to data that promotes operational inefficiency. Think of a support rep who has to look through five systems to find an answer for a customer on the phone. We are changing the way these users get to answers by providing them better insights from existing data securely. At a higher level we provide strategy support for executives looking for guidance on organizational change.

image

Alphabet Google’s decision to withdraw the  Google Search Appliance has left more than 60,000 licensees looking for an alternative. Since the début of the GSA in 2002, Google trimmed the product line and did not move the search system to the cloud. Cizmar’s view of the GSA’s 12 year journey reveals that:

The Google Search Appliance was definitely not a failure. The idea that organizations wanted an easy-to-use, reliable Google-style search system was ahead of its time. Current GSA customers need some guidance on planning and recommendations on available options. Our point of view is that it’s not the time to simply swap out one piece of metal for another even if vendors claim “OEM” equivalency. The options available for data processing and search today all provide tremendous capabilities, including cognitive solutions which provide amazing capabilities to assist users beyond the keyword search use case.

Cizmar sees an opportunity to provide GSA customers with guidance on planning and recommendations on available options. MC+A understands the options available for data processing and information access today. The company is deeply involved in solutions which tap “smart software” to deliver actionable information.

Cizmar said:

Keyword search is a commodity at this point, and we helping our customers put search where the user is without breaking an established workflow. Answers, not laundry lists of documents to read, is paramount today. Customers want to solve specific problems; for example, reducing average call time customer support using smart software or adaptive, self service solutions. This is where MC+A’s capabilities deliver value.

MC+A is cloud savvy. The company realized that cloud and hybrid or cloud-on premises solutions were ways to reduce costs and improve system payoff. Cizmar was one of the technologists recognized by Google for innovation in cloud applications of the GSA. MC+A builds on that engineering expertise. Today, MC+A supports Google, Amazon, and other cloud infrastructures.

Cizmar revealed:

Amazon Elastic Cloud Search is probably doing as much business as Google did with the GSA but in a much different way. Many of these cloud-based offerings are generally solving the problem with the deployment complexities that go into standing up Elasticsearch, the open source version of Elastic’s information access system.

MC+A does not offer a one size fits all solution. He said:

The problem still remains of what should go into the cloud, how to get a solution deployed, and how to ensure usability of the cloud-centric system. The cloud offers tremendous capabilities in running and scaling a search cluster. However, with the API consumption model that we have to operate in, getting your data out of other systems into your search clusters remains a challenge. MC+A does not make security an afterthought. Access controls and system integrity have high priority in our solutions.

MC+A takes a business approach to what many engineering firms view as a technical problem. The company’s engineers examine the business use case. Only then does MC+A determine if the cloud is an option. If so, which product or projects capabilities meet the general requirements. After that process, MC+A implements its carefully crafted, standard deployment process.

Cizmar noted:

If you are a customer with all of your data on premises or have a unique edge case, it may not make sense to use a cloud-based system. The search system needs to be near to the content most of the time.

MC+A offers its white-labeled search “Practice in a Box” to former Google partners and other integrators. High-profile specialist vendors like Onix in Ohio are be able to resell our technology backed by the MC+A engineering team.

In 2017, MC+A will roll out a search solution which is, at this time, shrouded in secrecy. This new offering will go “beyond the GSA” and offer expanded information access functionality. To support this new product, MC+A will announce a specialized search practice.

He said:

This international practice will offer depth and breadth in selling and implementing solutions around cognitive search, assist, and analytics with products other than Google throughout the Americas. I see this as beneficial to other Google and non-Google resellers because, it allows other them to utilize our award winning team, our content filters, and a wealth of social proofs on a just in time basis.

For 2017, MC+A offers a range of products and services. Based on the limited information provided by the secrecy-conscious Michael Ciznar, Beyond Search believes that the company will offer implementation and support services for Lucene and Solr, IBM Watson, and Microsoft SharePoint. The SharePoint support will embrace some vendors supplying SharePoint centric like Coveo. Plus, MC+A will continue to offer software to acquire content and perform extract-transform-load functions on premises, in the cloud, or in hybrid configurations.,

MC+A’s approach offers a business-technology approach to information access.

For more information about MC+A, contact sales@mcplusa.com 312-585-6396.

Stephen E Arnold, December 7, 2016

Five Years in Enterprise Search: 2011 to 2016

October 4, 2016

Before I shifted from worker bee to Kentucky dirt farmer, I attended a presentation in which a wizard from Findwise explained enterprise search in 2011. In my notes, I jotted down the companies the maven mentioned (love that alliteration) in his remarks:

  • Attivio
  • Autonomy
  • Coveo
  • Endeca
  • Exalead
  • Fabasoft
  • Google
  • IBM
  • ISYS Search
  • Microsoft
  • Sinequa
  • Vivisimo.

There were nodding heads as the guru listed the key functions of enterprise search systems in 2011. My notes contained these items:

  • Federation model
  • Indexing and connectivity
  • Interface flexibility
  • Management and analysis
  • Mobile support
  • Platform readiness
  • Relevance model
  • Security
  • Semantics and text analytics
  • Social and collaborative features

I recall that I was confused about the source of the information in the analysis. Then the murky family tree seemed important. Five years later, I am less interested in who sired what child than the interesting historical nuggets in this simple list and collection of pretty fuzzy and downright crazy characteristics of search. I am not too sure what “analysis” and “analytics” mean. The notion that an index is required is okay, but the blending of indexing and “connectivity” seems a wonky way of referencing file filters or a network connection. With the Harvard Business Review pointing out that collaboration is a bit of a problem, it is an interesting footnote to acknowledge that a buzzword can grow into a time sink.

image

There are some notable omissions; for example, open source search options do not appear in the list. That’s interesting because Attivio was at that time I heard poking its toe into open source search. IBM was a fan of Lucene five years ago. Today the IBM marketing machine beats the Watson drum, but inside the Big Blue system resides that free and open source Lucene. I assume that the gurus and the mavens working on this list ignored open source because what consulting revenue results from free stuff? What happened to Oracle? In 2011, Oracle still believed in Secure Enterprise Search only to recant with purchases of Endeca, InQuira, and Rightnow. There are other glitches in the list, but let’s move on.

Read more

Microsoft and the Open Source Trojan Horse

March 30, 2016

Quite a few outfits embrace open source. There are a number of reasons:

  1. It is cheaper than writing original code
  2. It is less expensive than writing original code
  3. It is more economical than writing original code.

The article “Microsoft is Pretending to be a FOSS Company in Order to Secure Government Contracts With Proprietary Software in ‘Open’ Clothing” reminded me that there is another reason.

No kidding.

I know that IBM has snagged Lucene and waved its once magical wand over the information access system and pronounced, “Watson.” I know that deep inside the kind, gentle heart of Palantir Technologies, there are open source bits. And there are others.

The write up asserted:

For those who missed it, Microsoft is trying to EEE GNU/Linux servers amid Microsoft layoffs; selfish interests of profit, as noted by some writers [1,2] this morning, nothing whatsoever to do with FOSS (there’s no FOSS aspect to it at all!) are driving these moves. It’s about proprietary software lock-in that won’t be available for another year anyway. It’s a good way to distract the public and suppress criticism with some corny images of red hearts.

The other interesting point I highlighted was:

reject the idea that Microsoft is somehow “open” now. The European Union, the Indian government and even the White House now warm up to FOSS, so Microsoft is pretending to be FOSS. This is protectionism by deception from Microsoft and those who play along with the PR campaign (or lobbying) are hurting genuine/legitimate FOSS.

With some government statements of work requiring “open” technologies, Microsoft may be doing what other firms have been doing for a while. See points one to three above. Microsoft is just late to the accountants’ party.

Why not replace the SharePoint search thing with an open source solution? What’s the $1.2 billion MSFT paid for the fascinating Fast Search & Transfer technology in 2008? It works just really well, right?

Stephen E Arnold, March 30, 2016

Enterprise Search Revisionism: Can One Change What Happened

March 9, 2016

I read “The Search Continues: A History of Search’s Unsatisfactory Progress.” I noted some points which, in my opinion, underscore why enterprise search has been problematic and why the menagerie of experts and marketers have put search and retrieval on the path to enterprise irrelevance. The word that came to mind when I read the article was “revisionism” for the millennials among us.

The write up ignores the fact that enterprise search dates back to the early 1970s. One can argue that IBM’s Storage and Information Retrieval System (STAIRS) was the first significant enterprise search system. The point is that enterprise search as a productized service has a history of over promising and under delivering of more than 40 years.

image.pngEnterprise search with a touch of Stalinist revisionism.

Customers said they wanted to “find” information. What those individuals meant was have access to information that provided the relevant facts, documents, and data needed to deal with a problem.

Because providing on point information was and remains a very, very difficult problem, the vendors interpreted “find” to mean a list of indexed documents that contained the users’ search terms. But there was a problem. Users were not skilled in crafting queries which were essentially computer instructions between words the index actually contained.

After STAIRS came other systems, many other systems which have been documented reasonably well in Bourne and Bellardo-Hahn’s A History of Online information Services 1963-1976. (The period prior to 1970 describes for-fee research centric online systems. STAIRS was among the most well known early enterprise information retrieval system.)  I provided some history in the first three editions of the Enterprise Search Report, published from 2003 to 2007. I have continued to document enterprise search in the Xenky profiles and in this blog.

The history makes painful reading for those who invested in many search and retrieval companies and for the executives who experienced the crushing of their dreams and sometimes career under the buzz saw of reality.

In a nutshell, enterprise search vendors heard what prospects, workers overwhelmed with digital and print information, and unhappy users of those early systems were saying.

The disconnect was that enterprise search vendors parroted back marketing pitches that assured enterprise procurement teams of these functions:

  • Easy to use
  • “All” information instantly available
  • Answers to business questions
  • Faster decision making
  • Access to the organization’s knowledge.

The result was a steady stream of enterprise search product launches. Some of these were funded by US government money like Verity. Sure, the company struggled with the cost of infrastructure the Verity system required. The work arounds were okay as long as the infrastructure could keep pace with the new and changed word-centric documents. Toss in other types of digital information, make the system perform ever faster indexing, and keep the Verity system responding quickly was another kettle of fish.

Research oriented information retrieval experts looked at the Verity type system and concluded, “We can do more. We can use better algorithms. We can use smart software to eliminate some of the costs and indexing delays. We can [ fill in the blank ].

The cycle of describing what an enterprise search system could actually deliver was disconnected from the promises the vendors made. As one moves through the decades from 1973 to the present, the failures of search vendors made it clear that:

  1. Companies and government agencies would buy a system, discover it did not do the job users needed, and buy another system.
  2. New search vendors picked up the methods taught at Cornell, Stanford, and other search-centric research centers and wrap on additional functions like semantics. The core of most modern enterprise search systems is unchanged from what STAIRS implemented.
  3. Search vendors came like Convera, failed, and went away. Some hit revenue ceilings and sold to larger companies looking for a search utility. The acquisitions hit a high water mark with the sale of Autonomy (a 1990s system) to HP for $11 billion.

What about Oracle, as a representative outfit. Oracle database has included search as a core system function since the day Larry Ellison envisioned becoming a big dog in enterprise software. The search language was Oracle’s version of the structured query language. But people found that difficult to use. Oracle purchased Artificial Linguistics in order to make finding information more intuitive. Oracle continued to try to crack the find information problem through the acquisitions of Triple Hop, its in-house Secure Enterprise Search, and some other odds and ends until it bought in rapid succession InQuira (a company formed from the failure of two search vendors), RightNow (technology from a Dutch outfit RightNow acquired), and Endeca. Where is search at Oracle today? Essentially search is a utility and it is available in Oracle applications: customer support, ecommerce, and business intelligence. In short, search has shifted from the “solution” to a component used to get started with an application that allows the user to find the answer to business questions.

I mention the Oracle story because it illustrates the consistent pattern of companies which are actually trying to deliver information that the u9ser of a search system needs to answer a business or technical question.

I don’t want to highlight the inaccuracies of “The Search Continues.” Instead I want to point out the problem buzzwords create when trying to understand why search has consistently been a problem and why today’s most promising solutions may relegate search to a permanent role of necessary evil.

In the write up, the notion of answering questions, analytics, federation (that is, running a single query across multiple collections of content and file types), the cloud, and system performance are the conclusion of the write up.

Wrong.

The use of open source search systems means that good enough is the foundation of many modern systems. Palantir-type outfits, essential an enterprise search vendors describing themselves as “intelligence” providing systems,, uses open source technology in order to reduce costs, shift bug chasing to a community, The good enough core is wrapped with subsystems that deal with the pesky problems of video, audio, data streams from sensors or similar sources. Attivio, formed by professionals who worked at the infamous Fast Search & Transfer company, delivers active intelligence but uses open source to handle the STAIRS-type functions. These companies have figured out that open source search is a good foundation. Available resources can be invested in visualizations, generating reports instead of results lists, and graphical interfaces which involve the user in performing tasks smart software at this time cannot perform.

For a low cost enterprise search system, one can download Lucene, Solr, SphinxSearch, or any one of a number of open source systems. There are low cost (keep in mind that costs of search can be tricky to nail down) appliances from vendors like Maxxcat and Thunderstone. One can make do with the craziness of the search included with Microsoft SharePoint.

For a serious application, enterprises have many choices. Some of these are highly specialized like BAE NetReveal and Palantir Metropolitan. Others are more generic like the Elastic offering. Some are free like the Effective File Search system.

The point is that enterprise search is not what users wanted in the 1970s when IBM pitched the mainframe centric STAIRS system, in the 1980s when Verity pitched its system, in the 1990s when Excalibur (later Convera) sold its system, in the 2000s when Fast Search shifted from Web search to enterprise search and put the company on the road to improper financial behavior, and in the efflorescence of search sell offs (Dassault bought Exalead, IBM bought iPhrase and other search vendors), and Lexmark bought Brainware and ISYS Search Software.

Where are we today?

Users still want on point information. The solutions on offer today are application and use case centric, not the silly one-size-fits-all approach of the period from 2001 to 2011 when Autonomy sold to HP.

Open source search has helped create an opportunity for vendors to deliver information access in interesting ways. There are cloud solutions. There are open source solutions. There are small company solutions. There are more ways to find information than at any other time in the history of search as I know it.

Unfortunately, the same problems remain. These are:

  1. As the volume of digital information goes up, so does the cost of indexing and accessing the sources in the corpus
  2. Multimedia remains a significant challenge for which there is no particularly good solution
  3. Federation of content requires considerable investment in data grooming and normalizing
  4. Multi-lingual corpuses require humans to deal with certain synonyms and entity names
  5. Graphical interfaces still are stupid and need more intelligence behind the icons and links
  6. Visualizations have to be “accurate” because a bad decision can have significant real world consequences
  7. Intelligent systems are creeping forward but crazy Watson-like marketing raises expectations and exacerbates the credibility of enterprise search’s capabilities.

I am okay with history. I am not okay with analyses that ignore some very real and painful lessons. I sure would like some of the experts today to know a bit more about the facts behind the implosions of Convera, Delphis, Entopia, and many other companies.

I also would like investors in search start ups to know a bit more about the risks associated with search and content processing.

In short, for a history of search, one needs more than 900 words mixing up what happened with what is.

Stephen E Arnold, March 9, 2016

Oracle Suggests a PeopleSoft Upgrade

September 2, 2015

PeopleSoft is a popular human resources management software and like all software it occasionally needs to be upgraded.  TriCore Solutions suggests that instead of using Verity, your next upgrade to PeopleSoft should be the Oracle Secure Enterprise Search (SES).  TriCore Solutions brags about helping clients upgrade to SES in the article, “Oracle Secure Enterprise Search (SES) And PeopleSoft 9.2.”

Oracle SES offers a secure, high-quality search across all enterprise platforms as well as analytics, intuitive search interface, secure crawling, indexing, and searching.  When SES is deployed into an enterprise system it also offers several key capabilities:

  • “Connectivity to Legacy Repositories. SES allows companies to access their most valuable assets – information about its specific business, its processes, products, customers, and documents that previously resided in proprietary repositories. Connectors include interfaces for EMC Documentum, Microsoft SharePoint, IBM Lotus Notes, Oracle‘s E-Business Suite and Oracle Siebel among others.

 

  • Security: The ability to search password protected sources securely. Oracle‘s search technology provides single-sign-on (SSO) based security where available, and can also employ application-specific security where SSO is not available.

 

  • High quality search results: Brings for the Intranet a high level of relevance that users associate with Internet searches.

 

  • Going beyond keywords. As the volume of information grows, users need advanced search techniques like the ability to categorize and cluster search results for iterative navigation.”

It is evident that Oracle SES offers a comprehensive search feature to PeopleSoft and maybe a better product, but what does Verity have to offer?

 

Whitney Grace, September 2, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Attensity Adds Semantic Markup

April 3, 2015

You have been waiting for more markup. I know I have, and that is why I read “Attensity Semantic Annotation: NLP-Analyse für Unternehmensapplikationen.”

So your wait and mine—over.

Attensity, a leading in figuring out what human discourse means, has rolled out a software development kit so you can do a better job with customer engagement and business intelligence. Attensity offers Dynamic Data Discovery. Unlike traditional analysis tools, Attensity does not focus on keywords. You know, what humans actually use to communicate.

Attensity uses natural language processing in order to identify concepts and issues in plain language. I must admit that I have heard this claim from a number of vendors, including long forgotten systems like DR LINK, among others.

The idea is that the SDK makes it easier to filter data to evaluate textual information and identify issues. Furthermore the SDK performs fast content fusion. The result is, as other vendors have asserted, insight. There was a vendor called Inxight which asserted quite similar functions in 1997. At one time, Attensity had a senior manager from Inxight, but I assume the attribution of functions is one of Attensity’s innovations. (Forgive me for mentioning vendors with which some 20 somethings know quite well.)

If you are dependent upon Java, Attensity is an easy fit. I assume that if you are one of the 150 million plus Microsoft SharePoint outfits, Attensity integration may require a small amount of integration work.

According the Attensity, the benefits of Attensity’s markup approach is that the installation is on site and therefore secure. I am not sure about this because security is person dependent, so cloud or on site, security remains an “issue” different from the one’s Attensity’s system identifies.

Attensity, like Oracle, provides a knowledge base for specific industries. Oracle made term lists available for a number of years. Maybe since its acquisition of Artificial Linguistics in the mid 1990s?

Attensity supports five languages. For these five languages, Attensity can determine the “tone” of the words used in a document. Presumably a company like Bitext can provide additional language support if Attensity does not have these ready to install.

Vendors continue to recycle jargon and buzzwords to describe certain core functions available from many different vendors. If your metatagging outfit is not delivering, you may want to check out Attensity’s solution.

Stephen E Arnold, April 3, 2015

Enterprise Search: Security Remains a Challenge

February 11, 2015

Download an open source enterprise search system or license a proprietary system. Once the system has been installed, the content crawled, the index built, the interfaces set up, and the system optimized the job is complete, right?

Not quite. Retrofitting a keyword search system to meet today’s security requirements is a complex, time consuming, and expensive task. That’s why “experts” who write about search facets, search as a Big Data system, and search as a business intelligence solution ignore security or reassure their customers that it is no big deal. Security is a big deal, and it is becoming a bigger deal with each passing day.

There are a number of security issues to address. The easiest of these is figuring out how to piggyback on access controls provided by a system like Microsoft SharePoint. Other organizations use different enterprise software. As I said, using access controls already in place and diligently monitored by a skilled security administrator is the easy part.

A number of sticky wickets remain; for example:

  • Some units of the organization may do work for law enforcement or intelligence entities. There may be different requirements. Some are explicit and promulgated by government agencies. Others may be implicit, acknowledged as standard operating procedure by those with the appropriate clearance and the need to know.
  • Specific administrative content must be sequestered. Examples range from information assembled for employee health or compliance requirements for pharma products or controlled substances.
  • Legal units may require that content be contained in a managed system and administrative controls put in place to ensure that no changes are introduced into a content set, access is provided to those with specific credential, or kept “off the radar” as the in house legal team tries to figure out how to respond to a discovery activity.
  • Some research units may be “black”; that is, no one in the company, including most information technology and security professionals are supposed to know where an activity is taking place, what the information of interest to the research team is, and specialized security steps be enforced. These can include dongles, air gaps, and unknown locations and staff.

image

An enterprise search system without NGIA security functions is like a 1960s Chevrolet project car. Buy it ready to rebuild for $4,500 and invest $100,000 or more to make it conform to 2015’s standards.  Source: http://car.mitula.us/impala-project

How do enterprise search systems deal with these access issues? Are not most modern systems positioned to index “all” content? Is the procedures for each of these four examples part of the enterprise search systems’ administrative tool kit?

Based on the research I conducted for CyberOSINT: Next Generation Information Access and my other studies of enterprise search, the answer is, “No.”

Read more

Enterprise Search: Confusing Going to Weeds with Being Weeds

November 30, 2014

I seem to run into references to the write up by a “expert”. I know the person is an expert because the author says:

As an Enterprise Search expert, I get a lot of questions about Search and Information Architecture (IA).

The source of this remarkable personal characterization is “Prevent Enterprise Search from going to the Weeds.” Spoiler alert: I am on record as documenting that enterprise search is at a dead end, unpainted, unloved, and stuck on the margins of big time enterprise information applications. For details, read the free vendor profiles at www.xenky.com/vendor-profiles or, if you can find them, read one of my books such as The New Landscape of Search.

Okay. Let’s assume the person writing the Weeds’ article is an “expert”. The write up is about misconcepts [sic]; specifically, crazy ideas about what a 50 year plus old technology can do. The solution to misconceptions is “information architecture.” Now I am not sure what “search” means. But I have no solid hooks on which to hang the notion of “information architecture” in this era of cloud based services. Well, the explanation of information architecture is presented via a metaphor:

The key is to understand: IA and search are business processes, rather than one-time IT projects. They’re like gardening: It’s up to you if you want a nice and tidy garden — or an overgrown jungle.

Gentle reader, the fact that enterprise search has been confused with search engine optimization is one thing. The fact that there are a number of companies happily leapfrogging the purveyors of utilities to make SharePoint better or improve automatic indexing is another.

Let’s look at each of the “misconceptions” and ask, “Is search going to the weeds or is search itself weeds?”

The starting line for the write up is that no one needs to worry about information architecture because search “will do everything for us.” How are thoughts about plumbing and a utility function equivalent. The issue is not whether a system runs on premises, from the cloud, or in some hybrid set up. The question is, “What has to be provided to allow a person to do his or her job?” In most cases, delivering something that addresses the employee’s need is overlooked. The reason is that the problem is one that requires the attention of individuals who know budgets, know goals, and know technology options. The confluence of these three characteristics is quite rare in my experience. Many of the “experts” working enterprise search are either frustrated and somewhat insecure academics or individuals who bounced into a niche where the barriers to entry are a millimeter or two high.

Next there is a perception, asserts the “expert”, that search and information architecture are one time jobs. If one wants to win the confidence of a potential customer, explaining that the bills will just keep on coming is a tactic I have not used. I suppose it works, but the incredible turnover in organizations makes it easy for an unscrupulous person to just keep on billing. The high levels of dissatisfaction result from a number of problems. Pumping money into a failure is what prompted one French engineering company to buy a search system and sideline the incumbent. Endless meetings about how to set up enterprise systems are ones to which search “experts” are not invited. The information technology professionals have learned that search is not exactly a career building discipline. Furthermore, search “experts” are left out of meetings because information technology professionals have learned that a search system will consume every available resource and produce a steady flow of calls to the help desk. Figuring out what to build still occupies Google and Amazon. Few organizations are able to do much more that embrace the status quo and wait until a mid tier consultant, a cost consultant, or a competitor provides the stimulus to move. Search “experts” are, in my experience, on the outside of serious engineering work at many information access challenged organizations. That’s a good thing in my view.

The middle example is what the expert calls “one size fits all.” Yep, that was the pitch of some of the early search vendors. These folks packaged keyword search and promised that it would slice, dice, and chop. The reality of information, even for the next generation information access companies with which I work, focus on making customization as painless as possible. In fact, these outfits provide some ready-to-roll components, but where the rubber meets the road is providing information tailored to each team or individual user. At Target last night, my wife and I bought Christmas gifts for needy people. One of the gifts was a 3X sweater. We had a heck of a time figuring out if the store offered such a product. Customization is necessary for more and more every day situations. In organizations, customization is the name of the game. The companies pitching enterprise search today lag behind next generation information access providers in this very important functionality. The reason is that the companies lack the resources and insight needed to deliver. But what about information architecture? How does one cloud based search service differ from another? Can you explain the technical and cost and performance differences between SearchBlox and Datastax?

The penultimate point is just plain humorous: Search is easy. I agree that search is a difficult task. The point is that no one cares how hard it is. What users want are systems that facilitate their decision making or work. In this blog I reproduced a diagram showing one firm’s vision for indexing. Suffice it to say that few organizations know why that complexity is important. The vendor has to deliver a solution that fits the technical profile, the budget, and the needs of an organization. Here is the diagram. Draw your own conclusion:

infolibrarian-metadata-data-goverance-building-blocks

The final point is poignant. Search, the “expert” says, can be a security leak. No, people are the security link. There are systems that process open source intelligence and take predictive, automatic action to secure networks. If an individual wants to leak information, even today’s most robust predictive systems struggle to prevent that action. The most advanced systems from Centripetal Networks and Zerofox offer robust systems, but a determined individual can allow information to escape. What is wrong with search has to do with the way in which provided security components are implemented. Again we are back to people. Information architecture can play a role, but it is unlikely that an organization will treat search differently from legal information or employee pay data. There are classes of information to which individuals have access. The notion that a search system provides access to “all information” is laughable.

I want to step back from this “expert’s” analysis. Search has a long history. If we go back and look at what Fulcrum Technologies or Verity set out to do, the journeys of the two companies are quite instructive. Both moved quickly to wrap keyword search with a wide range of other functions. The reason for this was that customers needed more than search. Fulcrum is now part of OpenText, and you can buy nubbins of Fulcrum’s 30 year old technology today, but it is wrapped in huge wads of wool that comprise OpenText’s products and services. Verity offered some nifty security features and what happened? The company chewed through CEOs, became hugely bloated, struggled for revenues, and end up as part of Autonomy. And what about Autonomy? HP is trying to answer that question.

Net net: This weeds write up seems to have a life of its own. For me, search is just weeds, clogging the garden of 21st century information access. The challenges are beyond search. Experts who conflate odd bits of jargon are the folks who contribute to confusion about why Lucene is just good enough so those in an organization concerned with results can focus on next generation information access providers.

Stephen E Arnold, November 30, 2014

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta