CyberOSINT banner

Enterprise Search: You Cannot Do It Yourself, People.

July 31, 2015

I love write ups like “Don’t Settle When It Comes to Enterprise Search Platforms.” These articles are designed to make consulting firms with the marketing flim flam which positions each as an “expert” in enterprise information access. I would not be surprised to find copies of this article in the peddler kit of search sales professionals.

The main point of the write up is that enterprise search is a “platform.” Because there are options, no self respecting company will try to implement search without the equivalent of the F Troop in mid tier or below consultants.

I noted:

Let’s look at two very common workarounds some have tried, and then we will talk about why you must go with a reputable developer when you make your final decision.

When I read this, I wondered if the “expert” were familiar with the Maxxcat line of enterprise search systems or the Blossom hosted solution.

The write up dismisses an open source solution apparently unaware of research by Diomidis Spinellis and Vaggelis Giannikas work published in Journal of Systems and Software, March 2012, pages 666 to 682. That’s okay. My hunch is that those finding the “Don’t Settle” article compelling are not likely to be interested in researchy type stuff.

One of the more interesting segments in the write up is the assertion that scalability is a “given.” Hmmm. In my experience, there are some on going enterprise search challenges: Scalability is one facet of a nest of vipers which includes my favorite reptile indexing latency.

The article states:

Open source platforms are only as scalable as their code allows, so if the person who first made it didn’t have your company’s needs in mind, you’ll be in trouble. Even if they did, you could run into a problem where you find out that scaling up actually reveals some issues you hadn’t encountered before. This is the exact kind of event you want to avoid at all costs.

I don’t want to rain on this parade of “information,” but every enterprise search system which I have had the pleasure of procuring, managing, investigating, and analyzing has scalability problems.

The reason is simple: The volume of changed information and the flow of new information goes up. Whatever one starts with is rather rapidly choked. The solutions are painful: Spend more or index less.

I am not confident that one who follows the advice of certain experts will find his or her enterprise search journey pleasant. On the other hand, there are opportunities as Uber drivers one can pursue.

Stephen E Arnold, July 31, 2015

IBM and Its Federated Search Camelot

July 25, 2015

Short honk: I scanned my Twitter feed this morning. What did I see? An impossible assertion from the marketing crazed folks at IBM Watson. Let me tell you, IBM Watson and its minions output a hefty flow of tweets. A year or so ago, IBM relied on mid tier consulting firms experts like Dave Schubmehl (yep, the fellow who sold my research on Amazon without my permission). Now there are other voices.


But the message, not just the medium, are important. IBM’s assertion is that there will be no more “data silos in enterprise search.” You can learn about IBM’s “reality” in a webcast.

Now, I am not planning on sitting through a webcast. I would, however, like to enumerate several learnings from my decades of enterprise information access work. You can use this list as a jump start for your questions to the IBM wizards. Here goes:

  1. In an enterprise, what happens when an indexing system makes available in a federated search system information to a legal matter which is not supposed to be available to anyone except the attorneys involved in the matter?
  2. In an enterprise, what happens if information pertinent to a classified government project is made available in a federated search system which has not be audited for access control compliance?
  3. What happens when personnel information containing data about a medical issue is indexed and made available in an enterprise search system when email attachments are automatically indexed?
  4. How does the federated system deal with content in servers located in a research facility engaged in new product research?
  5. What happens when sales and pricing data shared among key account executives is indexed and made available to a contractor advising the company?
  6. What is the process for removing pointers to data which are not supposed to be in the enterprise search system?
  7. What security measures are in place to ensure that a lost or stolen mobile device does not have access to an enterprise search system?
  8. How much manual work is required before an organization turns on the Watson indexing system?

These will get you started on the cross silo issues?

Oh, the answer to these questions is that the person identified as responsible for making the data available may get to find a future elsewhere. Amazon warehouses are hiring in southern Indiana.

Alternatively one can saddle up a white stallion, snag a lance, and head for the nearest windmill.

Stephen E Arnold, July 25, 2015

SharePoint 2013 Enterprise Search Configuration

July 25, 2015

In just 14 easy steps, you too can configure “SharePoint 2013 for a SharePoint 2013” site. Now this is not enterprise search, but when it comes to Microsoft and information access, trivialities just don’t matter.

The screenshots show what options to select. There is no explanation in Step 4 for what to do if you click “Basic Search Center” instead of “Enterprise Search Center.” A real MSFT lover will know the difference between “basic” and “enterprise” for a SharePoint site.

Follow the clicks to Step 9. Note that under the category search one selects “Search Settings”, not “Search and offline availability.” Again the clarity is astounding.

Cut and paste your way to Step 13 where you configure search navigation. Just click “everything” and presumably the URL, the description, and the link will be locked and loaded. And if not? Well, there will be no errors, gentle reader.

The coup de grace is Step 14. Here’s the instruction which is crystal clear:

Just go and check “Use the same results page settings as my parent” is selected from the subsite search site settings.”

You are good to go—directly to a consulting firm specializing in installing a third party search system into your SharePoint solution. Sorry, but that approach usually works. The Fast Search thing from the mid 1990s? Not exactly flawless in my experience. Configuration files are still nestled deep in the innards but the graphical interface may not get you where you need to be.

Stephen E Arnold, July 25, 2015

The Bestest Enterprise Search Diagram Ever. Really.

July 19, 2015

I like the word “bestest.” It is so right for diagrams that summarize the complex nature of search. The write up “The Best Enterprise Search Diagram You’ve Ever Seen” is a fetching smile to attract me the person who has never seen a better diagram ever to order a special report. What does the diagram look like? Here’s a not too legible version, but it is close enough for horseshoes:

Enterprise search - Ishikawa diagram DWG

The pink boxes are contributing activities. The red boxes in the middle of the diagram are direct search management tasks, and the red box in the gray rectangle is the total experience of search. Have at it. Let me know how you fair with your strategy tasks. Also, fill me in on how “search strategy” meshes with the location of the search box. Just askin? I am eager to hear how the search log insight is going to work. I like insight.

For those following this diagram, may I offer a suggestion: Look for a lateral arabesque within your organization or get a Subway sandwich franchise.

Stephen E Arnold, July 19, 2015

Short Honk: Eleasticsearch Information

July 10, 2015

Short honk: For information about Elastic’s Elasticsearch, the open source search system which has proprietary vendors of search systems cowering in fear, navigate to Elasticsearch: The Definitive Guide. Elasticsearch is not perfect, but what software is? Ask United Airlines, the New York Stock Exchange, and the Wall Street Journal about their systems. The book includes useful information about geolocation functions plus some meaty stuff about administering the system once you are up and running. Worth a look.

Stephen E Arnold, July 10, 2015

Enterprise Search and the Mythical Five Year Replacement Cycle

July 9, 2015

I have been around enterprise search for a number of years. In the research we did in 2002 and 2003 for the Enterprise Search Report, my subsequent analyses of enterprise search both proprietary and open source, and the ad hoc work we have done related to enterprise search, we obviously missed something.

Ah, the addled goose and my hapless goslings. The degrees, the experience, the books, and the knowledge had a giant lacuna, a goose egg, a zero, a void. You get the idea.

We did not know that an enterprise licensing an open source or proprietary enterprise search system replaced that system every 60 months. We did document the following enterprise search behaviors:

  • Users express dissatisfaction about any installed enterprise search system. Regardless of vendor, anywhere from 50 to 75 percent of users find the system a source of dissatisfaction. That suggests that enterprise search is not pulling the hay wagon for quite a few users.
  • Organizations, particularly the Fortune 500 firms we polled in 2003, had more than five enterprise search systems installed and in use. The reason for the grandfathering is that each system had its ardent supporters. Companies just grandfathered the system and looked for another system in the hopes of finding one that improved information access. No one replaced anything was our conclusion.
  • Enterprise search systems did not change much from year to year. In fact, the fancy buzzwords used today to describe open source and proprietary systems were in use since the early 1980s. Dig out some of Fulcrum’s marketing collateral or the explanation of ISYS Search Software from 1986 and look for words like clustering, automatic indexing, semantics, etc. A short cut is to read some of the free profiles of enterprise search vendors on my Web site.

I learned about a white paper, which is 21st century jargon for a marketing essay, titled “Best Practices for Enterprise Search: Breaking the Five-Year Replacement Cycle.” The write up comes from a company called Knowledgent. The company describes itself this way on its Who We Are Web page:

Knowledgent [is] a precision-focused data and analytics firm with consistent, field-proven results across industries.

The essay begins with a reference to Lexis, which along with Don Wilson (may he rest in peace) and a couple of colleagues founded. The problem with the reference is that the Lexis search engine was not an enterprise search and retrieval system. The Lexis OBAR system (Ohio State Bar Association) was tailored to the needs of legal researchers, not general employees. Note that Lexis’ marketing in 1973 suggested that anyone could use the command line interface. The OBAR system required content in quite specific formats for the OBAR system to index it. The mainframe roots of OBAR influenced the subsequent iterations of the LexisNexis text retrieval system: Think mainframes, folks. The point is that OBAR was not a system that was replaced in five years. The dog was in the kennel for many years. (For more about the history of Lexis search, see Bourne and Hahn, A History of Online information Services, 1963-1976. By 2010, LexisNexis had migrated to XML and moved from mainframes to lower cost architectures. But the OBAR system’s methods can still be seen in today’s system. Five years. What are the supporting data?

The white paper leaps from the five year “assertion” to an explanation of the “cycle.” In my experience, what organizations do is react to an information access problem and then begin a procurement cycle. Increasingly, as the research for our CyberOSINT study shows, savvy organizations are looking for systems that deliver more than keyword and taxonomy-centric access. Words just won’t work for many organizations today. More content is available in videos, images, and real time almost ephemeral “documents” which can difficult to capture, parse, and make findable. Organizations need systems which provide usable information, not more work for already overextended employees.

The white paper addresses the subject of the value of search. In our research, search is a commodity. The high value information access systems go “beyond search.” One can get okay search in an open source solution or whatever is baked in to a must have enterprise application. Search vendors have a problem because after decades of selling search as a high value system, the licensees know that search is a cost sinkhole and not what is needed to deal with real world information challenges.

What “wisdom” does the white paper impart about the “value” of search. Here’s a representative passage:

There are also important qualitative measures you can use to determine the value and ROI of search in your organization. Surveys can quickly help identify fundamental gaps in content or capability. (Be sure to collect enterprise demographics, too. It is important to understand the needs of specific teams.) An even better approach is to ask users to rate the results produced by the search engine. Simply capturing a basic “thumbs up” or “thumbs down” rating can quickly identify weak spots. Ultimately, some combination of qualitative and quantitative methods will yield an estimate of  search, and the value it has to the company.

I have zero clue how this set of comments can be used to justify the direct and indirect costs of implementing a keyword enterprise search system. The advice is essentially irrelevant to the acquisition of a more advanced system from an leading edge next generation information access vendor like BAE Systems (NetReveal), IBM (not the Watson stuff, however), or Palantir. The fact underscored by our research over the last decade is tough to dispute: Connecting an enterprise search system to demonstrable value is a darned difficult thing to accomplish.

It is far easier to focus on a niche like legal search and eDiscovery or the retrieval of scientific and research data for the firm’s engineering units than to boil the ocean. The idea of “boil the ocean” is that a vendor presents a text centric system (essentially a one trick pony) as an animal with the best of stallions, dogs, tigers, and grubs. The spam about enterprise search value is less satisfying than the steak of showing that an eDiscovery system helped the legal eagles win a case. That, gentle reader, is value. No court judgment. No fine. No PR hit. A grumpy marketer who cannot find a Web article is not value no matter how one spins the story.

Read more

Attivio ReachesTop 100 Status

June 29, 2015

The Data Dexterity Company announced the brand new Database Trends and Applications (DBTA) 100 and according to Yahoo Finance, Attivio is now on the list: “Attivio Named By Database Trends Applications To Its Prestigious Top 100 List.”

“We are pleased to be recognized by Database Trends and Applications as one of the most important firms in the data space; it further validates the type of feedback that our customers provide on a daily basis,” said Stephen Baker, CEO of Attivio. “As firms continue to be more reliant on maximizing their data to drive business-critical insights, we expect to play a critical role in driving this type of business innovation.”

Attivio joins the ranks of other companies that have made huge innovations in the data industry; they include EMC, Amazon, IBM, and more.  Attivio is an industry leader in enterprise systems with its intelligence search platform.  Attivio’s search platform enables users to make immediate insights with data visibility.  Attivio has a well-known client use that encompasses such names as National Instruments, Nexen, GE, UBS, and Qualcomm.  The company believes that there are many innovations to be made from all types, not just the type that is easily found in a database.  Attivio uses its search platform to uncover insights in unstructured data that would otherwise be missed by other enterprise search platforms.

We have been following Attivio for many years and by having its name added to DBTA 100 proves it can perform well and deliver useful results.  Enterprise search continues to be an important factor for enterprise systems, though people are often forgetting that today.  Attivio’s addition to the DBTA 100 stresses that not everyone has forgotten.

Whitney Grace, June 29, 2015

Sponsored by, publisher of the CyberOSINT monograph

Major SharePoint Features Disclosed

June 23, 2015

SharePoint Server 2016 has caused quite a stir, with users wondering what features will come through in the final version. At Microsoft Ignite last month, rumors turned to legitimate features. Read more about separating fact from fiction in the newest SharePoint release in the CIO article, “Top 4 Revelations about SharePoint.”

The article begins:

“Some of the biggest news to come out of Microsoft Ignite last month was the introduction and the first public demonstration of SharePoint Server 2016 – a demo that quelled a lot of speculation and uneasiness in the SharePoint administrator community. Here are the biggest takeaways from the conference, with an emphasis on the on-premises product.”

The article goes on to say that users can look forward to a full on-premises version, bolstered administrative features, four roles to divide the workload, and an emphasis on hybrid functions.  For users that need to stay in the loop with SharePoint updates and changes, stay tuned to Stephen E. Arnold is a longtime leader in search, and his Web site offers a unique SharePoint feed to keep all the latest tips, tricks, and news in one convenient location.

Emily Rae Aldridge, June 23, 2015

Sponsored by, publisher of the CyberOSINT monograph

Cloud Search: Are Data Secure?

June 19, 2015

I have seen a flurry of news announcements about Coveo’s cloud based enterprise search. You can review a representative example by reading “Coveo Lassos the Cloud for Enterprise Search.” Coveo is also aware of the questions about security. See “How Does Coveo Secure Your Data and Services.”

With Coveo’s me-too cloud service, I thought about other vendors which offer cloud-based solutions. The most robust based on our tests is Blossom Search. The company was founded by Dr. Alan Feuer, a former Bell Labs’ wizard. When my team was active in government work, we used the Blossom system to index a Federal law enforcement agency’s content shortly after Blossom opened for business in 1999. As government procurements unfold, Blossom was nosed out by an established government contractor, but the experience made clear:

  1. Blossom’s indexing method delivered near real time updates
  2. Creating and building an initial index was four times faster than the reference system against which we test Dr. Feuer’s solution. (The two reference systems were Fast Search & Transfer and Verity.)
  3. The Blossom security method conformed to the US government guidelines in effect at the time we did the work.

I read “Billions of Records at Risk from Mobile App Data Flow.” With search shifting from the desktop to other types of computing devices, I formulated several questions:

  1. Are vendors deploying search on clouds similar to Amazon’s system and method ensuring the security of their customers’ data? Open source vendors like resellers of Elastic and proprietary vendors like MarkLogic are likely to be giving some additional thought to the security of their customers’ data.
  2. Are licensees of cloud based search systems performing security reviews as we did when we implemented the Blossom search system? I am not sure if the responsibility for this security review rests with the vendor, the licensee, or a third party contracted to perform the work.
  3. How secure are hybrid systems; that is, an enterprise search or content processing system which pulls, processes, and stores customer data across disparate systems? Google, based on my experience, does a good job of handling search security for the Google Search Appliance and for Site Search. Other vendors may be taking similar steps, but the information is not presented with basic marketing information.

My view is that certain types of enterprise search may benefit from a cloud based solution. There will be other situations in which the licensee has a contractual or regulatory obligation to maintain indexes and content in systems which minimize the likelihood that alarmist headlines like “Billions of Records at Risk from Mobile App Data Flow.”

Security is the search industry’s industry of a topic which is moving up to number one with a “bullet.”

Stephen E Arnold, June 19, 2015

Enterprise Search: The Last Half of 2015

June 16, 2015

I saw a link this morning to an 11 month old report from an azure chip consulting firm. You know, azure chip. Not a Bain, BCG, Booz Allen, or McKinsey which are blue chip firms. A mid tier outfit. Business at the Boozer is booming is the word from O’Hare Airport, but who knows if airport gossip is valid.


Which enterprise search vendor will come up a winner in December 2015?

What is possibly semi valid are analyses of enterprise search vendors. The “Magic Quadrant for Enterprise Search” triggered some fond memories of the good old days in 2003 when the leaders in enterprise search were brands or almost brands. You probably recall the thrilling days of these information retrieval leaders:

  • Autonomy, the math oriented outfit with components names like neuro linguistic programming and integrated data operating layer and some really big name customers like BAE
  • Convera, formerly Excalibur with juice from ConQuest (developer by a former Booz, Allen person no less)
  • Endeca, the all time champ for computationally intensive indexing
  • Fast Search & Transfer, the outfit that dumped Web search in order to take over the enterprise search sector
  • Verity, ah, truth be told, this puppy’s architecture ensured plenty of time to dash off and grab a can of Mountain Dew.

In 2014, if the azure chip firm’s analysis is on the money, the landscape was very different. If I understand the non analytic version of Boston Consulting Group’s matrix from 1970, the big players are:

  • Attivio, another business intelligence solution using open source technology and polymorphic positioning for the folks who have pumped more than $35 million into the company. One executive told me via LinkedIn, that the SEC investigation of an Attivio board member had zero impact on the company. I like the attitude. Bold.
  • BA Insight, a business software vendor focused on making SharePoint somewhat useful and some investors with deepening worry lines
  • Coveo, a start up which is nudging close to a decade in age, and more than $30 million in venture backing. I wonder if those stakeholders are getting nervous.
  • Dassault Systèmes, the owner of Exalead, who said in the most recent quarterly report that the company was happy, happy, happy with Exalead but provided no numbers and no detail about the once promising technology
  • Expert System, an interesting company with a name that makes online research pretty darned challenging
  • Google, ah, yes, the proud marketer of the ever thrilling Google Search Appliance, a product with customer support to make American Airlines jealous
  • Hewlett Packard Autonomy, now a leader in the acrimonious litigation field
  • IBM, ah, yes, the cognitive computing bunch from Armonk. IBM search is definitely a product that is on everyone’s lips because the major output of the Watson group is a book of recipes
  • IHS, an outfit which is banking on its patent analysis technology to generate big bucks in the Goldmine cellophane
  • LucidWorks (Really?), a repackager of open source search and a distant second to Elastic (formerly Elasticsearch, which did not make the list. Darned amazing to me.)
  • MarkLogic, a data management system trying to grow with a proprietary XML technology that is presented as search, business intelligence, and a tool for running a restaurant menu generation system. Will MarkLogic buy Smartlogic? Do two logics make a rational decision?
  • Mindbreeze, a side project at Fabasoft which is the darling of the Austrian government and frustrated European SharePoint managers
  • Perceptive Software, which is Lexmark’s packaging of ISYS Search Software. ISYS incorporates technology from – what did the founder tell me in 2009? – oh, right, code from the 1980s. Might it not be tough to make big bucks on this code base? I have 70 or 80 million ideas about the business challenge such a deal poses
  • PolySpot, like Sinequa, a French company which does infrastructure, information access, and, of course, customer support
  • Recommind, a legal search system which has delivered a down market variation of the Autonomy-type approach to indexing. The company is spreading its wings and tackling enterprise search.
  • Sinequa, another one of those quirky French companies which are more flexible than a leotard for an out of work acrobat

But this line up from the azure chip consulting omits some companies which may be important to those looking for search solutions but not so much for azure chip consultants angling for retainer engagements. Let me highlight some vendors the azure chip crowd elected to ignore:

Read more

Next Page »