CyberOSINT banner

Machine Learning Resolves Enterprise Search

August 26, 2015

One of the main topics of discussion on Beyond Search is enterprise search.  We always try to find the juicy details behind enterprise search’s development, groundbreaking endeavors, and problems that search experts need to be aware of.  One thing we can all agree on is that enterprise search is full of problems.  The question is will all of enterprise search’s problems ever be solved?

Ron Miller proposed a possible solution on TechTarget’s Search Content Management blog, “Will Machine Learning Revamp Enterprise Search Software?”  Machine learning offers a bevy of solutions for many industries and what is very intriguing about the process is that we have yet to scratch the surface of its possible applications.  Miller points out that machine learning should deliver more accurate and broader search results than the traditional search index.

Miller imagines this scenario:

“I think we’re going to see tools where the machine can automatically generate results, based on what the user is working on. The information could perhaps populate onto a split screen, suggesting additional information that could potentially be helpful for the user, and then apply machine learning to the user’s response.”

He suggests machine learning driven enterprise search will anticipate a user’s information need and even help shape their daily work routine.  These are very feasible conjectures and machine learning has already shaped such industries as the medical field and engineering.  The main item to ask is when will machine learning become inexpensive enough to implement in enterprise search?

Whitney Grace, August 26, 2015
Sponsored by, publisher of the CyberOSINT monograph

Insights Into SharePoint 2013 Search

August 25, 2015

It has been awhile since we have discussed SharePoint 2013 and enterprise search.  Upon reading “SharePoint 2013: Some Observations On Enterprise Search” from Steven Van de Craen’s Blog, we noticed some new insights into how users can locate information on the collaborative content platform.

The first item he brings our attention to is the “content source,” an out-of-the-box managed property option that create result sources that aggregate content from different content sources, i.e. different store houses on the SharePoint.   Content source can become a crawled property.  What happens is that meta elements from Web pages made on SharePoint can be added to crawled properties and can be made searchable content:

“After crawling this Web site with SharePoint 2013 Search it will create (if new) or use (if existing) a Crawled Property and store the content from the meta element. The Crawled Property can then be mapped to Managed Properties to return, filter or sort query results.”

Another useful option was mad possible by a user’s request: making it possible to add query string parameters to crawled properties.  This allows more information to be displayed in the search index.  Unfortunately this option is not available out-of-the-box and it has to be programmed using content enrichment.

Enterprise search on SharePoint 2013 still needs to be tweaked and fine-tuned, especially as users’ search demands become more complex.  It makes us wonder when Microsoft will release the next SharePoint installment and if the next upgrade will resolve some of these issues or will it unleash a brand new slew of problems?  We cannot wait for that can of worms.

Whitney Grace, August 25, 2015
Sponsored by, publisher of the CyberOSINT monograph


More Enterprise Search Revisionism: Omitted Companies Are the Major News

August 24, 2015

A flurry of news items hit my Overflight system in the last couple of days. Gartner, one of the expert for hire mid tier consulting firms, issued a “Gartner’s Magic Quadrant for Enterprise Search.” I am not sure if you can access the report. I had to log in to LinkedIn and work through various screens until this gem presented itself to me.


I followed the link and learned that the “Magic Quadrant for 2015” includes these firms:

The Challengers. To me a challenger means a person or thing that engages in any contest, as of skill, strength, etc.

  • LucidWorks, founded in 2007
  • Mindbreeze, a unit of Microsoft centric Fabasoft in Austria. The search unit fired up a decade ago
  • Google, ah, dear old Google and its pricey Google Search Appliances. You can find the license fees for some devices via the GSAAdvantage service. Google has been sort of selling GSAs for a decade.
  • Dassault Systems, yep, the French engineering outfit working to convert Exalead’s ageing technology into a product component solution. Exalead dates from 2000. Yikes, that makes the technology 15 years old, an aeon in technology time.

The good news is that LucidWorks has its roots in open source. The other three outfits are proprietary technology.

The second group is Niche Players. The companies in this sector are:

  • Expert System. An outfit which opened its doors in 1992 and whose stock is publicly traded. The share price on August 23, 2015 was $2.13 a share
  • Recommind, founded in year 2000, is a legal system whose technical approach often reminds me of Autonomy’s systems and methods. The firm was founded in 2000 and now, according to this story, has $70 million in revenue
  • Squiz, which is, by golly, not an open source solution despite its origins in the 2001 P@noptic academic/research setting in Australia. Just try searching for that spelling “P@noptic.”

The third group is Visionaries which to me means “given to or characterized by fanciful, not presently workable, or unpractical ideas, views, or schemes.” The dictionary entry here also points out these clarifications: unreal, imaginary, idealistic, impractical, and unrealizable. Here are the search outfits in this category:

  • BA Insight. This is an company founded in 2004. The founder raised some venture money and then found himself looking for his future elsewhere. In the presentations I have heard, BA Insight is [a] an enterprise search system replacement for whatever you have running, [b] a business intelligence system, [c] a metatagging machine, [d] some combination of these functions.,
  • IBM. Ah, dear, old IBM. The company does the home grown thing with scripts and algorithms from its research labs. IBM was founded in It does the open source thing by building in 1911. The company has had a long time to figure out what to do since the STAIRS III and Web Fountain days. Now IBM search means use of open source, community supported, free Lucene. Plus, It does the acquisition thing with SPSS Clementine (remember than, gentle reader), Vivisimo, i2, and Cybertap, among other information access companies IBM has purchased. At the end of the day, I am not sure what search means because IBM has been promoting the heck out of Watson. You remember Watson. It was a TV game show winner. Watson wrote a cook book. Watson is curing cancer. Watson is doing all sorts of wonderful things. I suppose that’s why it is a visionary with 13 consecutive quarters of revenue decline.
  • IHS (Information Handling Service. IHS leverages technology from The Invention Machine (founded in 1992) an acquisition built to locate systems and methods from patent documents. The IHS search system is called Goldfire and positioned as an enterprise search system. IHS, according to Attivio, licenses the Fast Search & Transfer influenced UIA technology platform. IHS for me is a publishing company, but I suppose that doesn’t matter in today’s fluid world.

The final group of search vendors is labeled leaders. So what’s a leader? According to my online dictionary, a leader is a person or thing that leads. And “lead” means to go before or with to show the way; conduct or escort. No, I will not refer to Ashley Madison, gentle reader. I will play this straight. The leaders are:

  • Attivio, founded in 2007. It must be a leader because a “visionary” uses the Attivio technology to be a visionary. Is that self referential like articles about Google’s right to be forgotten which must be forgotten?
  • Coveo, founded in 2004. This company has been, like Attivio, successful in attracting venture capital.The company once focused on Microsoft Windows as did BA Insight. Now the firm is into customer support but the mid tier consultants remember the good old days of enterprise search.
  • Hewlett Packard. Ah, HP, the company wrote a check for $11 billion in 2011, promptly wrote off billions, and embarked on a much loved legal challenge to Dr. Michael Lynch and some other favored individuals. HP, like IBM, has been racking up declining revenues for five consecutive quarters and is in the process of dividing itself into two separate companies. Does this suggest that HP some challenges? Keep in mind Autonomy was founded in the mid 1990s.
  • Lexmark. This is a relative newcomer to enterprise search. The company bought Brainware of trigram fame. Lexmark bought the 1980s search darling ISYS Search Software, which was founded in 1988. The company also snagged Kofax, which got into the content processing game with its acquisition of Kapow. I did hear that Lexmark is looking at some shortfalls related to search and content processing. I reported on the chopping of 500 jobs a couple of months ago. But leaders must expect some setbacks like Hewlett Packard. Perhaps Lexmark will reveal the shortfall from its “search related” endeavors. I would peg the number somewhere in the $75 to $80 million range in the last 18 months.
  • Sinequa. This marketing centric, social media maven was founded in 2002. The company has some big European clients, but I am not certain that the push into the US has met with the “name in lights” success some French stakeholders expected. Sinequa is obviously a leader in search. I classify the company as a business process outfit, but the mid tier consultants are more informed than an old guy in rural Kentucky.

My view of the enterprise search sector is different.The companies in this list are oldies, a couple dating from the late 1980s and early 1990s. Let’s see. In Internet time, that pegs some technology as prehistoric.,

There is a notable omission too. The list of companies identified by the mid tier outfit has missed the company which has been driving a bulldozer through deals.

What company is that?

Elastic, gentle reader. This outfit is in the process of providing the folks at Goldman Sachs with some information access love. The company has shoved aside the Lucid Works outfit which is scrambling to reposition itself as a Big Data spark something. There are cloud versions of Elastic available for a darned reasonable price. Check out SearchBlox, for example. Keep in mind that Elasticsearch was a second act to Compass, another search system.

A question which I asked myself is, “Why has a mid tier outfit which is so darned expert in enterprise search overlooked the big dog?” Frankly I have no evidence other than the odd little grid in the Linked In post. I assume that the experts at the mid tier firms don’t know much about what’s happening in search. Another thought is that the Elastic folks don’t buy much third party expert input about search. Whatever the reason, I suggest you, gentle reader, become familiar with Elasticsearch in the free or for fee variant.

Another gap I noticed is the omission of the appliance folks. Right off the bat, I think Index Engines, Maxxcat, and Thunderstone deserve a tiny footnote. Maxxcat, for example, is pretty good in the enterprise content indexing arena. Buy a box and plug it in. Index Engines does a great job making some specialized content instantly accessible. And Thunderstone? Well, the company has some darned good technology.

A third lacuna is the omission of the wild and crazy, Fast Search & Transfer tinged SharePoint search. There are upwards of 150 million SharePoint installations. Like it or not, Microsoft also shoves search down my throat each time I use Windows 10. Yikes. The system may have a legacy of considerable interest, but the darned thing is out there. Maybe a teeny tiny footnote? I would suggest that the mid tier outfit identify the vendor which sells more search into Microsoft installations than any other vendor. Nope. I won’t identify this outfit. The president agreed to a Search Wizards Speak interview and then backed out. Too bad for him. No life preserver from me again.

What’s the value of this league table or grid thing from the mid tier consulting firm.

First, it allows the companies in the list to issue a news release. I have already seen references to some of the companies. This post was inspired by the junk mail Linked In shoots at me on a regular basis. There’s nothing like PR which gets a company’s name in front of a bunch of red hot prospects.

Second, the mid tier consulting firm can visit with each company. I can imagine that on those visits, the mid tier consulting firm might just mention the firm’s strategic and tactical for fee services. Hey, if I worked for a mid tier consulting firm, I would be sure to explain why retaining me was the best darned thing since sliced bread. Oh, wait. I worked at Booz, Allen & Hamilton before it drifted into Snowden drifts. I responded to requests; I don’t recall making sales calls. Life is different now I suppose.

Third, the mid tier reports practically force me to write blog posts. I am delighted to be spurred into action.

Fourth, how much does it cost to use these systems? Why not make a table which presents the name of the company, the search system name so that I know what IBM asserts actually performs enterprise search and what HP calls its cloud stuff with Autonomy made ever so easy? Why not states that such and such a search system begins at $X for the license fee and $Y for the on going support, upgrades, and maintenance? Why not present average hourly engineering and technical service fees? Hey, even the best of this animal shelter of disparate systems fail. Did I say crash? Did I say flame out? Did I say deliver irrelevant results? Well, often in my experience.

To wrap up, the Visionaries, the Challengers, the Leaders, and the Niche Players can output news releases. Some my try to dismiss my observations, which is just peachy keen with me. I assume that failed webmasters, thwarted academicians, and unemployed home economics majors will explain that the best of the best appear in the league table.

Present reality any way one wants. I don’t have to make this stuff work anymore. I don’t have to explain to the CFO why the costs associated with enterprise search will continue to go up until the system is removed from the company. I will no longer have to attend a conference filled with cheerleaders for a utilitarian technology which most companies have learned is pretty much the same as it has been since the days of Fulcrum and Verity.

Remember. This is 2015. Most of the technology presented in the mid tier report is getting old. The world wants mobile. The world wants predictive outputs. The world wants search which actually delivers relevant results.

Maybe that is secondary today?

Will I read the complete report if a copy becomes available to me?

Nah. Marketing stuff bores me.

Stephen E Arnold, August 24, 2015

The Integration of  Elasticsearch and Sharepoint Adds Capabilities

August 24, 2015

The article on the IDM Blog titled BA Insight Brings Together Elasticsearch and Sharepoint describes yet another vendor embracing Elasticsearch and falling in love again with Sharepoint. The integration of Elasticsearch and Sharepoint enables customers to use Elasticsearch through Sharepoint portals. The integration also made BA Insight’s portfolio accessible through open source Elasticsearch as well as Logstash and Kibana, Elastic’s data retrieval and reporting systems, respectively. The article quotes the Director of Product Management at Elastic,

“BA Insight makes it possible for Elasticsearch and SharePoint to work seamlessly together…By enabling Elastic’s powerful real-time search and analytics capabilities in SharePoint, enterprises will be able to optimize how they use data within their applications and portals.”  “Combining Elasticsearch and SharePoint opens up a world of exciting applications for our customers, ranging from geosearch and pattern search through search on machine data, data visualization, and low-latency search,” said Jeff Fried, CTO of BA Insight.”

Specific capabilities that the integration will enable include connectors to over fifty system, auto-classification, federation to improve the presentation of results within the Sharepoint framework, applications like Smart Previews and Matter Comparison. Users also have the ability to decide for themselves whether they want to use the Sharepoint search engine or Elastic’s, or combine them and put the results together into a set. Empowering users to make the best choice for their data is at the heart of the integration.

Chelsea Kerwin, August 24, 2015

Sponsored by, publisher of the CyberOSINT monograph


Enterprise Search: Failure Is a Synonym Whether on the Desktop or a Mobile Device

August 20, 2015

One of my favorite content management services has embraced enterprise search. With content management systems or CMS as they are called by the cognoscenti a source of information technology angst, enterprise search seems to be a complementary topic.

Both “disciplines” purport to make a trucking, chemical, or financial services firm into a more efficient information machine. The reality is persistent cost overruns, mismatches between user needs and what the systems actually deliver, and the deep thrum thrum of pumps outputting red ink.

I read “4 Ways Enterprise Mobile Repeats Intranet Mistakes.” I quite like the title. Four seems to undershoot the mistake score, but enterprise search has only been in the failure business since the early 1980s. My list of “challenges” is in pinball machine score range.

Here are the four mistakes viewed through the eyeballs of a CMS centric source:

  1. No dedicated program with a person who “owns” the project
  2. Regular information technology folks are running the car wash
  3. Those regular information technology folks are not too swift in the “user experience design” department
  4. Regular information technology folks and search experts — heck everyone — does not understand what users need. (I assume there are assorted experts, failed webmasters, unemployed middle school teachers, and out of work journalists who do understand what users need.)

So what’s the fix? How will organizations ever manage? The sky is falling and we have to build a space elevator, right?


The fix involves four actions:

  1. I have to quote this, since I lack the expertise to paraphrase the following: “Find a home within the organization for enterprise mobile leadership, and build up stakeholder engagement, governance, and change management capabilities.” Does this sound like horse features to you? I think this is different; these notions are balderdash. Your mileage may vary.
  2. You whoever you is simply “ensure your IT function operates at a strategic level.” Sure enough, boss.
  3. Beef up your “UXD” capabilities. The notion of UXD is supposed to evoke nifty stuff like unusable iPad apps, odd ball Google cards, and weird three line “hamburger” icons which are too small for my aged and clumsy fingers. I am into user experience; namely, a keyboard, a command prompt, and paper. Obviously I am a loser in the UXD game.
  4. Research what those frontline worker need. Oh, don’t forget to watch a frontline worker do work.

Let’s reflect on these fixes.

In my pre retirement years, I had the opportunity to work with a number of organizations. These ranged from lost in space tractor companies to outfits which were chock full of the smartest people in the world.

I learned that getting tasks completed were difficult. Few people, including the late lamented strategy officers, got much done. The design stuff emerged from marketing departments and most frontline folks ignored marketing departments. I learned that asking someone what they need produces features no one uses.

My hunch is that anyone who tries to implement an enterprise search solution is likely to convert that effort into the same slough of cost overruns, unhappy users, and technological mine fields associated with vanilla enterprise search.

For those who are looking for a better gig than implementing content management systems and enterprise search systems, the mobile thing dusted with user experience malarkey will remain marginalized or just ignored.

Install Elasticsearch. Use prebuilt templates. Move on. Senior management won’t care. Users won’t care. Maybe if a search project comes in under budget someone in accounting will be happy with enterprise search for once.

Stephen E Arnold, August 20, 2015

Microsoft Top Execs Reaffirm SharePoint Commitment

August 6, 2015

Doubts still remain among users as to whether or not Microsoft is fully committed to the on-premise version of SharePoint. While on-premise has been a big talking point for the SharePoint Server 2016 release, recent news points to more of a hybrid focus, and more excitement from executives regarding the cloud functions. Redmond Magazine sets the story straight with their article, “Microsoft’s Top Office Exec Affirms Commitment to SharePoint.”

The article sums up Microsoft’s stance:

“Microsoft realizes and has acknowledged that many enterprises will want to use SharePoint Server to keep certain data on premises. At the same time, it appears Microsoft is emphasizing the hybrid nature of SharePoint Server 2016, tying the new on-premises server with much of what’s available via Office 365 services.”

No one can know for sure exactly how to prepare for the upcoming SharePoint Server 2016 release, or even future versions of SharePoint. However, staying up to date on the latest news, and the latest tips and tricks, is helpful. For users and managers alike, a SharePoint feed managed by Stephen E. Arnold can be a great resource. The Web site,, is a one-stop-shop for all things search, and the SharePoint feed is particularly helpful for users who need an easy way to stay up to date.

Emily Rae Aldridge, August 6, 2015

Sponsored by, publisher of the CyberOSINT monograph


Sorry, Experts. NLP and Semantic Technology Will Guarantee Higher Precision and Recall

August 3, 2015

I read “5 Reasons for Developers to Build NLP and Semantic Search Skills” is one of those bait and switch write ups. The title suggests that NLP and semantic search are “skills.” The content of the article presents without factual substantiation assertions about the differences between Web search and enterprise search. The reality is that both are more closely related than they appear to some “experts.” Neither works particularly well for reasons which have to do with cost control, system management, and focus. The technology is, from my point of view, more stable than some search mavens believe.

Here’s the passage I highlighted in pale mauve because I did not have purple:

It at times feels magical that Search engines know, with unbelievable accuracy, exactly what you are looking for. This is the result of a heavy investment in NLP and Semantic technologies. These, along with speech-recognition, have the potential of enabling a future where search will transform into a smart machine that uses “connected knowledge” to answer significantly complex questions – a Star Trek Computer may not be too far away after all, if Amit Singhal – brain behind Google’s search engine evolution, has be to believed.

More remarkable was the introduction of the phrase “big, unstructured data.” I also found the notion of “commoditization” of data science amusing.

One idea warrants comment. The article calls attention to the “widening gap between enterprise search platforms and general purpose search engines.” Anyone who has attempted to index Web content quickly learns that it is a fruit basket which is in the process of being shoved into a blender. The notion of the enterprise search system was to process the content normally found inside an organization. But guess what? After the first query run on a restricted domain of content, the user says, “I need access to Internet content.” The “gap” is one of perception. The underlying components of the system and much of the gee whiz technology are similar. The fact that the Web search systems have been shaped to handle a restricted body of content is lost on some folks. Similarly the enterprise search systems are struggling because they, like Web search engines, cannot handle efficiently and automatically certain types of content. In short, neither works particularly well.

Will NLP and semantic skills help a developer? Not too much if the search system is not focused, the content is not reliable, and functions poorly defined. Forget big data, little data, and unstructured or structured data. Get the basics wrong and one has a lousy search system, which sadly, is more common than not.

Stephen E Arnold, August 3, 2015

Enterprise Search: You Cannot Do It Yourself, People.

July 31, 2015

I love write ups like “Don’t Settle When It Comes to Enterprise Search Platforms.” These articles are designed to make consulting firms with the marketing flim flam which positions each as an “expert” in enterprise information access. I would not be surprised to find copies of this article in the peddler kit of search sales professionals.

The main point of the write up is that enterprise search is a “platform.” Because there are options, no self respecting company will try to implement search without the equivalent of the F Troop in mid tier or below consultants.

I noted:

Let’s look at two very common workarounds some have tried, and then we will talk about why you must go with a reputable developer when you make your final decision.

When I read this, I wondered if the “expert” were familiar with the Maxxcat line of enterprise search systems or the Blossom hosted solution.

The write up dismisses an open source solution apparently unaware of research by Diomidis Spinellis and Vaggelis Giannikas work published in Journal of Systems and Software, March 2012, pages 666 to 682. That’s okay. My hunch is that those finding the “Don’t Settle” article compelling are not likely to be interested in researchy type stuff.

One of the more interesting segments in the write up is the assertion that scalability is a “given.” Hmmm. In my experience, there are some on going enterprise search challenges: Scalability is one facet of a nest of vipers which includes my favorite reptile indexing latency.

The article states:

Open source platforms are only as scalable as their code allows, so if the person who first made it didn’t have your company’s needs in mind, you’ll be in trouble. Even if they did, you could run into a problem where you find out that scaling up actually reveals some issues you hadn’t encountered before. This is the exact kind of event you want to avoid at all costs.

I don’t want to rain on this parade of “information,” but every enterprise search system which I have had the pleasure of procuring, managing, investigating, and analyzing has scalability problems.

The reason is simple: The volume of changed information and the flow of new information goes up. Whatever one starts with is rather rapidly choked. The solutions are painful: Spend more or index less.

I am not confident that one who follows the advice of certain experts will find his or her enterprise search journey pleasant. On the other hand, there are opportunities as Uber drivers one can pursue.

Stephen E Arnold, July 31, 2015

IBM and Its Federated Search Camelot

July 25, 2015

Short honk: I scanned my Twitter feed this morning. What did I see? An impossible assertion from the marketing crazed folks at IBM Watson. Let me tell you, IBM Watson and its minions output a hefty flow of tweets. A year or so ago, IBM relied on mid tier consulting firms experts like Dave Schubmehl (yep, the fellow who sold my research on Amazon without my permission). Now there are other voices.


But the message, not just the medium, are important. IBM’s assertion is that there will be no more “data silos in enterprise search.” You can learn about IBM’s “reality” in a webcast.

Now, I am not planning on sitting through a webcast. I would, however, like to enumerate several learnings from my decades of enterprise information access work. You can use this list as a jump start for your questions to the IBM wizards. Here goes:

  1. In an enterprise, what happens when an indexing system makes available in a federated search system information to a legal matter which is not supposed to be available to anyone except the attorneys involved in the matter?
  2. In an enterprise, what happens if information pertinent to a classified government project is made available in a federated search system which has not be audited for access control compliance?
  3. What happens when personnel information containing data about a medical issue is indexed and made available in an enterprise search system when email attachments are automatically indexed?
  4. How does the federated system deal with content in servers located in a research facility engaged in new product research?
  5. What happens when sales and pricing data shared among key account executives is indexed and made available to a contractor advising the company?
  6. What is the process for removing pointers to data which are not supposed to be in the enterprise search system?
  7. What security measures are in place to ensure that a lost or stolen mobile device does not have access to an enterprise search system?
  8. How much manual work is required before an organization turns on the Watson indexing system?

These will get you started on the cross silo issues?

Oh, the answer to these questions is that the person identified as responsible for making the data available may get to find a future elsewhere. Amazon warehouses are hiring in southern Indiana.

Alternatively one can saddle up a white stallion, snag a lance, and head for the nearest windmill.

Stephen E Arnold, July 25, 2015

SharePoint 2013 Enterprise Search Configuration

July 25, 2015

In just 14 easy steps, you too can configure “SharePoint 2013 for a SharePoint 2013” site. Now this is not enterprise search, but when it comes to Microsoft and information access, trivialities just don’t matter.

The screenshots show what options to select. There is no explanation in Step 4 for what to do if you click “Basic Search Center” instead of “Enterprise Search Center.” A real MSFT lover will know the difference between “basic” and “enterprise” for a SharePoint site.

Follow the clicks to Step 9. Note that under the category search one selects “Search Settings”, not “Search and offline availability.” Again the clarity is astounding.

Cut and paste your way to Step 13 where you configure search navigation. Just click “everything” and presumably the URL, the description, and the link will be locked and loaded. And if not? Well, there will be no errors, gentle reader.

The coup de grace is Step 14. Here’s the instruction which is crystal clear:

Just go and check “Use the same results page settings as my parent” is selected from the subsite search site settings.”

You are good to go—directly to a consulting firm specializing in installing a third party search system into your SharePoint solution. Sorry, but that approach usually works. The Fast Search thing from the mid 1990s? Not exactly flawless in my experience. Configuration files are still nestled deep in the innards but the graphical interface may not get you where you need to be.

Stephen E Arnold, July 25, 2015

Next Page »