Google Information Retrieval: Not Just Acceptable. Insanely Better

June 12, 2015

Like the TSA’s perfect bag, Google’s search is the apex of findability, according to “Google Now Has Just Gotten Insanely Better and Very Freaky.” What causes such pinnacles of praise? According to the write up:

Google announced at an event in Paris a Location Aware Search feature that can answer a new set of questions, without the user having to ask questions that should include addresses or proper place names. Asking Google Now questions like “what is this museum?” or “when was this building built?” in proximity of the Louvre in Paris will get you answers about the Louvre, as Google will be able to use your location and understand what you meant by “this” or “this building”.

How does the feature work when one is looking for information about the location of a Dark Web hidden services server in Ashburn, Virginia? Ah, not so helpful perhaps? What’s the value of a targeted message in this insanely better environment? Good question.

Stephen E Arnold, June 12, 2015

CyberOSINT Videos

May 12, 2015 has posted a single page which provides one click access to the three CyberOSINT videos. The videos provide highlight of Stephen E Arnold’s new monograph about next generation information access. You can explore the videos which run a total of 30 minutes on the Xenky site. One viewer said, “This has really opened my eyes. Thank you.”

Kenny Toth, May 12, 2015

Search Left Out of the Collaborative Economy Honeycomb

May 8, 2015

I must admit that I knew very little about the collaborative economy. I used AirBnB once time and worried about my little test. I survived. I rode in an Uber car one time because my son is an aficionado. I am okay with the subway and walking. I ignore apps which allegedly make my life better, faster, and more expensive.

I saw a post which pointed me to the Chief Digital Officer Summit and that pointed me to this page with the amazing honeycomb shown below. The title is “Collaborative Economy Honeycomb 2: Watch It Grow

collaborative honeycomb

The hexagons are okay, but the bulk of the write up is a listing of companies which manifest the characteristics of a collaborative honeycomb outfit.

Most of the companies were unfamiliar to me. I did recognize the names of a couple of the honeycombers; for example, Khan Academy, Etsy, eBay (ah, delightful eBay), Craigslist, Freelancer, the Crypto currencies (yep, my Dark Web work illuminated this hexagon in the honeycomb for me), and Indiegogo (I met the founder at a function in Manhattan).

But the other 150 companies in the list were news to me.

But what caused me to perk up and pay attention was one factoid:

There were zero search, content processing, or next generation information access companies in the list.

I formed a hypothesis which will probably give indigestion to the individuals and financial services firm pumping money into search and content processing companies. Here it is:

The wave of innovation captured in the wonky honeycomb is moving forward with search as an item on a checklist. The finding functions of these outfits boil down to social media buzz and niche marketing. Information access is application centric, not search centric.

If I am correct, why would honeycomb companies in collaboration mode want to pump money into a proprietary keyword search system? Why not use open source software and put effort into features for the app crowd?

Net net: Generating big money from organic license deals may be very difficult if the honeycomb analysis is on the beam. How hard will it be to sell a high priced search system to the companies identified in this analysis? I think that the task might be difficult and time consuming.

the good news is that the list of companies provides outfits like Attivio, BA Insight, Coveo, Recommind, Smartlogic, and other information retrieval firms with some ducks at which to shoot. How many ducks will fall in a fusillade of marketing?

One hopes that the search sharpshooters prevail.

Stephen E Arnold, May 8, 2015

Did You Know Oracle and WCC Go Beyond Search?

January 10, 2015

I love the phrase “beyond search.” Microsoft uses it, working overtime to become the go-to resource for next generation search. I learned that Oracle also finds the phrase ideal for describing the lash up of traditional database technology, the decades old Endeca technology, and the Dutch matching system from WCC Group.

You can read about this beyond search tie up in “Beyond Search in Policing: How Oracle Redefines Real time Policing and Investigation—Complementary Capabilities of Oracle’s Endeca Information Discovery and WCC’s ELISE.”

The white paper explains in 15 pages how intelligence led policing works. I am okay with the assertions, but I wonder if Endeca’s computationally intensive approach is suitable for certain content processing tasks. The meshing of matching with Endeca’s outputs results in an “integrated policing platform.”

The Oracle marketing piece explains ELISE in terms of “Intelligent Fusion.” Fusion is quite important in next generation information access. The diagram explaining ELISE is interesting:


Endeca’s indexing makes use of the MDex storage engine, which works quite well for certain types of applications; for example, bounded content and point-and-click access. Oracle shows this in terms of Endeca’s geographic output as a mash up:


For me, the most interesting part of the marketing piece was this diagram. It shows how two “search” systems integrate to meet the needs of modern police work:


It seems that WCC’s technology, also used for matching candidates with jobs, looks for matches and then Endeca adds an interface component once the Endeca system has worked through its computational processes.

For Oracle, ELISE and Endeca provide two legs of Oracle’s integrated police case management system.

Next generation information access systems move “beyond search” by integrating automated collection, analytics, and reporting functions. In my new monograph for law enforcement and intelligence professionals, I profile 21 vendors who provide NGIA. Oracle may go “beyond search,” but the company has not yet penetrated NGIA, next generation information access. More streamlined methods are required to cope with the type of data flows available to law enforcement and intelligence professionals.

For more information about NGIA, navigate to

Stephen E Arnold, January 10, 2015

BA Insight on the Resurrection Trail

September 10, 2014

I read “Artificial Intelligence Is Resurrecting Enterprise Search.” The unstated foundation of this write up is that enterprise search is dead. I am not sure I buy into that assumption. Last time I checked ElasticSearch was thriving with its open source approach. In fact, one “expert” pointed out that the decline in the fortunes of certain Brand Name search systems coincided with the rise in ElasticSearch’s fortunes. Connection? I don’t know, but enterprise search is thriving.

What needs resurrection (either the Phoenix variety or the Henry James’s varieties of mystical experience type) is search vendors whose software does not deliver for licensees. In this category are outfits that have just gone out of business; for example, Convera, Delphes, Entopia, Kartoo, Perfect Search, Siderean Software, and others).

Then there are the vendors with aging technology that have sold out to outfits that pack information retrieval into umbrella applications in order to put hurdles for competitors to scale. If lock in won’t work, then find a way to build a barricade. Outfits with this approach include Dassault, OpenText, Oracle, TeraText (now Leidos), among others.

Also, there are search vendors up to their ears in hock to venture funding firms. With stakeholders wanting some glimmer of a payout, the pressure is mounting. Companies in this leaky canoe include Attivio, BA Insight, Coveo, and Lucid Imagination, among others.

Another group of vendors are what I call long shots. These range from the quirky French search vendors like Antidot to Sinequa. There are some academic spin outs like Funnelback, which is now a commercial operation with its own unique challenges. And there are some other cats and dogs that live from deal to deal.

Finally, there are the giant companies looking for a way to make as much money as possible from the general ennui associated with proprietary search solutions. IBM is pitching Watson and using open source to get the basic findability function up and running. Microsoft is snagging technology from Jabber and bundling in various bits and pieces to deliver on the SharePoint vision of access to information in an organization. This Delve stuff is sort of smart, but until the product ships and provides access to a range of content types, I think Microsoft has a work in progress, not an enterprise solution upon which one can rely. The giant IHS is leveraging acquired technology into a search business, at least in the planners’ spreadsheets. Google offers its Search Appliance, which is one of the most expensive appliance solutions I have encountered. There is one witless mid tier consulting firm that believes a GSA is economical. Okay. And there is the name surfing Schubmehl from IDC who uses other people’s work to build a reputation.

To sum up, ElasticSearch is doing fine. Lots of other vendors are surviving or selling science fiction.

So what?

The “Artificial Intelligence Is Resurrecting Enterprise Search” is a write up from one of the outfits eager to generate big dollars to keep the venture capitalists happy. Hey, don’t take the money, if the recipients can’t generate big bucks.

Anyway, the premise of the write up is that enterprise search is dead and Microsoft’s Delve will give the software sector new life. The only folks who will get new life are the Microsoft savvy developers who can figure out how to set up, customize, optimize, and keep operational a grab back of software.

Microsoft wants to provide a corporate SharePoint user with a single interface to the content needed to perform work. This is a pretty tough problem. SageMaker, now long gone, failed at this effort. Google asserted that its Search Appliance could pull off this trick. Google failed. Dozens of vendors talk about federated search and generally deliver results that are of the “close but no cigar” variety.

Now what’s artificial intelligence got to do with Delve? Well, the system uses personalization and cues to figure out what a business SharePoint user wants and needs. We know how well this works with the predictive services available from Apple, Google, and—Microsoft Phone. Each time I use these services, I remember that they don’t work too well. Yep, Google really knows what I want about one out of a 1,000 queries. The other 999 Google generates laughable outputs.

Microsoft will be in the same rubber raft.

The write up does disagree with my viewpoint. Well, that’s okay because the BA Insight professional who tackles artificial intelligence is going to need more than inputs from Dave Schubmehl who recycles my information without my permission. If this write up is any indication, something has gone wrong somewhere along the line with regard to artificial intelligence, which is, I believe, an oxymoron.

Delve is, according the the write up, now “turning search on its head.” What? I need to find information about a specific topic. How will a SharePoint centric solution know I need that information? Well, that is not a viable scenario. Delve only knows what I have previously done. That’s the beauty of smart personalization. The problem is that my queries bounce from Ebola to silencers for tactical shotguns, from meth lab dispersion in Kentucky to the Muslim Brotherhood connections to certain political figures. Yep, Delve is going to be a really big help, right?

The write up asserts:

Companies need to get smarter about how they structure their information by addressing core foundational data layers. Pay attention to corporate taxonomies and introduce automated processes that add additional metadata where it’s left out from unstructured data sets. Doing this homework will make enterprise search results more relevant and will allow better results when interacting with enterprise data — whether it’s through text, voice or based on social distance. Access to enterprise data through intelligent interfaces is only getting better.

My reaction? My goodness. What the heck does this collection of buzzwords have to do with advanced software methods for information retrieval? Not much. That’s what the write is conveying to me.

Hopefully the investors in BA Insight find more to embrace than I do. If I were an investor, I would demand that my money be spent for more impactful essays, not reminders that Microsoft like IBM thrives on services, certification, and customers who may not know how to determine if software is smart.

Stephen E Arnold, September 10, 2014

Connotate: Marketing by Listing Features

August 6, 2014

Connotate posted a page that lists 51 features. The title of the Web page is “What Connotate Does Better than Scripts, Scrapers, and Toolkits.” The 51 features are grouped into 10 categories. Several are standard content processing operations; for example, scaling, ease of use, and rapid deployment.

Several are somewhat fuzzy. A good example is the category “Efficiency”. Connotate explains this concept with these features:

  • Highly efficient code is automatically generated during Agent training
  • Agents bookmark the final destination and identify links that aren’t necessary, bypassing useless links and arriving at the desired data much faster
  • Optimized navigation also generates less traffic on target websites
  • Supports load balancing
  • Multi-threaded – supports simultaneous execution of multiple Agents on a single system
    • Optimizes resource usage by analyzing clues during runtime about the various intended uses of the extracted data

From my experience with training systems, I know that the process can be quite a job, particularly when the source content is not scientific, technical, and medical information. STM is somewhat easier because the terminology is less colorful than social media content, for example. The deployment of agents that do not trigger a block by a target is a good idea. But load balancing is a different type of efficiency and one that is becoming part of some vendor’s punch list.

I found the 51 items useful as a thought starter for crafting a listicle.

Stephen E Arnold, August 6, 2014

Exclusive Interview: Miles Kehoe, LucidWorks

January 30, 2013

Miles Kehoe, formerly a senior manager at Verity and then the founder of New Idea Engineering, joined LucidWorks in late 2012. I worked with Miles on a project and found him a top notch resource for search and the tough technical area which was our concern.

I was able to interview Miles Kehoe on January 25, 2013. He was forthcoming and offered me insights which I found fresh and practical. For example, he told me:

You know I come from a ‘platform neutral’ background, and I know many of the folks involved with ElasticSearch. Their product addresses many of the shortcomings in Solr 3.x, and a year or two ago that would have been a coup. But now, Solr 4 completely addresses those shortcomings, and then some, with SolrCloud and Zoo Keeper. ES says it doesn’t require a pesky ‘schema’ to define fields; and when you’re playing with a product for the first time, that is kind of nice. On the other hand, folks I know who have attempted production projects with ES tell me there’s no way you want to go into production without a schema. Apache Lucene and Solr enjoy a much larger community of developers. If you check the Wikipedia page, you’ll see that Lucene and Solr both list the Apache Software Foundation as the developer; Elastic Search lists a single developer, who it turns out, has made the vast majority of updates to date. While it is based on Apache Lucene, Elastic Search is not an Apache project. Both products support RESTful API usage, but Elastic requires all transactions to use JSON. Solr supports JSON as well, but goes beyond to support transactions in many formats including XML, Java, PHP, CSV and Python. This lets you write applications to interact with Solr in any language and with any protocol you want to use. But the most noticeable difference is that Solr has an awesome Web Based Admin UI, ES doesn’t. If you’re only writing code, you might not care, but the second a project is handed over to an Admin group they’re bound to notice! It makes me smile every time somebody says ES and “ease of use” in the same sentence – you remember the MS DOS prompt back in 1990? Although early adopters enjoyed that “simplicity”, business people preferred mouse-based systems like the Mac and Windows. We’re seeing this play out all over again – busy IT people want an admin UI – they don’t want to spend all day at what amounts to a “web command line”, stitching together URLs and JSON commands.

I found this comment prescient. I learned about a possible issue triggered by ElasticSearch in “Github Search Exposes Passwords Then Crashes.”

I pressed Mr. Kehoe for key points of differentiation in open source search. I pointed out that every vendor is rushing to embrace open source search. Some do it with lights flashing like IBM and others operate in a lower profile manner like Attivio. He told me:

Just as we have different products and services for our customers, we can customize our engagements to meet our customers’ needs. Some of our customers want to have deep product expertise in-house, and with training, best practice and advisory consulting, and operations/production consulting, we help them come up to speed. We also provide ongoing technical and production support for mission critical applications – just last month an eCommerce site ran into production problems on the Friday afternoon before Christmas. We were able to help them out and have them at full capacity before dinner. Not to dwell on it, but what sets LucidWorks apart is the people. We employ a large number of the team that created and enhances Lucene and Solr including Grant Ingersoll, Steve Rowe and Yonik Seeley. We also have significant expertise on the business side as well. At the top, Paul Doscher grew Exalead from an unknown firm into a major enterprise search player over just a few years; my former business partner Mark Bennett and I have built up deep understanding of search since our Verity days in the early 1990s.

Important information for those analyzing search systems I believe.

You can read the full text of the interview on the ArnoldIT Search Wizards Speak series at Search Wizards Speak is the largest, no cost, freely available collection of interviews with experts in search and content processing. There are more than 60 interviews available. You can find the full series listing at and

Stephen E Arnold, January 30, 2013

Sponsored by

IBM Returns to Pure Software Roots As Technology Evolves

December 27, 2012

Since IBM ceased their production of applications and reorganized into two organizations,  Middleware and Solutions in 2011, they have been pumping out infrastructure software and the complementary integration components to go with it. These inner organizational changes have helped them determine the type of solutions they can offer to companies as the industry itself evolves.

Seeking Alpha’s article “So What Does IBM Mean When It Says It’s In The Solutions Business?” explains what type of solutions IBM will be providing in the future:

“It is not individual packaged products per se, but groups of related software products, services, and systems. And we know at very high level where IBM is going to focus its solutions efforts. IBM has always been about software, services, and systems – although in recent years the first two have taken front stage. The flip side is that some of these solutions areas are overly broad. Smarter Analytics is a catch-all covering the familiar areas of business intelligence and performance management, predictive analytics and analytical decision management, and analytic applications.”

The need for sustainable ROI in technology, it is unsurprising that IBM returned to their software roots. IBM seeks opportunities with best in class partners and their association with leading enterprise search companies such as Intrafind,is a relationship that seems to be paying off well. Intrafind was an early IBM Pure integrator and both sides seem to be making the best of the relationship.

Jennifer Shockley, December 27, 2012

Sponsored by, developer of Augmentext

Companies Striving for Success Choose Proven Enterprise Search Software Providers

December 25, 2012

The days of limited mobile app options came to an end a few years ago with the increased popularity of BYOD (bring your own device) work options. A growing demand for products to simplify work processes brought about phenomenal improvements on tablets and mobile devices. In turn, the enterprise app market skyrocketed, not in price but in product offerings. Companies looking to invest in the most beneficial applications for their business will want to weigh their options carefully.

Enterprise Apps Today’s article “Choosing the Right Enterprise Apps for your Business” touches on the importance of all around support when filtering through application options:

“Today, a hefty proportion of cutting-edge applications can be found on cloud platforms in the form of SaaS (software-as-a-service). While a quick glance at the website of an enterprise software offering will tell a great deal about the maturity of a project, it is hardly the entire story. For the huge investment of time and money that a business expects to make in an enterprise software deployment, it’s important to first ensure that a supporting ecosystem is in place.”

The article offers good advice and guidance on choosing the best applications, but companies striving for success will choose a proven enterprise search software provider. Intrafind offers guidance on strategy, applications and use of enterprise search software that can help businesses make the most of their investment. Financial firms and pharmaceutical industry leaders are just a few examples of the types of enterprise that rely on Intrafind’s capabilities.

Jennifer Shockley, December 25, 2012

Sponsored by, developer of Augmentext

Hybrid Cloud with Cloning Capability May Not Bode Well for Cloud Platform Developers

December 17, 2012

The introduction of hybrid technology comes as no surprise, but one has to wonder how current developers will feel about being cloned in the future. TechCrunch’s article “CloudVelocity Launches With $5M from Mayfield to Bring the Hybrid Cloud to the Enterprise” discusses the introduction of a hybrid cloud and its growing potential, along with its cloud cloning ability.

This new technology could save companies a bundle off initial investments, but smart platform designers may take precautions against cloning in the future. One has to wonder what preparations have already been made, if any. Investors want to be certain the risk of this approach is worth the effort.

“One Hybrid Cloud platform, aims to extend the enterprise data center to the public cloud, by enabling multi-tier applications to run without modification in the cloud and access services that reside in the enterprise data center. In a nutshell, the startup allows enterprises to get the benefits of private clouds in the public cloud. Users can discover, blueprint, clone, and migrate applications between data centers and public clouds. Currently, CloudVelocity supports full server, networking, security and storage integration with AWS but plans to integrate other public clouds.”

The excitement around startups and cloud solutions is great but corporations are reluctant to take chances with sensitive data. Those enterprises seeking stability in the growing hybrid cloud universe may find some assurance in relying on a mature, capable enterprise provider. Intrafind offers consultative solutions and reliable cloud solutions with secure access.

Jennifer Shockley, December 17, 2012

Sponsored by, developer of Augmentext

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta