Featured

Autumn Approaches: Time for Realism about Search

Last week I had a conversation with a publisher who has a keen interest in software that “knows” what content means. Armed with that knowledge, a system can then answer questions.

The conversation was interesting. I mentioned my presentations for law enforcement and intelligence professionals about the limitations of modern and computationally expensive systems.

Several points crystallized in my mind. One of these is addressed, in part, in a diagram created by a person interested in machine learning methods. Here’s the diagram created by SciKit:

image

The diagram is designed to help a developer select from different methods of performing estimation operations. The author states:

Often the hardest part of solving a machine learning problem can be finding the right estimator for the job. Different estimators are better suited for different types of data and different problems. The flowchart below is designed to give users a bit of a rough guide on how to approach problems with regard to which estimators to try on your data.

First, notice that there is a selection process for choosing a particular numerical recipe. Now who determines which recipe is the right one? The answer is the coding chef. A human exercises judgment about a particular sequence of operation that will be used to fuel machine learning. Is that sequence of actions the best one, the expedient one, or the one that seems to work for the test data? The answer to these questions determines a key threshold for the resulting “learning system.” Stated another way, “Does the person licensing the system know if the numerical recipe is the most appropriate for the licensee’s data?” Nah. Does a mid tier consulting firm like Gartner, IDC, or Forrester dig into this plumbing? Nah. Does it matter? Oh, yeah. As I point out in my lectures, the “accuracy” of a system’s output depends on this type of plumbing decision. Unlike a backed up drain, flaws in smart systems may never be discerned. For certain operational decisions, financial shortfalls or the loss of an operation team in a war theater can be attributed to one of many variables. As decision makers chase the Silver Bullet of smart, thinking software, who really questions the output in a slick graphic? In my experience, darned few people. That includes cheerleaders for smart software, azure chip consultants, and former middle school teachers looking for a job as a search consultant.

Second, notice the reference to a “rough guide.” The real guide is understanding of how specific numerical recipes work on a set of data that allegedly represents what the system will process when operational. Furthermore, there are plenty of mathematical methods available. The problem is that some of the more interesting procedures lead to increased computational cost. In a worst case, the more interesting procedures cannot be computed on available resources. Some developers know about N=NP and Big O. Others know to use the same nine or ten mathematical procedures taught in computer science classes. After all, why worry about math based on mereology if the machine resources cannot handle the computations within time and budget parameters? This means that most modern systems are based on a set of procedures that are computationally affordable, familiar, and convenient. Does this similar of procedures matter? Yep. The generally squirrely outputs from many very popular systems are perceived as completely reliable. Unfortunately, the systems are performing within a narrow range of statistical confidence. Stated in a more harsh way, the outputs are just not particularly helpful.

In my conversation with the publisher, I asked several questions:

  1. Is there a smart system like Watson that you would rely upon to treat your teenaged daughter’s cancer? Or, would you prefer the human specialist at the Mayo Clinic or comparable institution?
  2. Is there a smart system that you want directing your only son in an operational mission in a conflict in a city under ISIS control? Or, would you prefer the human-guided decision near the theater about the mission?
  3. Is there a smart system you want managing your retirement funds in today’s uncertain economy? Or, would you prefer the recommendations of a certified financial planner relying on a variety of inputs, including analyses from specialists in whom your analyst has confidence?

When I asked these questions, the publisher looked uncomfortable. The reason is that the massive hyperbole and marketing craziness about fancy new systems creates what I call the Star Trek phenomenon. People watch Captain Kirk talking to devices, transporting himself from danger, and traveling between far flung galaxies. Because a mobile phone performs some of the functions of the fictional communicator, it sure seems as if many other flashy sci-fi services should be available.

Well, this Star Trek phenomenon does help direct some research. But in terms of products that can be used in high risk environments, the sci-fi remains a fiction.

Believing and expecting are different from working with products that are limited by computational resources, expertise, and informed understanding of key factors.

Humans, particularly those who need money to pay the mortgage, ignore reality. The objective is to close a deal. When it comes to information retrieval and content processing, today’s systems are marginally better than those available five or ten years ago. In some cases, today’s systems are less useful.

Read more »

Interviews

Elasticsearch: A Platform for Third Party Revenue

Making money from search and content processing is difficult. One company has made a breakthrough. You can learn how Mark Brandon, one of the founders of QBox, is using the darling of the open source search world to craft a robust findability business.

I interviewed Mr. Brandon, a graduate of the University of Texas as Austin, shortly after my return from a short trip to Europe. Compared with the state of European search businesses, Elasticsearch and QBox are on to what diamond miners call a “pipe.”

In the interview, which is part of the Search Wizards Speak series, Mr. Brandon said:

We offer solutions that work and deliver the benefits of open source technology in a cost-effective way. Customers are looking for search solutions that actually work.

Simple enough, but I have ample evidence that dozens and dozens of search and content  processing vendors are unable to generate sufficient revenue to stay in business. Many well known firms would go belly up without continual infusions of cash from addled folks with little knowledge of search’s history and a severe case of spreadsheet fever.

Qbox’s approach pivots on Elasticsearch. Mr. Brandon said:

When our previous search product proved to be too cumbersome, we looked for an alternative to our initial system. We tested Elasticsearch and built a cluster of Elasticsearch servers. We could tell immediately that the Elasticsearch system was fast, stable, and customizable. But we love the technology because of its built-in distributed nature, and we felt like there was room for a hosted provider, just as Cloudant is for CouchDB, Mongolab and MongoHQ are for MongoDB, Redis Labs is for Redis, and so on. Qbox is a strong advocate for Elasticsearch because we can tailor the system to customer requirements, confident the system makes information more findable for users.

When I asked where Mr. Brandon’s vision for functional findablity came from, he told me about an experience he had at Oracle. Oracle owns numerous search systems, ranging from the late 1980s Artificial Linguistics’ system to somewhat newer systems like the late 1990s Endeca system, and the newer technologies from Triple Hop. Combine these with the SES technology and the hybrid InQuira formed from two faltering NLP systems, and Oracle has some hefty investments.

Here’s Mr. Brandon’s moment of insight:

During my first week at Oracle, I asked one of my colleagues if they could share with me the names of the middleware buyer contacts at my 50 or so named accounts. One colleague said, “certainly”, and moments later an Excel spreadsheet popped into my inbox. I was stunned. I asked him if he was aware that “Excel is a Microsoft technology and we are Oracle.” He said, “Yes, of course.” I responded, “Why don’t you just share it with me in the CRM System?” (the CRM was, of course, Siebel, an Oracle product). He chortled and said, “Nobody uses the CRM here.” My head exploded. I gathered my wits to reply back, “Let me get this straight. We make the CRM software and we sell it to others. Are you telling me we don’t use it in-house?” He shot back, “It’s slow and unusable, so nobody uses it.” As it turned out, with around 10 million corporate clients and about 50 million individual names, if I had to filter for “just middleware buyers”, “just at my accounts”, “in the Northeast”, I could literally go get a cup of coffee and come back before the query was finished. If I added a fourth facet, forget it. The CRM system would crash. If it is that bad at the one of the world’s biggest software companies, how bad is it throughout the enterprise?

You can read the full interview at http://bit.ly/1mADZ29. Information about QBox is at www.qbox.com.

Stephen E Arnold, July 2, 2014

Latest News

Cloud Based Mathematica from Wolfram

I read “Launching Today: Mathmatica Online.” The interface is similar to the desktop application. The benefits of having the Mathematica tool accessible on non... Read more »

September 16, 2014 | | Comment

Xooglers Craft a New Search Engine

If you are interested in searching for off color content, you will be thrilled to learn that Boodigo awaits your input. Gizmodo reports that the system surfaces... Read more »

September 16, 2014 | | Comment

Alternatives to Windows Search

For some common searches, Windows’ built-in desktop search function works just fine. Other times, though, our hard-drive hunts call for something more. Reporter... Read more »

September 16, 2014 | | Comment

Android in the Enterprise

Can a deal with HP help Google’s Android catch up to Apple’s iOS in the business market? According to Business Insider, “Google Is Chasing Apple’s Mega Deal... Read more »

September 16, 2014 | | Comment

SharePoint Simplifies Cloud Use with Better Updates

The next major update to SharePoint functionality will not occur until sometime in 2015, but for now users can get the most function out of their current implementation... Read more »

September 16, 2014 | | Comment

Autonomy Technology a Good Buy Says HP Big Dog Isherwood

If you follow the HP Autonomy firefights, you will enjoy “Autonomy Deal Fallout ‘More Extreme’ Than Hoped, says HP’s UK boss Andy Isherwood: In spite... Read more »

September 15, 2014 | | Comment

Short Honk: Goggle Intervention-Objectivity?

Short honk: Navigate to “How Google’s Autonomous Car Passed the First U.S. State Self-Driving Test.” Do you find this statement interesting? Google chose the... Read more »

September 15, 2014 | | Comment

Bing Can Now Converse

Microsoft’s Bing is spinning, I mean, sporting a nifty new feature; The Next Web reveals, “Microsoft Updates Bing’s Conversational Understanding to Let You... Read more »

September 15, 2014 | | Comment

Feds Warned to Sweat the Small Stuff When Considering Big Data Solutions

Say, here’s a thought: After spending billions for big-data software, federal managers are being advised to do their research before investing in solutions. We... Read more »

September 15, 2014 | | Comment

HP, Bribes, and an Autonomy Flip

Try as I might, I cannot avoid learning about Hewlett Packard. For a $100 billion outfit, the flow of information is not overwhelmingly positive. Earlier today,... Read more »

September 13, 2014 | | Comment