Search Sink Hole Identified and Allegedly Paved and Converted to a Data Convenience Store
May 20, 2016
I try to avoid reading more than one write up a day about alleged revolutions in content processing and information analytics. My addled goose brain cannot cope with the endlessly recycled algorithms dressed up in Project Runway finery.
I read “Ryft: Bringing High Performance Analytics to Every Enterprise,” and I was pleased to see a couple of statements which resonated with my dim view of information access systems. There is an accompanying video in the write up. I, as you may know, gentle reader, am not into video. I prefer reading, which is the old fashioned way to suck up useful factoids.
Here’s the first passage I highlighted:
Any search tool can match an exact query to structured data—but only after all of the data is indexed. What happens when there are variations? What if the data is unstructured and there’s no time for indexing? [Emphasis added]
The answer to the question is increasing costs for sales and marketing. The early warning for amped up baloney are the presentations given at conferences and pumped out via public relations firms. (No, Buffy, no, Trent, I am not interested in speaking with the visionary CEO who hired you.)
I also highlighted:
With the power to complete fuzzy search 600X faster at scale, Ryft has opened up tremendous new possibilities for data-driven advances in every industry.”
I circled the 600X. Gentle reader, I struggle to comprehend a 600X increase in content processing. Dear Mother Google has invested to create a new chip to get around the limitations of our friend Von Neumann’s approach to executing instructions. I am not sure Mother Google has this nailed because Mother Google, like IBM, announces innovations without too much real world demonstration of the nifty “new” things.
I noted this statement too:
For the first time, you can conduct the most accurate fuzzy search and matching at the same speed as exact search without spending days or weeks indexing data.
Okay, this strikes me as a capability I would embrace if I could get over or around my skepticism. I was able to take a look at the “solution” which delivers the astounding performance and information access capability. Here’s an image from Ryft’s engineering professionals:
Notice that we have Spark and pre built components. I assume there are myriad other innovations at work.
The hitch in the git along is that in order to deal with certain real world information processing challenges, the inputs come from disparate systems, each generating substantial data flows in real time.
Here’s an example of a real world information access and understanding challenge, which, as far as I know, has not been solved in a cost effective, reliable, or usable manner.
Image source: Plugfest 2016 Unclassified.
This unclassified illustration makes clear that the little things in the sky pump out lots of data into operational theaters. Each stream of data must be normalized and then converted to actionable intelligence.
The assertion about 600X sounds tempting, but my hunch is that the latency in normalizing, transferring, and processing will not meet the need for real time, actionable, accurate outputs when someone is shooting at a person with a hardened laptop in a threat environment.
In short, perhaps the spark will ignite a fire of performance. But I have my doubts. Hey, that’s why I spend my time in rural Kentucky where reasonable people shoot squirrels with high power surplus military equipment.
Stephen E Arnold, May 20, 2016