How-To Overview of Building a Data Platform to Handle Real-Time Datasets
March 11, 2016
The article on Insight Data Engineering titled Building a Streaming Search Platform offers a glimpse into the Fellows Program wherein grad students and software engineers alike build data platforms and learn cutting-edge open source technologies. The article delves into the components of the platform, which enables close to real-time search of a streaming text data source, with Twitter as an example. It also explores the usefulness of such a platform,
On average, Twitter users worldwide generate about 6,000 tweets per second. Obviously, there is much interest in extracting real-time signal from this rich but noisy stream of data. More generally, there are many open and interesting problems in using high-velocity streaming text sources to track real-time events. … Such a platform can have many applications far beyond monitoring Twitter…All code for the platform I describe here can be found on my github repository Straw.”
Ryan Walker, a Casetext Data Engineer, describes how these products might deliver major results in the hands of a skilled developer. He uses the example of a speech to text monitor being able to transcribe radio or TV feeds and send the transcriptions to the platform. The platform would then seek key phrases and even be set up to respond with real-time event management. There are many industries that will find this capability very intriguing due to their dependence on real-time information processing, including finance and marketing.
Chelsea Kerwin, March 11, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
Comments
One Response to “How-To Overview of Building a Data Platform to Handle Real-Time Datasets”
Part of the system Ryan developed is based on our Luwak library, based on Apache Lucene, which implements a high-performance stored search engine. Luwak is also being used by Bloomberg for their very high traffic financial news monitoring system http://www.flax.co.uk/blog/2016/03/08/helping-bloomberg-build-real-time-news-search-engine/