Digital Reasoning: A New Generation of Big Data Tools

December 31, 2011

I read “Tool Detects Patterns Hidden in Vast Data Sets.” The Broad Institute’s online Web site reported that a group of researchers in the US and Israel “have developed a tool that can tackle large data sets in a way that no other software program can.”

What seems exciting to me is that the mathematical procedure which involves creating a space and grids into which certain discerned patterns are placed provides a fascinating potential enhancement to companies like ours–Digital Reasoning. Our proprietary methods have performed similar associative analytics in order to reduce the uncertainty associated with processing large flows of data and distilling meaningful relationships from them. Some day computers and associated systems will be able to cope with exabytes of data from the Internet of things. Today, the Broad Institute validates the next-generation numerical methods that its researchers, Digital Reasoning’s engineers, and a handful of other organizations have been exploring.

The technical information about the method, which is called MIC, shorthand for Maximal Information Coefficient, is available to members of the AAAS. To get a copy of the original paper and its mathematical exegesis you will want the full bibliographic information:

“Detecting Novel Associations in Large Data Sets” by David N. Reshef, Yakir A. Reshef, Hilary K. Finucane, Sharon R. Grossman, Gilean McVean, Peter J. Turnbaugh, Eric S. Lander, Michael Mitzenmacher, and Pardis C. Sabeti, Science, 16 December 2011, Volume. 334, Number 6062, pages 1518-1524.

The core of the authors work is:

Identifying interesting relationships between pairs of variables in large data sets is increasingly important. Here, we present a measure of dependence for two-variable relationships: the maximal information coefficient (MIC). MIC captures a wide range of associations both functional and not, and for functional relationships provides a score that roughly equals the coefficient of determination (R2) of the data relative to the regression function. MIC belongs to a larger class of maximal information-based nonparametric exploration (MINE) statistics for identifying and classifying relationships. We apply MIC and MINE to data sets in global health, gene expression, major-league baseball, and the human gut microbiota and identify known and novel relationships.

Digital Reasoning’s application of similar mathematical methods underpins our entity-oriented analytics. You can read more about our methods in our description of Synthesys, a platform for performing automated understanding of the meaning of Big Data in real time.

The significance of this paper is that it shines a spotlight on the increasing importance of research into applications of next-generation numerical methods. Public discussion of methods like MIC will serve to accelerate innovation and the diffusion of knowledge. At Digital Reasoning we see this as further evidence of the potential of algorithmic, unaided approaches like ours to achieve true “automated understanding” of all forms of text regardless of volume, velocity or variety. As we shift to IPv6, the “Internet of things” will dramatically increase the flows of real time data. With automobiles and consumer devices transmitting data continuously or on demand, the digital methods of 10 or five years ago fall short.

Three other consequences of MIC-style innovations will accrue:

First, at Digital Reasoning, we will be able to enhance our existing methods with the new insights, forming partnerships and investing in research to apply demonstrations to real world problems. The confidence SilverLake partners’ investment in Digital Reasoning has provided us with capital to extend our commercial system quickly and in new directions such as financial services, health care, legal, and other verticals.

Second, we see the MIC method fueling additional research into methods making Big Data more accessible and useful; that is, consumerize some applications without solutions. Big Data will eventually be part of a standard information process, not something discussed as “new” and “unusual.”

Third, greater awareness of the contribution of mathematics will, I believe, stimulate young men and women to make mathematics and statistics a career. With more talent entering the workforce, the pace of innovation and integration will accelerate. That’s good for many companies, not just Digital Reasoning.

Kudos to the MIC team. What’s next?

Tim Estes, December 31, 2011

Sponsored by Pandia.com

Comments

One Response to “Digital Reasoning: A New Generation of Big Data Tools”

  1. New Data Analysis Tool Developed | BioSciBlog on March 14th, 2012 1:48 pm

    […] Digital Reasoning: A New Generation of Big Data Tools (arnoldit.com) Rate this: Share this:FacebookTwitterStumbleUponDiggEmailRedditPrintLike this:LikeBe the first to like this post. This entry was posted in Bioinformatics, News and tagged Broad Institute, data analysis, Genomics, Harvard University, Microbiology, Science(journal), World Health organization by biosciblog. Bookmark the permalink. […]

  • Archives

  • Recent Posts

  • Meta