Statisticians Weigh In on Big Data

September 5, 2011

The Joint Statistical Meetings, the largest assembly of data scientists in North America, provided fertile ground this summer for a survey by Revolution Analytics on the state of Big Data technologies. Revolution Analytics presents the results in “97 Percent of Data Scientists Say ‘Big Data’ Technology Solutions Need Improvement.”

As the headline suggests, the vast majority of these experts crave improvement in the field:

The survey revealed nearly 97 percent of data scientists believe big data technology solutions need improvement and the top three obstacles data scientists foresee when running analytics on Big Data are: complexity of big data solutions; difficulty of applying valid statistical models to the data; and having limited insight into the meaning of the data.

Results also show a lack of consensus on the definition of “Big Data.” Is the threshold a terabyte? Petabyte? Or does it vary by the job? No accepted standard exists.

Survey-takers were asked about their future use of existing analytics platforms, SPSS, SAS, R, S+, and MATLAB. Most respondents expected to increase use of only one of these, the open source R project (a.k.a. GNU S).

Revolution Analytics bases their data management software and services on the R project. The company also sponsors Inside-R.org, a resource for the R project community. I’d have to see the survey to know whether the emphasis they found on R was skewed, but let’s give them the benefit of the doubt for now.

Cynthia Murrell, September 5, 2011

Sponsored by Pandia.com, publishers of The New Landscape of Enterprise Search

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta