Learn One of These Programming Languages, Crunch Big Data. Easy, Right?

October 3, 2015

I read a listicle called “Ten Top Languages for Crunching Big Data.” The list is interesting but the underlying assumption about the languages and “crunching” Big Data was remarkable.

The core of the write up is a list of 10 programming languages which make it possible (maybe semi easy) to “generate insights.” The list has some old familiar programming languages; for example, SQL or structured query language. There’s the graduate student in psychology fave SAS. Some might argue that SPSS Clem is the way to chop Big Data down to size. There is a toolkit in the list. Remember Matlab, which for a “student” is only $49. For the sportier crowd, I would add Mathematica to the list, but I don’t want to melt the listicle.

Also on the list are Python and R. Both get quite a bit of love from some interesting cyber OSINT outfits.

For fans of Java, the list points to Scala. The open source fan can use HiveQL, Julia, or Pig Latin.

The listicle includes a tip of the hat to Alphabet Google. According to the write up:

Go has been developed by Google and released under an open source license. Its syntax is based on C, meaning many programmers will be familiar with it, which has aided its adoption. Although not specifically designed for statistical computing, its speed and familiarity, along with the fact it can call routines written in other languages (such as Python) to handle functions it can’t cope with itself, means it is growing in popularity for data programming.

Yep, a goodie from the GOOG spells Big Data magic. For how long? Well, I don’t know.

However, the assumption from which the listicle hangs is that a programming language allows Big Data to be crunched.

Whoa, Nellie.

There may be a couple of “trivial” intermediary steps required. Let me mention one. The Big Data cruncher has to code up something to get useful outputs. Now that “code up” step may require some other bothersome tasks; for example, dealing with messy data to ensure that the garbage in, garbage out problem does not arise. The mathematically inclined may suggest that the coded up “script” actually work within available computer time and memory resources. Wow, that might make a script to crunch Big Data either not work or output results which are dead wrong. What if the script implements algorithmic bias?

Whoa, whoa, Nellie.

I know that programming languages are important. But some other tasks deserve attention in my experience.

Stephen E Arnold, October 3, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta