Spark: An Easy Way to Burn Through Big Data?

November 14, 2017

I read “What is Apache Spark? The Big Data Analytics Platform Explained.” Interesting approach. The publishing outfit IDC seized upon the idea that the Wikipedia entry for Spark was not making the open source project easy enough to understand. I know that Wikipedia is chock full of craziness, but the Spark write up in the free encyclopedia struck me as reasonably good as far as Wikipedia content goes. There are code samples, links, and statements which balance the wonderfulness of open source with the grim realities of fiddling with the goodies the community provides. If I were a college professor (which I most certainly am not!), I would caution my students about applying the tenants of recycling to their class assignments. Apparently the old fashioned ideas I have are irrelevant.

Let’s look at three points from the IDC “explainer” that I found intriguing:

Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning

The statement seems to be factual. I would ask, from my shack in rural Kentucky, what is the source of data backing up this claim. I hate to rain on everyone’s parade, but I was under the impression that the numero uno tool for wrestling with data was Excel. There are some software solutions which are popular among the crunching crowd; for example, the much loved SAS and SPSS systems. And there are others. Many others.

A second interesting statement warranted a blue circle on my printed copy of the article:

The second advantage is the developer-friendly Spark API. As important as Spark’s speed-up is, one could argue that the friendliness of the Spark API is even more important.

If I understand the title, the write up is about making Spark easy. The explanation of “easy” is to use the “developer friendly Spark AI.” Easy means friendly. Hmmm.

The third statement I noted was:

By providing bindings to popular languages for data analysis like Python and R, as well as the more enterprise-friendly Java and Scala, Apache Spark allows everybody from application developers to data scientists to harness its scalability and speed in an accessible manner.

It seems that “easy” means that one needs knowledge of specific programming languages. Yep, easy. For “everybody” too.

What a simple thing is Spark! I will stick with Wikipedia. Maybe IDC should too?

Stephen E Arnold, November 14, 2017

Written by Stephen E. Arnold · Filed Under Big data, News

Comments

One Response to “Spark: An Easy Way to Burn Through Big Data?”

Tutuapp for iPhone on November 14th, 2017 10:21 am

cool.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.