Can Analytics Be Cloud Friendly?
August 24, 2016
One of the problems with storing data in the cloud is that it is difficult to run analytics. Sure, you can run tests to determine the usage of the cloud, but analyzing the data stored in the cloud is another story. Program developers have been trying to find a solution to this problem and the open source community has developed some software that might be the ticket. Ideata wrote about the newest Apache software in “Apache Spark-Comparing RDD, Dataframe, and Dataset.”
Ideata is a data software company and they built many of the headlining products on the open source software Apache Spark. They have been using Apache Spark since 2013 and enjoy using it because it offers a rich abstraction, allows the developer to build complex workflows, and perform easy data analysis.
Apache Spark works like this:
Spark revolves around the concept of a resilient distributed dataset (RDD), which is a fault-tolerant collection of elements that can be operated on in parallel. An RDD is Spark’s representation of a set of data, spread across multiple machines in the cluster, with API to let you act on it. An RDD could come from any datasource, e.g. text files, a database via JDBC, etc. and can easily handle data with no predefined structure.
It can be used as the basis fort a user-friendly cloud analytics platform, especially if you are familiar with what can go wrong with a dataset.