Spark: An Easy Way to Burn Through Big Data?

November 14, 2017

I read “What is Apache Spark? The Big Data Analytics Platform Explained.” Interesting approach. The publishing outfit IDC seized upon the idea that the Wikipedia entry for Spark was not making the open source project easy enough to understand. I know that Wikipedia is chock full of craziness, but the Spark write up in the free encyclopedia struck me as reasonably good as far as Wikipedia content goes. There are code samples, links, and statements which balance the wonderfulness of open source with the grim realities of fiddling with the goodies the community provides. If I were a college professor (which I most certainly am not!), I would caution my students about applying the tenants of recycling to their class assignments. Apparently the old fashioned ideas I have are irrelevant.

Let’s look at three points from the IDC “explainer” that I found intriguing:

Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning

The statement seems to be factual. I would ask, from my shack in rural Kentucky, what is the source of data backing up this claim. I hate to rain on everyone’s parade, but I was under the impression that the numero uno tool for wrestling with data was Excel. There are some software solutions which are popular among the crunching crowd; for example, the much loved SAS and SPSS systems. And there are others. Many others.

A second interesting statement warranted a blue circle on my printed copy of the article:

The second advantage is the developer-friendly Spark API. As important as Spark’s speed-up is, one could argue that the friendliness of the Spark API is even more important.

If I understand the title, the write up is about making Spark easy. The explanation of “easy” is to use the “developer friendly Spark AI.” Easy means friendly. Hmmm.

The third statement I noted was:

By providing bindings to popular languages for data analysis like Python and R, as well as the more enterprise-friendly Java and Scala, Apache Spark allows everybody from application developers to data scientists to harness its scalability and speed in an accessible manner.

It seems that “easy” means that one needs knowledge of specific programming languages. Yep, easy. For “everybody” too.

What a simple thing is Spark! I will stick with Wikipedia. Maybe IDC should too?

Stephen E Arnold, November 14, 2017

Big Data Less Accessible for Small and Mid-Size Businesses

October 31, 2017

Even as the term “Big Data” grows stale, small and medium-sized businesses (SMB’s) are being left behind in today’s data-driven business world. The SmartData Collective examines the issue in, “Is Complexity Strangling the Real-World Benefits of Big Data for SMB’s?” Writer Rehan Ijaz supplies this example:

Imagine a local restaurant chain fighting to keep the doors open as a national competitor moves into town. The national competitor will already have a competent Cloud Data Manager (CDM) in place to provide insight into what should be offered to customers, based on their past interactions. A multi-million-dollar technology is affordable, due to scale, for a national chain. The same can’t be said for a smaller, mom and pop type restaurant. They’ve relied on their gut instinct and hometown roots to get them this far, but it may not be enough in the age of Big Data. Large companies are using their financial muscle to get information from large data sets, and take targeted action to outmaneuver local competitors.

Pointing to an article from Forbes, Ijaz observes that the main barrier for these more modestly-sized enterprises is not any hesitation about the technology itself, but rather a personal issue—their existing marketing employees were not hired for their IT prowess, and even the most valuable data requires analysis to be useful. Few SMB’s are eager to embrace the cost and disruption of hiring data scientists and reorganizing their marketing teams; they have to be sure it will be worth the trouble.

Ijaz hopes that the recent increase in scalable, cloud-based analysis solutions will help SMB’s with these challenges. The question is, he notes, whether it is too late for many SMB’s to recover from their late foray into Big Data.

Cynthia Murrell, October 31, 2017

A Handy Collection of References on AI Topics

October 24, 2017

Ever wish there were a centralized resource with all you need to know about AI, clearly presented? If so, check out the “Cheat Sheets for AI, Neural Networks, Machine Learning, Deep Learning & Big Data” at Becoming Human. Chatbot pro-Stefan Kojouharov shares his selections of graphic aids and includes a summary list of links at the end. He briefly introduces his assemblage:

Over the past few months, I have been collecting AI cheat sheets. From time to time I share them with friends and colleagues and recently I have been getting asked a lot, so I decided to organize and share the entire collection. To make things more interesting and give context, I added descriptions and/or excerpts for each major topic. This is the most complete list and the Big-O is at the very end, enjoy…

The offerings begin with illustrations of neural networks and machine learning in general, then progress to detailed information on relevant software, like Python for Data Science and TensorFlow, and topics like data wrangling and data visualization. As promised, graphics on Big-O notation conclude the presentation. This is a page to bookmark; it could save some time hunting for the right resource down the line, if not today.

Cynthia Murrell, October 24, 2017

Big Data Might Just Help You See Through Walls

October 18, 2017

It might sound like science fiction or, worse, like a waste of time, but scientists are developing cameras that can see around corners. More importantly, these visual aids will fill in our human blind spots. According to an article in MIT News, “An Algorithm For Your Blind Spot,” it may have a lot of uses, but needs some serious help from big data and search.

According to the piece about the algorithm, “CornerCameras,”

CornerCameras generates one-dimensional images of the hidden sceneA single image isn’t particularly useful since it contains a fair amount of “noisy” data. But by observing the scene over several seconds and stitching together dozens of distinct images, the system can distinguish distinct objects in motion and determine their speed and trajectory.

Seems like a pretty neat tool. Especially, when you consider that this algorithm could help firefighters find people in burning buildings or help bus drivers spot a child running onto the street. However, it is far from perfect.

The system still has some limitations. For obvious reasons, it doesn’t work if there’s no light in the scene, and can have issues if there’s low light in the hidden scene itself. It also can get tripped up if light conditions change, like if the scene is outdoors and clouds are constantly moving across the sun. With smartphone-quality cameras the signal also gets weaker as you get farther away from the corner.

Seems like they have a brilliant idea in need of a big data boost. We can envision a world where these folks partner with big data and search giants to help fill in the gaps of the algorithm and provide a powerful tool that can save lives. Here’s to hoping we’re not the only ones making that connection.

Patrick Roland, October 18, 2017

Artificial Intelligence Will Make Humans Smarter at Work

October 17, 2017

Relatively no industry has been untouched by the past decade’s advances in artificial intelligence. We could go on and make a laundry list of which businesses in particular, but we have a hunch you are very close to one right now. According to a recent Enterprise CIO article, “From Buzzword to Boardroom: What’s Next for Machine Learning?” human intelligence is becoming obsolete in certain fields.

As demonstrated in previous experiments, no human brain is able to process as much data at comparable speed and accuracy as machine-learning systems can and as a result, deliver a sound, data-based result within nanoseconds.

While that should make you sit up and take notice, the article is not as apocalyptic as that quote might lead you to believe. In fact, there is a silver lining in all this AI. We humans will just have to work hard to get there. The story continues:

It must also leave room for creativity and innovation. Insights and suggestions gained with the aid of artificial intelligence should stimulate, not limit. Ultimately, real creativity and genuine lateral thinking still comes from humans.

We have to agree with this optimistic line of thinking. These machines are not exactly stealing our jobs, but forcing humans to reevaluate their roles. If you can properly combine AI, big data, and search for your role, chances are an employee, like yourself, will become invaluable instead of obsolete.

Patrick Roland, October 17, 2017

CEOs AI Hyped but Not Many Deploy It

October 17, 2017

How long ago was big data the popular buzzword?  It was not that long ago, but now it has been replaced with artificial data and machine learning.  Whenever a buzzword is popular, CEOs and other leaders become obsessed with implementing it within their own organizations.  Fortune opens up about the truth of artificial intelligence and its real deployment in the editorial, “The Hype Gap In AI”.

Organization leaders have high expectations for artificial intelligence, but the reality is well below them.  According to a survey cited in the editorial, 85% of executives believe that AI will change their organizations for the better, but only one in five executives have actually implemented AI into any part of their organizations.  Only 39% actually have an AI strategy plan.

Hype about AI and its potential is all over the business sector, but very few really understand the current capabilities.  Even fewer know how they can actually use it:

But actual adoption of AI remains at a very early stage. The study finds only about 19% of companies both understand and have adopted AI; the rest are in various stages of investigation, experimentation, and watchful waiting. The biggest obstacle they face? A lack of understanding —about how to adapt their data for algorithmic training, about how to alter their business models to take advantage of AI, and about how to train their workforces for use of AI.

Organizations view AI as an end-all solution, similar to how big data was the end all solution a few years ago.  What is even worse is that while big data may have had its difficulties, understanding it was simpler than understanding AI.  The way executives believe AI will transform their companies is akin to a science fiction solution that is still very much in the realm of the imagination.

Whitney Grace, October 17, 2017

Big Data and Big Money Are on a Collision Course

October 16, 2017

A recent Forbes article has started us thinking about the similarities between long-haul truckers and Wall Street traders. Really! The editorial penned by JP Morgan, “Informing Investment Decisions Using Machine Learning and Artificial Intelligence,” showcases the many ways in which investing is about to be overrun with big data machines. Depending on your stance, it is either thrilling or frightening.

The story claims:

Big data and machine learning have the potential to profoundly change the investment landscape. As the quantity and the access to data available have grown, many investors continue to evaluate how they can leverage data analysis to make more informed investment decisions. Investment managers who are willing to learn and to adopt new technologies will likely have an edge.

Sounds an awful lot like the news we have been reading recently about how almost two million truck drivers could be out of work in the next decade thanks to self-driving cars. If you have money in trucking, the amount saved is amazing, but if that’s how you make your living things have suddenly become chilly. Sounds like the future of Wall Street, according to this story.

It continues:

Big data and machine learning strategies are already eroding some of the advantage of fundamental analysts, equity long-short managers and macro investors, and systematic strategies will increasingly adopt machine learning tools and methods.

If you ask us, it’s not a matter of if but when. Nobody wants to lose their job due to efficiency, but it’s pretty much impossible to stop. Money talks and saving money talks loudest to companies and business owners, like investment firms.

Patrick Roland, October 16, 2017

Blockchain Quote to Note: The Value of Big Data as an Efficient Error Reducer

September 6, 2017

I read “Blockchains for Artificial Intelligence: From Decentralized Model Exchanges to Model Audit Trails.” The foundation of the write up is that blockchain technology can be used to bring more control to data and models. The idea is an interesting one. I spotted a passage tucked into the lower 20 percent of the article which I judged to be a quote to note. Here’s the passage I highlighted:

as you added more data — not just a bit more data but orders of magnitude more data — and kept the algorithms the same, then the error rates kept going down, by a lot. By the time the datasets were three orders of magnitude larger, error was less than 5%. In many domains, there’s a world of difference between 18% and 5%, because only the latter is good enough for real-world application. Moreover, the best-performing algorithms were the simplest; and the worst algorithm was the fanciest. Boring old perceptrons from the 1950s were beating state-of-the-art techniques.

Bayesian methods date from the 18th century and work well. Despite LaPlacian and Markovian bolt ons, the drift problem bedevils some implementations. The solution? Pump in more training data, and the centuries old techniques work like a jazzed millennial with a bundle of venture money.

Care to name a large online outfit which may find this an idea worth nudging forward? I don’t think it will be Verizon Oath or Tronc.

Stephen E Arnold, September 6, 2017

Big Data Visualization the Open Source Way

August 10, 2017

Big Data though was hailed in a big way, it is yet to gain full steam because of a shortage of talent. Companies working in this domain are taking another swipe by offering visualization tools for free.

The Customize Windows in an article titled List of Open Source Big Data Visualization Tools:

There are some growing number of websites which write about Big Data, cloud computing and spread wrong information to sell some others paid things.

Many industries have tried the freemium route to attract talent and promote the industry. For instance, Linux OS maker Penguin Computing offered its product for free to users. This move sparked interest among users who wanted to try something other than Windows and Mac.

The move created a huge user base of Linux users and also attracted talent to promote research and development.

Big Data players it seems is following the exact strategy by offering data visualization tools free, which they will monetize later. All that is needed now is patience.

Vishal Ingole, August 10, 2017

Big Data Can Reveal Darkest Secrets

August 8, 2017

Surveys for long have been used by social and commercial organizations for collecting data. However, with easy access to Big Data, it seems people while responding to surveys, lie more than thought about earlier.

In an article by Seth Stephens-Davidowitz published by The Guardian and titled Everybody Lies: How Google Search Reveals Our Darkest Secrets, the author says:

The power in Google data is that people tell the giant search engine things they might not tell anyone else. Google was invented so that people could learn about the world, not so researchers could learn about people, but it turns out the trails we leave as we seek knowledge on the internet are tremendously revealing.

As per the author, impersonal and anonymity of Internet and ease of access is one of the primary reasons why Internet users reveal their darkest secrets to Google (in form of queries).

Big Data which is a form of data scourged from various sources can be a reliable source of information. For instance, surveys say that around 10% of American men are gay. Big Data, however, reveals that only 2-3% of men are actually gay. To know more about interesting insights on Big Data, courtesy Google, read the article here.

Vishal Ingole, August 8, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta