September 21, 2016
I read “Basic Understanding of Big Data. What Is This and How It Is Going to Solve Complex Problems.” My initial reaction was that the article recycled Hewlett Packard Enterprise marketing exhaust. The author describes himself as:
a Software Geek, graduated in Computer Science and Engineering, and currently working as a BigData Developer in one of the leading MultiNational Company. (sic)
For me, the main idea in the article is that there is a great deal of Big Data sloshing around the datasphere. For proof, the write up reproduces an image from Hewlett Packard, and outfit now focused on infrastructure and consulting. The software has been spun off into an arm’s length outfit to the applause of lawyers, investment bankers, and accountants.
It appears that HPE has calculated that every 60 second these digital emissions take place:
- More than 98,000 tweets
- 595,000 Facebook status updates
- 11 million instant messages
- 698,445 Google search
- More than 168 million emails sent
- 1,820 terabytes of data created (no word on whether this is double counting of the tweets, instant messages, etc.)
- 217 new mobile Web users.
Nifty numbers but no footnote. My question? What are the sources of these data?
The article then trots out the Dracula lurking in the shadows—Big Data. Again Hewlett Packard Enterprise becomes the source of the fangs. A visual which looks like a PowerPoint slide says that we have “gone beyond the decimal systems.” Really? I particularly liked the introduction of the “brontobyte.” According to the art work, a brontobyte is a one followed by 27 zeros. Oh, I thought the decimal system was dead yet we are using it to explain this big numeric concept. Strikes me as goose feathers.
I was thrilled to see that the article then shifts gears to embrace IBM’s jargon about volume, variety, and velocity or the three Vs of Big Data. Data problems are really bad when the three Vs come into the game.
The article then introduces some technical concepts which have been kicking around since the late 1990s.
The author then tells me:
this article talks about only a glass of water from the entire ocean. Go get started and take a dip dive in the bigdata world or if i can say BigData Planet
- Recycling the marketing of a company like Hewlett Packard does not inspire confidence in the factual foundation of the article
- Dipping into IBM’s marketing jargon just makes me nervous. IBM has been lining up declining revenues for years
- Urging me to “take a dip” in the “bigdata world” (sic) is advice that is wonky.
What I liked about the write up was the inclusion of the report that “we have gone beyond the decimal system.” Sounds crazy to me. What if the brontobyte is carnivorous and hungry for Big Data expert stew.
Stephen E Arnold, September 21, 2016
September 12, 2016
In the article “The Man Who Lit the Dark Web ,” the author states that
An estimated 21 million people are being trafficked around the planet. More than half are women and girls. More than 1 million are children. Nearly one-quarter are bought and sold as sex slaves. Only 1-in-100 victims of human trafficking is ever rescued. It’s a booming business. High profits and low risk make human trafficking one of the fastest-growing and most lucrative crimes on the planet; the U.N. recently estimated that trafficking nets $150 billion a year.
With Dark Web, traffickers have realized that its easier for them to operate away from the eyes law enforcement. The article asserts:
The “surface” Web, or open Web, represents between 5 and 20 percent of what’s out there. The rest resides in places that most crawlers can’t reach or index. Some data are “deep,” in password-protected places like social media and message boards, or in increasingly common dynamic websites—which are more like apps than pages from a book, and change when you interact with them, like Kayak. The rest of the Web is “dark.”
White’s approach is to tackle the Dark Web with Big Data. The author of the article spent a decade of his life helping US Army track, penetrate and destroy financial networks of terrorist organizations. Will the Big Data approach actually work?
Certainly White, a Microsoft employee, is helping. Agencies like Defense Advanced Research Projects Agency (DARPA) and similar organizations may have to channel research funds into initiatives like White’s. Otherwise, the payoff from commercial innovations will put a lid on efforts like White’s.
Vishal Ingole, September 12, 2016
September 7, 2016
The size and volume that characterizes an information set as big data — and the tools used to process — is relative to the current era. A story from NPR reminds us of this as they ask, Can Web Search Predict Cancer? Promise And Worry Of Big Data And Health. In 1600’s England, a statistician essentially founded demography by compiling details of death records into tables. Today, trends from big data are drawn through a combination of assistance from computer technology and people’s analytical skills. Microsoft scientists conducted a study showing that Bing search queries may hold clues to a future diagnosis of pancreatic cancer.
The Microsoft scientists themselves acknowledge this [lack of comprehensive knowledge and predictive abilities] in the study. “Clinical trials are necessary to understand whether our learned model has practical utility, including in combination with other screening methods,” they write. Therein lies the crux of this big data future: It’s a logical progression for the modern hyper-connected world, but one that will continue to require the solid grounding of a traditional health professional, to steer data toward usefulness, to avoid unwarranted anxiety or even unnecessary testing, and to zero in on actual causes, not just correlations within particular health trends.”
As the producers of data points in many social-related data sets, and as the original analyzers of big data, it makes sense that people remain a key part of big data analytics. While this may be especially pertinent in matters related to health, it may be more intuitively understood in this sector in contrast to others. Whether health or another sector, can the human variable ever be taken out of the data equation? Perhaps such a world will give rise to whatever is beyond the current buzz around the phrase big data.
Megan Feil, September 7, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph
There is a Louisville, Kentucky Hidden Web/Dark Web meet up on September 27, 2016.
Information is at this link: https://www.meetup.com/Louisville-Hidden-Dark-Web-Meetup/events/233599645/
August 25, 2016
The idea that Everyman can tap into a real time data stream and perform “analyses” is like catnip for some. The concept appeals to those in the financial sector, but these folks often have money (yours and mine) to burn. The idea seems to snag the attention of some folks in the intelligence sector who want to “make sense” out of Twitter streams and similar flows of “social media.” In my experience, big outfits with a need to tap into data streams have motivation and resources. Most of those who fit into my pigeonhole have their own vendors, systems, and methods in place.
The question is, “Does Tom’s Trucking need to tap into real time data flows to make decisions about what paint to stock or what marketing pitch to use on the business card taped to the local grocery’s announcement board?”
I plucked from my almost real time Web information service (Overflight) two articles suggesting that there is money in “them thar hills” of data.
The first is “New Amazon Service Uses SQL To Query Streaming Big Data.” Amazon is a leader in the cloud space. The company may not be number one on the Gartner hit parade, but some of those with whom I converse believe that Amazon continues to be the cloud vendor to consider and maybe use. The digital Wal-Mart has demonstrated both revenue and innovation with its cloud business.
The article explains that Amazon has picked up the threads of Hadoop, SQL, and assorted enabling technologies and woven Amazon Kinesis Analytics. The idea is that Amazon delivers a piping hot Big Data pizza via a SQL query. The write up quotes an Amazon wizard as saying:
“Being able to continuously query and gain insights from this information in real-time — as it arrives — can allow companies to respond more quickly to business and customer needs,” AWS said in a statement. “However, existing data processing and analytics solutions aren’t able to continuously process this ‘fast moving’ data, so customers have had to develop streaming data processing applications — which can take months to build and fine-tune — and invest in infrastructure to handle high-speed, high-volume data streams that might include tens of millions of events per hour.”
Additional details appear in Amazon’s blog post here. The idea is that anyone with some knowledge of things Amazon, coding expertise, and a Big Data stream can use the Amazon service.
The second write up is “Microsoft Power BI Dashboards Deliver Real Time Data.” The idea seems to be that Microsoft is in the real time data analysis poker game as well. The write up reveals:
Power BI’s real-time dashboards — known as Real-Time Dashboard tiles — builds on the earlier Power BI REST APIs release to create real-time tiles within minutes. The tiles push data to the Power BI REST APIs from streams of data created in PubNub, a real-time data streaming service currently used widely for building web, mobile and IoT applications.
The idea is that a person knows the Microsoft methods, codes the Microsoft way, and has a stream of Big Data. The user then examines the outputs via “tiles.” These are updated in real time. As mentioned above, Microsoft is the Big Data Big Dog in the Gartner kennel. Obviously Microsoft will be price competitive with the service prices at about $10 per month. The original price was about $40 a month, but the cost cutting fever is raging in Redmond.
The question is, “Which of these services will dominate?” Who knows? Amazon has a business and a real time pitch which makes sense to those who have come to depend on the AWS services. Microsoft has business customers, Windows 10, and a reseller/consulting community eager to generate revenue.
My thought is, “Pick your horse, put down your bet, and head to the Real Time Data Analytics race track.” Tomorrow’s $100 ticket is only a few bucks today. The race to low cost entry fees is about to begin.
Stephen E Arnold, August 25, 2016
August 16, 2016
The weekly news program about search, online, and content processing is now available at https://youtu.be/mE3MGlmrUWc. In addition to comments about Goo!Hoo, IBM, and Microsoft, you will learn about grilling squirrel over a wood fire. Live from Harrod’s Creek.
Stephen E Arnold, August 16, 2016
August 10, 2016
I read an interview with a wizard from Talend, which I did not know had French roots. The write up is “Interview: Christophe Toum, Talend on Why Big Data Needs Big Governance.” I noted two passages which I found refreshing.
The first address the unpleasant topic of being organized. The code word for this all-too-human characteristic is “governance.” I highlighted this passage:
At Talend we believe Big Data without governance will quickly become a big problem…Big Data needs even more governance.
My view is that more of an annoying administrative, human subject matter intensive investment required, the less governance will be applied. Just a thought based on my experience.
The second comment elicited one exclamation report from my subdued pale blue highlighter:
Controlling who can access and use this data, what data is verified and trusted, by whom and how, is a big deal.
Stephen E Arnold, August 10, 2016
August 9, 2016
I read “Cracking the Data Conundrum: How Successful Companies Make Big Data Operational.” The high level, super sophisticated, MBA quivering report is free. Does that mean that Capgemini Consulting is trying to drum up business? I thought these top level outfits generated 90 percent of their annual revenue from repeat business? Perhaps today’s economic climate is different?
The report is interesting because the premise is that Capgemini has solved a “conundrum.” This is a nifty word which I learned when I was a wee lad trying to keep my tutor in Campinas, Brazil, happy. I recall that the word was used by one Thomas Nash (no, not a relative of the Nash made famous with the quip “the golden trashery of Ogden Nashery). But that neologistic meaning has a fresh charge of meaning for me; to wit:
A term of abuse for a crank or a pedant.
Today the word is popular among the MBA set as a solvable problem. However, a conundrum can be another word for dilemma. That’s a logical word for illogical statements; for example,
Bruno was gored on the horns of a big, angry dilemma.
What does the Capgemini document suggest is the resolution to the problem of Big Data.
The write up tells the reader that most outfits trying to integrate Big Data into every day work life screw up. The fancy wording is:
Successful Big Data implementations elude most organizations.
That’s bad for the organizations, and I assume really good for consultants who know how to deal with wasted money.
The problem? Organizations’ management are not able to manage. I learned:
Our research revealed that the top challenges that organizations face include: dealing with scattered silos of data, ineffective coordination of analytics initiatives, the lack of a clear business case for Big Data funding, and the dependence on legacy systems to process and analyze Big Data.
Imagine organizations have these flaws. What are they to do?
Step one is to get their act together; that is, organize for Big Data. Sounds good. But what if the organization is set up to do something else; for instance, make men’s shirts or do publicity of a Hollywood motion picture?
Well, these outfits need to have a systematic approach to Big Data. And one size does not fit every organization. Capgemini identifies four ways to put the ponies in the circus wagon. These are:
- Scattered pockets of Big Data stuff
- Decentralized Big Data stuff. (How is this different from “scattered pockets”?)
- Centralized Big Data stuff
- A Big Data business unit. (This is the one that delivers the most “success.” I am not sure for whom however.)
How does an organization move from total loser in Big Data to a successful outfit integrating Big Data into operations? This effort, which will be billed either as a flat fee, a retainer, or time and materials basis, is an “implementation journey.” I have a hunch that this trip will not a 10 walk to the convenient store for a bottle of Big Red soda pop. The trip will be a hike through the Ural mountains in winter.
The write up includes a test. This makes it easy for the shirt maker in Bangladesh or the 20 somethings working from a trailer in Orange County to put their act in the circus’ center ring.
The write up references a survey conducted in 2014. I suppose in the slow moving world of the shirt makers and Hollywood publicists a year and a half is a reasonable time interval.
If you want to test your understanding of the word “conundrum,” you will want to read this free report. Only you can answer this question: Does conundrum reference a crank or pedant or a hapless MBA dangling from a sharp horn? Whenever horns of a bull enter a conversation, other stuff may follow.
Stephen E Arnold, August 9, 2016
August 9, 2016
You can view the August 8, 2016, HonkinNews program at this link. The video comes from Goodwill-grade 8 mm film equipment. The program highlights recent stories from the free (yep, no cost whatsoever) Beyond Search Web log. Learn about the how one Google executive “escaped” life in the fast lane. The Verizon acquisition of Yahoo reminds Stephen of Washington’s wooden false teeth. The deal allows Verizon to own two Internet artifacts. Hewlett Packard Enterprise, owner of Autonomy, faces an uncertain future as its sells units and thinks about selling itself. And there’s more in the six minute news program; for example, a restrained MBA cheer for Big Data. But that’s a sotte voce rah, rah. Like Beyond Search, the honking video version tries to separate the giblets from the goose feathers in the thrilling world of search, content processing, and related disciplines. That’s not easy in today’s search-centric world where relevance is mostly good enough and jargon is its own virtual reality.
Ken Toth, August 9, 2016
August 2, 2016
I read “Advisors and Big Data: The Disconnect.” Stunned am I. Consultants not listening to their clients. Systems with severed communication channels to those to who their licensing bills. Unbelievable.
But while many companies have big dough invested in this ongoing project, they still rely far too much on intuition and gut instinct instead of using their data to operate. This is often due to a fundamental disconnection between the actual needs of the business versus what the data analytics are designed to deliver.
The write up makes a number of statements which suggest there is some snake oil laced with ineptitude in the Big Data world; for example:
- Analytics enable. What if analytics enable poor decision making?
- Algorithms are not a “magic kit.” I thought algorithms were really smart.
- Bad data are bad. Really?
- Data are not insights. I thought data were chock full of insight.
- Moving big data from Point A to Point B is not a slam dunk. What about a three point shot?
If these points resonate with you, you are probably not getting with the Big Data program. I thought Big Data was a silver bullet and a magic potion blended in one tasty for fee meal. Stunned, I tell you. Stunned. Imagine. Disconnect advisors.
Stephen E Arnold, August 2, 2016
August 1, 2016
Remember in the 1979 hit The Muppet Movie there was a running gag where Kermit the Frog kept saying, “It’s a myth. A myth!” Then a woman named Myth would appear out of nowhere and say, “Yes?” It was a funny random gag, but while it is a myth that frogs give warts, most of the myths related to big data may or not be. Data Science Central decided to explain some of the myths in, “Debunking The 68 Most Common Myths About Big Data-Part 2.”
Some of the prior myths debunked in the first part were that big data was the newest power word, an end all solution for companies, only meant for big companies, and that it was complicated and expensive. In truth, anyone can benefit from big data with a decent implementation plan and with someone who knows how to take charge of it.
Big data, in fact, can be integrated with preexisting systems, although it takes time and knowledge to link the new and the old together (it is not as difficult as it seems). Keeping on that same thought, users need to realize that there is not a one size fits all big data solution. Big data is a solution that requires analytical, storage, and other software. It cannot be purchased like other proprietary software and it needs to be individualized for each organization.
One myth that is has converted into truth is that big data relies on Hadoop storage. It used to be Hadoop managed a market of many, but bow it is an integral bit of software needed to get the big data job done. One of the most prevalent myths is it only belongs in the IT department:
“Here’s the core of the issue. Big Data gives companies the greatly enhanced ability to reap benefits from data-driven insights and to make better decisions. These are strategic issues.
You know who is most likely to be clamoring for Big Data? Not IT. Most likely it’s sales, marketing, pricing, logistics, and production forecasting. All areas that tend to reap outsize rewards from better forward views of the business.”
Big data is becoming more of an essential tool for organizations in every field as it tells them more about how they operate and their shortcomings. Big data offers a very detailed examination of these issues; the biggest issue users need to deal with is how they will use it?