Big Data: Trade Offs Necessary

September 14, 2015

i read “How to Balance the Five Analytic Dimensions.” The article presents information which reminded me of a college professor’s introductory lecture about data analysis.

The basics are definitely important. As the economy slips toward 2016, the notion of trade offs is an important one to keep in mind. According to the article, making sense of data via analytics involves understanding and balancing:

  1. The complexity of the data. Yep, data are often complex.
  2. Speed. Yep, getting results when the outputs are needed is important.
  3. The complexity of the analytics. Yep, adding a column of numbers and calculating the mean may be easy but not what the data doctor ordered.
  4. Accuracy and precision. The idea is that some outputs may be inappropriate for the task at hand. In theory, results should be accurate, or at least accurate enough.
  5. Data size. Yep, crunching lots of data can entail a number of “big” and “complex” tasks.

I agree with these points.

The problem is that the success of a big or small data project with simple or complex analytics is different from a laundry list of points to keep in mind. Knowing the five points is helpful if one is taking a test in a junior college information management class.

The write up does not address the rock upon which many analytics project crashes; that is:

What are the systems and methods for balancing resources across these five dimensions?

Without addressing this fundamental question, how can good decisions be made when the foundation is assumed to be level and stable?

Most analytics work just like the textbook said they would. The outputs are often baloney because the underlying assumptions were assumed to be spot on.

Why not just guess and skip the lecture? I know. Is this an acceptable answer: “That’s too time consuming and above our pay grade”?

The professional who offers this answer may get an A in class but an F in decision making.

Stephen E Arnold, September 14, 2015

Big Data Shockers: Not Big

September 14, 2015

I read “Big Data Doesn’t Exist.” Now “data” is plural, but why get involved in grammar. This is the mobile, thumb typing era.

The write up states:

I’ve found it’s a good rule of thumb to assume a company has one one-thousandth of the data they say they do.

Yep, the data perception anomaly is alive and well. The folks who have too much data are too busy as well. Many of the individuals with whom I come in contact have no time to think new thoughts, complete projects on time, return phone calls, or answer email. Quartz offers “We’re Not Actually That Busy, But We’re Great at Pretending We Are.”

The factors causing the razzle dazzled view of an organization’s data and busy-ness are similar. The inability to look at information or tasks from an informed vantage point creates uncertainty. The easiest way to escape criticism for a strategic failure is to embrace a crazy generalization and protest too much about work that must be completed.

Hence, there is a boom in time management and automatic scheduling. I hear, “My calendar is full.” No kidding. That tells me the person has abrogated responsibility.

The statement that “we have too much data” underscores the individual’s inability to think about information in a way that is helpful. The consequence is the mad dash to software that does the thinking for a professional. There are visualization tools. These make it easy to see what the data allegedly say.

Baloney.

Both the craziness about Big Data and the too much to do approach to work are cover ups.

The issue is rooted deep within many individuals who are unable to cope with the mundane activities of life in the 21st century. The fix is within individuals. Stated another way, there is no fix when there is little or no incentive or desire to take responsibility for work.

I asked a Kentucky Fried Chicken store clerk, “Why do I have to wait for the biscuits to be cooked before I can have two pieces of chicken for my beloved boxers?” The boxers don’t eat biscuits on my watch.

The answer, “That’s what I was told.” Judgment and the instinct to use common sense is absent in the executive suite just as it is at retail fast food outlets.

No sale. Many professionals want a short cut and no responsibility. That’s a mushy foundation for a digital work ethic. Analytics will miss this important nuance when it processes declining revenues.

Stephen E Arnold

Computers Learn Discrimination from Their Programmers

September 14, 2015

One of the greatest lessons one take learn from the Broadway classic South Pacific is that children aren’t born racist, rather they learn about racism from their parents and other adults.  Computers are supposed to be infallible, objective machines, but according to Gizmodo’s article, “Computer Programs Can Be As Biased As Humans.”  In this case, computers are “children” and they observe discriminatory behavior from their programmers.

As an example, the article explains how companies use job application software to sift through prospective employees’ resumes.  Algorithms are used to search for keywords related to experience and skills with the goal of being unbiased related to sex and ethnicity.  The algorithms could also be used to sift out resumes that contain certain phrases and other information.

“Recently, there’s been discussion of whether these selection algorithms might be learning how to be biased. Many of the programs used to screen job applications are what computer scientists call machine-learning algorithms, which are good at detecting and learning patterns of behavior. Amazon uses machine-learning algorithms to learn your shopping habits and recommend products; Netflix uses them, too.”

The machine learning algorithms are mimicking the same discrimination habits of humans.  To catch these computer generated biases, other machine learning algorithms are being implemented to keep the other algorithms in check.  Another option to avoid the biases is to reload the data in a different manner so the algorithms do not fall into the old habits.  From a practical stand point it makes sense: if something does not work the first few times, change the way it is done.

Whitney Grace, September 14, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

GIGO: A Reminder That Your Statistics 101 Class Was Important

September 12, 2015

You are familiar with GIGO, aren’t you? This is an old school acronym which is allegedly shorthand for “garbage in, garbage out.” Do not use this acronym with a civil engineer with a minor in wastewater treatment. Make a joke about a nuclear engineer.

Plus, gentle reader, I assume you remember your Statistics 101 class. The yap about sample size, data quality, various validity checks, and other assorted disturbances of an otherwise normal academic journey.

Now navigate to “The Most Important Thing to Know About Big Data: It’s Not About the Tools.” The write up in a rather pleasant way reminds me that software tools are less important than dealing with current information, accurate data, and complete, normalized data sets.

I know the “data lake” crowd dismisses these issues as trivial, irrelevant, or old fashioned (just like me).

I learned:’

So handling big data isn’t really at all about the tools but, instead, it’s about using them as a part of the process to arrive at the right decisions to meet the organization’s needs. Anyone who doubts that would do well to bear in mind the 10/90 rule put forward nearly 10 years ago by Google co-founder and digital evangelist Avinash Kaushik. He suggested that for every $10 invested in data analytics tools a business should invest $90 in people to actually extract value from the data.

I am not sure most of the Big Data hypesters agree.

Stephen E Arnold, September 12, 2015

Big Data: The McKinsey Way

September 11, 2015

I read “6 Observations from a New Survey on the State of Big Data Analytics.” The data come from a study underwritten by a magazine outfit, a blue chip consulting firm, and a company selling storage and related bright and shiny things.

I found the write up suggestive. The first finding was a bomb shell.

The hype gone, big data is alive and doing well.

Aside from the subject-verb error coming from data is when data is the plural of datum, the information is revolutionary. Big Data is no longer subject to hyperbole. I did not know that. Topsy.com tallied 3,154 tweets about Big Data in the the 24 hours of September 8, 2015. For comparison, Big Data is in a dead heat with the tweets about the Bentley Bentayga SUV. Good company. FYI: Katy Perry managed only 1,468 tweets in the same time period. Nevertheless, in Harrod’s Creek, Big Data, expensive autos, and a musical 30 year old are buzz machines.

The write up reports:

No matter how many times you say “data-driven,” decisions are still not based on data. Sounds familiar? 51% of executives said that adapting and refining a data-driven strategy is the single biggest cultural barrier and 47% reported putting big data learning into action as an operational challenge.

Yikes. More consulting is needed to get this cultural change thing underway.

Other findings that underpin the article are:

  • If the CEO is into Big Data, the company is into Big Data…mostly. If the CEO is like the airline executives in the news, the CEO may have other interests
  • I love this: “Even if you have top leadership sponsorship and the right culture, getting data to drive action and strategy is a challenge.  48% of executives surveyed regard making fact-based business decisions based on data as a key strategic challenge, and 43% cite developing a corporate strategy as a significant hurdle.” Maybe Big Data is not the slam dunk consultants and journalists wish it to be?
  • Brontobyte data. Hey, we have perfectly useful words to suggest unimaginably large quantities. I like yottabyte. The study sponsors seem to be okay with the brontobyte coinage. Very hip, but I would have created a variant of Diplodocus. More colorful for sure.
  • There is a shortage of “big data miners.” Okay, I understand. The user friendly analytics tools are just not too helpful unless a company has someone who actually paid attention in statistics classes.

The only thing missing from this write up is links to the sponsors’ product pages. By the way, the article pumps up Big Data. Amusing stuff.

Stephen E Arnold, September 11, 2015

Subjective Big Data: Marginalized Hype from a Mid Tier Outfit

September 4, 2015

I read “Why Gartner Dropped Big Data Off the Hype Curve.” The article purports to explain why Gartner Group, a mid tier consulting firm, eliminated Big Data from its hype cycle. Let me ask, “Perhaps Big Data reports do not sell to executives who have zero clue what Big Data means to a struggling business?” The write up is an analytics and data clean room. Facts are tough to discern.

The article included a chart without numbers to help knowledge hungry folks figure out what technology is an innovation trigger, a technology which is at the peak of inflated expectations, what technology have fallen (gasp!) into the trough of disillusionment, which are on the slope of enlightenment, and which have reached the plateau of productivity.

The write up fills the empty vessel of my mind with this insight from a mid tier wizard, Betsy Burton. She allegedly revealed:

There’s a couple of really important changes,” Burton says. “We’ve retired the big data hype cycle. I know some clients may be really surprised by that because the big data hype cycle was a really important one for many years. “But what’s happening is that big data has quickly moved over the Peak of Inflated Expectations,” she continues, “…and has become prevalent in our lives across many hype cycles. So big data has become a part of many hype cycles.”

I like that observation about Big Data becoming part of many hype cycles.

That’s reassuring. I don’t know what Big Data is, but it is now part of many hype cycles.

I like subjective statements about what is moving through a hype cycle. When one hype cycle is not enough, then put the fuzzy wuzzy statement into many hype cycles. Neat.

The article explains that other “notable subtractions” took place; for example, drop outs include:

  • Prescriptive analytics, which I presume are numbers which are not used in this article’s graphics. Numbers are so annoying because one must explain where the numbers came from, figure out if the numbers are accurate, and then make decisions about how to extract valid outputs from numerical recipes. Who has time for that?
  • Data science. I am not sure what this means, but it’s off the hype cycle hit parade.
  • Complex event processing. Sounds great but it too is a victim of the delete button.

I view the listing as subjective. Subjectivity is useful, particularly when discussing which painting in the Wildenstein Collection is the best one or which of Mozart’s variations is the hot one.

Objective analyses, in my opinion, to make a case that virtual reality is on the slope of enlightenment or that affective computing is lifting off like a hyperbole fueled rocket.

Am I the only one who finds these subjective lists silly? My hunch is that the reason concepts get added to the list is to create some demand for a forthcoming study. The reason stuff disappears is because reports about the notion do not sell.

I wonder if there are data available from mid tier consulting firms to back up my hypothesis. Well, we can argue whether pale ivory is more attractive than honey milk.

Interior design professionals will go to the mattresses tinted white wisp to defend their subjective color choice. Do mid tier consultants share this passion?

Stephen E Arnold, September 4, 2015

Big Data Vendors Don’t Understand Big Data

August 27, 2015

Sit back and absorb this article’s title for a moment: big data vendors don’t understand big data.  How can IT vendors not understand one of the IT industry’s largest selling products?  According to Computing, “SAP, Oracle, and HP ‘Don’t Get’ Big Data, Claims Massive Analytic Chairman” in a very bold statement.

Executive chairman and founder of the Oscar AP platform George Frangou claims that companies that like Oracle, HP, and SAP do not know how to help their customers take advantage of their big data and are more interested in getting customers hooked into their ecosystems than providing true analytical insight.

One of the reasons Frangou says this is because his Oscar AP is more “advanced” and allows users to foretell the future with various outcomes.  The Oscar AP platform is part of the next round of big data called massive analytics.  HP, Oracle, and SAP cannot wrap their heads around massive analytics yet, because they are more concerned with selling their product.

“Because of this, Frangou said Massive Analytic is ‘quite unashamedly following a displacement strategy to displace the incumbents because they’re not getting it.’  He added that SAP HANA, Oracle Exalytics and HP Haven are essentially the same product because they’re built on the same base code.”

Frangou went on to say that big data customers are spending more money than they need to and are getting sucked into purchasing more products in order to make their big data plans work.  It appears to be a vicious cycle.  Frangou said that cloud analytics are the best option for customers and to go with SAP, although still more barriers remain getting a decent cloud analytics platform off the ground.

It does not come as surprising that big data products are falling short of their promised results.  A similar comparison would be the Windows OS falling well below expected desired performance expectations and users spending way too much time troubleshooting than getting their projects done.

Whitney Grace, August 27, 2015
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Indeed. One Can Fix Government Economic Forecasts

August 26, 2015

Big Data is magic. Big Data is revolutionary. Big Data is good consulting angle.

But Big Data is not going to fix government forecasts. I hate to rain on the parade of a distinguished academic and chief economist, but those rain drops keep a falling.

Navigate to “Economic Forecasts in the Age of Big Data.” The passage I highlighted with my sea of red ink colored marker was:

Properly used, new data sources have the potential to revolutionize economic forecasts. In the past, predictions have had to extrapolate from a few unreliable data points. In the age of Big Data, the challenge will lie in carefully filtering and analyzing large amounts of information. It will not be enough simply to gather data; in order to yield meaningful predictions, the data must be placed in an analytical framework. The Fed may have blundered in releasing its data ahead of schedule. But its mistake offers us an important opportunity. In order to improve economic predictions, economists must be encouraged to seek new sources of data and develop new forecasting models. As we learn how to harness the power of big data, our chances of predicting – and perhaps even preventing – the next recession will improve.

I am thrilled with job opening analyses in Boston, demand for rentals in San Francisco, and housing starts in Los Angeles (you know the water crisis city).

However, government economic analyses are not into reality. In Washington, DC, there is a big building adjacent the train station. It is filled with folks who do economic forecasts among other things. There are economic forecasts cranked out by lobbyists. There are super grades in Federal entities crunching numbers. The numbers get reviewed, shaped, and tweaked. Eventually the numbers emerge in a new release which may or many not be widely distributed. The government process for creating economic forecasts is institutionalized. Like an aircraft carrier, the system carries some momentum.

A person who wants to inject real time Big Data into these procedures can go through the normal process. Get involved in an allocation for an initiative. Find a way to work on a statement of work. Compete for a Big Data economic forecast project. Do the work. Have the work reviewed and taken under advisement.

End of the day: The existing system keeps on generating forecasts.

Net net: Economic forecasts from DC and other world capitals drift above real time. Rome had the same problem.

Stephen E Arnold, August 26, 2015

How to Search the Ashley-Madison Data and Discover If You Had an Affair Too

August 26, 2015

If you haven’t heard about the affair-promoting website Ashley Madison’s data breach, you might want to crawl out from under that rock and learn about the millions of email addresses exposed by hackers to be linked to the infidelity site. In spite of claims by parent company Avid Life Media that users’ discretion was secure, and that the servers were “kind of untouchable,” as many as 37 million customers have been exposed. Perhaps unsurprisingly, a huge number of government and military personnel have been found on the list. The article on Reuters titled Hacker’s Ashley Madison Data Dump Threatens Marriages, Reputations also mentions that the dump has divorce lawyers clicking their heels with glee at their good luck. As for the motivation of the hackers? The article explains,

“The hackers’ move to identify members of the marital cheating website appeared aimed at maximum damage to the company, which also runs websites such as Cougarlife.com andEstablishedMen.com, causing public embarrassment to its members, rather than financial gain. “Find yourself in here?,” said the group, which calls itself the Impact Team, in a statement alongside the data dump. “It was [Avid Life Media] that failed you and lied to you. Prosecute them and claim damages. Then move on with your life. Learn your lesson and make amends. Embarrassing now, but you’ll get over it.”

If you would like to “find yourself” or at least check to see if any of your email addresses are part of the data dump, you are able to do so. The original data was put on the dark web, which is not easily accessible for most people. But the website Trustify lets people search for themselves and their partners to see if they were part of the scandal. The website states,

“Many people will face embarrassment, professional problems, and even divorce when their private details were exposed. Enter your email address (or the email address of your spouse) to see if your sexual preferences and other information was exposed on Ashley Madison or Adult Friend Finder. Please note that an email will be sent to this address.”

It’s also important to keep in mind that many of the email accounts registered to Ashley Madison seem to be stolen. However, the ability to search the data has already yielded some embarrassment for public officials and, of course, “family values” activist Josh Duggar. The article on the Daily Mail titled Names of 37 Million Cheating Spouses Are Leaked Online: Hackers Dump Huge Data File Revealing Clients of Adultery Website Ashley Madison- Including Bankers, UN and Vatican Staff goes into great detail about the company, the owners (married couple Noel and Amanda Biderman) and how hackers took it upon themselves to be the moral police of the internet. But the article also mentions,

“Ashley Madison’s sign-up process does not require verification of an email address to set up an account. This means addresses might have been used by others, and doesn’t prove that person used the site themselves.”

Some people are already claiming that they had never heard of Ashley Madison in spite of their emails being included in the data dump. Meanwhile, the Errata Security Blog entry titled Notes on the Ashley-Madison Dump defends the cybersecurity of Ashley Madison. The article says,

“They tokenized credit card transactions and didn’t store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn’t become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.”

Praise for this, if for nothing else. The impact of this data breach is still only beginning, with millions of marriages and reputations in the most immediate trouble, and the public perception of the cloud and cybersecurity close behind.

 

Chelsea Kerwin, August 26, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Elasticsearch is the Jack of All Trades at Goldman Sachs

August 25, 2015

The article titled Goldman Sachs Puts Elasticsearch to Work on Information Week discusses how programmers at Goldman Sachs are using Elasticsearch. Programmers there are working on applications to exploit both the data retrieval capabilities as well as the faculty it has for unstructured data. The article explains,

“Elasticsearch and its co-products — Logstash, Elastic’s server log data retrieval system, and Kibana, a dashboard reporting system — are written in Java and behave as core Java systems. This gives them an edge with enterprise developers who quickly recognize how to integrate them into applications. Logstash has plug-ins that draw data from the log files of 165 different information systems. It works natively with Elasticsearch and Kibana to feed them data for downstream analytics, said Elastic’s Jeff Yoshimura, global marketing leader.”

The article provides detailed examples of how Elastic is being used in legal, finance, and engineering departments within Goldman Sachs. For example, rather than hiring a “platoon of lawyers” to comb through Goldman’s legal contracts, a single software engineer was able to build a system that digitized everything and flagged contract documents that needed revision. With over 9,000 employees, Goldman currently has several thousand using Elasticsearch. The role of search has expanded, and it is important that companies recognize the many functions it can provide.

Chelsea Kerwin, August 25, 2015

Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

 

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta