CyberOSINT banner

Big Data Vendors Don’t Understand Big Data

August 27, 2015

Sit back and absorb this article’s title for a moment: big data vendors don’t understand big data.  How can IT vendors not understand one of the IT industry’s largest selling products?  According to Computing, “SAP, Oracle, and HP ‘Don’t Get’ Big Data, Claims Massive Analytic Chairman” in a very bold statement.

Executive chairman and founder of the Oscar AP platform George Frangou claims that companies that like Oracle, HP, and SAP do not know how to help their customers take advantage of their big data and are more interested in getting customers hooked into their ecosystems than providing true analytical insight.

One of the reasons Frangou says this is because his Oscar AP is more “advanced” and allows users to foretell the future with various outcomes.  The Oscar AP platform is part of the next round of big data called massive analytics.  HP, Oracle, and SAP cannot wrap their heads around massive analytics yet, because they are more concerned with selling their product.

“Because of this, Frangou said Massive Analytic is ‘quite unashamedly following a displacement strategy to displace the incumbents because they’re not getting it.’  He added that SAP HANA, Oracle Exalytics and HP Haven are essentially the same product because they’re built on the same base code.”

Frangou went on to say that big data customers are spending more money than they need to and are getting sucked into purchasing more products in order to make their big data plans work.  It appears to be a vicious cycle.  Frangou said that cloud analytics are the best option for customers and to go with SAP, although still more barriers remain getting a decent cloud analytics platform off the ground.

It does not come as surprising that big data products are falling short of their promised results.  A similar comparison would be the Windows OS falling well below expected desired performance expectations and users spending way too much time troubleshooting than getting their projects done.

Whitney Grace, August 27, 2015
Sponsored by, publisher of the CyberOSINT monograph

Indeed. One Can Fix Government Economic Forecasts

August 26, 2015

Big Data is magic. Big Data is revolutionary. Big Data is good consulting angle.

But Big Data is not going to fix government forecasts. I hate to rain on the parade of a distinguished academic and chief economist, but those rain drops keep a falling.

Navigate to “Economic Forecasts in the Age of Big Data.” The passage I highlighted with my sea of red ink colored marker was:

Properly used, new data sources have the potential to revolutionize economic forecasts. In the past, predictions have had to extrapolate from a few unreliable data points. In the age of Big Data, the challenge will lie in carefully filtering and analyzing large amounts of information. It will not be enough simply to gather data; in order to yield meaningful predictions, the data must be placed in an analytical framework. The Fed may have blundered in releasing its data ahead of schedule. But its mistake offers us an important opportunity. In order to improve economic predictions, economists must be encouraged to seek new sources of data and develop new forecasting models. As we learn how to harness the power of big data, our chances of predicting – and perhaps even preventing – the next recession will improve.

I am thrilled with job opening analyses in Boston, demand for rentals in San Francisco, and housing starts in Los Angeles (you know the water crisis city).

However, government economic analyses are not into reality. In Washington, DC, there is a big building adjacent the train station. It is filled with folks who do economic forecasts among other things. There are economic forecasts cranked out by lobbyists. There are super grades in Federal entities crunching numbers. The numbers get reviewed, shaped, and tweaked. Eventually the numbers emerge in a new release which may or many not be widely distributed. The government process for creating economic forecasts is institutionalized. Like an aircraft carrier, the system carries some momentum.

A person who wants to inject real time Big Data into these procedures can go through the normal process. Get involved in an allocation for an initiative. Find a way to work on a statement of work. Compete for a Big Data economic forecast project. Do the work. Have the work reviewed and taken under advisement.

End of the day: The existing system keeps on generating forecasts.

Net net: Economic forecasts from DC and other world capitals drift above real time. Rome had the same problem.

Stephen E Arnold, August 26, 2015

How to Search the Ashley-Madison Data and Discover If You Had an Affair Too

August 26, 2015

If you haven’t heard about the affair-promoting website Ashley Madison’s data breach, you might want to crawl out from under that rock and learn about the millions of email addresses exposed by hackers to be linked to the infidelity site. In spite of claims by parent company Avid Life Media that users’ discretion was secure, and that the servers were “kind of untouchable,” as many as 37 million customers have been exposed. Perhaps unsurprisingly, a huge number of government and military personnel have been found on the list. The article on Reuters titled Hacker’s Ashley Madison Data Dump Threatens Marriages, Reputations also mentions that the dump has divorce lawyers clicking their heels with glee at their good luck. As for the motivation of the hackers? The article explains,

“The hackers’ move to identify members of the marital cheating website appeared aimed at maximum damage to the company, which also runs websites such as, causing public embarrassment to its members, rather than financial gain. “Find yourself in here?,” said the group, which calls itself the Impact Team, in a statement alongside the data dump. “It was [Avid Life Media] that failed you and lied to you. Prosecute them and claim damages. Then move on with your life. Learn your lesson and make amends. Embarrassing now, but you’ll get over it.”

If you would like to “find yourself” or at least check to see if any of your email addresses are part of the data dump, you are able to do so. The original data was put on the dark web, which is not easily accessible for most people. But the website Trustify lets people search for themselves and their partners to see if they were part of the scandal. The website states,

“Many people will face embarrassment, professional problems, and even divorce when their private details were exposed. Enter your email address (or the email address of your spouse) to see if your sexual preferences and other information was exposed on Ashley Madison or Adult Friend Finder. Please note that an email will be sent to this address.”

It’s also important to keep in mind that many of the email accounts registered to Ashley Madison seem to be stolen. However, the ability to search the data has already yielded some embarrassment for public officials and, of course, “family values” activist Josh Duggar. The article on the Daily Mail titled Names of 37 Million Cheating Spouses Are Leaked Online: Hackers Dump Huge Data File Revealing Clients of Adultery Website Ashley Madison- Including Bankers, UN and Vatican Staff goes into great detail about the company, the owners (married couple Noel and Amanda Biderman) and how hackers took it upon themselves to be the moral police of the internet. But the article also mentions,

“Ashley Madison’s sign-up process does not require verification of an email address to set up an account. This means addresses might have been used by others, and doesn’t prove that person used the site themselves.”

Some people are already claiming that they had never heard of Ashley Madison in spite of their emails being included in the data dump. Meanwhile, the Errata Security Blog entry titled Notes on the Ashley-Madison Dump defends the cybersecurity of Ashley Madison. The article says,

“They tokenized credit card transactions and didn’t store full credit card numbers. They hashed passwords correctly with bcrypt. They stored email addresses and passwords in separate tables, to make grabbing them (slightly) harder. Thus, this hasn’t become a massive breach of passwords and credit-card numbers that other large breaches have lead to. They deserve praise for this.”

Praise for this, if for nothing else. The impact of this data breach is still only beginning, with millions of marriages and reputations in the most immediate trouble, and the public perception of the cloud and cybersecurity close behind.


Chelsea Kerwin, August 26, 2015

Sponsored by, publisher of the CyberOSINT monograph

Elasticsearch is the Jack of All Trades at Goldman Sachs

August 25, 2015

The article titled Goldman Sachs Puts Elasticsearch to Work on Information Week discusses how programmers at Goldman Sachs are using Elasticsearch. Programmers there are working on applications to exploit both the data retrieval capabilities as well as the faculty it has for unstructured data. The article explains,

“Elasticsearch and its co-products — Logstash, Elastic’s server log data retrieval system, and Kibana, a dashboard reporting system — are written in Java and behave as core Java systems. This gives them an edge with enterprise developers who quickly recognize how to integrate them into applications. Logstash has plug-ins that draw data from the log files of 165 different information systems. It works natively with Elasticsearch and Kibana to feed them data for downstream analytics, said Elastic’s Jeff Yoshimura, global marketing leader.”

The article provides detailed examples of how Elastic is being used in legal, finance, and engineering departments within Goldman Sachs. For example, rather than hiring a “platoon of lawyers” to comb through Goldman’s legal contracts, a single software engineer was able to build a system that digitized everything and flagged contract documents that needed revision. With over 9,000 employees, Goldman currently has several thousand using Elasticsearch. The role of search has expanded, and it is important that companies recognize the many functions it can provide.

Chelsea Kerwin, August 25, 2015

Sponsored by, publisher of the CyberOSINT monograph


Oh, Oh. Big Data Has Problems. Impossible.

August 21, 2015

A happy quack to the reader who alerted me to “5 Problems with Big Data.” How can this be? Big Data is the new black, the new enterprise search, the new information management opportunity.

The write up states:

But when data gets big, big problems can arise.

The article identifies five issues. Most of these strike me as trivial for MBAs and failed middle school teachers to resolve before lunch. The alleged problems are:

  • Storage. Hey, hey. I thought storage and the management thereof were a no brainer. But I have heard rumors that finding useful items and moving them around may contribute to digital heart burn.
  • Bias. What! Incredible. I heard an MBA say at a conference not long ago that with Big Data little issues get smoothed out. Imagine. Big Data works like an electric iron with a spritz feature.
  • False positives. Yo, dude. Those are things one talks about in Statistics 101. So a method says Tom and Betty have Ebola. After a quick check up at the doc in the box, both seem to be suffering from bad pizza and a sleepless night caused by worrying about the mid term statistics test. So what if a financial model predicts that GOOG and GOOGL shares no upward boundary. Hello, infinity.
  • Complexity. Gasp. Layering SAP with SAS components within a SharePoint environment is complex. No way, José. This is century 21. We can crash a lander on an asteroid. We can handle a simple upgrade to an air traffic control system.
  • Outputs which answer a question no one asked. Look, gentle reader, we have IBM Watson. That system can answer the question, “What sauce will tamarind enhance?” The answer which made perfect sense to me was barbeque sauce. Who worries if the question was a coded string intercepted from a anonymous post on a Dark Web forum.

Stepping back I have complete confidence in the confidence men and women pitching the Big Data thing. Five speed bumps presented as real, live problems. Big Data is the answer. Enterprise search vendors like Lucid Imagination and wizards like the IDC crowd which sold some of my work without my permission on Amazon (Dave Schubmehl, where are you?) know that Big Data will do the revenue trick.

Problems are just too darned negative. I want a happy face on that flawed, incomprehensible, irrelevant, and expensive report. This is the modern world, not tout at the chariot races pitching Nero’s team.

Get real. We have no “problems.” We have opportunities.

Stephen E Arnold, August 21, 2015

Quote to Note: Confluent

August 20, 2015

I read “Meet Confluent, The Big-Data Startup That Has Silicon Valley Buzzing.” Confluent can keep “he data flowing at some of the biggest and most information-rich firms in Silicon Valley.” The company’s Web site is The company uses Apache Kafka to deliver its value to customers.

Here’s the passage i noted:

Experts suggest Confluent’s revenue could approach $10 million next year and pass $50 million in 2017. The company could echo the recent success of another open-source darling, Docker, which has turned record adoption of its computing tools called “containers” into a growing enterprise suite and a $1 billion valuation. Confluent is likely worth about one-sixth that today but not for long. “Every person we hire uncovers millions of dollars in sales,” says early investor Eric Vishria of Benchmark. “There’s real potential [for Confluent] to be an enterprise phenomenon.”

I noted the congruence of Docker and Confluence. I enjoyed the word “every”. Categorical affirmatives are thrilling. I liked also “phenomenon.” The article’s omission of a reference to Palantir surprised me.

Nevertheless, I have a question: “Has another baby unicorn been birthed?” According to Crunchbase, the company has raised more than $50 million. With 17 full time employees, Confluent may be hiring. Perhaps some lucid engineers will see the light?

Stephen E Arnold, August 20, 2015

Data Lake Alert: Tepid Water, High Concentration of Agricultural Runoff

August 13, 2015

Call me skeptical. Okay, call me a person who is fed up with silly jargon. You know what a database is, right? You know what a data warehouse is, well, sort of, maybe? Do you know what a data lake is? I don’t.

A lake, according to the search engine du jour Giburu:

An area prototypically filled with water, also of variable size.

A data lake, therefore, is an area filled with zeros and ones, also of variable size. How does a data lake differ from a database or a data warehouse?

According to the write up “Sink or Swim – Why your Organization Needs a Data Lake”:

A Data Lake is a storage repository that holds a vast amount of raw data in its native format for processing later by the business.

The magic in this unnecessary jargon is, in my opinion, a quest, perhaps Quixotic?) for sales leads. The write up points out that a data lake is available. A data lake is accessible. A data lake is—wait for it—Hadoop.

What happens if the water is neither clear nor pristine? One cannot unleash the hounds of the EPA to resolve the problem of data which may not very good until validated, normalized, and subjected to the ho hum tests which some folks want to have me believe may be irrelevant steps in the land of a marketer’s data lakes.

My admonition, “Don’t drink the water until you know it won’t make life uncomfortable—or worse. Think fatal.”

Stephen E Arnold, August 13, 2015

Coauthoring Documents in SharePoint to Save Time

August 4, 2015

SharePoint users are often looking for ways to save time and streamline the process of integration from other programs. Business Management Daily has devoted some attention to the topic with their article, “Co-authoring Documents in SharePoint and Office.” Read on for the full details of how to make the most of this feature.

The article begins:

“One of the best features of SharePoint 2010 and 2013 is the way it permits co-authoring. Co-authoring means more than one person is in a document, workbook or presentation at the same time editing different parts. It works differently in Word, Excel and PowerPoint . . . With Word 2013/SharePoint 2013, co-authors may edit either in Word Online (Word Web App) or the desktop version.”

SharePoint is a powerful but complicated solution that requires quite a bit of energy to maintain and use to the best of its ability. For those users and managers that are tasked with daily work in SharePoint, staying in touch with the latest tips and tricks is vital. Those users may benefit from Stephen E. Arnold’s Web site, A longtime leader in search, Arnold brings the latest SharePoint news together in one easy to digest news feed.

Emily Rae Aldridge, August 4, 2015

Sponsored by, publisher of the CyberOSINT monograph

Data Science, Senior Managers, and the Ever Interesting Notion of Truth

August 3, 2015

I read “Data Scientists to CEOs: You Can’t Handle the Truth.” I enjoy write ups about data science which start off with the notion of truth. I know that the “truth” referenced is the outputs of analytics systems.

Call me skeptical. If the underlying data are not normalized, validated, and timely, the likelihood of truth becomes even murkier than it was in my college philosophy class. Roger Ailes allegedly said:

Truth is whatever people will believe.

Toss in the criticism of a senior manager who in the US is probably a lawyer or an accountant, and you have a foul brew. Why would a manager charged with hitting quarterly targets or generating enough money to meet payroll quiver with excitement when a data scientist presents “truth.”

There is that pesky perception thing. There are frames of reference. There are subjective factors in play. Think of the dentist who killed Cecil. I am not sure data science will solve his business and personal challenges. Do you?

The write up is a silly fan rant for the fuzzy discipline of data science. Data science does not pivot on good old statisticians with their love of SAS and SPSS, fancy math, and 17th century notions of what constitutes a valid data set. Nope.

The data scientist has to communicate the known unknowns to his or her CEO. Shades of Rumsfeld. Does today’s CEO want to know more about the uncertainty in the business? The answer is, “Maybe.” But senior managers often get information that is filtered, shaped, and presented to create an illusion. Shattering those illusions can have some negative career consequences even for data scientists, assuming there is such a discipline as data science.

Evoking the truth from statistical processes which are output from system configured by others can be interesting. Those threshold settings are not theoretical. Those settings determine what the outputs are and what they are “about.”

Connecting an automated output to something that the data scientist asserts should be changed strikes me as somewhat parental. How does that work on a manager like Dick Cheney? How does that work on the manager of a volunteer committee working on a parent teacher luncheon?

I thought the Jack Benny program from the 1930s to 1960s was amusing. Some of the output about data science suggests that comedy may be a more welcoming profession than management based on truth from data science. Truth and statistics. Amazing comedy.

Stephen E Arnold, August 3, 2015

Big Data Lake: Are the Data Safe to Consume?

August 2, 2015

I read “The Analytics Journey Leading to the Business Data Lake.” Data lake is one of the terms floating around (pun definitely intended!) to stimulate sales. If one has a great deal of water, one needs a place to put it. Even though water is dammed, piped, used, recycled, and dumped—storage is the key.

Enter EMC, a company which is in the business of helping those with water store it and make use of that substance.

The write up reflects effort. I assume there was a PowerPoint slide deck in the mix. There are some snazzy graphics. Here’s one that caught my eye:


Instead of enterprise search being the go-to enterprise software solution, EMC has slugged in the following umbrella terms:

  • Information ecosystem
  • Business intelligence (perhaps an oxymoron in light of this article)
  • Advanced analytics (obviously because regular analytics just are zippy enough)
  • Knowledge layer (I remain puzzled about knowledge because I have a tough time defining. In fact, I resigned from my for fee knowledge management column because I just don’t know what the heck “knowledge” means.)
  • The unfathomable data lake (yep, pun intended). What’s wrong with the word “storage” or “database” by the way?
  • Master data which is also baffling. Is there servant data too?
  • Machine data. Again I have no clue what this means.

The chart scatters undefined and fuzzy buzzwords like a crazed Jethro Tull, a water soluble blend of Jethro Tull (inventor of the seed drill) and Jethro Tull (the commercially successful and eccentric rock bands).

The write up is important because EMC has sucked in the jargon and assertions once associated with enterprise search and applied them to the dark and mysterious data lake.

I highlighted:

Our data lake is one logical data platform with multiple tiers of performance and storage levels to optimally serve various data needs based on Service Level Agreements (SLA). It will provide a vast amount of structured and unstructured data at the Hadoop and Greenplum layers to data scientists for advanced analytics innovation. The higher performance levels powered by Greenplum and in-memory caching databases will serve mission-critical and real-time analytics and application solutions. With more robust data governance and data quality management, we can ensure authoritative, high-quality data driving all of EMC business insights and analytics driven applications using data services from the lake.

Ah, the Mariana Trench of enterprise information: Governance. Like “knowledge” and “advanced analytics”,  governance has euphony. I think of the water lapping against the shore of Lake Paseco.

So what? Several observations:

  1. This type of “suggest lots” marketing ended poorly for a number of companies who used this type of rhetoric when marketing search
  2. The folks who swallow this bait are likely to find themselves in a most uncomfortable spot
  3. The problems associated with making use of information to improve decision making by reducing risk are not going to be solved by crazy diagrams and unsupported assertions.

EMC has been able to return revenue growth. But the company’s profit margin has flat lined.


I am not sure that increasing the buzzword density in marketing write ups will help angle the red lines to low earth orbit. With better margins, it is much easier to check out the topographic view and see where lakes meet land.

Stephen E Arnold, August 2, 2015

Next Page »