CyberOSINT banner

Bad News for Instant Analytics Sharpies

June 28, 2016

I read “Leading Statisticians Establish Steps to Convey Statistics a Science Not Toolbox.” I think “steps” are helpful. The challenge will be to corral the escaped ponies who are making fancy analytics a point and click, drop down punch list. Who needs to understand anything. Hit the button and generate visualizations until somethings looks really super. Does anyone know a general who engages in analytic one-upmanship? Content and clarity sit in the backseat of the JLTV.

The write up is similar to teens who convince their less well liked “pals” to go on a snipe hunt. I noted this passage:

To this point, Meng [real statistics person] notes “sound statistical practices require a bit of science, engineering, and arts, and hence some general guidelines for helping practitioners to develop statistical insights and acumen are in order. No rules, simple or not, can be 100% applicable or foolproof, but that’s the very essence that I find this is a useful exercise. It reminds practitioners that good statistical practices require far more than running software or an algorithm.”

Many vendors emphasize how easy smart analytics systems are to use. The outputs are presentation ready. Checks and balances are mostly pushed to the margins of the interface.

Here are the 10 rules.

  1. Statistical Methods Should Enable Data to Answer Scientific Questions
  2. Signals Always Come with Noise
  3. Plan Ahead, Really Ahead
  4. Worry about Data Quality
  5. Statistical Analysis Is More Than a Set of Computations
  6. Keep it Simple
  7. Provide Assessments of Variability
  8. Check Your Assumptions
  9. When Possible, Replicate!
  10. Make Your Analysis Reproducible

I think I can hear the guffaws from the analytics vendors now. I have tears in my eyes when I think about “statistical methods should enable data to answer scientific questions.” I could have sold that line to Jack Benny if he were still alive and doing comedy. Scientific questions from data which no human has checked for validity. Oh, my goodness. Then reproducibility. That’s a good one too.

Stephen E Arnold, June 28, 2016

Forbes, News Coverage, and Google Love

June 24, 2016

Short honk: US news coverage has “faves.” I assume that the capitalist tool avoids bias in its admirable reporting about business.

Navigate to “Television As Data: Mapping 6 Years of American Television News.” The write up uses Big Data from television news to reveal what gets air time. When I read the article, I must admit I thought about the phrase “If it bleeds, it leads.”

The bottom line is not that countries and cities are used to characterize an event. For me the most interesting comment was the thanks bestowed on Google for assisting with the analysis.

I circled twice in honest blue this statement:

In the end, these maps suggest that the bigger story that is being missed in all the conversation about media fragmentation and bias is that media has always been biased geographically, culturally and linguistically.

Note the “all” and the “always.” Nifty generalizations from an analysis of six years of data.

Biased coverage? I cannot conceive of biased coverage. Film at 11.

Stephen E Arnold, June 24, 2016

Data Wrangling Market Is Self-Aware and Growing, Study Finds

June 20, 2016

The article titled Self-Service Data Prep is the Next Big Thing for BI on Datanami digs into the quickly growing data preparation industry by reviewing the Dresner Advisory Services study. The article provides a list of the major insights from the study and paints a vivid picture of the current circumstances. Most companies often perform end-user data preparation, but only a small percentage (12%) find themselves to be proficient in the area. The article states,

“Data preparation is often challenging, with many organizations lacking the technical resources to devote to comprehensive data preparation. Choosing the right self-service data preparation software is an important step…Usability features, such as the ability to build/execute data transformation scripts without requiring technical expertise or programming skills, were considered “critical” or “very important” features by over 60% of respondents. As big data becomes decentralized and integrated into multiple facets of an organization, users of all abilities need to be able to wrangle data themselves.”

90% of respondents agreed on the importance of two key features: the capacity to aggregate and group data, and a straightforward interface for implementing structure on raw data. Trifacta earned the top vendor ranking of just under 30 options for the second year in a row. The article concludes by suggesting that many users are already aware that data preparation is not an independent activity, and data prep software must be integrated with other resources for success.


Chelsea Kerwin, June 20, 2016

Sponsored by, publisher of the CyberOSINT monograph

The Value of Data: The Odd Isolation of Little Items

June 17, 2016

I read “Determining the Economic Value of Data.” The author is a chief technology officer, a dean of Big Data, and apparently a college professor training folks to be MBAs. The idea is that data are intangible. How does one value an intangible when writing from the perspective of a “dean”?

The answer is to seize on some applications of Big Data which can be converted to measurable entities. Examples include boosting the number of bank products a household “holds”, reducing of customer churn, and making folks happier. Happiness is a “good” and one can measure it; for example, “How happy are you with the health care plan?”

One can then collect data, do some Excel fiddling, and output numbers. The comparative figures (one hopes) provide a handle upon which to hang “value.”

This is the standard approach used to train business wizards in MBA programs based on my observations. We know the method works, just check out the economic performance of the US economy in the last quarter.

The problem I have with this isolationist approach is that it ignores the context of any perceived value. I don’t want to hobble through the The Knowledge Value Revolution by Taichi Sakaiya. I would suggest that any analysis of value may want to acknowledge the approach taken by Sakaiya about four decades ago. One can find a copy of the book for one penny on good old Amazon. How’s that for knowledge value.

Old ideas are not exactly the fuel that fires the imaginations of some “deans” or MBAs. Research is the collection of data which one can actually locate. Forget about the accuracy of the data or the validity of the analyses of loosey goosey notions of “satisfaction”.

I would suggest that the “dean’s” approach is a bit wobbly. Consider Sakaiya, who seems to be less concerned with creating busy work and more with coming to grips why certain products and services command high prices and others are almost valueless.

I know that reading a book written in the 1980s is a drag. Perhaps it is better to ignore prescient thought and just go with whatever can be used to encourage the use of Excel and the conversion of numbers into nifty visualizations.

Stephen E Arnold, June 17, 2016

Enterprise Search Vendor Sinequa Partners with MapR

June 8, 2016

In the world of enterprise search and analytics, everyone wants in on the clients who have flocked to Hadoop for data storage. Virtual Strategy shared an article announcing Sinequa Collaborates With MapR to Power Real-Time Big Data Search and Analytics on Hadoop. A firm specializing in big data, Sinequa, has become certified with the MapR Converged Data Platform. The interoperation of Sinequa’s solutions with MapR will enable actionable information to be gleaned from data stored in Hadoop. We learned,

“By leveraging advanced natural language processing along with universal structured and unstructured data indexing, Sinequa’s platform enables customers to embark on ambitious Big Data projects, achieve critical in-depth content analytics and establish an extremely agile development environment for Search Based Applications (SBA). Global enterprises, including Airbus, AstraZeneca, Atos, Biogen, ENGIE, Total and Siemens have all trusted Sinequa for the guidance and collaboration to harness Big Data to find relevant insight to move business forward.”

Beyond all the enterprise search jargon in this article, the collaboration between Sinequa and MapR appears to offer an upgraded service to customers. As we all know at this point, unstructured data indexing is key to data intake. However, when it comes to output, technological solutions that can support informed business decisions will be unparalleled.


Megan Feil, June 8, 2016

Sponsored by, publisher of the CyberOSINT monograph


Search Vendor Identifies Big Data Failings

June 5, 2016

Talk about the pot calling the kettle a deep fryer? I read “Attivio Survey Exposes Disconnect Between Big Data Excitement and Organizations’ Ability to Execute.” On one hand, the idea that a buzzword does not, like Superman, transform into truth, justice, and the America way is understandable. On the other hand, the survey underscores one of the gaps in the marketing invasion force search vendors have when selling information access as business intelligence.

The write up points out that Big Data is going like gangbusters. However:

64 percent of respondents said that process bottlenecks prevent large data sets from being accessed quickly and efficiently. This dissonance highlights a growing gulf between the desire to embrace Big Data and their ability to operationalize it.

With a sample size of 150, I am not sure how solid these results are, but the point is poignant. Doing “stuff” with data is great. But how is the “stuff” relevant to closing a sale.

Attivio, the apparent sponsor of the study, seems a glass that is more than half full, maybe overflowing. Three key findings from the study allegedly were:

  • Legacy systems are not up to the task of Big Data crunching. The fix? Not provided but my hunch is that the “cloud” will be a dandy solution
  • Finding folks who can actually “do” Big Data and provide useful operational outputs is a very difficult task. The fix? I assume one can hire an outfit like the study’s sponsor, but this is just a wild guess on my part.
  • Governance is an issue. The fix? If I were working at Booz, Allen, the answer is obvious: Hire Booz, Allen to manage. If that’s not an option, well, floundering may work.

Net net: Search vendors need to find a source of sustainable revenue. Big Data is a possibility, but the market is not exactly confident about the payoff and how to use the outputs. The demos are often interesting.

Stephen E Arnold, June 5, 2016

Financial Institutes Finally Realize Big Data Is Important

May 30, 2016

One of the fears of automation is that human workers will be replaced and there will no longer be any more jobs for humanity.  Blue-collar jobs are believed to be the first jobs that will be automated, but bankers, financial advisors, and other workers in the financial industry have cause to worry.  Algorithms might replace them, because apparently people are getting faster and better responses from automated bank “workers”.

Perhaps one of the reasons why bankers and financial advisors are being replaced is due to their sudden understanding that “Big Data And Predictive Analytics: A Big Deal, Indeed” says ABA Banking Journal.  One would think that the financial sector would be the first to embrace big data and analytics in order to keep an upper hand on their competition, earn more money, and maintain their relevancy in an ever-changing world.   They, however, have been slow to adapt, slower than retail, search, and insurance.

One of the main reasons the financial district has been holding back is:

“There’s a host of reasons why banks have held back spending on analytics, including privacy concerns and the cost for systems and past merger integrations. Analytics also competes with other areas in tech spending; banks rank digital banking channel development and omnichannel delivery as greater technology priorities, according to Celent.”

After the above quote, the article makes a statement about how customers are moving more to online banking over visiting branches, but it is a very insipid observation.  Big data and analytics offer the banks the opportunity to invest in developing better relationships with their customers and even offering more individualized services as a way to one up Silicon Valley competition.  Big data also helps financial institutions comply with banking laws and standards to avoid violations.

Banks do need to play catch up, but this is probably a lot of moan and groan for nothing.  The financial industry will adapt, especially when they are at risk of losing more money.  This will be the same for all industries, adapt or get left behind.  The further we move from the twentieth century and generations that are not used to digital environments, the more we will see technology integration.

Whitney Grace, May 30, 2016
Sponsored by, publisher of the CyberOSINT monograph

Now Big Data Has to Be Fast

May 15, 2016

I read “Big Data Is No Longer Enough: It’s Now All about Fast Data.” The write up is interesting because it shifts the focus from having lots of information to infrastructure which can process the data in a timely manner. Note that “timely” means different things in different contexts. For example, to a crazed MBA stock market maven, next week is not too useful. To a clueless marketing professional with a degree in art history, “next week” might be just speedy enough.

The write up points out:

Processing data at these breakneck speeds requires two technologies: a system that can handle developments as quickly as they appear and a data warehouse capable of working through each item once it arrives. These velocity-oriented databases can support real-time analytics and complex decision-making in real time, while processing a relentless incoming data feed.

The point omitted from the article is that speed comes at a cost. The humans required to figure out what’s needed to go fast, the engineers to build the system, and the time required to complete the task. The “cloud” is not a solution to the cost.

Another omission in the article is that the numerical recipes required to “make sense” of large volumes of data require specialist knowledge. A system which outputs nifty charts may be of zero utility when it comes to making a decision.

The write up ignores the information in “What Beats Big Data? Small Data.” Some organizations cannot afford the cost of fast data. Even outfits which have the money can find themselves tripping over their analyses. See, for example, “Amazon Isn’t Racist, It’s Just Been an Unfortunate Victim of Big Data.” Understanding the information is important. Smart software often lacks the ability to discern nuances or issues with data quality, poor algorithm selection, or knowing what to look for in the first place.

Will the write up cause marketers and baloney makers to alter their pitches about Big Data and smart software. Not a chance. Vendors’ end game is revenue; licensees have a different agenda. When the two do not meet, there may be some excitement.

Stephen E Arnold, May 15, 2016

Deep Learning: Old Wine, New Labels

May 13, 2016

I read “Deep Learning: Definition, Resources, and Comparison with Machine Learning.” The most useful segment of the article to me is the list of resources. I did highlight this statement and its links:

Many deep learning algorithms (clustering, pattern recognition, automated bidding, recommendation engine, and so on)  — even though they appear in new contexts such as IoT or machine to machine communication — still rely on relatively old-fashioned techniques such as logistic regression, SVM, decision trees, K-NN, naive Bayes, Bayesian modeling, ensembles, random forests, signal processing, filtering, graph theory, gaming theory, and many others. Click here and here for details about the top 10 algorithms.

The point is that folks are getting interested in established methods hooked together in interesting ways. Perhaps new methods will find their way into the high flying vehicles for smart software? But wait. Are computational barriers acting like a venturi in the innovation flow? What about that vacuum?

Stephen E Arnold, May 13, 2016

DARPA Seeks Keys to Peace with High-Tech Social Science Research

May 11, 2016

Strife has plagued the human race since the beginning, but the Pentagon’s research arm thinks may be able to get to the root of the problem. Defense Systems informs us, “DARPA Looks to Tap Social Media, Big Data to Probe the Causes of Social Unrest.” Writer George Leopold explains:

“The Defense Advanced Research Projects Agency (DARPA) announced this week it is launching a social science research effort designed to probe what unifies individuals and what causes communities to break down into ‘a chaotic mix of disconnected individuals.’ The Next Generation Social Science (NGS2) program will seek to harness steadily advancing digital connections and emerging social and data science tools to identify ‘the primary drivers of social cooperation, instability and resilience.’

“Adam Russell, DARPA’s NGS2 program manager, said the effort also would address current research limitations such as the technical and logistical hurdles faced when studying large populations and ever-larger datasets. The project seeks to build on the ability to link thousands of diverse volunteers online in order to tackle social science problems with implications for U.S. national and economic security.”

The initiative aims to blend social science research with the hard sciences, including computer and data science. Virtual reality, Web-based gaming, and other large platforms will come into play. Researchers hope their findings will make it easier to study large and diverse populations. Funds from NGS2 will be used for the project, with emphases on predictive modeling, experimental structures, and boosting interpretation and reproducibility of results.

Will it be the Pentagon that finally finds the secret to world peace?


Cynthia Murrell, May 11, 2016

Sponsored by, publisher of the CyberOSINT monograph


Next Page »