Big Data: The Crawfish Approach to Meaningful Information

March 21, 2017

Have you ever watched a crawfish (sometimes called a crawdad or a crayfish) get away from trouble. The freshwater crustaceans can go backwards. Members of the members of the Astacidae can be found in parts of the south, so you will have to wander in a Georgia swamp to check out the creature’s behavior.

The point is that crawfish go backwards to protect themselves and achieve their tiny lobster like goals. Big time consultants also crawfish in order to sell more work and provide “enhanced” insight into a thorny business or technical problem other consultants have created.

To see this in action, navigate to “The Conundrum of Big Data.” A super consultant explains that Big Data is not exactly the home run, silver bullet, or magic potion some lesser consultants said Big Data would be. I learned:

Despite two decades of intensive IT investment in data [mining] applications, recent studies show that companies continue to have trouble identifying metrics that can predict and explain performance results and/or improve operations. Data mining, the process of identifying patterns and structures in the data, has clear potential to identify prescriptions for success but its wide implementation fails systematically. Companies tend to deploy ‘unsupervised-learning’ algorithms in pursuit of predictive metrics, but this automated [black box] approach results in linking multiple low-information metrics in theories that turn out to be improbably complex.

Big surprise. For folks who are not trained in the nuts and bolts of data analysis and semi fancy math, Big Data is a giant vacuum cleaner for money. The cash has to pay for “experts,” plumbing, software, and more humans. The outputs are often fuzzy wuzzy probabilities which more “wizards” interpret. Think of a Greek religious authority looking at the ancient equivalent of road kill.

The write up cites the fizzle that was Google Flu Trends. Cough. Cough. But even that sneeze could be fixed with artificial intelligence. Yep, when smart humans make mistakes, send in smart software. That will work.

In my opinion, the highlight of the write up was this passage:

When it comes to data, size isn’t everything because big data on their own cannot just solve the problem of ‘insight’ (i.e. inferring what is going on). The true enablers are the data-scientists and statisticians who have been obsessed for more than two centuries to understand the world through data and what traps lie in wait during this exercise. In the world of analytics (AaaS), it is agility (using science, investigative skills, appropriate technology), trust (to solve the client’s real business problems and build collateral), and ‘know-how’ (to extract intelligence hidden in the data) that are the prime ‘assets’ for competing, not the size of the data. Big data are certainly here but big insights have yet to arrive.

Yes. More consulting is needed to make those payoffs arrive. But first, hire more advisers. What could possibly go wrong? Cough. Sneeze. One goes forwards with Big Data by going backwards for more analysis.

Stephen E Arnold, March 21, 2017

Big Data Requires More Than STEM Skills

March 13, 2017

It will require training Canada’s youth in design and the arts, as well as STEM subjects if that country is to excel in today’s big-data world. That is the advice of trio of academic researchers in that country, Patricio Davila, Sara Diamond, and Steve Szigeti,  who declare, “There’s No Big Data Without Intelligent Interface” at the Globe and Mail. The article begins by describing why data management is now a crucial part of success throughout society, then emphasizes that we need creative types to design intuitive user interfaces and effective analytics representations. The researchers explain:

Here’s the challenge: For humans, data are meaningless without curation, interpretation and representation. All the examples described above require elegant, meaningful and navigable sensory interfaces. Adjacent to the visual are emerging creative, applied and inclusive design practices in data “representation,” whether it’s data sculpture (such as 3-D printing, moulding and representation in all physical media of data), tangible computing (wearables or systems that manage data through tactile interfaces) or data sonification (yes, data can make beautiful music).

Infographics is the practice of displaying data, while data visualization or visual analytics refers to tools or systems that are interactive and allow users to upload their own data sets. In a world increasingly driven by data analysis, designers, digital media artists, and animators provide essential tools for users. These interpretive skills stand side by side with general literacy, numeracy, statistical analytics, computational skills and cognitive science.

We also learn about several specific projects undertaken by faculty members at OCAD University, where our three authors are involved in the school’s Visual Analytics Lab. For example, the iCity project addresses transportation network planning in cities, and the Care and Condition Monitor is a mobile app designed to help patients and their healthcare providers better work together in pursuit of treatment goals. The researchers conclude with an appeal to their nation’s colleges and universities to develop programs that incorporate data management, data numeracy, data analysis, and representational skills early and often. Good suggestion.

Cynthia Murrell, March 13, 2017

To Make Data Analytics Sort of Work: Attention to Detail

March 10, 2017

I read “The Much-Needed Business Facet for Modern Data Integration.” The write up presents some useful information. Not many of the “go fast and break things” crowd will relate to some of the ideas and suggestions, but I found the article refreshing.

What does one do to make modern data centric activities sort of work? The answers are ones that I have found many more youthful wizards often elect to ignore.

Here they are:

  1. Do data preparation. Yikes. Normalization of data. I have fielded this question in the past, “Who has time for that?” Answer: Too few, gentle reader. Too few.
  2. Profile the data. Another gasp. In my experience it is helpful to determine what data are actually germane to the goal. Think about the polls for the recent
  3. Create data libraries. Good idea. But it is much more fun to just recreate data sets. Very Zen like.
  4. Have rules which are now explained as “data governance.” The jargon does not change the need for editorial and data guidelines.
  5. Take a stab at data quality. This is another way of saying, “Clean up the data.” Even whiz bang modern systems are confused with differences like I.B.M and International Business Machines or numbers with decimal points in the incorrect place.
  6. Get colleagues in the game. This is a good idea, but in many organizations in which I have worked “team” is spelled “my bonus.”

Useful checklist. I fear that those who color unicorns will not like the dog work which accompanies implementing the ideas. That’s what makes search and content processing so darned interesting.

Stephen E Arnold, March 10, 2017

Intelligence Industry Becoming Privatized and Concentrated

March 10, 2017

Monopolies aren’t just for telecoms and zipper manufacturers. The Nation reveals a much scarier example in its article, “5 Corporations Now Dominate Our Privatized Intelligence Industry.” Reporter Tim Shorrock outlines the latest merger that brings us to this point, one between Pentagon &  NSA contractor Leidos Holdings and a division of Lockheed Martin called Information Systems and Global Solutions. Shorrock writes:

The sheer size of the new entity makes Leidos one of the most powerful companies in the intelligence-contracting industry, which is worth about $50 billion today. According to a comprehensive study I’ve just completed on public and private employment in intelligence, Leidos is now the largest of five corporations that together employ nearly 80 percent of the private-sector employees contracted to work for US spy and surveillance agencies.

Yes, that’s 80 percent. For the first time since spy agencies began outsourcing their core analytic and operational work in the late 1990s, the bulk of the contracted work goes to a handful of companies: Leidos, Booz Allen Hamilton, CSRA, SAIC, and CACI International. This concentration of ‘pure plays’—a Wall Street term for companies that makes one product for a single market—marks a fundamental shift in an industry that was once a highly diverse mix of large military contractors, small and medium technology companies, and tiny ‘Beltway Bandits’ surrounding Washington, D.C.

I should mention that our beloved leader, Stephen E Arnold, used to work as a gopher for one of these five companies, Booz Allen Hamilton. Shorrock details the reasons such concentrated power is a problem in the intelligence industry, and shares the profile he has made on each company. He also elaborates on the methods he used to analyze the shadowy workforce they employ. (You’ll be unsurprised to learn it can be difficult to gather data on intelligence workers.) See the article for those details, and for Shorrock’s discussion of negligence by the media and by Congress on this matter. We can agree that most folks don’t seem to be aware of this trend, or of its potential repercussions.

Cynthia Murrell, March 10, 2016

 

 

Cambridge Analytica: Buzz, Buzz, Buzz

March 9, 2017

The idea that software can make sense of information is a powerful one. Many companies tout the capabilities of their business processes, analytical tools, and staff to look at data and get a sense of the future. The vast majority of these firms have tools and methods which provide useful information.

What happens when a person who did not take a course in analytics learns about the strengths and limitations of these systems?

Answer: You get some excitement.

I read “Big Data’s Power Is Terrifying. That Could Be Good News for Democracy.” The main idea is that companies with nifty analytic systems and methods can control life is magnetic. Lots of folks want to believe that a company’s analyses can have a significant impact on elections, public opinion, and maybe the stock market.

The write up asserts:

Online information already lends itself to manipulation and political abuse, and the age of big data has scarcely begun. In combination with advances in cognitive linguistics and neuroscience, this data could become a powerful tool for changing the electoral decisions we make. Our capacity to resist manipulation is limited.

My view is that one must not confuse the explanations from marketing mavens, alarmists, and those who want to believe that Star Trek is “real” with what today’s systems can do. Firms like Cambridge Analytica and others generate reports. In fact, companies have been using software to figure out what’s what for many years.

What’s interesting is that folks learn about these systems and pick up the worn ball and carry it down field while screaming, “Touchdown.”

Sorry. The systems don’t warrant that type of excitement. Reality is less exciting. Probabilities are useful, not reality. But why not carry the ball. It is easier than learning what analytics firms do.

Stephen E Arnold, March 9, 2017

IBM and Root Access Misstep?

March 2, 2017

Maybe this is fake news? Maybe. Navigate to “Big Blue’s Big Blunder: IBM Accidentally Hands Over Root Access to Its Data Science Servers.” When I read the title, my first reaction was, “Hey, Yahoot is back in the security news.” Wrong.

According to the write up, which I assume to be exposing the “truth”:

IBM left private keys to the Docker host environment in its Data Science Experience service inside freely available containers. This potentially granted the cloud service’s users root access to the underlying container-hosting machines – and potentially to other machines in Big Blue’s Spark computing cluster. Effectively, Big Blue handed its cloud users the secrets needed to potentially commandeer and control its service’s computers.

IBM hopped to it. Two weeks after the stumble was discovered, IBM fixed the problem.

The write up includes this upbeat statement, attributed to the person using a demo account which exposed the glitch:

I think that IBM already has some amazing infosec people and a genuine commitment to protecting their services, and it’s a matter of instilling security culture and processes across their entire organization. That said, any company that has products allowing users to run untrusted code should think long and hard about their system architecture. This is not to imply that containers were poorly designed (because I don’t think they were), but more that they’re so new that best practices in their use are still being actively developed. Compare a newer-model table saw to one decades old: The new one comes stock with an abundance of safety features including emergency stopping, a riving knife, push sticks, etc, as a result of evolving culture and standards through time and understanding.

Bad news. Good news.

Let’s ask Watson about IBM security. Hold that thought, please. Watson is working on health care information. And don’t forget the March 2017 security conference sponsored by those security pros at IBM.

Stephen E Arnold, March 2, 2017

Finding Meaning in Snapchat Images, One Billion at a Time

February 27, 2017

The article on InfoQ titled Amazon Introduces Rekognition for Image Analysis explores the managed service aimed at the explosive image market. According to research cited in the article, over 1 billion photos are taken every single day on Snapchat alone, compared to the 80 billion total taken in the year 2000. Rekognition’s deep learning power is focused on identifying meaning in visual content. The article states,

The capabilities that Rekognition provides include Object and Scene detection, Facial Analysis, Face Comparison and Facial Recognition. While Amazon Rekognition is a new public service, it has a proven track record. Jeff Barr, chief evangelist at AWS, explains: Powered by deep learning and built by our Computer Vision team over the course of many years, this fully-managed service already analyzes billions of images daily. It has been trained on thousands of objects and scenes. Rekognition was designed from the get-go to run at scale.

The facial analysis features include markers for image quality, facial landmarks like facial hair and open eyes, and sentiment expressed (smiling = happy.) The face comparison feature includes a similarity score that estimates the likelihood of two pictures being of the same person. Perhaps the most useful feature is object and scene detection, which Amazon believes will help users find specific moments by searching for certain objects. The use cases also span vacation rental markets and travel sites, which can now tag images with key terms for improved classifications.

Chelsea Kerwin, February 27, 2017

Upgraded Social Media Monitoring

February 20, 2017

Analytics are catching up to content. In a recent ZDNet article, Digimind partners with Ditto to add image recognition to social media monitoring, we are reminded images reign supreme on social media. Between Pinterest, Snapchat and Instagram, messages are often conveyed through images as opposed to text. Capitalizing on this, and intelligence software company Digimind has announced a partnership with Ditto Labs to introduce image-recognition technology into their social media monitoring software called Digimind Social. We learned,

The Ditto integration lets brands identify the use of their logos across Twitter no matter the item or context. The detected images are then collected and processed on Digimind Social in the same way textual references, articles, or social media postings are analysed. Logos that are small, obscured, upside down, or in cluttered image montages are recognised. Object and scene recognition means that brands can position their products exactly where there customers are using them. Sentiment is measured by the amount of people in the image and counts how many of them are smiling. It even identifies objects such as bags, cars, car logos, or shoes.

It was only a matter of time before these types of features emerged in social media monitoring. For years now, images have been shown to increase engagement even on platforms that began focused more on text. Will we see more watermarked logos on images? More creative ways to visually identify brands? Both are likely and we will be watching to see what transpires.

Megan Feil, February 20, 2017

 

Why Do We Care More About Smaller Concerns? How Quantitative Numbing Impacts Emotional Response

February 14, 2017

The affecting article on Visual Business Intelligence titled When More is Less: Quantitative Numbing explains the phenomenon that many of us have probably witnessed on the news, in our friends and family, and even personally experienced in ourselves. A local news story about the death of an individual might provoke a stronger emotional response than news of a mass tragedy involving hundreds or thousands of deaths. Scott Slovic and Paul Slovic explore this in their book Numbers and Nerves. According to the article, this response is “built into our brains.” Another example explains the Donald Trump effect,

Because he exhibits so many examples of bad behavior, those behaviors are having relatively little impact on us. The sheer number of incidents creates a numbing effect. Any one of Trump’s greedy, racist, sexist, vulgar, discriminatory, anti-intellectual, and dishonest acts, if considered alone, would concern us more than the huge number of examples that now confront us. The larger the number, the lesser the impact…This tendency… is automatic, immediate, and unconscious.

The article suggests that the only reason to overcome this tendency is to engage with large quantities in a slower, more thoughtful way. An Abel Hertzberg quote helps convey this approach when considering the large-scale tragedy of the Holocaust: “There were not six million Jews murdered: there was one murder, six million times.” The difference between that consideration of individual murders vs. the total number is stark, and it needs to enter into the way we process daily events that are happening all over the world if we want to hold on to any semblance of compassion and humanity.

Chelsea Kerwin, February 14, 2017

Data Mining Firm Cambridge Analytica Set to Capture Trump White House Communications Contract and Trump Organization Sales Contract

February 13, 2017

The article titled Data Firm in Talks for Role in White House Messaging — And Trump Business on The Guardian discusses the future role of Cambridge Analytica in both White House communication and the Trump Organization as well. Cambridge Analytica is a data company based out of London that boasts crucial marketing and psychological data on roughly 230 million Americans. The article points out,

Cambridge’s data could be helpful in both “driving sales and driving policy goals”, said the digital source, adding: “Cambridge is positioned to be the preferred vendor for all of that.”… The potential windfall for the company comes after the Mercers and Cambridge played key roles in Trump’s victory. Cambridge Analytica was tapped as a leading campaign data vendor as the Mercers… The Mercers reportedly pushed for the addition of a few top campaign aides, including Bannon and Kellyanne Conway, who became campaign manager.

Robert Mercer is a major investor in Cambridge Analytica as well as Breitbart News, Steve Bannon’s alt-right news organization. Steve Bannon is also on the board of Cambridge Analytica. The entanglements mount. Prior to potentially snagging these two wildly conflicting contracts, Cambridge Analytica helped Trump win the presidency with their data modeling and psychological profiling that focuses on building intimate relationships between brands and consumers to drive action.

Chelsea Kerwin, February 13, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta