January 15, 2017
McKinsey & Co., the blue chip consulting firm, is doing its part to motivate students to ace their SATs. You can get a glimpse of the future for those who are not over achievers and able to get hired at an oligopoly in “The Age of Analytics: Competing in a Data Driven World.” If your firm is a customer of McKinsey, you can wrangle a briefing and get even more juicy insights. But for the folks who live in Harrod’s Creek, we have to make do with the free write up.
The main point is that organizations who embrace analytics can just be more successful. More money, more influence, more, more, more. In today’s uncertain business climate, the starving cats are going to pay attention to this catnip.
The write up reveals:
Leading companies are using their capabilities not only to improve their core operations but also to launch entirely new business models. The network effects of digital platforms are creating a winner-take-most situation in some markets. The leading firms have remarkably deep analytical talent taking on various problems—and they are actively looking for ways to enter other industries. These companies can take advantage of their scale and data insights to add new business lines, and those expansions are increasingly blurring traditional sector boundaries.
Net net: hire McKinsey to help you take advantage of this opportunity. For those who are not working hard to be perceived as smart enough to work at a blue chip outfit like McKinsey, there may be universal basic income in your not so bright future.
Stephen E Arnold, January 15, 2017
January 5, 2017
I read “Really? Most Americans Don’t Suffer Information Overload.” The main idea is that folks in the know, in the swim, and in the top one percent suffer from too much information. The rest of the ignorance-is-bliss crowd has a different perception.
The write up explains, reports, states:
A new report from the Pew Research Center says that most Americans do not suffer from information overload—even though many of us frequently say otherwise.
What’s up with that?
The write up points out:
Many people complain about the volume of information coming at us. But we want it. Adweek reported earlier this year that the average person consumes almost 11 hours of media per day. That’s everything from text messages to TV programs to reading a newspaper.
Well, the Pew outfit interviewed 1,520 people which is sample approved by those who look in the back of statistics 101 textbooks rely upon. I have no details about the demographics of the sample, geographic location, and reason these folks took time out from watching Netflix to answer the Pew questions, however.
The answer that lots of people don’t suffer from information overload seems wrong when viewed from the perspective of a millennial struggling to buy a house while working as a customer support rep until the automated system is installed.
But wait. The write up informs me:
the recent national election showed that “in a lot of ways people live in small information bubbles. They get information on social media that has been filtered for them. It is filtered by the network they belong to. In a lot of ways, there’s less information and much of it is less diverse than it was in an earlier era.” The public’s hunger for that information is reflected in a study conducted by Bank of America. The bank found that 71 percent of the people they surveyed sleep within arm’s reach of their smartphone. And 3 percent of those people hold their smartphone while they’re in dreamland.
Too much information for me.
Stephen E Arnold, January 5, 2017
January 3, 2017
I read a chunk of what looks to me like content marketing called “The Death of Prediction.” Prediction seems like a soft target. There were the polls which made clear that Donald J. Trump was a loser. Well, how did that work out? For some technology titans, the predictions omitted a grim pilgrimage to Trump Tower to meet the deal maker in person. Happy faces? Not so many, judging from the snaps of the Sillycon Valley crowd and one sycophant from Armonk.
The write up points out that predictive analytics are history. The future is “explanatory analytics.” An outfit called Quantifind has figured out that explaining is better than predicting. My hunch is that explaining is little less risky. Saying that the Donald would lose is tough to explain when the Donald allegedly “won.”
Explaining is like looser. The black-white, one-two, or yes-no thing is a bit less gelatinous.
So what’s the explainer explaining? The checklist is interesting:
- Alert me when it matters. The idea is that a system or smart software will proactively note when something important happens and send one of those mobile phone icon things to get a human to shift attention to the new thing. Nothing like distraction I say.
- Explore why on one’s own. Yep, this works really well for spelunkers who find themselves trapped. Exploration is okay, but it is helpful to [a] know where one is, [b] know where one is going, and [c] know the territory. Caves can be killers, not just dark and damp.
- Quantify impact in “real” dollars. The notion of quantifying strikes me as important. But doesn’t one quantify to determine if the prediction were on the money. I sniff a bit of flaming contradiction. The notion of knowing something in real time is good too. Now the problem becomes, “What’s real time?” I have tilled this field before and saying “real time” is different from delivering what one expects and what the system can do and what the outfit can afford.
It’s not even 2017, and I have learned that “prediction” is dead. I hope someone tells the folks at Recorded Future and Palantir Technologies. Will they listen?
Buzzwording with cacaphones is definitely alive and kicking.
Stephen E Arnold, January 3, 2017
December 29, 2016
A British charity is teaming up with an online intelligence startup specializing in Bitcoin. The Register reports on this in their piece called, Bitcoin child abuse image pervs will be hunted down by the IWF. The Internet Watch Foundation, with the help of a UK blockchain forensics start-up, Elliptic, aims to identify individuals who use Bitcoin to purchase child abuse images online. The IWF will provide Elliptic with a database of Bitcoin addresses and Elliptic takes care of the rest. We learned,
The IWF has identified more than 68,000 URLs containing child sexual abuse images. UNICEF Malaysia estimates two million children across the globe are affected by sexual exploitation every year. Susie Hargreaves, IWF CEO, said, “Over the past few years, we have seen an increasing amount of Bitcoin activity connected to purchasing child sexual abuse material online. Our new partnership with Elliptic is imperative to helping us tackle this criminal use of Bitcoin.” The collaboration means Elliptic’s clients will be able to automatically monitor transactions they handle for any connection to proceeds of child sex abuse.
Machine learning and data analytics technologies are used by Elliptic to collect actionable evidence for law enforcement and intelligence agencies. The interesting piece of this technology, and others like it, is that it runs perhaps as surreptitiously in the background as those who use the Dark Web and Bitcoin for criminal activity believe they do.
Megan Feil, December 29, 2016
December 25, 2016
I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.
The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like
Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?
The write up states that the problem was the sample itself: The actual humans.
The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:
“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”
The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.
The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:
With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.
My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.
If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?
Stephen E Arnold, December 25, 2016
December 19, 2016
Over at Hacker Noon, blogger “movrcx” reveals a potential vulnerability chain that he says threatens the entire Tor Browser ecosystem in, “Tor Browser Exposed: Anti-Privacy Implantation at Mass Scale.” Movrcx says the potential avenue for a massive hack has existed for some time, but taking advantage of these vulnerabilities would require around $100,000. This could explain why movrcx’s predicted attack seems not to have taken place. Yet. The write-up summarizes the technique:
Anti-Privacy Implantation at Mass Scale: At a high-level the attack path can be described by the following:
*Attacker gains custody of an addons.mozilla.org TLS certificate (wildcard preferred)
*Attacker begins deployment of malicious exit nodes
*Attacker intercepts the NoScript extension update traffic for addons.mozilla.org
*Attacker returns a malicious update metadata file for NoScript to the requesting Tor Browser
*The malicious extension payload is downloaded and then silently installed without user interaction
*At this point remote code execution is gained
*The attacker may use an additional stage to further implant additional software on the machine or to cover any signs of exploitation
This attack can be demonstrated by using Burp Suite and a custom compiled version of the Tor Browser which includes a hardcoded root certificate authority for transparent man-in-the-middle attacks.
See the article for movrcx’s evidence, reasoning, and technical details. He emphasizes that he is revealing this information in the hope that measures will be taken to nullify the potential attack chain. Preferably before some state or criminal group decides to invest in leveraging it.
Cynthia Murrell, December 19, 2016
December 14, 2016
The article on ScienceDaily titled New Study Highlights Power of Crowd to Transmit News on Twitter shows that Twitter is, in fact, good at something. That something is driving recommendations of news stories. A study executed by Columbia University and the French National Institute found that the vast majority of clicks on news stories is based on reader referrals. The article details the findings:
Though far more readers viewed the links news outlets promoted directly on Twitter… most of what readers shared and read was crowd-curated. Eighty-two percent of shares, and 61 percent of clicks, of the tweets in the study sample referred to content readers found on their own. But the crowd’s relative influence varied by outlet; 85 percent of clicks on tweets tied to a BBC story came from reader recommendations while only 10 percent of tweets tied to a Fox story did.
It will come as no shock that people are getting a lot more of their news through social media, but the study also suggests that people are often sharing stories without reading them at all. Indeed, one of the scientists stated that the correlation between likes, shares, and actual reads is very low. The problem inherent in this system is that readers will inevitably only look at content that they already agree with in a news loop that results in an even less informed public with even more information at their fingertips than ever before. Thanks Twitter.
Chelsea Kerwin, December 14, 2016
December 12, 2016
I read “5 Unexpected Sources of Bias in Artificial Intelligence.” Was I surprised? Yep, but the five examples seemed a bit more pop psychology than substantive. In my view, the bias in smart software originates with the flaws or weaknesses in the common algorithms used to build artificially intelligent systems. I have a lecture about the ways in which a content creator can fiddle with algorithms to generate specific results. I call the lecture “Weaponizing Information: Using Words to Fiddle with Algorithms.” (Want to know more? Write benkent2020 at yahoo dot com. Be aware that this is a for fee presentation.)
This “5 Unexpected…” write up offers these ideas:
- Data driven bias. The notion is that Stats 101 injunctions are happily ignored, forgotten, or just worked around. See what I mean? Human intent, not really mathy at its core.
- Bias through interaction. The idea is that humans interact. If the humans are biased, guess what? The outputs are biased, which dominoes down the line. Key word: Human.
- Emergent bias. This is the filter bubble. I view this as feedback looping, which is a short cut to figuring out stuff. I ran across this idea years ago in Minneapolis. A start up there was explaining how to let me do one thing to inform the somewhat dull system about what to present. Does this sound like Amazon’s method to you?
- Similarity bias. Now we are getting close to a mathy notion. But the write up wanders back to the feedback notion and does not ask questions about the wonkiness of clustering. Sigh.
- Conflicting goals bias. Now that puzzled me. I read the paragraphs in the original article and highlighted stereotyping. This struck me as a variant of feedback.
Math is sort of objective, but this write up sticks to some broad and somewhat repetitive ideas. The bias enters when thresholds are set, data are selected, processes structured to deliver what the programmer [a] desires, [b] ze’s boss desires, [c] what can be made to run and sort of work in the time available, or [d] what the developer remembers from a university class, a Hacker News post, or a bit of open source goodness.
The key to bias is to keep the key word “human” in mind.
Stephen E Arnold, December 12, 2016
December 9, 2016
Digital Reasoning has released the latest iteration of its Synthesys platform, we learn from Datanami’s piece, “Cognitive Platform Sharpens Focus on Untructured Data.” Readers may recall that Digital Reasoning provides tools to the controversial US Army intelligence system known as DCGS. The write-up specifies:
Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.
The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.
The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.
Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.
Cynthia Murrell, December 9, 2016
December 8, 2016
Machine learning tools like the artificial intelligence Watson from IBM can and will improve healthcare access and diagnosis, but the problem is getting on the road to improvement. Implementing new technology is costly, including the actual equipment and training staff, and there is always the chance it could create more problems than resolving them. However, if the new technology makes a job easier and resolves situations then you are on the path to improvement. The UK is heading that way says TechCrunch in, “DeepMind Health Inks New Deal With UK’s NHS To Deploy Streams App In Early 2017.”
London’s NHS Royal Free Hospital will employ DeepMind Health in 2017, taking advantage of its data sharing capabilities. Google owns DeepMind Health and it focuses on driving the application of machine learning algorithms in preventative medicine. The NHS and DeepMind Health had a prior agreement in the past, but when the New Scientist made a freedom of information request their use of patients’ personal information came into question. The information was used to power the Streams app to sent alerts to acute kidney injury patients. However, ICO and MHRA shut down Streams when it was discovered it was never registered as a medical device.
The eventual goal is to relaunch Streams, which is part of the deal, but DeepMind has to repair its reputation. DeepMind is already on the mend with the new deal and registering Streams as a medical device also helped. In order for healthcare apps to function properly, they need to be tested:
The point is, healthcare-related AI needs very high-quality data sets to nurture the kind of smarts DeepMind is hoping to be able to build. And the publicly funded NHS has both a wealth of such data and a pressing need to reduce costs — incentivizing it to accept the offer of “free” development work and wide-ranging partnerships with DeepMind…
Streams is the first step towards a healthcare system powered by digital healthcare products. As already seen is the stumbling block protecting personal information and powering the apps so they can work. Where does the fine line between the two end?
Whitney Grace, December 8, 2016