January 3, 2017
I read a chunk of what looks to me like content marketing called “The Death of Prediction.” Prediction seems like a soft target. There were the polls which made clear that Donald J. Trump was a loser. Well, how did that work out? For some technology titans, the predictions omitted a grim pilgrimage to Trump Tower to meet the deal maker in person. Happy faces? Not so many, judging from the snaps of the Sillycon Valley crowd and one sycophant from Armonk.
The write up points out that predictive analytics are history. The future is “explanatory analytics.” An outfit called Quantifind has figured out that explaining is better than predicting. My hunch is that explaining is little less risky. Saying that the Donald would lose is tough to explain when the Donald allegedly “won.”
Explaining is like looser. The black-white, one-two, or yes-no thing is a bit less gelatinous.
So what’s the explainer explaining? The checklist is interesting:
- Alert me when it matters. The idea is that a system or smart software will proactively note when something important happens and send one of those mobile phone icon things to get a human to shift attention to the new thing. Nothing like distraction I say.
- Explore why on one’s own. Yep, this works really well for spelunkers who find themselves trapped. Exploration is okay, but it is helpful to [a] know where one is, [b] know where one is going, and [c] know the territory. Caves can be killers, not just dark and damp.
- Quantify impact in “real” dollars. The notion of quantifying strikes me as important. But doesn’t one quantify to determine if the prediction were on the money. I sniff a bit of flaming contradiction. The notion of knowing something in real time is good too. Now the problem becomes, “What’s real time?” I have tilled this field before and saying “real time” is different from delivering what one expects and what the system can do and what the outfit can afford.
It’s not even 2017, and I have learned that “prediction” is dead. I hope someone tells the folks at Recorded Future and Palantir Technologies. Will they listen?
Buzzwording with cacaphones is definitely alive and kicking.
Stephen E Arnold, January 3, 2017
December 29, 2016
A British charity is teaming up with an online intelligence startup specializing in Bitcoin. The Register reports on this in their piece called, Bitcoin child abuse image pervs will be hunted down by the IWF. The Internet Watch Foundation, with the help of a UK blockchain forensics start-up, Elliptic, aims to identify individuals who use Bitcoin to purchase child abuse images online. The IWF will provide Elliptic with a database of Bitcoin addresses and Elliptic takes care of the rest. We learned,
The IWF has identified more than 68,000 URLs containing child sexual abuse images. UNICEF Malaysia estimates two million children across the globe are affected by sexual exploitation every year. Susie Hargreaves, IWF CEO, said, “Over the past few years, we have seen an increasing amount of Bitcoin activity connected to purchasing child sexual abuse material online. Our new partnership with Elliptic is imperative to helping us tackle this criminal use of Bitcoin.” The collaboration means Elliptic’s clients will be able to automatically monitor transactions they handle for any connection to proceeds of child sex abuse.
Machine learning and data analytics technologies are used by Elliptic to collect actionable evidence for law enforcement and intelligence agencies. The interesting piece of this technology, and others like it, is that it runs perhaps as surreptitiously in the background as those who use the Dark Web and Bitcoin for criminal activity believe they do.
Megan Feil, December 29, 2016
December 25, 2016
I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.
The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like
Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?
The write up states that the problem was the sample itself: The actual humans.
The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:
“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”
The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.
The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:
With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.
My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.
If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?
Stephen E Arnold, December 25, 2016
December 19, 2016
Over at Hacker Noon, blogger “movrcx” reveals a potential vulnerability chain that he says threatens the entire Tor Browser ecosystem in, “Tor Browser Exposed: Anti-Privacy Implantation at Mass Scale.” Movrcx says the potential avenue for a massive hack has existed for some time, but taking advantage of these vulnerabilities would require around $100,000. This could explain why movrcx’s predicted attack seems not to have taken place. Yet. The write-up summarizes the technique:
Anti-Privacy Implantation at Mass Scale: At a high-level the attack path can be described by the following:
*Attacker gains custody of an addons.mozilla.org TLS certificate (wildcard preferred)
*Attacker begins deployment of malicious exit nodes
*Attacker intercepts the NoScript extension update traffic for addons.mozilla.org
*Attacker returns a malicious update metadata file for NoScript to the requesting Tor Browser
*The malicious extension payload is downloaded and then silently installed without user interaction
*At this point remote code execution is gained
*The attacker may use an additional stage to further implant additional software on the machine or to cover any signs of exploitation
This attack can be demonstrated by using Burp Suite and a custom compiled version of the Tor Browser which includes a hardcoded root certificate authority for transparent man-in-the-middle attacks.
See the article for movrcx’s evidence, reasoning, and technical details. He emphasizes that he is revealing this information in the hope that measures will be taken to nullify the potential attack chain. Preferably before some state or criminal group decides to invest in leveraging it.
Cynthia Murrell, December 19, 2016
December 14, 2016
The article on ScienceDaily titled New Study Highlights Power of Crowd to Transmit News on Twitter shows that Twitter is, in fact, good at something. That something is driving recommendations of news stories. A study executed by Columbia University and the French National Institute found that the vast majority of clicks on news stories is based on reader referrals. The article details the findings:
Though far more readers viewed the links news outlets promoted directly on Twitter… most of what readers shared and read was crowd-curated. Eighty-two percent of shares, and 61 percent of clicks, of the tweets in the study sample referred to content readers found on their own. But the crowd’s relative influence varied by outlet; 85 percent of clicks on tweets tied to a BBC story came from reader recommendations while only 10 percent of tweets tied to a Fox story did.
It will come as no shock that people are getting a lot more of their news through social media, but the study also suggests that people are often sharing stories without reading them at all. Indeed, one of the scientists stated that the correlation between likes, shares, and actual reads is very low. The problem inherent in this system is that readers will inevitably only look at content that they already agree with in a news loop that results in an even less informed public with even more information at their fingertips than ever before. Thanks Twitter.
Chelsea Kerwin, December 14, 2016
December 12, 2016
I read “5 Unexpected Sources of Bias in Artificial Intelligence.” Was I surprised? Yep, but the five examples seemed a bit more pop psychology than substantive. In my view, the bias in smart software originates with the flaws or weaknesses in the common algorithms used to build artificially intelligent systems. I have a lecture about the ways in which a content creator can fiddle with algorithms to generate specific results. I call the lecture “Weaponizing Information: Using Words to Fiddle with Algorithms.” (Want to know more? Write benkent2020 at yahoo dot com. Be aware that this is a for fee presentation.)
This “5 Unexpected…” write up offers these ideas:
- Data driven bias. The notion is that Stats 101 injunctions are happily ignored, forgotten, or just worked around. See what I mean? Human intent, not really mathy at its core.
- Bias through interaction. The idea is that humans interact. If the humans are biased, guess what? The outputs are biased, which dominoes down the line. Key word: Human.
- Emergent bias. This is the filter bubble. I view this as feedback looping, which is a short cut to figuring out stuff. I ran across this idea years ago in Minneapolis. A start up there was explaining how to let me do one thing to inform the somewhat dull system about what to present. Does this sound like Amazon’s method to you?
- Similarity bias. Now we are getting close to a mathy notion. But the write up wanders back to the feedback notion and does not ask questions about the wonkiness of clustering. Sigh.
- Conflicting goals bias. Now that puzzled me. I read the paragraphs in the original article and highlighted stereotyping. This struck me as a variant of feedback.
Math is sort of objective, but this write up sticks to some broad and somewhat repetitive ideas. The bias enters when thresholds are set, data are selected, processes structured to deliver what the programmer [a] desires, [b] ze’s boss desires, [c] what can be made to run and sort of work in the time available, or [d] what the developer remembers from a university class, a Hacker News post, or a bit of open source goodness.
The key to bias is to keep the key word “human” in mind.
Stephen E Arnold, December 12, 2016
December 9, 2016
Digital Reasoning has released the latest iteration of its Synthesys platform, we learn from Datanami’s piece, “Cognitive Platform Sharpens Focus on Untructured Data.” Readers may recall that Digital Reasoning provides tools to the controversial US Army intelligence system known as DCGS. The write-up specifies:
Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.
The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.
The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.
Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.
Cynthia Murrell, December 9, 2016
December 8, 2016
Machine learning tools like the artificial intelligence Watson from IBM can and will improve healthcare access and diagnosis, but the problem is getting on the road to improvement. Implementing new technology is costly, including the actual equipment and training staff, and there is always the chance it could create more problems than resolving them. However, if the new technology makes a job easier and resolves situations then you are on the path to improvement. The UK is heading that way says TechCrunch in, “DeepMind Health Inks New Deal With UK’s NHS To Deploy Streams App In Early 2017.”
London’s NHS Royal Free Hospital will employ DeepMind Health in 2017, taking advantage of its data sharing capabilities. Google owns DeepMind Health and it focuses on driving the application of machine learning algorithms in preventative medicine. The NHS and DeepMind Health had a prior agreement in the past, but when the New Scientist made a freedom of information request their use of patients’ personal information came into question. The information was used to power the Streams app to sent alerts to acute kidney injury patients. However, ICO and MHRA shut down Streams when it was discovered it was never registered as a medical device.
The eventual goal is to relaunch Streams, which is part of the deal, but DeepMind has to repair its reputation. DeepMind is already on the mend with the new deal and registering Streams as a medical device also helped. In order for healthcare apps to function properly, they need to be tested:
The point is, healthcare-related AI needs very high-quality data sets to nurture the kind of smarts DeepMind is hoping to be able to build. And the publicly funded NHS has both a wealth of such data and a pressing need to reduce costs — incentivizing it to accept the offer of “free” development work and wide-ranging partnerships with DeepMind…
Streams is the first step towards a healthcare system powered by digital healthcare products. As already seen is the stumbling block protecting personal information and powering the apps so they can work. Where does the fine line between the two end?
Whitney Grace, December 8, 2016
December 8, 2016
Kaspersky Lab researchers exposed a massive global underground market selling more than 70,000 hacked servers from government entities, corporations and universities for as little as $6 each.
The cybersecurity firm said the newly discovered xDedic marketplace currently has a listing of 70,624 hacked Remote Desktop Protocol (RDP) servers for sale. It’s reported that many of the servers either host or provide access to consumer sites and services, while some have software installed for direct mail, financial accounting and POS processing, Kaspersky Lab confirmed.
Kapersky’s Costin Raiu notes the study is evidence that “cybercrime-as-a-service” is growing, and has been developing its own, well-organized infrastructure. He also observes that the victims of these criminals are not only the targets of attack, but the unwitting server-owners. xDedic, he says, represents a new type of cybercriminal marketplace.
Kapersky Lab recommends organizations take these precautions:
*Implement multi-layered approach to IT infrastructure security that includes a robust security solution
*Use of strong passwords in server authentication processes
*Establish an ongoing patch management process
*Perform regular security audits of IT infrastructures
*Invest in threat intelligence services”
Stay safe, dear readers.
Cynthia Murrell, December 8, 2016
November 30, 2016
The development team behind the Tor Project recently announced the release of Tor 0.2.9.5 that is almost bug-free, stable and secure.
Softpedia in a release titled New Tor “The Onion Router” Anonymity Network Stable Branch Getting Closer says:
Tor 0.2.9.5 Alpha comes three weeks after the release of the 0.2.9.4 Alpha build to add a large number of improvements and bug fixes that have been reported by users since then or discovered by the Tor Project’s hard working development team. Also, this release gets us closer to the new major update of The Onion Router anonymity network.
Numerous bugs and loopholes were being reported in Tor Network that facilitated backdoor entry to snooping parties on Tor users. With this release, it seems those security loopholes have been plugged.
The development team is also encouraging users to test the network further to make it completely bug-free:
If you want to help the Tor Project devs polish the final release of the Tor 0.2.9 series, you can download Tor 0.2.9.5 Alpha right now from our website and install it on your GNU/Linux distribution, or just fetch it from the repositories of the respective OS. Please try to keep in mind, though, that this is a pre-release version, not to be used in production environments.
Though it will always be a cat and mouse game between privacy advocates and those who want to know what goes on behind the veiled network, it would be interesting to see who will stay ahead of the race.