An Apologia for People. Big Data Are Just Peachy Keen

December 25, 2016

I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.

The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like

image

Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?

The write up states that the problem was the sample itself: The actual humans.

The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:

“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”

The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.

The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:

With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.

My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.

If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?

Stephen E Arnold, December 25, 2016

Potential Tor Browser Vulnerability Reported

December 19, 2016

Over at Hacker Noon, blogger “movrcx” reveals a potential vulnerability chain that he says threatens the entire Tor Browser ecosystem in, “Tor Browser Exposed: Anti-Privacy Implantation at Mass Scale.” Movrcx says the potential avenue for a massive hack has existed for some time, but taking advantage of these vulnerabilities would require around $100,000. This could explain why movrcx’s predicted attack seems not to have taken place. Yet. The write-up summarizes the technique:

Anti-Privacy Implantation at Mass Scale: At a high-level the attack path can be described by the following:

*Attacker gains custody of an addons.mozilla.org TLS certificate (wildcard preferred)

*Attacker begins deployment of malicious exit nodes

*Attacker intercepts the NoScript extension update traffic for addons.mozilla.org

*Attacker returns a malicious update metadata file for NoScript to the requesting Tor Browser

*The malicious extension payload is downloaded and then silently installed without user interaction

*At this point remote code execution is gained

*The attacker may use an additional stage to further implant additional software on the machine or to cover any signs of exploitation

This attack can be demonstrated by using Burp Suite and a custom compiled version of the Tor Browser which includes a hardcoded root certificate authority for transparent man-in-the-middle attacks.

See the article for movrcx’s evidence, reasoning, and technical details. He emphasizes that he is revealing this information in the hope that measures will be taken to nullify the potential attack chain. Preferably before some state or criminal group decides to invest in leveraging it.

Cynthia Murrell, December 19, 2016

Twitter Fingers May Be Sharing News Story Links but That Does Not Mean Anyone Read the Article

December 14, 2016

The article on ScienceDaily titled New Study Highlights Power of Crowd to Transmit News on Twitter shows that Twitter is, in fact, good at something. That something is driving recommendations of news stories. A study executed by Columbia University and the French National Institute found that the vast majority of clicks on news stories is based on reader referrals. The article details the findings:

Though far more readers viewed the links news outlets promoted directly on Twitter… most of what readers shared and read was crowd-curated. Eighty-two percent of shares, and 61 percent of clicks, of the tweets in the study sample referred to content readers found on their own. But the crowd’s relative influence varied by outlet; 85 percent of clicks on tweets tied to a BBC story came from reader recommendations while only 10 percent of tweets tied to a Fox story did.

It will come as no shock that people are getting a lot more of their news through social media, but the study also suggests that people are often sharing stories without reading them at all. Indeed, one of the scientists stated that the correlation between likes, shares, and actual reads is very low. The problem inherent in this system is that readers will inevitably only look at content that they already agree with in a news loop that results in an even less informed public with even more information at their fingertips than ever before. Thanks Twitter.

Chelsea Kerwin, December 14, 2016

Smart Software and Bias: Math Is Not Objective, Right?

December 12, 2016

I read “5 Unexpected Sources of Bias in Artificial Intelligence.” Was I surprised? Yep, but the five examples seemed a bit more pop psychology than substantive. In my view, the bias in smart software originates with the flaws or weaknesses in the common algorithms used to build artificially intelligent systems. I have a lecture about the ways in which a content creator can fiddle with algorithms to generate specific results. I call the lecture “Weaponizing Information: Using Words to Fiddle with Algorithms.” (Want to know more? Write benkent2020 at yahoo dot com. Be aware that this is a for fee presentation.)

This “5 Unexpected…” write up offers these ideas:

  • Data driven bias. The notion is that Stats 101 injunctions are happily ignored, forgotten, or just worked around. See what I mean? Human intent, not really mathy at its core.
  • Bias through interaction. The idea is that humans interact. If the humans are biased, guess what? The outputs are biased, which dominoes down the line. Key word: Human.
  • Emergent bias. This is the filter bubble. I view this as feedback looping, which is a short cut to figuring out stuff. I ran across this idea years ago in Minneapolis. A start up there was explaining how to let me do one thing to inform the somewhat dull system about what to present. Does this sound like Amazon’s method to you?
  • Similarity bias. Now we are getting close to a mathy notion. But the write up wanders back to the feedback notion and does not ask questions about the wonkiness of clustering. Sigh.
  • Conflicting goals bias. Now that puzzled me. I read the paragraphs in the original article and highlighted stereotyping. This struck me as a variant of feedback.

Math is sort of objective, but this write up sticks to some broad and somewhat repetitive ideas. The bias enters when thresholds are set, data are selected, processes structured to deliver what the programmer [a] desires, [b] ze’s boss desires,  [c] what can be made to run and sort of work in the time available, or [d] what the developer remembers from a university class, a Hacker News post, or a bit of open source goodness.

The key to bias is to keep the key word “human” in mind.

Stephen E Arnold, December 12, 2016

Digital Reasoning Releases Synthesis Version 4

December 9, 2016

Digital Reasoning has released the latest iteration of its Synthesys platform, we learn from Datanami’s piece, “Cognitive Platform Sharpens Focus on Untructured Data.” Readers may recall that Digital Reasoning provides tools to the controversial US Army intelligence system known as DCGS. The write-up specifies:

Version 4 of the Digital Reasoning platform released on Tuesday (June 21) is based on proprietary analytics tools that apply deep learning neural network techniques across text, audio and images. Synthesys 4 also incorporates behavioral analytics based on anomaly detection techniques.

The upgrade also reflects the company’s push into user and ‘entity’ behavior analytics, a technique used to leverage machine learning in security applications such as tracking suspicious activity on enterprise networks and detecting ransomware attacks. ‘We are especially excited to expand into the area of entity behavior analytics, combining the analysis of structured and unstructured data into a person-centric, prioritized profile that can be used to predict employees at risk for insider threats,’ Bill DiPietro, Digital Reasoning’s vice president of product management noted in a statement.

The platform has added Spanish and Chinese to its supported languages, which come with syntactic parsing. There is also now support for Elasticsearch, included in the pursuit of leveraging unstructured data in real time. The company emphasizes the software’s ability to learn from context, as well as enhanced tools for working with reports.

Digital Reasoning was founded in 2000, and makes its primary home in Nashville, Tennessee, with offices in Washington, DC, and London. The booming company is also hiring, especially in the Nashville area.

Cynthia Murrell, December 9, 2016

 

 

 

The Data Sharing of Healthcare

December 8, 2016

Machine learning tools like the artificial intelligence Watson from IBM can and will improve healthcare access and diagnosis, but the problem is getting on the road to improvement.  Implementing new technology is costly, including the actual equipment and training staff, and there is always the chance it could create more problems than resolving them.  However, if the new technology makes a job easier and resolves situations then you are on the path to improvement.  The UK is heading that way says TechCrunch in, “DeepMind Health Inks New Deal With UK’s NHS To Deploy Streams App In Early 2017.”

London’s NHS Royal Free Hospital will employ DeepMind Health in 2017, taking advantage of its data sharing capabilities.  Google owns DeepMind Health and it focuses on driving the application of machine learning algorithms in preventative medicine.  The NHS and DeepMind Health had a prior agreement in the past, but when the New Scientist made a freedom of information request their use of patients’ personal information came into question.  The information was used to power the Streams app to sent alerts to acute kidney injury patients.  However, ICO and MHRA shut down Streams when it was discovered it was never registered as a medical device.

The eventual goal is to relaunch Streams, which is part of the deal, but DeepMind has to repair its reputation.  DeepMind is already on the mend with the new deal and registering Streams as a medical device also helped.  In order for healthcare apps to function properly, they need to be tested:

The point is, healthcare-related AI needs very high-quality data sets to nurture the kind of smarts DeepMind is hoping to be able to build. And the publicly funded NHS has both a wealth of such data and a pressing need to reduce costs — incentivizing it to accept the offer of “free” development work and wide-ranging partnerships with DeepMind…

Streams is the first step towards a healthcare system powered by digital healthcare products.  As already seen is the stumbling block protecting personal information and powering the apps so they can work.  Where does the fine line between the two end?

Whitney Grace, December 8, 2016

Increasingly Sophisticated Cybercrime

December 8, 2016

What a deal! Pymnts.com tells us that “Hacked Servers Sell for $6 On The Dark Web.” Citing recent research from Kapersky Lab, the write-up explains:

Kaspersky Lab researchers exposed a massive global underground market selling more than 70,000 hacked servers from government entities, corporations and universities for as little as $6 each.

The cybersecurity firm said the newly discovered xDedic marketplace currently has a listing of 70,624 hacked Remote Desktop Protocol (RDP) servers for sale. It’s reported that many of the servers either host or provide access to consumer sites and services, while some have software installed for direct mail, financial accounting and POS processing, Kaspersky Lab confirmed.

Kapersky’s Costin Raiu notes the study is evidence that “cybercrime-as-a-service” is growing, and has been developing its own, well-organized infrastructure. He also observes that the victims of these criminals are not only the targets of attack, but the unwitting server-owners. xDedic, he says, represents a new type of cybercriminal marketplace.

Kapersky Lab recommends organizations take these precautions:

*Implement multi-layered approach to IT infrastructure security that includes a robust security solution

*Use of strong passwords in server authentication processes

*Establish an ongoing patch management process

*Perform regular security audits of IT infrastructures

*Invest in threat intelligence services”

Stay safe, dear readers.

Cynthia Murrell, December 8, 2016

Bug-Free, Efficient Tor Network Inching Towards Completion

November 30, 2016

The development team behind the Tor Project recently announced the release of Tor 0.2.9.5 that is almost bug-free, stable and secure.

Softpedia in a release titled New Tor “The Onion Router” Anonymity Network Stable Branch Getting Closer says:

Tor 0.2.9.5 Alpha comes three weeks after the release of the 0.2.9.4 Alpha build to add a large number of improvements and bug fixes that have been reported by users since then or discovered by the Tor Project’s hard working development team. Also, this release gets us closer to the new major update of The Onion Router anonymity network.

Numerous bugs and loopholes were being reported in Tor Network that facilitated backdoor entry to snooping parties on Tor users. With this release, it seems those security loopholes have been plugged.

The development team is also encouraging users to test the network further to make it completely bug-free:

If you want to help the Tor Project devs polish the final release of the Tor 0.2.9 series, you can download Tor 0.2.9.5 Alpha right now from our website and install it on your GNU/Linux distribution, or just fetch it from the repositories of the respective OS. Please try to keep in mind, though, that this is a pre-release version, not to be used in production environments.

Though it will always be a cat and mouse game between privacy advocates and those who want to know what goes on behind the veiled network, it would be interesting to see who will stay ahead of the race.

Vishal Ingole, November 30, 2016
Sponsored by ArnoldIT.com, publisher of the CyberOSINT monograph

Examples of Visualizations

November 20, 2016

If you want a quick look at what visualizations to use for use cases, you may find “An Overview of Text Mining Visualizations Possibilities with R on the CETA Trade Agreement.” The article focuses on trade agreement data, but  the graphics provide a darned good refresher about visualization options. One caveat: Some of the links in the write up do not work. Nevertheless, we found the illustrations and commentary helpful.

Stephen E Arnold, November 20, 2016

Palantir Technologies: Less War with Gotham?

November 9, 2016

I read “Peter Thiel Explains Why His Company’s Defense Contracts Could Lead to Less War.” I noted that the write up appeared in the Washington Post, a favorite of Jeff Bezos I believe. The write up referenced a refrain which I have heard before:

Washington “insiders” currently leading the government have “squandered” money, time and human lives on international conflicts.

What I highlighted as an interesting passage was this one:

a spokesman for Thiel explained that the technology allows the military to have a more targeted response to threats, which could render unnecessary the wide-scale conflicts that Thiel sharply criticized.

I also put a star by this statement from the write up:

“If we can pinpoint real security threats, we can defend ourselves without resorting to the crude tactic of invading other countries,” Thiel said in a statement sent to The Post.

The write up pointed out that Palantir booked about $350 million in business between 2007 and 2016 and added:

The total value of the contracts awarded to Palantir is actually higher. Many contracts are paid in a series of installments as work is completed or funds are allocated, meaning the total value of the contract may be reflected over several years. In May, for example, Palantir was awarded a contract worth $222.1 million from the Defense Department to provide software and technical support to the U.S. Special Operations Command. The initial amount paid was $5 million with the remainder to come in installments over four years.

I was surprised at the Washington Post’s write up. No ads for Alexa and no Beltway snarkiness. That too was interesting to me. And I don’t have a dog in the fight. For those with dogs in the fight, there may be some billability worries ahead. I wonder if the traffic jam at 355 and Quince Orchard will now abate when IBM folks do their daily commute.

Stephen E Arnold, November 9, 2016

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta