Quote to Note: Statistics May Spoil Like Bananas

April 13, 2018

I noticed this synopsis for a talk by Andrew Gelman, a wizard who teaches at Columbia University. You can find the summary in “Do Statistical methods Have an Expiration Date?” Here’s the quote I noted:

The statistical methods which revolutionized science in the 1930s-1950s no longer seem to work in the 21st century. How can this be? It turns out that when effects are small and highly variable, the classical approach of black-box inference from randomized experiments or observational studies no longer works as advertised.

What happens when these methods are bolted into next generation data analytics systems which humans use to make decisions? My great uncle (Vladimir.I. Arnold and his co worker Andrey Kolmogorov could calculate an answer I assume?)

Stephen E Arnold, April 13, 2018

The AI Spy Who Photographed Me

March 29, 2018

Artificial intelligence is one of the of the tools that law enforcement is using to thwart potential terrorist attacks and other illegal activities.  Applications use AI to run data analysis, scan the Dark Web, and monitor identity theft.  One major use for AI is image analysis and facial recognition.  IEEE Spectrum takes a look at how there is a huge demand for more accurate image AI, “Wanted: AI That Can Spy.”  While fear over spy satellites is not much a plot point anymore, the US has hundreds of satellites orbiting the planet capturing photographic data.  Humans are only capable of observing so many photographic data and the US government has FOMO “fear of missing out” on something important.

US intelligence officials sponsored an AI challenge to identify objects of interest in satellite images.  The entire goal is to improve AI standards and capabilities:

Since July, competitors have trained machine-learning algorithms on one of the world’s largest publicly available data sets of satellite imagery—containing 1 million labeled objects, such as buildings and facilities. The data is provided by the U.S. Intelligence Advanced Research Projects Activity (IARPA). The 10 finalists will see their AI algorithms scored against a hidden data set of satellite imagery when the challenge closes at the end of December.

The agency’s goal in sponsoring the Functional Map of the World Challenge aligns with statements made by Robert Cardillo, director of the U.S. National Geospatial-Intelligence Agency, who has pushed for AI solutions that can automate 75 percent of the workload currently performed by humans analyzing satellite images.

Lockheed research scientist Mark Pritt guessed that the US government wants to automatically generate maps, instead of relying on manual labor.  Pritt’s Lockheed team is one of the many teams competing for the $100,000 prize to develop the best deep-learning algorithm that can recognize specific patterns and identify objects of interest in satellite images.  Satellite images are more complex than other images because they are shot from multiple angles, cloud coverage is a problem, and a variety of resolutions.

Even if a deep-learning algorithm was developed it would not be enough, because the algorithm lacks the ability for refinement.  Think sentimental analysis, except with images.  The perfect solution for the moment is a combination of AI and human interaction.  The AI does the bulk of the work, while humans examine flagged photos for further investigation.

Whitney Grace, March 29, 2018

Importance of Good Data to AI Widely Underappreciated

March 27, 2018

Reliance on AI has now become embedded in our culture, even as we struggle with issues of algorithmic bias and data-driven discrimination. Tech news site CIO reminds us, “AI’s Biggest Risk Factor: Data Gone Wrong.” In the detailed article, journalist Maria Korolov begins with some early examples of “AI gone bad” that have already occurred, and explains how this happens; hard-to-access data, biases lurking within training sets, and faked data are all concerns. So is building an effective team of data management workers who know what they are doing. Regarding the importance of good data, Korolov writes:

Ninety percent of AI is data logistics, says JJ Guy, CTO at Jask, an AI-based cybersecurity startup. All the major AI advances have been fueled by advances in data sets, he says. ‘The algorithms are easy and interesting, because they are clean, simple and discrete problems,’ he says. ‘Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world.’… However, companies often don’t realize the importance of good data until they have already started their AI projects. ‘Most organizations simply don’t recognize this as a problem,’ says Michele Goetz, an analyst at Forrester Research. ‘When asked about challenges expected with AI, having well curated collections of data for training AI was at the bottom of the list.’ According to a survey conducted by Forrester last year, only 17 percent of respondents say that their biggest challenge was that they didn’t ‘have a well-curated collection of that to train an AI system.’

Eliminating bias gleaned from training sets (like one AI’s conclusion that anyone who’s cooking must be a woman) is tricky, but certain measures could help. For example, tools that track how an algorithm came to a certain conclusion can help developers correct its impression. Also, independent auditors bring in a fresh perspective. These delicate concerns are part of why, says Korolov, AI companies are “taking it slow.” This is slow? We’d better hang on to our hats whenever (they decide) they’ve gotten a handle on these issues.

Cynthia Murrell, March 27, 2018

Cambridge Analytica and Fellow Travelers

March 26, 2018

I read Medium’s “Russian Analyst: Cambridge Analytica, Palantir and Quid Helped Trump Win 2016 Election.” Three points straight away:

  1. The write up may be a nifty piece of disinformation
  2. The ultimate source of the “factoids” in the write up may be a foreign country with interests orthogonal to those of the US
  3. The story I saw is dated July 2017, but dates – like other metadata – can be fluid unless in a specialized system which prevents after the fact tampering.

Against this background of what may be hefty problems, let me highlight several of the points in the write up I found interesting.

More than one analytics provider. The linkage of Cambridge Analytica, Palantir Technologies, and Quid is not a surprise. Multiple tools, each selected for its particular utility, are a best practice in some intelligence analytics operations.

A Russian source. The data in the write up appear to arrive via a blog by a Russian familiar with the vendors, the 2016 election, and how analytic tools can yield actionable information.

Attributing “insights.” Palantir allegedly output data which suggested that Mr. Trump could win “swing” states. Quid’s output suggested, “Focus on the Midwest.” Cambridge Analytica suggested, “Use Twitter and Facebook.”

If you are okay with the source and have an interest in what might be applications of each of the identified companies’ systems, definitely read the article.

On April 3, 2018, my April 3, 2018, DarkCyber video program focuses on my research team’s reconstruction of a possible workflow. And, yes, the video accommodates inputs from multiple sources. We will announce the location of the Cambridge Analytica, GSR, and Facebook “reconstruction” in Beyond Search.

Stephen E Arnold, March 26, 2018

Algorithm Positions Microsoft on Top of Global Tech Field

March 23, 2018

This is quite a surprise. Reporting the results of their own analysis, Reuters announces, “Microsoft Tops Thomson Reuters Top 100 Global Tech Leaders List.” The write-up tells us that, in second and third place, were:

… Chipmaker Intel and network gear maker Cisco Systems. The list, which aims to identify the industry’s top financially successful and organizationally sound organizations, features US tech giants such as Apple, Alphabet, International Business Machines and Texas Instruments, among its top 10. Microchip maker Taiwan Semiconductor Manufacturing, German business software giant SAP and Dublin-based consultant Accenture round out the top 10. The remaining 90 companies are not ranked, but the list also includes the world’s largest online retailer Amazon and social media giant Facebook.


The results are based on a 28-factor algorithm that measures performance across eight benchmarks: financial, management and investor confidence, risk and resilience, legal compliance, innovation, people and social responsibility, environmental impact, and reputation. The assessment tracks patent activity for technological innovation and sentiment in news and selected social media as the reflection of a company’s public reputation. The set of tech companies is restricted to those that have at least $1 billion in annual revenue.

That is an interesting combination of factors; I’d like to see that Venn diagram. Some trends emerged from the report. For example, 45 of those 100 companies are based in the US (but 47 in North America); 38 are headquartered in Asia, 14 in Europe, and one in Australia.

Cynthia Murrell, March 23, 2018

What Happens When Intelligence Centric Companies Serve the Commercial and Political Sectors?

March 18, 2018

Here’s a partial answer:






Years ago, certain types of companies with specific LE and intel capabilities maintained low profiles and, in general, focused on sales to government entities.

How times have changed!

In the DarkCyber video news program for March 27, 2018, I report on the Madison Avenue type marketing campaigns. These will create more opportunities for a Cambridge Analytica “activity.”

Net net: Sometimes discretion is useful.

Stephen E Arnold, March 18, 2018

Crime Prediction: Not a New Intelligence Analysis Function

March 16, 2018

We noted “New Orleans Ends Its Palantir Predictive Policing Program.” The interest in this Palantir Technologies’ project surprised us from our log cabin with a view of the mine drainage run off pond. The predictive angle is neither new nor particularly stealthy. Many years ago when I worked for one of the outfits developing intelligence analysis systems, the “predictive” function was a routine function.

Here’s how it works:

  • Identify an entity of interest (person, event, organization, etc.)
  • Search for other items including the entity
  • Generate near matches. (We called this “fuzzification” because we wanted hits which were “near” the entity in which we had an interest. Plus, the process worked reasonably well in reverse too.)
  • Punch the analyze function.

Once one repeats the process several times, the system dutifully generates reports which make it easy to spot:

  • Exact matches; for example, a “name” has a telephone number and a dossier
  • Close matches; for example, a partial name or organization is associated with the telephone number of the identity
  • Predicted matches; for example, based on available “knowns”, the system can generate a list of highly likely matches.

The particular systems with which I am familiar allow the analyst, investigator, or intelligence professional to explore the relationships among these pieces of information. Timeline functions make it trivial to plot when events took place and retrieve from the analytics module highly likely locations for future actions. If an “organization” held a meeting with several “entities” at a particular location, the geographic component can plot the actual meetings and highlight suggestions for future meetings. In short, prediction functions work in a manner similar to Excel’s filling in items in a number series.

heat map with histogram

What would you predict as a “hot spot” based on this map? The red areas, the yellow areas, the orange areas, or the areas without an overlay? Prediction is facilitated with some outputs from intelligence analysis software. (Source: Palantir via Google Image search)

Read more

The Flaws in Smart Software Methods

March 15, 2018

I read “Machine Learning Models Keep Getting Spoofed by Adversarial Attacks and It’s Not Clear If This Can Ever Be Fixed.” About four years ago I gave a series of lectures about the most commonly used mathematical procedures used in smart software. The lecture included equations, which I learned, are not high on the list of popular ways law enforcement and intelligence professionals favorite types of information.

Despite the inclusion of this lecture in some of my conference talks, only since the allegations, assertions, and counter assertions about interference via social media has the topic of flawed methods become popular.

The write up “Machine Learning Models…” is okay. The write up covers the basics, but specific information about why clustering can be disrupted or why anomaly detection numerical recipes can go off the rails is not included.

My point is that models can be enhanced and improved. However, in order to make even incremental progress, the companies, universities, and individuals involved in cooking up warmed over mathematical procedures have to take the initiative; for example:

  1. Question the use of textbook methods. Does Google’s struggle to identify faces in images reflect a dependence on Dr. Norvig’s recipe book?
  2. Become more demanding when threshold settings are implemented by an intern or an engineer who thinks the defaults are just dandy.
  3. Examine outputs in the context of a user who has subject matter expertise in the content and can identify wonky outputs
  4. Encourage developers to move beyond copying and pasting routines from college courses or methods invoked from a library someone said was pretty good.
  5. Evaluate the work flow sequence for its impact on system outputs
  6. Verify that “more data” works around flaws in the data by magic.

Until these types of shifts take place, smart software — whether for machine learning or making sense of real time flows of data — will remain less than perfect for many use cases.

Stephen E Arnold, March 15, 2018

Google Accused of Censorship

March 13, 2018

Google, Facebook, and other social media and news outlets are concerned with fake news.  They have taken preliminary measures to curb false, but Live Mint says, “Google Is Filtering News For The Wrong Reason.”  Google, like other news outlets and social media platforms, is a business. While it delivers products and services, its entire goal is to turn a profit.  Anything that affects the bottom line, such as false information, is deemed inappropriate.

Google deemed the Russian government-owned news Web sites RT and Sputnik as false information generators, so the search engine giant has reworked its ranking algorithm.  The new ranking algorithm pushes RT and Sputnik way down in news searches.  Live Mint explained that this made RT and Sputnik victims, but Google does not want to ban these Web sites.  Instead, Google has other ideas:

Schmidt’s words are a riff on an April post by Google vice president of engineering Ben Gomes, who teased changes to how Google searches for news. New instructions targeted “deceptive web pages” that look like news but seek to “manipulate users” with conspiracy theories, hoaxes, and inaccurate information. ‘We’ve adjusted our signals to help surface more authoritative pages and demote low-quality content,’ Gomes wrote.

The author makes a poignant argument about why it is bad for businesses to alter their services, such as a news aggregator, to avoid bad press and increase regulation on them.  He also argues that false information Web sites are harmful, but it is not Google’s responsibility to censor them.

It is a good point, but when people take everything printed on the Internet as fact someone has to take the moral argument to promote the truth.

Whitney Grace, March 13, 2018

Racism and Artificial Intelligence Become a Hot Topic

March 8, 2018

Here’s a scary thought: What if AI and machine learning inadvertently (or purposely) discriminate? Impossible, you say. How can an algorithm see race? Some of the brightest minds in the business have some shocking insight into this idea and it isn’t pretty, as we learned in a recent NextWeb story, “The future of FinTech is racist, according to this anonymous data scientist.”

According to the story:

Anybody that says, “We’re an AI company that’s making smarter loans”: racist. Absolutely, 100%.

I was actually floored, during the last Super Bowl I saw this SoFi ad that said, “We discriminate.” I was just sitting there watching this game like I cannot believe it — it’s either they don’t know, which is terrifying, or they know and they don’t give a shit, which is also terrifying.

I don’t know how that court case is going to work out, but I can tell you in the next ten years, there’s going to be a court case about it. And I would not be surprised if SoFi lost for discrimination. And in general, I think it’s going to be an increasingly important question about the way that we handle protected classes generally, and maybe race specifically, in data science models of this type.

It doesn’t end there. One future looking scientist had similar things to say, stating that if AI recognizes racism as a pattern, it might not have the intelligence not to proliferate it in many aspects of life. Haunting. Ideally, this will be the point in history where ethicists step in and help guide this crucial moment in our world.

Patrick Roland, March 8, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta