Cambridge Analytica and Fellow Travelers

March 26, 2018

I read Medium’s “Russian Analyst: Cambridge Analytica, Palantir and Quid Helped Trump Win 2016 Election.” Three points straight away:

  1. The write up may be a nifty piece of disinformation
  2. The ultimate source of the “factoids” in the write up may be a foreign country with interests orthogonal to those of the US
  3. The story I saw is dated July 2017, but dates – like other metadata – can be fluid unless in a specialized system which prevents after the fact tampering.

Against this background of what may be hefty problems, let me highlight several of the points in the write up I found interesting.

More than one analytics provider. The linkage of Cambridge Analytica, Palantir Technologies, and Quid is not a surprise. Multiple tools, each selected for its particular utility, are a best practice in some intelligence analytics operations.

A Russian source. The data in the write up appear to arrive via a blog by a Russian familiar with the vendors, the 2016 election, and how analytic tools can yield actionable information.

Attributing “insights.” Palantir allegedly output data which suggested that Mr. Trump could win “swing” states. Quid’s output suggested, “Focus on the Midwest.” Cambridge Analytica suggested, “Use Twitter and Facebook.”

If you are okay with the source and have an interest in what might be applications of each of the identified companies’ systems, definitely read the article.

On April 3, 2018, my April 3, 2018, DarkCyber video program focuses on my research team’s reconstruction of a possible workflow. And, yes, the video accommodates inputs from multiple sources. We will announce the location of the Cambridge Analytica, GSR, and Facebook “reconstruction” in Beyond Search.

Stephen E Arnold, March 26, 2018

What Happens When Intelligence Centric Companies Serve the Commercial and Political Sectors?

March 18, 2018

Here’s a partial answer:

image

And

image

Plus

image

Years ago, certain types of companies with specific LE and intel capabilities maintained low profiles and, in general, focused on sales to government entities.

How times have changed!

In the DarkCyber video news program for March 27, 2018, I report on the Madison Avenue type marketing campaigns. These will create more opportunities for a Cambridge Analytica “activity.”

Net net: Sometimes discretion is useful.

Stephen E Arnold, March 18, 2018

Crime Prediction: Not a New Intelligence Analysis Function

March 16, 2018

We noted “New Orleans Ends Its Palantir Predictive Policing Program.” The interest in this Palantir Technologies’ project surprised us from our log cabin with a view of the mine drainage run off pond. The predictive angle is neither new nor particularly stealthy. Many years ago when I worked for one of the outfits developing intelligence analysis systems, the “predictive” function was a routine function.

Here’s how it works:

  • Identify an entity of interest (person, event, organization, etc.)
  • Search for other items including the entity
  • Generate near matches. (We called this “fuzzification” because we wanted hits which were “near” the entity in which we had an interest. Plus, the process worked reasonably well in reverse too.)
  • Punch the analyze function.

Once one repeats the process several times, the system dutifully generates reports which make it easy to spot:

  • Exact matches; for example, a “name” has a telephone number and a dossier
  • Close matches; for example, a partial name or organization is associated with the telephone number of the identity
  • Predicted matches; for example, based on available “knowns”, the system can generate a list of highly likely matches.

The particular systems with which I am familiar allow the analyst, investigator, or intelligence professional to explore the relationships among these pieces of information. Timeline functions make it trivial to plot when events took place and retrieve from the analytics module highly likely locations for future actions. If an “organization” held a meeting with several “entities” at a particular location, the geographic component can plot the actual meetings and highlight suggestions for future meetings. In short, prediction functions work in a manner similar to Excel’s filling in items in a number series.

heat map with histogram

What would you predict as a “hot spot” based on this map? The red areas, the yellow areas, the orange areas, or the areas without an overlay? Prediction is facilitated with some outputs from intelligence analysis software. (Source: Palantir via Google Image search)

Read more

Schmidt Admits It Is Hard to Discern Between Fact and Fiction

March 15, 2018

One basic research essential is learning how to tell the difference between fact and fiction.  It used to be easier to control and verify news because information dissemination was limited to physical mediums.  The Internet blew everything out of the water and made it more difficult to discern fact and fiction.  Humans can be taught tricks, but AI still has a lot to learn.  The Daily Mail reports that, “Alphabet Chairman Eric Schmidt Admits It Is ‘Very Difficult’ For Google’s Algorithm To Separate Fact From Fiction In Its Search Results.”

Millions of articles and other content is posted daily online.  Google’s job is to sift through it and delivery the most accurate results.  When opposing viewpoints are shared, Google’s algorithm has difficulty figuring out the truth.  Eric Schmidt says that can be fixed with tweaking.  He viewed fact vs. fiction problems as bugs that need repair and with some work they can be fixed.  The article highlights some of the more infamous examples of Google’s failing such as the AutoComplete feature and how conspiracy theories can be regarded as fact.

Search results displaying only hard truth will be as elusive as accurate sentiment analytics.

Schmidt added:

That is a core problem of humans that they tend to learn from each other and their friends are like them.  And so until we decide collectively that occasionally somebody not like you should be inserted into your database, which is sort of a social values thing, I think we are going to have this problem.’

Or we can just wait until we make artificial intelligence smarter.

Whitney Grace, March 15, 2018

Come on Google, Stop Delivering Offensive Content

March 14, 2018

Sentiment analytics is notoriously hard to program and leads to more chuckles than accurate results.  Throughout the year, Google, Facebook, and other big names have dealt with their own embarrassing sentiment analytics fiascos and they still continue.  The Verge shares, “Google’s Top Search Results Promote Offensive Content, Again” in an unsurprising headline.

One recent example took an offensive meme from the swathe subreddit when “gender fluid” was queried and made it the first thing displayed.  Yes, it is funny, but stuff like this keeps happening without any sign of stopping:

The slip-up comes just a month after Google briefly gave its “top stories” stamp of approval to two 4chan threads identifying the wrong suspect in the recent Las Vegas mass shooting tragedy. This latest search result problem appears to be related to the company’s snippet feature. Featured snippets are designed to answer queries instantly, and they’ve often provided bad answers in the past. Google’s Home device, for example, used a featured snippet to answer the question ‘are women evil?’ with the horrendously bad answer ‘every woman has some degree of prostitute in her.’

The ranking algorithm was developed to pull the most popular stories and deliver them regardless of their accuracy.  Third parties and inaccurate sources can manipulate the ranking algorithm for their own benefit or human.  Google is considered the de facto source of information.  There is a responsibility of purveying the truth, but there will always be people who take advantage of the news outlets.

Whitney Grace, March 14, 2018

Facebook Fails Discrimination Test

March 12, 2018

While racism and discrimination still plague society, the average person does not participate in it.  The Internet exacerbates hatred to the point that people believe it is more powerful today than it was in the past.  Social media Web sites do their best to prevent these topics from spreading by using sentiment analytics.  Sentiment analytics are still in their infancy and, on more than one occasion, have proven to work against their intended purpose.  TechCrunch shares that, “Facebook’s Ad System Shown Failing To Enforce Its Own Anti-Discriminatory Policy” is a recent example.

Facebook demands to be allowed to regulate themselves when it comes to abuse of their services, such as ads.  Despite the claims that Facebook can self-regulate itself, current events have proven the contrary.  The article points to Facebook’s claim that it disabled its ethnic affinity ad targeting for employment, housing, and credit.  ProPublica ran a test case by creating fake rental housing ads.  What did they discover? Facebook continues to discriminate:

However instead of the platform blocking the potentially discriminatory ad buys, ProPublica reports that all its ads were approved by Facebook “within minutes” — including an ad that sought to exclude potential renters “interested in Islam, Sunni Islam and Shia Islam”. It says that ad took the longest to approve of all its buys (22 minutes) — but that all the rest were approved within three minutes.

It also successfully bought ads that it judged Facebook’s system should at least flag for self-certification because they were seeking to exclude other members of protected categories. But the platform just accepted housing ads blocked from being shown to categories including ‘soccer moms’, people interested in American sign language, gay men and people interested in wheelchair ramps.

Facebook reiterated its commitment to anti-discrimination and ProPublica responds that if an outside research team was called to regulate Facebook then these ads would never have reached the Web.  Maybe Facebook should follow Google’s example and higher content curators to read every single ad to prevent the bad stuff from getting through.

Whitney Grace, March 12, 2018

Palantir Executive Reveals How Silicon Valley Really Works

March 5, 2018

I usually ignore the talking heads on the US television cable channels. I did perk up when I heard a comment made by Alex Karp, one of the founders of Palantir Technologies. The company’s Gotham and Metropolitan product lines (now evolved to a platform called Foundry), its licensing deals with Thomson Reuters, and the company’s work for commercial organizations is quite interesting. Most consumers and many users of high profile online services are unaware of Palantir. Some click centric outfits like Buzzfeed rattle the Palantir door knob with revelations about the super low profile company. The reality is that Palantir is not all that secret. In fact, a good online researcher can unearth examples of the company’s technology, including its plumbing, its interfaces, and its outputs. Dig further, and one can pinpoint some of the weaknesses in the company’s technology, systems, methods, and management approach.

In the CNBC interview, which appeared as an online story “CNBC Exclusive: CNBC Transcript: Palantir Technologies Co-Founder & CEO Alex Karp Joins CNBC’s Josh Lipton for Rare Interview Airing Today,” I noted several insights. Critics of Palantir might describes these comments in another way, but for me, I thought the ideas expressed were interesting and suggestive.

Here’s the first one:

I believe that Silicon Valley is creating innovation without jobs, and it’s really hurting our world.

I read this to mean that if one cannot get hired in a world infused with smart software, job hunters and seekers are dead in the water. Those homeless people, by extension, will replicate the living conditions in shanties in Third World countries. Most Silicon Valley cheerleaders sidestep what is a massive phase change in society.

The second statement I noted is:

Realize that most Silicon Valley companies don’t care and nor do they have a corporate responsibility to care.

For me, Mr. Karp is making clear that chatter from FAGMA (Facebook, Amazon, Google, Microsoft, and Apple) about doing the right thing, trying to be better, and accepting the responsibility which falls upon the shoulders of quasi-monopolies is just that—chatter. Palantir, it seems, is a critic of the Silicon Valley way. I find this fascinating.

The third statement I circled is:

We are primarily a creative organization, so that means we create, we try not to look at what other people are doing, or obviously not overly.

This statement does not hint at the out of court settlement with i2 Group. The legal dust up, which I discussed in this post, was not mentioned by either the interlocutor or Mr. Karp. The omission was notable. I don’t want to be skeptical of this “creative organization” phrase, but like many people who emerged from the start up scene, the “history” of innovation often has an important story to tell. But unless the CNBC interviewer knows about the allegations related to the ANB file format and other Analysts Notebook functions, the assertion creeps toward becoming a fact about Palantir’s innovation. (Note: I was an adviser to i2 Group Ltd., before the company’s founders sold the enterprise.)

The original interview is interesting and quite revelatory. Today, I believe that history can be reshaped. It’s not fake news; it’s a consequence of how information is generated, disseminated, and consumed in an uncritical way.

Stephen E Arnold, March 5, 2018

An Upside to Fake Data

February 2, 2018

We never know if “data” are made up or actual factual. Nevertheless, we read “How Fake Data Can Help the Pentagon Track Rogue Weapons.” The main idea from our point of view is predictive analytics which can adapt to that which has not yet happened. We circled this statement from the company with the contract to make “fake” data useful under a US government contract:

IvySys Founder and Chief Executive Officer James DeBardelaben compared the process to repeatedly finding a needle in a haystack, but making both the needle and haystack look different every time. Using real-world data, agencies can only train algorithms to spot threats that already exist, he said, but constantly evolving synthetic datasets can train tools to spot patterns that have yet to occur.

Worth monitoring IvySys at https://www.ivysys.com/.

Stephen E Arnold, February 2, 2018

Averaging Information Is Not Cutting It Anymore

January 16, 2018

Here is something interesting that comes after the headline of “People From Around The Globe Met For The First Flat Earth Conference” and beliefs that white supremacists are gaining more power.  The Frontiers Media shares that, “Rescuing Collective Wisdom When The Average Group Opinion Is Wrong” is an article that pokes fun at the fanaticism running rampant in the news.  Beyond the fanaticism in the news, there is a real concern with averaging when it comes to data science and other fields that heavily rely on data.

The article breaks down the different ways averaging is used and the different theorems that are developed from it.  The introduction is a bit wordy but it sets the tone:

The total knowledge contained within a collective supersedes the knowledge of even its most intelligent member. Yet the collective knowledge will remain inaccessible to us unless we are able to find efficient knowledge aggregation methods that produce reliable decisions based on the behavior or opinions of the collective’s members. It is often stated that simple averaging of a pool of opinions is a good and in many cases the optimal way to extract knowledge from a crowd. The method of averaging has been applied to analysis of decision-making in very different fields, such as forecasting, collective animal behavior, individual psychology, and machine learning. Two mathematical theorems, Condorcet’s theorem and Jensen’s inequality, provide a general theoretical justification for the averaging procedure. Yet the necessary conditions which guarantee the applicability of these theorems are often not met in practice. Under such circumstances, averaging can lead to suboptimal and sometimes very poor performance. Practitioners in many different fields have independently developed procedures to counteract the failures of averaging. We review such knowledge aggregation procedures and interpret the methods in the light of a statistical decision theory framework to explain when their application is justified. Our analysis indicates that in the ideal case, there should be a matching between the aggregation procedure and the nature of the knowledge distribution, correlations, and associated error costs.

Understanding how data can be corrupted is half the battle of figuring out how to correct the problem.  This is one of the complications related to artificial intelligence and machine learning.  One example is trying to build sentiment analysis engines.  These require huge data terabytes and the Internet provides an endless supply, but the usual result is that the sentiment analysis engines end up racist, misogynist, and all around trolls.  It might lead to giggles but does not very accurate results.

Whitney Grace, January 17, 2018

Apples Orchard of AI Talent

January 1, 2018

Here’s an analysis that will be of interest to competitive artificial intelligence professionals. Fast Company reports on its own research in the piece, “Where Apple Recruits Its AI Talent, According to LinkedIn.” Writer Jared Newman begins:

Apple appears to have doubled its headcount in artificial intelligence and related fields since 2014–and more than tripled its number of PhD holders in the sector–as tech companies race to build a generation of smarter products. That’s one conclusion from an analysis of more than 600 Apple employees who specialize in machine learning, computer vision, natural language processing, and other disciplines related to AI. To help us understand where Apple is getting its AI talent, Fast Company created a database from publicly available LinkedIn profiles, searching for employees who either defined their jobs as “scientist” or “researcher” or listed AI-related skills in their resumes. This analysis certainly does have some limitations: It won’t account for employees who have defined their jobs in vague terms on their profiles, self-reported inaccurately or incompletely, or have avoided sharing their employment information on LinkedIn entirely. Apple has reportedly discouraged employees from announcing their AI jobs on LinkedIn in the past, so blind spots in our study are inevitable. Still, this analysis provides a broad snapshot of Apple’s response to a growing AI arms race in the tech industry.

The article goes on to share several graphs representing Apple AI hiring trends, like the proportion of Ph.D. to non-Ph.D., hires by year; or the percentages of employees obtained from acquisitions, universities or government organizations, and other businesses. We can also see from which businesses and universities Apple have hired most, and which acquisitions brought the company the most AI talent. See the article for all the details.

Cynthia Murrell, January 1, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta