A Peek into Google and Palantir Contracts: The UK National Health Service Versions
June 8, 2020
Curious about the legalese, terms, and conditions of US companies licensing and servicing government entities in the United Kingdom require? Good news. You can (at least as of June 6, 2020 at 0600 US Eastern time) can read allegedly complete contracts for software and services.
- The Google NHS agreement is at https://tinyurl.com/y88tzdqq
- The Microsoft NHS agreement is at https://tinyurl.com/y8vzj5ye
- The Palantir NHS agreement is at https://tinyurl.com/ybgdl82p
A contract from Faculty.ai is also available. Founded in 2014, Faculty.ai does not have the cachet of a Google. If you want to look at that contract, it is for now at https://tinyurl.com/ya3kzolw.
The deals are between these firms and an entity doing government business under the name of NHSX which seems to mean “a joint unit bringing together teams from the Department of Health and Social Care and NHS England and NHS Improvement to drive the digital transformation of care. COVID-19 Response.”
Are there some interesting details in these documents? Yep. Will these be shared in this blog post? Nope. You will learn some of the DarkCyber’s team insight if you attend our National Crime Conference presentation about investigative tools and systems.
Not invited? For fee briefings are still offered. Contact benkent2020 at yahoo dot com.
Stephen E Arnold, June 8, 2020
GoAccess: A Log Analyzer
June 4, 2020
We are updating our tools section of an upcoming National Crime Conference lecture. If you have access to a Web server and a log, you may want to take a look at GoAccess. The software
was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser.
“Analytics without Google” provides additional information about the software and includes helpful pointer. The article states:
What I further liked about GoAccess is I could run it on a separate machine, transferring logs from multiple servers into one place, then creating my necessary dashboards; this isn’t a specific feature of GoAccess, but a feature of the Unix philosophy. This flexibility works well with my seemingly ephemeral Digital Ocean Droplets, which don’t go kaboom on their own, but rather suffer from my own tendencies to erase and start from scratch. GoAccess reminded me how beautiful composable tools are. Its feature set is minimal and it plays nicely with the tools already available to us on a *nix platform. Do one thing and do it well — words of wisdom.
Worth a look.
Stephen E Arnold, June 4, 2020
Anonymized Location Data: an Oxymoron?
May 13, 2020
Location data. To many the term sounds innocuous, boring really. Perhaps that is why society has allowed apps to collect and sell it with no significant regulation. An engaging (and well-illustrated) piece from Norway’s NRK News, “Revealed by Mobile,” shares the minute details journalists were able to put together about one citizen from location data purchased on the open market. Graciously, this man allowed the findings to published as a cautionary tale. We suggest you read the article for yourself to absorb the chilling reality. (The link we share above runs through Google Translate.)
Vendors of location data would have us believe the information is completely anonymized and cannot be tied to the individuals who generated it. It is only good for general uses like statistics and regional marketing, they assert. Intending to put that claim to the test, NRK purchased a batch of Norwegian location data from the British firm Tamoco. Their investigation shows anonymization is an empty promise. Though the data is stripped of directly identifying information, buyers are a few Internet searches away from correlating location patterns with individuals. Journalists Trude Furuly, Henrik Lied, and Martin Gundersen tell us:
“All modern mobile phones have a GPS receiver, which with the help of satellite can track the exact position of the phone with only a few meters distance. The position data NRK acquired consisted of a table with four hundred million map coordinates from mobiles in Norway. …
“All the coordinates were linked to a date, time, and specific mobile. Thus, the coordinates showed exactly where a mobile or tablet had been at a particular time. NRK coordinated the mobile positions with a map of Norway. Each position was marked on the map as an orange dot. If a mobile was in a location repeatedly and for a long time, the points formed larger clusters. Would it be possible for us to find the identity of a mobile owner by seeing where the phone had been, in combination with some simple web searches? We selected a random mobile from the dataset.
“NRK searched the address where the mobile had left many points about the nights. The search revealed that a man and a woman lived in the house. Then we searched their Facebook profiles. There were several pictures of the two smiling together. It seemed like they were boyfriend and girlfriend. The man’s Facebook profile stated that he worked in a logistics company. When we searched the company in question, we discovered that it was in the same place as the person used to drive in the morning. Thus, we had managed to trace the person who owned the cell phone, even though the data according to Tamoco should have been anonymized.”
The journalists went on to put together a detailed record of that man’s movements over several months. It turns out they knew more about his trip to the zoo, for example, than he recalled himself. When they revealed their findings to their subject, he was shocked and immediately began deleting non-essential apps from his phone. Read the article; you may find yourself doing the same.
Cynthia Murrell, May 12, 2020
Google Apple Contact Tracing Interface
May 9, 2020
Now Toronto published “Here’s What Apple and Google’s COVID-19 Contact Tracing App Looks Like.” The article includes sample screenshots and some explanation about the data displayed. Worth a look. Much more is possible in terms of tracking, contact mapping, and analytics, of course. Who or what will have access to these more useful views of the collected data?
Stephen E Arnold, May 9, 2020
Sigma Gets $30 Million In Key Funding
April 30, 2020
Once the economic ramifications from the COVID-19 pandemic are underway and you are adjusting your investment portfolio, data analytics company stocks should not lose any value. Why? Data analytics platforms are in high demand and Sigma Computing recently nabbed: “Sigma Computing Raises $30 Million More For Cloud Data Analytics Tools” says Venture Beat.
Sigma Computing held a series B round of founding and added another $30 million to their fund. Investors in the second funding round include Sutter Hill Ventures and Altimeter Capital. CEO for Sigma Computing Rob Woollen said the money would be used for product development and product support.
Woollen stated that data is useless without making it comprehendible and capable of delivering actionable BI insights. Sigma makes data useable, but also keeping in mind the importance of governance, security issues, and compliance. Sigma uses a spreadsheet-like UI that transforms data from any source into useful insights, plus the search tool is powerful:
“Searches can be performed by natural language and by filter, the results of which can be compiled in an embeddable report and delivered via email. Where collaboration is concerned, Sigma’s link feature enables users to map data relationships and add linked data to documents. The platform’s workspaces are conducive to sharing — they can be circulated among teams, departments, or entire organizations — and spotlight important data blocks, worksheets, and interfaces with visual badges and a range of visualizations.”
Sigma Computing includes Zumper, Navis, LendUp, Clover, Volta, and Olivela among their clients. They sell software for data visualization and big data/business analytics, both markets combined are worth over $11 million. It sounds like a good investment.
Whitney Grace, April 30, 2020
GeoSpark Analytics: Real Time Analytics
April 6, 2020
In late 2017, OGSystems chopped out some of the firm’s analytics capabilities. The new company was Geospark Analytics. The service provided enabled customers like the US Department of Defense and FEMA to obtain information about important new events. “Events” is jargon for an alert plus data about something that is important.
“FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring” explains one use of the system. The write up says:
Geospark Analytics combines machine learning and big data to analyze events in real-time and warn of potential disruptions to the businesses of high-dollar private and public clientele…
Like Bluedot in Canada, Geospark was one of the monitoring companies analyzing open source and some specialized data to find interesting events. The write up continues:
Geospark Analytics’ product, called Hyperion, the namesake of the Titan son of Uranus (meaning, “watcher from above”), fingered Wuhan as a “hotspot,” in the company’s parlance, within hours after news of the virus first broke. “Hotspots tracks normal patterns of activity across the globe and provides a visual cue to flag disruptive events that could impact your employees, operations, and investments and result in billions of dollars in economic losses,” the company’s website says.
Engadget points out that there are a couple of companies with the name “Geospark.” DarkCyber finds this interesting. This statement provides more color about the Geospark approach:
Geospark Analytics claims to have processed “6.8 million” sources of information; everything from tweets to economic reports. “We geo-position it, we use natural language processing, and we have deep learning models that categorize the data into event and health models,” Goolgasian [Geospark’s CEO] said. It’s through these many millions of data points that the company creates what it calls a “baseline level of activity” for specific regions, such as Wuhan. A spike of activity around any number of security-, military-, or health-related topics and the system flags it as a potential disruption.
How does Geospark avoid the social media noise, bias, and disinformation that finds its way into open source content? The article states:
“We rely more on traditional data sources and we don’t do anything that isn’t publicly available,” Goolgasian said, echoing a common refrain among data firms that fuel surveillance products by mining the internet itself.
Providing specialized services to government agencies is not much of a surprise in DarkCyber’s opinion. Financial firms can also be avid consumers of real-time data. The idea is to get the jump on the competition which probably has its own source of digital insights.
Other observations:
- The apparent “surprise” threading through the Engadget article is a bit off putting. DarkCyber is aware of a number of social media and specialized content monitoring services. In fact, there is a surplus of these operations and not all will survive in the present business climate.
- Detecting and alerting are helpful but the messengers failed to achieve impact. How does DarkCyber know? Well, there is the lockdown.
- Publicizing what companies like Geospark and others do to generate income can have interesting consequences.
Net net: Some types of specialized services are difficult to explain in a way that reduces blowback. Some of the blowback have significant impact on social media analytics companies. The Geofeedia case is a reminder. I know. I know. “What’s a Geofeedia some may ask?”
Good question and DarkCyber thinks few know the answer. Plucking insights from information many people believe to be privileged can be fraught with business shock waves.
Stephen E Arnold, April 6, 2020
Forget Weak Priors, Certain Predictive Methods Just Fail
April 2, 2020
Nope. No equations. No stats speak. Tested predictive models were incorrect.
Navigate to “Researchers Find AI Is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship.” Here’s the finding, which is a delightfully clear:
A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren’t very accurate even when trained on 13,000 data points from over 4,000 families.
So what? The write up states in the form of a quote from the author of the paywalled paper:
“Here’s a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate,” said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. “These results show us that machine learning isn’t magic; there are clearly other factors at play when it comes to predicting the life course.”
We noted this comment from a researcher at Princeton University:
In the end, even the best of the over 3,000 models submitted — which often used complex AI methods and had access to thousands of predictor variables — weren’t spot on. In fact, they were only marginally better than linear regression and logistic regression, which don’t rely on any form of machine learning.
Several observations:
- Nice work AAAS. Keep advancing science with a paywall germane to criminal justice and policeware.
- Over inflation of the “value” of outputs from models is common in marketing. DarkCyber thinks that the weaknesses of these methods needs more than a few interviews with people like the Cathy O’Neil, author of Weapons of Math Destruction.
- Are those afflicted with innumeracy willing to delegate certain important actions to procedures which are worse than relying on luck, flipping a coin, or Monte Carlo methods?
Net net: No one made accurate predictions. Yep, no one. Thought stimulating research with implication for predictive analytics adherents. This open source paper provides some of the information referenced in the AAAS paper: Measuring the Predictability of Life Outcomes with a scientific mass collaboration
Stephen E Arnold, April 2, 2020
Wolfram Mathematica
March 19, 2020
DarkCyber noted “In Less Than a Year, So Much New: Launching Version 12.1 of Wolfram Language & Mathematica” contains highly suggestive information. Yes, this is a mathy program. The innovations are significant for analysts and some government professionals. To cite one example:
I’ve been recording hundreds of hours of video in connection with a new project I’m working on. So I decided to try our new capabilities on it. It’s spectacular! I could take a 4-hour video, and immediately extract a bunch of sample frames from it, and then—yes, in a few hours of CPU time—“summarize the whole video”, using SpeechRecognize to do speech-to-text on everything that was said and then generating a word cloud…
DarkCyber reacts positively to other additions and enhancements to the Mathematica “system.” Version 12.1 will make it easier to develop specific functions for policeware and intelware use cases.
Remarkable because the “system” can geo-everything. That’s important in many situations.
Stephen E Arnold, March 19, 2020
Israel and Mobile Phone Data: Some Hypotheticals
March 19, 2020
DarkCyber spotted a story in the New York Times: “Israel Looks to Repurpose a Trove of Cell Phone Data.” The story appeared in the dead tree edition on March 17, 2020, and you can access the online version of the write up at this link.
The write up reports:
Prime Minister Benjamin Netanyahu of Israel authorized the country’s internal security agency to tap into a vast , previously undisclosed trove of cell phone data to retract the movements of people who have contracted the corona virus and identify others who should be quarantined because their paths crossed.
Okay, cell phone data. Track people. Paths crossed. So what?
Apparently not much.
The Gray Lady does the handwaving about privacy and the fragility of democracy in Israel. There’s a quote about the need for oversight when certain specialized data are retained and then made available for analysis. Standard journalism stuff.
DarkCyber’s team talked about the write up and what the real journalists left out of the story. Remember. DarkCyber operates from a hollow in rural Kentucky and knows zero about Israel’s data collection realities. Nevertheless, my team was able to identify some interesting use cases.
Let’s look at a couple and conclude with a handful of observations.
First, the idea of retaining cell phone data is not exactly a new one. What if these data can be extracted using an identifier for a person of interest? What if a time-series query could extract the geolocation data for each movement of the person of interest captured by a cell tower? What if this path could be displayed on a map? Here’s a dummy example of what the plot for a single person of interest might look like. Please, note these graphics are examples selected from open sources. Examples are not related to a single investigation or vendor. These are for illustrative purposes only.
Source: Standard mobile phone tracking within a geofence. Map with blue lines showing a person’s path. SPIE at https://bit.ly/2TXPBby
Useful indeed.
Second, what if the intersection of two or more individuals can be plotted. Here’s a simulation of such a path intersection:
Source: Map showing the location of a person’s mobile phone over a period of time. Tyler Bell at https://bit.ly/2IVqf7y
Would these data provide a way to identify an individual with a mobile phone who was in “contact” with a person of interest? Would the authorities be able to perform additional analyses to determine who is in either party’s social network?
Third, could these relationship data be minded so that connections can be further explored?
Source: Diagram of people who have crossed paths visualized via Analyst Notebook functions. Globalconservation.org
Can these data be arrayed on a timeline? Can the routes be converted into an animation that shows a particular person of interest’s movements at a specific window of time?
Source: Vertical dots diagram from Recorded Future showing events on a timeline. https://bit.ly/39Xhbex
These hypothetical displays of data derived from cross correlations, geotagging, and timeline generation based on date stamps seem feasible. If earnest individuals in rural Kentucky can see the value of these “secret” data disclosed in the New York Times’ article, why didn’t the journalist and the others who presumably read the story?
What’s interesting is that systems, methods, and tools clearly disclosed in open source information is overlooked, ignored, or just not understood.
Now the big question: Do other countries have these “secret” troves of data?
DarkCyber does not know; however, it seems possible. Log files are a useful function of data processes. Data exhaust may have value.
Stephen E Arnold, March 19, 2020
First Counting Bees, Now Predicting Parrots
March 5, 2020
DarkCyber found amusing the write up “Parrots Can Make Predictions Based on Probabilities” interesting. With the corona virus data widely available, will these poly-nomial avians lend their expertise to global health administrators?
The write up asserts:
They [scientists] discovered the kea, a species of large parrot found in New Zealand, can make inferences and predict events based previous knowledge or experience. They [yep, this is a reference to the parrots] even performed better than chimps in some experiments.
The write up states:
The team said it is the first time this complex cognitive ability has been demonstrated in an animal outside of the great apes, which could help shed light on the “evolutionary history of statistical inference”.
Now is the time to apply parrot intelligence to tough computing problems like the Corona virus research. Polly, do you want a protein predictive output?
Stephen E Arnold, March 5, 2020