GeoSpark Analytics: Real Time Analytics

April 6, 2020

In late 2017, OGSystems chopped out some of the firm’s analytics capabilities. The new company was Geospark Analytics. The service provided enabled customers like the US Department of Defense and FEMA to obtain information about important new events. “Events” is jargon for an alert plus data about something that is important.

“FEMA Contractor Tracing Coronavirus Deaths Uses Web Scraping, Social Media Monitoring” explains one use of the system. The write up says:

Geospark Analytics combines machine learning and big data to analyze events in real-time and warn of potential disruptions to the businesses of high-dollar private and public clientele…

Like Bluedot in Canada, Geospark was one of the monitoring companies analyzing open source and some specialized data to find interesting events. The write up continues:

Geospark Analytics’ product, called Hyperion, the namesake of the Titan son of Uranus (meaning, “watcher from above”), fingered Wuhan as a “hotspot,” in the company’s parlance, within hours after news of the virus first broke. “Hotspots tracks normal patterns of activity across the globe and provides a visual cue to flag disruptive events that could impact your employees, operations, and investments and result in billions of dollars in economic losses,” the company’s website says.

Engadget points out that there are a couple of companies with the name “Geospark.” DarkCyber finds this interesting. This statement provides more color about the Geospark approach:

Geospark Analytics claims to have processed “6.8 million” sources of information; everything from tweets to economic reports. “We geo-position it, we use natural language processing, and we have deep learning models that categorize the data into event and health models,” Goolgasian [Geospark’s CEO] said. It’s through these many millions of data points that the company creates what it calls a “baseline level of activity” for specific regions, such as Wuhan. A spike of activity around any number of security-, military-, or health-related topics and the system flags it as a potential disruption.

How does Geospark avoid the social media noise, bias, and disinformation that finds its way into open source content? The article states:

“We rely more on traditional data sources and we don’t do anything that isn’t publicly available,” Goolgasian said, echoing a common refrain among data firms that fuel surveillance products by mining the internet itself.

Providing specialized services to government agencies is not much of a surprise in DarkCyber’s opinion. Financial firms can also be avid consumers of real-time data. The idea is to get the jump on the competition which probably has its own source of digital insights.

Other observations:

  • The apparent “surprise” threading through the Engadget article is a bit off putting. DarkCyber is aware of a number of social media and specialized content monitoring services. In fact, there is a surplus of these operations and not all will survive in the present business climate.
  • Detecting and alerting are helpful but the messengers failed to achieve impact. How does DarkCyber know? Well, there is the lockdown.
  • Publicizing what companies like Geospark and others do to generate income can have interesting consequences.

Net net: Some types of specialized services are difficult to explain in a way that reduces blowback. Some of the blowback have significant impact on social media analytics companies. The Geofeedia case is a reminder. I know. I know. “What’s a Geofeedia some may ask?”

Good question and DarkCyber thinks few know the answer. Plucking insights from information many people believe to be privileged can be fraught with business shock waves.

Stephen E Arnold, April 6, 2020

Forget Weak Priors, Certain Predictive Methods Just Fail

April 2, 2020

Nope. No equations. No stats speak. Tested predictive models were incorrect.

Navigate to “Researchers Find AI Is Bad at Predicting GPA, Grit, Eviction, Job Training, Layoffs, and Material Hardship.” Here’s the finding, which is a delightfully clear:

A paper coauthored by over 112 researchers across 160 data and social science teams found that AI and statistical models, when used to predict six life outcomes for children, parents, and households, weren’t very accurate even when trained on 13,000 data points from over 4,000 families.

So what? The write up states in the form of a quote from the author of the paywalled paper:

“Here’s a setting where we have hundreds of participants and a rich data set, and even the best AI results are still not accurate,” said study co-lead author Matt Salganik, a professor of sociology at Princeton and interim director of the Center for Information Technology Policy at the Woodrow Wilson School of Public and International Affairs. “These results show us that machine learning isn’t magic; there are clearly other factors at play when it comes to predicting the life course.”

We noted this comment from a researcher at Princeton University:

In the end, even the best of the over 3,000 models submitted — which often used complex AI methods and had access to thousands of predictor variables — weren’t spot on. In fact, they were only marginally better than linear regression and logistic regression, which don’t rely on any form of machine learning.

Several observations:

  1. Nice work AAAS. Keep advancing science with a paywall germane to criminal justice and policeware.
  2. Over inflation of the “value” of outputs from models is common in marketing. DarkCyber thinks that the weaknesses of these methods needs more than a few interviews with people like the Cathy O’Neil, author of Weapons of Math Destruction.
  3. Are those afflicted with innumeracy willing to delegate certain important actions to procedures which are worse than relying on luck, flipping a coin, or Monte Carlo methods?

Net net: No one made accurate predictions. Yep, no one. Thought stimulating research with implication for predictive analytics adherents. This open source paper provides some of the information referenced in the AAAS paper: Measuring the Predictability of Life Outcomes with a scientific mass collaboration

Stephen E Arnold, April 2, 2020

Wolfram Mathematica

March 19, 2020

DarkCyber noted “In Less Than a Year, So Much New: Launching Version 12.1 of Wolfram Language & Mathematica” contains highly suggestive information. Yes, this is a mathy program. The innovations are significant for analysts and some government professionals. To cite one example:

I’ve been recording hundreds of hours of video in connection with a new project I’m working on. So I decided to try our new capabilities on it. It’s spectacular! I could take a 4-hour video, and immediately extract a bunch of sample frames from it, and then—yes, in a few hours of CPU time—“summarize the whole video”, using SpeechRecognize to do speech-to-text on everything that was said and then generating a word cloud…

DarkCyber reacts positively to other additions and enhancements to the Mathematica “system.” Version 12.1 will make it easier to develop specific functions for policeware and intelware use cases.

Remarkable because the “system” can geo-everything. That’s important in many situations.

Stephen E Arnold, March 19, 2020

Israel and Mobile Phone Data: Some Hypotheticals

March 19, 2020

DarkCyber spotted a story in the New York Times: “Israel Looks to Repurpose a Trove of Cell Phone Data.” The story appeared in the dead tree edition on March 17, 2020, and you can access the online version of the write up at this link.

The write up reports:

Prime Minister Benjamin Netanyahu of Israel authorized the country’s internal security agency to tap into a vast , previously undisclosed trove of cell phone data to retract the movements of people who have contracted the corona virus and identify others who should be quarantined because their paths crossed.

Okay, cell phone data. Track people. Paths crossed. So what?

Apparently not much.

The Gray Lady does the handwaving about privacy and the fragility of democracy in Israel. There’s a quote about the need for oversight when certain specialized data are retained and then made available for analysis. Standard journalism stuff.

DarkCyber’s team talked about the write up and what the real journalists left out of the story. Remember. DarkCyber operates from a hollow in rural Kentucky and knows zero about Israel’s data collection realities. Nevertheless, my team was able to identify some interesting use cases.

Let’s look at a couple and conclude with a handful of observations.

First, the idea of retaining cell phone data is not exactly a new one. What if these data can be extracted using an identifier for a person of interest? What if a time-series query could extract the geolocation data for each movement of the person of interest captured by a cell tower? What if this path could be displayed on a map? Here’s a dummy example of what the plot for a single person of interest might look like. Please, note these graphics are examples selected from open sources. Examples are not related to a single investigation or vendor. These are for illustrative purposes only.

image

Source: Standard mobile phone tracking within a geofence. Map with blue lines showing a person’s path. SPIE at https://bit.ly/2TXPBby

Useful indeed.

Second, what if the intersection of two or more individuals can be plotted. Here’s a simulation of such a path intersection:

image

Source: Map showing the location of a person’s mobile phone over a period of time. Tyler Bell at https://bit.ly/2IVqf7y

Would these data provide a way to identify an individual with a mobile phone who was in “contact” with a person of interest? Would the authorities be able to perform additional analyses to determine who is in either party’s social network?

Third, could these relationship data be minded so that connections can be further explored?

Image result for analyst notebook mapping route

Source:  Diagram of people who have crossed paths visualized via Analyst Notebook functions. Globalconservation.org

Can these data be arrayed on a timeline? Can the routes be converted into an animation that shows a particular person of interest’s movements at a specific window of time?

image

Source: Vertical dots diagram from Recorded Future showing events on a timeline. https://bit.ly/39Xhbex

These hypothetical displays of data derived from cross correlations, geotagging, and timeline generation based on date stamps seem feasible. If earnest individuals in rural Kentucky can see the value of these “secret” data disclosed in the New York Times’ article, why didn’t the journalist and the others who presumably read the story?

What’s interesting is that systems, methods, and tools clearly disclosed in open source information is overlooked, ignored, or just not understood.

Now the big question: Do other countries have these “secret” troves of data?

DarkCyber does not know; however, it seems possible. Log files are a useful function of data processes. Data exhaust may have value.

Stephen E Arnold, March 19, 2020

First Counting Bees, Now Predicting Parrots

March 5, 2020

DarkCyber found amusing the write up “Parrots Can Make Predictions Based on Probabilities” interesting. With the corona virus data widely available, will these poly-nomial avians lend their expertise to global health administrators?

The write up asserts:

They [scientists] discovered the kea, a species of large parrot found in New Zealand, can make inferences and predict events based previous knowledge or experience. They [yep, this is a reference to the parrots] even performed better than chimps in some experiments.

The write up states:

The team said it is the first time this complex cognitive ability has been demonstrated in an animal outside of the great apes, which could help shed light on the “evolutionary history of statistical inference”.

Now is the time to apply parrot intelligence to tough computing problems like the Corona virus research. Polly, do you want a protein predictive output?

Stephen E Arnold, March 5, 2020

Amazon: Buying More Innovation

February 26, 2020

DarkCyber noted the article “Amazon Acquires Turkish Startup Datarow.” The word “startup” is rather loosely applied. Datarow was founded in 2016. Not a spring chicken in DarkCyber’s view is a four year old outfit.

What’s interesting about this acquisition is that it provides the sometimes unartful Amazon with an outfit that specializes in making easier-to-use data tools. The firm appears to have been built around AWS Redshift.

image

The company’s quite wonky Web site says:

We’re proud to have created an innovative tool that facilitates data exploration and visualization for data analysts in Amazon Redshift, providing users with an easy to use interface to create tables, load data, author queries, perform visual analysis, and collaborate with others to share SQL code, analysis, and results. Together with AWS, we look forward to taking our tool to the next level for customers.

The company provides what it calls “data governance,” a term which DarkCyber means “get your act together” with regard to information. This is easier said than done, but it is a hot button among companies struggling to reduce costs, comply with assorted rules and regulations, and figure out what’s actually happening in their lines of business. Profit and loss statements are not up to the job of dealing with diverse content, audio, video, real time data, and tweets. Well, neither is Amazon, but that’s not germane.

Will Amazon AWS Redshift (love the naming, don’t you?) become easier to use? Perhaps Datarow will become responsible for the AWS Web site?

Stephen E Arnold, February 26, 2020

Facial Recognition: Those Error Rates? An Issue, Of Course

February 21, 2020

DarkCyber read “Machines Are Struggling to Recognize People in China.” The write up asserts:

The country’s ubiquitous facial recognition technology has been stymied by face masks.

One of the unexpected consequences of the Covid 19 virus is that citizens with face masks cannot be recognized.

“Unexpected” when adversarial fashion has been getting some traction among those who wish to move anonymously.

The write up adds:

Recently, Chinese authorities in some provinces have made medical face masks mandatory in public and the use and popularity of these is going up across the country. However, interestingly, as millions of masks are now worn by Chinese people, there has been an unintended consequence. Not only have the country’s near ubiquitous facial-recognition surveillance cameras been stymied, life is reported to have become difficult for ordinary citizens who use their faces for everyday things such as accessing their homes and bank accounts.

Now an “admission” by a US company:

Companies such as Apple have confirmed that the facial recognition software on their phones need a view of the person’s full face, including the nose, lips and jaw line, for them to work accurately. That said, a race for the next generation of facial-recognition technology is on, with algorithms that can go beyond masks. Time will tell whether they work. I bet they will.

To sum up: Masks defeat facial recognition. The future is a method of identification that can work with what is not covered plus any other data available to the system; for example, pattern of walking and geo-location.

For now, though, the remedy for the use of masks is lousy facial recognition and more effort to find innovations.

The author of the write up is a — wait for it — venture capital professional. And what country leads the world in facial recognition? China, according to the VC professional.

The future is better person recognition of which the face is one factor.

Stephen E Arnold, February 21, 2020

Map Economics: Useful Content and One Major Omission

February 13, 2020

DarkCyber spotted a paper called “The Economics of Maps.” The authors have presented some extremely useful and interesting information about depicting the real world.

One of the most useful aspects of the article is the list of companies providing different types of mapping services and data. The list of firms in this business includes such providers, vendors, and technology companies as:

Airbus

Farmers Edge

Mapbox

Pitney Bowes

There are some significant omissions; for example, the category for geo-analytics for law enforcement and intelligence applications; for example, the low profile Geogence and investigative tools like those available from Verint.

Worth reading and tucking into one’s intelligence folder in our opinion.

Stephen E Arnold, February 13, 2020

Easy Facial Recognition

February 11, 2020

DarkCyber spotted a Twitter thread. You can view it here (verified on February 8, 2020). The main point is that using open source software, an individual was able to obtain (scrape; that is copying) images from publicly accessible services. Then the images were “processed.” The idea was identify a person from an image. Net net: People can object to facial recognition, but once a technology migrates from “little known” to public-available, there may be difficulty putting the tech cat bag in the black bag.

Stephen E Arnold, February 11, 2020

Math Resources

January 27, 2020

One of the DarkCyber team spotted a list of math resources available. Some cost money; others are free. Math Vault lists courses, platforms, tools, and question – answering sites. Some are relatively mainstream like Wolfram Alpha; others, less well publicized like ProofWiki. You can find the listing at this link.

Kenny Toth, January 26, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta