After 20 Plus Years, Whoa! Surveillance by Big Tech
August 10, 2020
DarkCyber has noted a flurry of write ups expressing surprise, rage, indignation, and blusterification at the idea of a commercial company collecting data. Hello, services are free for a basic reason: Making money. Part of making money is to have something that other companies and organizations will purchase. A good example is personal information about users of free services. The way big companies work is that there is a constant pressure to find new ways to generate money. Thus, there are data sucking apps; there are advertisements and more advertisements; there are subscriptions which lock in revenue while providing an Amazon-style we know a lot about those who shop on Amazon; and there are many ornaments on these methods.
I got a kick out of “Silicon Valley’s Vast Data Collection Should Worry You More Than TikTok.” We know the story well. Commercial firms in the US gather data and license it, often to marketing firms and to other organizations. After two decades of blissful ignorance a devoted band of “real” journalists are now probing the core business model of many technology centric companies.
Give me a break. We are talking decades of business processes designed to generate useful reports from flows of actions by individuals. In some countries, the government performs this task. In others, commercial enterprises do the work and license the normalized data to governments.
This passage from the write up tickled my funny bone:
And none of this is unreasonable. We should be worried about private companies and governments potentially collecting data on millions of unsuspecting people and censoring content they don’t like. But those based in China represent just a sliver of that threat.
Yep, the old “woulda, coulda, shoulda” ploy. May I remind you, gentle reader, that we are decades into the automation of data about the actions of individuals. These are the happy and often ignorant humanoids who download apps, run queries, click on videos, and send personal message while leaving a data trail a foot deep and a mile wide.
And now the need for something?
And data collection is not a technical and economic issue. Nope. Data collection is politics; for example:
TikTok’s critics might point to the increasingly scary behavior of China’s government as to why Chinese control of information is particularly alarming. They’re right about the behavior, but they curiously ignore the fact that the United States itself is currently governed by a far-right demagogue with his own concentration camps and authoritarian repression, and that the party behind him, which aligns entirely with his politics, reliably cycles into power at least once every eight years.
What’s the fix? Well, “oppose it all.”
Where were the regulators, the users, and the competitors 20 years ago? Probably in grade school, blissfully unaware that those handheld gadgets would become more important than other activities. Okay, adult thumbtypers, your outrage is interesting. Step back, and perhaps you can see why the howls of outrage, the references to evil forms of government, and the horrors of toting around a device that usually provides real time documentation of one’s actions as a bad thing.
But after 20 years, is it surprising that personal data actions are captured, analyzed, and used to provide more data “stuff” to consume? As I said, its been 20 years with no lessening of the processes. Complain to your parents. Maybe they dropped the ball? Commercial enterprises and governments are like beavers. And beavers do what beavers do.
Stephen E Arnold, August 10, 2020
Stratifyd: Marketing Push
August 6, 2020
Stratifyd or Taste Analytics competes in the analytics sector. The company has raised about $55 million since it opened for business in 2015. I read “Stratifyd Launches Next Generation Data Analytics Platform.” The write up confused me. The company’s Web site clear: “Blazing fast data insights that reveal your hidden story.”
The article about Stratifyd says:
Stratifyd, a technology company that democratizes data science and artificial intelligence (AI) through self-service data analytics, today announced a revolution in data analytics with the launch of its next generation platform. This powerful analytics engine was re-designed from the ground up to be intuitive and easy-to-use, enabling business users – regardless of education, skill, or job function – to harness the power of proprietary and third-party data to easily reveal and understand hidden stories represented within the data, thus delivering the benefits of a data science team to every organization.
The write up reports this:
The Stratifyd platform now provides the functionality to meet the demanding data science needs of an organization, but is specifically designed to be easy to use for those with limited data analytics experience. It empowers users of all skill levels to connect data sources to the platform, perform in depth analysis and data modeling, and discover insightful stories faster and more easily than previously possible. Through a graphical user interface, pre-built and customizable data analytics models, and simplified dashboards, the platform enables business users to extract insights (i.e., stories) that are hidden in the data and essential in helping companies improve customer service, better understand customer requirements, deliver product enhancements that address gaps in the market, solve problems experienced by customers, rollout new product and service offerings that deliver a competitive advantage, and more.
DarkCyber’s view is that one click access to data can lead to interesting decisions even for a company with “data science in its DNA.” We also noted that, like Amazon, Stratifyd has a “flywheel.” Instead of a business model which generates new businesses by selling online products, Stratifyd’s approach is providing a “data storytelling flywheel.”
Yep, stories and lots of buzzwords.
Stephen E Arnold, August 6, 2020
Quantexa: A Better Way to Nail a Money Launderer?
July 29, 2020
We noted the Techcrunch article “Quantexa Raises $64.7M to Bring Big Data Intelligence to Risk Analysis and Investigations.” There were a number of interesting statements or factoids in the write up; for example:
Altogether, Quantexa has “thousands of users” across 70+ countries, it said, with additional large enterprises, including Standard Chartered, OFX and Dunn & Bradstreet.
We also circled in true blue marker this passage:
As an example, typically, an investigation needs to do significantly more than just track the activity of one individual or one shell company, and you need to seek out the most unlikely connections between a number of actions in order to build up an accurate picture. When you think about it, trying to identify, track, shut down and catch a large money launderer (a typical use case for Quantexa’s software) is a classic big data problem.
And lastly:
Marria [the founder] says that it has a few key differentiators from these. First is how its software works at scale: “It comes back to entity resolution that [calculations] can be done in real time and at batch,” he said. “And this is a platform, software that is easily deployed and configured at a much lower total cost of ownership. It is tech and that’s quite important in the current climate.”
Some “real time” systems require time consuming and often elaborate configuration to produce useful outputs. The buzzwords take precedence over the nuts and bolts of installing, herding data, and tuning the outputs of this type of system.
Worth monitoring how the company’s approach moves forward.
Stephen E Arnold, July 29, 2020
EU Wants Google to Promise It Will Not Use Fitbit Data to Enhance Search
July 27, 2020
We noted “Europe Wants Google to Pledge That Fitbit Data Won’t Further Enhance Search.” Let’s see what “pledge” means:
Your Dictionary says: “The definition of a pledge is something held as security on a contract, a promise, or a person who is in a trial period before joining an organization. An example of a pledge is a cash down payment on a car. An example of a pledge is a promise that you’ll buy a person’s car.”
Dictionary.com says: “A solemn promise or agreement to do or refrain from doing something:a pledge of aid; a pledge not to wage war. Something delivered as security for the payment of a debt or fulfillment of a promise, and subject to forfeiture on failure to pay or fulfill the promise.”
Wordsense.eu says: “From Middle English plege?, from Anglo-Norman plege?, from Old French plege? (Modern French pleige?) from Medieval Latin plevium?, plebium?, from Medieval Latin plebi?? (“I pledge”), from Frankish *plegan? (“to pledge; to support; to guarantee”), from Proto-Germanic *plehan?? (“to care about, be concerned with”). Akin to Old High German pflegan? (“to take care of, be accustomed to”), Old Saxon plegan? (“to vouch for”), Old English pl?on? (“to risk, endanger”).”
The write up says:
EU regulators are asking Google to pledge that Fitbit information will not be used to “further enhance its search advantage.” Another demand involves letting third-parties have “equal” access to that data.
DarkCyber’s comment: Ho, ho, ho. Guarantee? Data are ingested and processed. Ho, ho, ho. No humans involved. Ho, ho, ho. It’s an artificial intelligence system. Ho, ho, ho. Let the lawyers figure it out. Ho, ho, ho. Fitbit users buy products, and Google wants to sell like Amazon. Ho, ho, ho.
Stephen E Arnold, July 27, 2020
Physics Embraces AI: A Development for One Percent of the One Percenters
July 22, 2020
Say what you like about Newton. Teachers have made gravity “real” to indifferent students with the apple on the noggin metaphor.
Physics teachers today face a different challenge. The “old school” ideas are not going to win promotions, grants, or — even better — a prize. Cash! Fame! The cash thing may work among those who are work from home dads and colleagues without a tenure track ticket.
What can physicists who are the one percent of the one percenters? The answer is to combine esoteric mathematical concepts with the future forward concept “artificial intelligence.”
AI can do physics and “AI in Physics: Are We Facing a Scientific Revolution?” explains this shift. Now between you and me, there are a number of revolutions underway, but the “real life” stuff is of scant interest to physicists in my experience. Einstein anecdotes notwithstanding, physicists are an interesting chunk of the one percent’s one percenters.
Your homework? Verify the over density equation and show each step. No shortcuts! This is forward leaning physics with “real” representations, simulations, and predicted properties. No apple either.
The write up states with significant seriousness that symbolic regression:
can be used to derive mathematical formulas from the internally represented relationships in the network. Symbolic regression is carried out as a genetic algorithm. Equipped with variables and mathematical operators, the algorithm searches for the simplest mathematical formula with which known data can be reproduced.
Like many helpful mathy statements, this statement illuminates the process:
Their result clearly shows that the mixture of data, neural graph networks and symbolic regression is actually suitable for extracting mathematical formulas – in this case an already known natural law – from data with AI.
I enjoy the “clearly.”
But the future is not the stuff one can see, touch, feel, sniff, or think about in substantive ways. The future is tackling Dark Matter with AI.
I learned:
The researchers used the neural grapheme network again. Each node contains information about a dark matter halo such as position, speed and mass and is connected to other halos at a distance of 50 Mpc / h. The network was trained with data from the Quijote Dark Matter Simulation , a collection of generated dark matter structures.
And there is a payoff. Ready?
After the training, the GNN was able to predict the desired property of the halos more accurately than previous models. Using symbolic regression, the researchers were then able to produce a previously unknown mathematical formula that has a lower error rate than the currently most commonly used human-made formula for the same task. The resulting formula was also better able to deal with previously unknown data. For Cranmer, this is a clear sign that the mathematical formula generalizes much better than the neural graph network from which it was derived. This coincides with our previous experience in physics, says Cranmer: “The language of simple symbolic models describes the universe correctly.”
Forget the apple falling on Newton’s slightly addled brain carrier. Think in terms of this metaphor:
If AI is like Columbus, computing power is Santa Maria
And Big Data? Of course, of course. One percent of one percenters know this.
Stephen E Arnold, July 22, 2020
Do Those Commercial Satellites Just Provide Internet? Maybe Not
July 12, 2020
Much has changed since the early days of the Civil Rights Movement, not the least of which is the state of observation technology. We learn from Bloomberg that “Satellites Are Capturing the Protests, and Just About Everything Else on Earth.” Satellite-captured images of protests pervade recent news coverage, particularly a photo of D.C.’s yellow “Black Lives Matter” street mural captured by Planet Labs, Inc. This company, founded in 2010, brings satellite imagery to the masses. Journalist Ashlee Vance reports:
“The company that took the photo, Planet Labs Inc., has hundreds of satellites floating around Earth, enough that it can snap at least one photo of every spot on the planet every day, according to the startup. Such imagery used to be rare, expensive and controlled by governments. Now, Planet has built what amounts to a real-time accounting system of the earth that just about anyone can access by paying a fee.
Over the next couple months, Planet is embarking on a project that will dramatically increase the number of photos it takes and improve the quality of the images by 25% in terms of resolution. To do that, the company is lowering the orbits of some of its larger, high-resolution satellites and launching a half-dozen more devices. As a result, Planet will go from photographing locations twice a day to as many as 12 times a day in some places. Customers will also be able to aim the satellites where they want using an automated system developed by Planet. ‘The schedule is shipped to the satellite, and it knows the plan it needs to follow,’ said Jim Thomason, the vice president of products at Planet.”
The implications are both amazing and alarming. The very concept of privacy may become hypothetical when anyone willing to pay can see just about anything and anyone, anywhere, at nearly any time. On the other hand, there are more benign possibilities, like the investors who examine parking lots to determine how lucrative certain retail businesses are. And, of course, there is the ability to chronicle a large scale social-justice movement. During the Covid-19 pandemic, analysts have also used satellite imagery to track activity slowdowns, military activity, and shipments of goods.
Planet Labs is not the only private company in the satellite imagery market. Rivals include Capella Space and Iceye. As the competition heats up, how many more objects will be placed into orbit around our planet? As I recall, we already have too much stuff flying around out there. I suppose, though, that concern is beyond the purview of companies looking to cash in on the technology.
Cynthia Murrell, July 12, 2020
The Myth of Data Federation: Not a New Problem, Not One Easily Solved
July 8, 2020
I read “A Plan to Make Police Data Open Source Started on Reddit.” The main point of this particular article is:
The Police Data Accessibility Project aims to request, download, clean, and standardize public records that right now are overly difficult to find.
Interesting, but I interpreted the Silicon Valley centric write up differently. If you are a marketer of systems which purport to normalize disparate types of data, aggregate them, federate indexes, and make the data accessible, analyzable, retrievable, and bang on dead simple — stop reading now. I don’t want to deal with squeals from vendors about their superior systems.
For the individual reading this sentence, a word of advice. Fasten your seat belt.
Some points to consider when reading the article cited above, listening to a Vimeo “insider” sales pitch, or just doing techno babble with your Spin class pals:
- Dealing with disparate data requires time and money as well as NOT ONE but multiple software tools.
- Even with a well resourced and technologically adept staff, exceptions require attention. A failure to deal with the stuff in the Exceptions folder can skew the outputs of some Fancy Dan analytic systems. Example: How about that Detroit facial recognition system? Nifty, eh?
- The flows of real time data are a big problem — are you ready for this — a challenge to the Facebooks, Googles, and Microsofts of the world. The reason is that the volume of data and CHANGES TO THOSE ALREADY PROCESSED ITEMS OF INFORMATION is a very, very tough problem. No, faster processors, bigger pipes, and zippy SSDs won’t do the job. The trouble lies within, the intradevice and intra software module flow. The fix is to sample, and sampling increases the risk of inaccuracies. Example: Remember Detroit’s facial recognition accuracy. The arrested individual may share some impressions with you.
- The baloney about “all” data or “any” type is crazy talk. When one deals with more than 18,000 police forces in the US, outputs from surveillance devices from different vendors, and the geodumps of individuals and their ad tracking beacons — this is going to be mashed up and made usable. Noble idea. There are many noble ideas.
Why am I taking the time to repeat what anyone with experience in large scale data normalization and analysis knows?
Baloney can be thinly sliced, smeared with gochujang, and served on Delft plates. Know what? Still baloney.
Gobble this:
Still, data is an important piece of understanding what law enforcement looks like in the US now, and what it could look like in the future. And making that information more accessible, and the stories people tell about policing more transparent, is a first step.
But the killer assumption is that the humans involved don’t make errors, systems remain online, and file formats are forever.
That baloney. It really is incredible. Just not what you think.
Stephen E Arnold, July 8, 2020
Secrets of Popular YouTube Videos Revealed. Are You Excited!
July 8, 2020
We found “Analysis of YouTube Trending Videos of 2019 (US)” amusing. Here are several of the chucklers we spotted:
First, hot YouTube videos use CAPITAL letter in TITLES.
Second, here are the words you need to use in your YouTube titles and descriptions:
Third, use emojis. The fire emoji is a “hot” addition.
Fourth, rely on “official” as in “official video.” What if the video is not official? Hey, what is this a courtroom. You just need to pass Judge Google, and you are good to go with rehab ads, wonky food info, and nifty fashion ideas.
Fifth, your video title must be 36 to 64 characters. Something like “Macbeth” would suck as a YouTube click magnet.
Sixth, when do you publish your video? Saturday is for losers, gentle reader.
There’s more astounding insights. You are officially ON YOUR OWN.
Stephen E Arnold, July 8, 2020
A Peek into Google and Palantir Contracts: The UK National Health Service Versions
June 8, 2020
Curious about the legalese, terms, and conditions of US companies licensing and servicing government entities in the United Kingdom require? Good news. You can (at least as of June 6, 2020 at 0600 US Eastern time) can read allegedly complete contracts for software and services.
- The Google NHS agreement is at https://tinyurl.com/y88tzdqq
- The Microsoft NHS agreement is at https://tinyurl.com/y8vzj5ye
- The Palantir NHS agreement is at https://tinyurl.com/ybgdl82p
A contract from Faculty.ai is also available. Founded in 2014, Faculty.ai does not have the cachet of a Google. If you want to look at that contract, it is for now at https://tinyurl.com/ya3kzolw.
The deals are between these firms and an entity doing government business under the name of NHSX which seems to mean “a joint unit bringing together teams from the Department of Health and Social Care and NHS England and NHS Improvement to drive the digital transformation of care. COVID-19 Response.”
Are there some interesting details in these documents? Yep. Will these be shared in this blog post? Nope. You will learn some of the DarkCyber’s team insight if you attend our National Crime Conference presentation about investigative tools and systems.
Not invited? For fee briefings are still offered. Contact benkent2020 at yahoo dot com.
Stephen E Arnold, June 8, 2020
GoAccess: A Log Analyzer
June 4, 2020
We are updating our tools section of an upcoming National Crime Conference lecture. If you have access to a Web server and a log, you may want to take a look at GoAccess. The software
was designed to be a fast, terminal-based log analyzer. Its core idea is to quickly analyze and view web server statistics in real time without needing to use your browser.
“Analytics without Google” provides additional information about the software and includes helpful pointer. The article states:
What I further liked about GoAccess is I could run it on a separate machine, transferring logs from multiple servers into one place, then creating my necessary dashboards; this isn’t a specific feature of GoAccess, but a feature of the Unix philosophy. This flexibility works well with my seemingly ephemeral Digital Ocean Droplets, which don’t go kaboom on their own, but rather suffer from my own tendencies to erase and start from scratch. GoAccess reminded me how beautiful composable tools are. Its feature set is minimal and it plays nicely with the tools already available to us on a *nix platform. Do one thing and do it well — words of wisdom.
Worth a look.
Stephen E Arnold, June 4, 2020