Does Google Manifest Addiction to Personal Data?
March 31, 2021
I read an amusing “we don’t do that!” write up in “Google Collects 20 Times More Telemetry from Android Devices Than Apple from iOS.” The cyber security firm Recorded Future points to academic research asserting:
The study unearthed some uncomfortable results. For starters, Prof. Leith said that “both iOS and Google Android transmit telemetry, despite the user explicitly opting out of this [option].” Furthermore, “this data is sent even when a user is not logged in (indeed even if they have never logged in),” the researcher said. [Weird bold face in original text removed.]
Okay, this is the stuff of tenure. The horrors of monopolies and clueless users who happily gobble up free services.
What’s amazing is that the write up does not point out the value of these data for predictive analytics. That’s the business of Recorded Future, right? Quite an oversight. That’s what happens when “news” stumbles over the business model paying for marketing via content. Clever? Of course.
The reliability of the probabilities generated by the Recorded Future methods pivot on having historical and real time data. No wonder Google and Apple suggest that “we don’t do that.”
Recorded Future’s marketing is one thing, but Google’s addiction to data is presenting itself in quite fascinating ways. Navigate to “Google’s New App Automagically Organizes Your Scanned Documents.” The write up states:
The app lets you scan documents and then it uses AI to automatically name and sort them into different categories such as bills, IDs, and vehicles.
And what happens?
To make it easy to find documents, you can also search through the full text of the document.
What types of documents does a happy user scan? Maybe the Covid vaccination card? Maybe legal documents like mortgages, past due notices from a lawyer, divorce papers, and similar tough-to-obtain information of a quite private and personal nature?
My point is that mobile devices are data collection devices. The data are used to enable the Apple and Google business models. Ads, information about preferences, clues to future actions, and similar insights are now routinely available to those with access to the data and analytic systems.
The professor on the tenure track or gunning for an endowed chair can be surprised by practices which have been refined over many years. Not exactly ground breaking research.
Google obtaining access to scanned personal documents? No big deal. Think how easy and how convenient the free app makes taming old fashioned paper. I wonder if Google has an addiction to data and can no longer help itself?
Without meaningful regulation, stunned professors and mobile device users in love with convenience are cementing monopoly control over information flows.
Oh, Recorded Future was once a start up funded by Google and In-Q-Tel. Is that a useful fact?
Stephen E Arnold, March 31, 2021
Let Us Now Consider Wonky Data and Tagging
March 31, 2021
As you may know, I find MIT endlessly amusing. From the Jeffrey Epstein matter to smart people who moonlight for other interesting entities, the esteemed university does not disappoint. I noted an article about and MIT finding which is interesting. “MIT”s AI Dataset Study and Startling Findings” reports:
MIT Researchers analyzed 10 test sets from datasets, including ImageNet, and found over 2,900 errors in the ImageNet validation set alone. When used as a benchmark data set, the errors in the dataset were proved to have an incorrect position in correlation to direct observation or ground truth.
So what?
Garbage in, garbage out.
This is not a surprise and it certainly seems obvious. If anything, the researchers’ error rate seems low. There is no information about data pushed into the “exception” folder for indexing systems.
Stephen E Arnold, March 31, 2021
Solarwinds: Making Security a Priority. After the Barn Burned and Running in the Crime Derby
March 31, 2021
I read a remarkable write up called “SolarWinds CEO Gives Chief Security Officer Authority and Air Cover to Make Software Security a Priority.” The article is notable for the information omitted. Here’s a passage I noted:
He created a cybersecurity committee for the board that includes him and two sitting board members. He also said that he has given the company’s chief security officer the power to stop any software release if necessary to address security concerns.
A security committee. Will the group produce a security solution which is elegant, effective, and able to restore trust?
The write up identifies the causes of security breaches. These are managerial missteps. Obviously SolarWinds believes a committee is the optimal way to deal with wonky management by those with an eye of the bottom line, bonuses, and a responsibility-free tenure as top dog.
The technical causes are not really causes. Sorry, but phishing is not a cause. Phishing is a method implemented because employees have inadequate training and the organizations employing these people drop the ball in setting up a defensible perimeter.
Why is this remarkable? Misdirection, blame shifting, and a belief a committee can overcome MBA thinking, compensation incentives, and what I call a high school science club sense of exceptionalism.
Stephen E Arnold, March 31, 2021
So You Wanna Be a Google?
March 31, 2021
Just a short item which may be of interest to Web indexing wannabes: Datashake has rolled out its Web Scraper API. You can read about how to:
Scrape the web with proxies, CAPTCHA solving, headless browsers and more to avoid being blocked.
You will have to sign up to get “early access” to the service. The service is not free … because scraping Web sites is neither easy nor inexpensive.
There’s not much info about this API as of March 23, 2021, but this type of service beats the pants off trying to cook up our content acquisition scripts in 1993 for the The Point (Top 5% of the Internet). You remember that, don’t you?
Of course, thumbtypers will say, “Okay, boomer, what’s up with that ancient history?”
Sigh.
Stephen E Arnold, March 31, 2021
Intellectual Cohesiveness: A Reading List
March 30, 2021
Why do liberal arts graduates struggle to understand the logic of a Facebook-type engineer or a Google-like wizard or the demeanor of a Twitter-like senior manager? Easy. The reading list for engineers includes books about math, physics, and programming. The well-rounded humanoid educated in the currents of Western culture read other books. Which other books? I am delighted you asked. You can find a list of the 1,138,841 most frequently assigned texts. Just click this link and view the Open Syllabus Galaxy. Yes, the diagram is not a list. Listicles are not popular with some of the thumbtypers, so behold a visualization.
Let’s return to the notion of intellectual cohesiveness, shall we? In order to build a shared knowledge base, educated individuals should have some familiarity with the most assigned college texts. That way, when someone references Napoleon and a winter walk, the others engaged in the conversation will know that the little emperor did skipped a lesson about winter in Eastern Europe.
Without a shared knowledge base, it is difficult to know what the other person is talking about. For a recent example, consider the questioning of big tech’s luminaries by the oh, so wise elected officials.
One observation. A person assigned a book to read does not guarantee that the book was read.
Cohesiveness must be obtained in some other way in our zip zip world I think.
Stephen E Arnold, March 30, 2021
MSFT Exchange Excitement: Another Jolt of Info
March 30, 2021
I read “Exchange Server Attacks: Microsoft Shares Intelligence on Post-Compromise Activities.” Interesting, weeks, maybe longer since what one of my analysts described as another digital Chernobyl, have passed without much substantive information.
This “real” news story reports:
Microsoft is raising an alarm over potential follow-on attacks targeting already compromised Exchange servers, especially if the attackers used web shell scripts to gain persistence on the server, or where the attacker stole credentials during earlier attacks.
Interesting. A massive attack which may have distributed malware, possibly as yet undetected, poses a risk. That’s good to know.
This statement attributed to Microsoft is intriguing as well:
In a new blog post, Microsoft reiterated its warning that “patching a system does not necessarily remove the access of the attacker”.
Does this mean that Microsoft’s remediation is not fixing the “problem”? What sorts of malware could be lurking? Microsoft provides some measured answers to this particular question in “Analyzing Attacks Taking Advantage of the Exchange Server Vulnerabilities”?
But the problem is that Microsoft’s foundational software build and deploy business process seems to be insecure.
Dribs and dabs of the consequences of a major security breach is PR and hand waving, not actions which I craved.
Stephen E Arnold, March 30, 2021
Section 230: Just Flip the Regulation of Big Tech Around
March 30, 2021
I read “No One Agrees on How to Fix Big Tech.” The main point seems to be embodied in this quote from the article attributed to an elected US official:
The time for self-regulation is over. It’s time we legislate to hold you accountable.
Let’s look at the need for regulation in a different way.
Big tech is more democratic than some other systems. Big tech’s users are voting on its value, viability, and virtue with each click. Elected officials and the historical laws are essentially out of step with what people want.
The write up asserts:
You could suggest that each company’s statement on s230 is a reflection of their general values and attitude. Facebook wants to tweak the law to potentially weaken competitors, Google is hoping not to make waves, but won’t shout for the status quo too loudly, while Twitter is already mentally elsewhere. Unfortunately for Zuckerberg, Pichai and Dorsey, none of those positions are likely to sate politicians who understand that something needs to change, but aren’t sure what.
Another view is that big tech is a manifestation of the “new” democracy. The organizations are nation states, have support, and operate above the no longer meaningful laws of historical artifices.
It is increasingly clear that it is a thumbtyping world. Self regulation is not needed when the constituents vote to keep big tech in office.
Stephen E Arnold, March 30, 2021
AI: Are Algorithms House Trained?
March 30, 2021
“Containment Algorithms Don’t Work for Our Machines” includes a thought-provoking passage; namely:
Director of the Center for Humans and Machines, Iyad Rahwan, described it this way: “If you break the problem down to basic rules from theoretical computer science, it turns out that an algorithm that would command an AI not to destroy the world could inadvertently halt its own operations. If this happened, you would not know whether the containment algorithm is still analyzing the threat, or whether it has stopped to contain the harmful AI. In effect, this makes the containment algorithm unusable.”
What’s the write up’s take on this “challenge”? Here’s the statement in the article:
The lesson of the study’s computability theory is that we do not know how or if we will be able to build a program that eliminates the risk associated with a sufficiently advanced artificial intelligence. As some AI theorists and scientists believe, no advanced AI systems can ever be guaranteed entirely safe. But their work continues; nothing in our lives has ever been guaranteed safe to begin with.
With the US doing yoga to maintain its perceived lead in smart software, the trajectory of smart software and its receptivity to house training may reside elsewhere.
Stephen E Arnold, March 30, 2021
Prodaft: Chasing the Bad Actors of SolarWinds
March 29, 2021
I read “Swiss Firm Says It Accessed SolarWinds Attackers’ Servers.” The idea is that the cyber security outfit explored the intermediary servers employed by the SolarWinds’ bad actors. The result was a successful penetration of some of these systems. The result? Prodaft, according to the report, has learned that “these attackers continue to target large corporations and public institutions worldwide.” The targets? The US and Europe.
Furthermore, the attackers have been given the handle “SilverFish Group.” One discovery is explained this way:
[The attackers have] designed an unprecedented malware detection sandbox formed by actual enterprise victims, which enables the adversaries to test their malicious payloads on actual live victim servers with different enterprise AV and EDR solutions, further expanding the high success rate of the SilverFish group attacks.
From my vantage point in rural Kentucky, this sounds similar to the methods revealed in the disclosure of the the Hacking Team’s Remote Control System. The approach makes it possible to “spin” malware in a controlled manner across compromised systems.
The main point is that despite the radio silence from certain organizations affected by the month’s long attacks is:
confirmation of the ongoing nature of the attack validates industry concerns. Once attackers establish persistence within an environment, it is difficult to remove them without considerable resources.
Interesting and not particularly reassuring.
Stephen E Arnold, March 29, 2021
The Google and Web Indexing: An Issue of Control or Good, Old Fear?
March 29, 2021
I read “Google’s Got A Secret.” No kidding, but Google has many, many secrets. Most of them are unknown to today’s Googlers. After 20 plus years, even Xooglers are blissfully unaware of the “big idea,” the logic of low profiling data slurping, how those with the ability to make “changes” to search from various offices around the world can have massive consequences for those who “trust” the company, and the increasing pressure to control Googzilla’s appetite for cash. But enough of these long-ignored issues.
The “secret” in the article is that Google actively pushes as many buttons and pulls as many levers as its minions can to make it tough for competitors to “index” the publicly accessible Web. The write up states:
Only a select few crawlers are allowed access to the entire web, and Google is given extra special privileges on top of that.
The write up adds:
Only a select few crawlers are allowed access to the entire web, and Google is given extra special privileges on top of that. This isn’t illegal and it isn’t Google’s fault, but this monopoly on web crawling that has naturally emerged prevents any other company from being able to effectively compete with Google in the search engine market.
I am not in total agreement with these assertions. For example, consider the world of public relations distribution agencies. Do a search for “news release distribution” and you get a list of outfits. Now write a news release which reveals previously unknown and impactful information about a public traded company. These firms like PR Underground-type operations will explain that a one-day or longer review process is needed. The “reason” is that these firms have “rules of the road.” Is it possible that these distribution outlets are conforming to some vague guidelines imposed by Google. Crossing a blurry line means that the releases won’t be indexed. No indexing in Google means the agency failed and, of course, the information is effectively censored.
What about a company which publishes information on a consumer-type topic like automobiles. What if that Web site operator uses advertising from sources not linked to the Google combine? That Web site is indexed on a less frequent basis by the friendly Google crawler. Then those citations are suppressed for some unknown reason by a Google algorithm. (These are written by humans, but the Google never talks much about the capabilities of a person in a Google office laboring on core search to address issues.)
The ideas of the Knuckleheads’ Club are interesting. Implementing them, however, is going to require some momentum to overcome the Google habit which has become part of the online user’s DNA over the last 20 years.
The real questions remain. Is Google in control of the public Web? Are people fearful of irritating Mother Google (It’s not nice to anger Mother Google)?
Stephen E Arnold, March 29, 2021