TikTok: Innocuous? Maybe Not Among Friends

January 5, 2022

Short videos. No big deal.

The data about one’s friends are a big deal. A really big deal. TikTok may be activating a network effect. “TikTok Tests Its Own Version of the Retweet with a New Repost Button” suggests that a Twitter function is chugging along. What if the “friend” is not a registered user of TikTok? Perhaps the Repost function is a way to expand a user’s social network. What can one do with such data? Building out a social graph and cross correlating those data with other information might be a high value exercise. What other uses can be made of these data a year or two down the road? That’s an interesting question to consider, particularly from the point of view of Chinese intelligence professionals.

China Harvests Masses of Data on Western Targets, Documents Show” explains that China acquires data for strategic and tactical reasons. The write up doses not identify specific specialized software products, services, and tools. Furthermore, the price tags for surveillance expenditures seem modest. Nevertheless, there is a suggestive passage in the write up:

Highly sensitive viral trends online are reported to a 24-hour hotline maintained by the Cybersecurity administration of China (CAC), the body that oversees the country’s censorship apparatus…

What’s interesting is that China uses both software and human-intermediated systems.

Net net: Pundits and users have zero clue about China’s data collection activities in general. When it comes to specific apps and their functions on devices, users have effectively zero knowledge of the outflow of personal data which can be used to generate a profile for possible coercion. Pooh pooh-ing TikTok? Not a great idea.

Stephen E Arnold, January 5, 2022

Quantitative vs Qualitative Data, Defined

January 4, 2022

Sounding almost philosophical, The Future of Things posts, “What is Data? Types of Data, Explained.” Distinguishing between types of data can mean many things. One distinction we are always curious about is what data does one emit via mobile devices and what types are feeding the surveillance machines? This write-up, though, is more of a primer on a basic data science concept: the difference between quantitative and qualitative data. Writer Kris defines quantitative data:

“As the name suggests, quantitative data can be quantified — that is, it can be measured and expressed in numerical values. Thus, it is easy to manipulate quantitative data and represent it through statistical graphs and charts. Quantitative data usually answers questions like ‘How much?’, ‘How many?’ and ‘How often?’ Some examples of quantitative data include a person’s height, the amount of time they spend on social media and their annual income. There are two key types of quantitative data: discrete and continuous.”

Here is the difference between those types of quantitative data: Discrete data cannot be divided into parts smaller than a whole number, like customers (yikes!) or orders. Continuous data is measured on a scale and can include fractions; the height or weight of a product, for example.

Kris goes on to define quantitative data, which is harder to process and analyze but can provide insights that quantitative data simply cannot:

“Qualitative data … exists as words, images, objects and symbols. Sometimes qualitative data is called categorical data because the information must be sorted into categories instead of represented by numbers or statistical charts. Qualitative data tends to answer questions with more nuance, like ‘Why did this happen?’ Some examples of qualitative data business leaders might encounter include customers’ names, their favorite colors and their ethnicity. The two most common types of qualitative data are nominal data and ordinal data.”

As the names suggest: Nominal data names a piece of data without assigning it any order in relation to other pieces of data. Ordinal data ranks bits of information in an order of some kind. The difference is important when performing any type of statistical analysis.

Cynthia Murrell, January 4, 2022

Data Science Information at Your Fingertips

December 27, 2021

Just a brief honk about a useful resource. Data scientist, engineer, and blogger Manpreet Singh draws our attention to an “Amazing List of Data Science Cheat Sheets.” Singh begins with a word to those wondering what, exactly, data science is—linking to UC Berkeley’s page on the subject. He then reveals the trove of quick-access info is located at GitHub, posted there by engineer Favio Vazquez. Singh includes a series of screenshots that give a taste of the collection. He writes:

“When you load up this repo you will see a few different folders, these folders house a ton of different cheat sheets for different disciplines: [screenshot 1] You can also scroll down a bit to see a breakdown of these sheets: [screenshot 2] These cheat sheets range in use, but they all offer a ton of value for your data science needs. All you have to do is click on the cheat sheets you want to see, you will then be redirected to some awesome looking cheat sheets: [screenshots 3 and 4] Without a doubt, if you’re planning on learning data science, I would highly recommend checking out these cheat sheets.”

With topics as wide-ranging as business science, calculus, SQL, and machine learning, this list is a one-stop source of reference material for the current or aspiring data scientist. Savvy readers may wish to bookmark the useful page.

Cynthia Murrell, December 27, 2021

Big Data Creates Its Own Closed Mind

December 23, 2021

New ideas that challenged established theories are always ridiculed. Depending on the circumstances, they are also deemed “heretical” against all accumulated knowledge. Mind Matters News discusses how it is harmful not to explore new ideas in the interview with author Erik J. Larson, “Why Big Data Can Be The Enemy Of New Ideas.” During the interview, Larson was asked how past ethics of innovations are useful today and he stated Copernicus’s heliocentric solar system model was an example.

Copernicus’s heliocentric model was not accepted by his colleagues, who believed in the Ptolemaic Earth-centered model. There was tons of data to support the Ptolemaic model, while Copernicus’s model was not as predictive for astronomy conundrums. It only solved a few questions. Copernicus innovated because he questioned scientific doctrine. Big data AI are incapable of thinking differently, because they are only as smart as they have been programmed. In other words, AI is incapable of thinking outside the box.

Computer technology cannot replicate the human brain. Millions of dollars were invested in an attempt to dubbed the Human Brain Project:

“Of course it was a total failure… The guy who started it actually ended up getting fired for a variety of reasons but tech didn’t solve that problem in science because focusing on technology rather than the actual natural world turns out to have not been a good idea. It’s almost like inserting an artificial layer. Trying to convert basic research in neuroscience into a software development project just means you’re going to end up with software ideas and ideas that are programmable on a computer. Your scientists are going to be working with existing theories because those are the ones you can actually write and code. And they’re not going to be looking for gaps in our existing theoretical knowledge in the brain.”

If people accept big data software as smarter than actual humans then that is a huge problem. It is comparable to how religious dogma (from all cultures) is used to exert control. Religion itself is not a problem, but blind obedience to its doctrine is dangerous. An example is religious fundamentalists of all kinds, including Abrahamic, Buddhist, and Hindu followers.

Big data does solve and prevent problems, but it cannot be a replacement for the human brain. Thinking creatively does not compute for AI.

Whitney Grace, December 23, 2021

Veraset: Another Data Event

November 22, 2021

Here is a good example of how personal data, in this case tracking data, can be used without one’s knowledge. In its article “Files: Phone Data Shared” the Arkansas Democrat Gazette reports that data broker Veraset provided phone location data to the US Department of Health last year as part of a free trial. The transaction was discovered by digital-rights group Electronic Frontier Foundation. The firm marketed the data as valuable for COVID research, but after the trial period was up the agency declined to move forward with a partnership. The data was purportedly stripped of names and other personal details and the researchers found no evidence it was misused. However, Washington Post reporter Drew Harwell writes:

“[Foundation technologist Bennett Cyphers] noted that Veraset’s location data includes sequences of code, known as ‘advertising identifiers,’ that can be used to pinpoint individual phones. Researchers have also shown that such data can be easily ‘de-anonymized’ and linked to a specific person. Apple and Google announced changes earlier this year that would allow people to block their ID numbers from being used for tracking. Veraset and other data brokers have worked to improve their public image and squash privacy concerns by sharing their records with public health agencies, researchers and news organizations.”

Amidst a pandemic, that tactic just might work. How do data brokers get this information in the first place? We learn:

“Data brokers pay software developers to include snippets of code in their apps that then sent a user’s location data back to the company. Some companies have folded their code into games and weather apps, but Veraset does not say which apps it works with. Critics have questioned whether users are aware that their data is being shared in such a way. The company is a spinoff of the location-data firm SafeGraph, which Google banned earlier this year as part of an effort to restrict covert location tracking.”

Wow, banned by Google—that is saying something. Harwell reports SafeGraph shared data with the CDC during the first few weeks of the pandemic. The agency used that data to track how many people were staying home for its COVID Data Tracker.

App users, often unwittingly, agree to data sharing in those opaque user agreements most of us do not read. The alternative, of course, is to deprive oneself of technology that is increasingly necessary to operate in today’s world. It is almost as if that were by design.

Cynthia Murrell November 22, 2021

China Realities: No Fortnite after more Than 20 Days

November 3, 2021

Google, as I recall, wanted China to change its approach to high technology. Facebook – oops, Meta – had a senior manager who learned Chinese and gave a talk in actual Chinese I believe. Yahoo bailed. And now Fortnite has decided to leave the Middle Kingdom to its own devices.

What’s interesting about the Fortnite decision to abandon the world’s largest market is that it was never serious about China. China was a “test.” The “test” began in 2018 and involved an interesting partner, Tencent, which owned a chunk of the popular online game.

China is rethinking its approach to online activities. The CNN report states that:

companies were “urged to break from the solitary focus of pursuing profit or attracting players and fans”…

Yep, just a test. But of what? Fortnite’s will? The company can take on Apple, but it swizzled its push into China as a test.

Who or what failed? My answer: The marketing/PR wizard thinking up this Fancy Dance move. It’s an F.

Stephen E Arnold, November 3, 2021

Data Slupring Gluttons: Guess Who, Please?

October 19, 2021

Apple’s iOS enjoys a reputation of being more respectful of users’ privacy than Google’s Android. However, announces Tom’s Guide, “New Study Reveals iPhones Aren’t as Private as You Think.” The recent paper was published by Trinity College’s School of Computer Science & Statistics. Unlike the many studies that have covered what kind of data apps collect, this research focusses on data reaped by core operating systems.

The researchers found Android does collect a higher volume of data, but iPhones collect more types of information. This includes data about other devices that could allow Apple to make a relationship graph of all devices in a local network, whether a home, office, or public space like a café. Creepy. Not only that, both operating systems collect telemetry and other data even when users explicitly opt out. Much of this collection happens when the phone is powered up. The rest occurs the whole time the device is on, even when sitting idle. Writer Paul Wegenseil specifies:

“Both the iPhone and Android phone called home to Apple and Google servers every 4 or 5 minutes while the phones were left idle and unused for several days. The phones were powered on and plugged in, but the users had not yet logged into Apple or Google accounts. Even when the iPhone user stayed logged out of their Apple account, the iPhone still sent identifying cookies to iCloud, Siri, the iTunes Store and Apple’s analytics servers while the iPhone was idle. It also sent information about nearby devices sharing the same Wi-Fi network. When location services were enabled on the iPhone, its latitude and longitude were transmitted to Apple servers. On Android, data is sent to Google Play servers every 10 to 20 minutes even when the user is not logged in. Certain Google apps also send data, including Chrome, Docs, Messaging, Search and YouTube, although only YouTube sends unique device identifiers. Even when the iPhone user stayed logged out of their Apple account, the iPhone still sent identifying cookies to iCloud, Siri, the iTunes Store and Apple’s analytics servers while the iPhone was idle. It also sent information about nearby devices sharing the same Wi-Fi network.”

Unfortunately, researchers concluded, there is not much one can do to prevent this data from being harvested. The best Android users can do is to start their phone with network connections disabled. The study found disabling Google Play Services and the Google Play and YouTube apps before connecting to a network prevented the vast majority of data sharing. But then, users would have to visit other app stores to download apps, each of which has its own privacy issues. Apple users do not even have that option, as their device must connect to a network to activate.

See the article for a summary of the researchers’ process. They reached out to both companies for comment. Google responded by comparing its data collection to the statistics modern vehicles send back to manufacturers—they just want to make sure everything is working properly. Apple’s spokesperson quibbled with the researchers findings and insisted users’ personal data was safe and could not be traced to individuals. I suppose we will just have to take their word for it.

Cynthia Murrell October 19, 2021

Data Confidence: The Check Is in the Mail

October 15, 2021

Why are we not surprised? SeattlePI reports, “Americans Have Little Trust in Online Security: AP-NORC Poll.” Writer Matt O’Brien reveals:

“The poll by The Associated Press-NORC Center for Public Affairs Research and MeriTalk shows that 64% of Americans say their social media activity is not very or not at all secure. About as many have the same security doubts about online information revealing their physical location. Half of Americans believe their private text conversations lack security. And they’re not just concerned. They want something done about it. Nearly three-quarters of Americans say they support establishing national standards for how companies can collect, process and share personal data.”

Few have any hope such standards will be enacted by federal officials, however. Even after years filled with private sector hacks and scandals, we’re told 56% of respondents would trust corporations to safeguard their data before they would the government. The write-up continues:

“About 71% of Americans believe that individuals’ data privacy should be treated as a national security issue, with a similar level of support among Democrats and Republicans. But only 23% are very or somewhat satisfied in the federal government’s current efforts to protect Americans’ privacy and secure their personal data online. ‘This is not a partisan issue,’ said Colorado state Rep. Terri Carver, a Republican who co-sponsored a consumer data privacy bill signed into law by Democratic Gov. Jared Polis in July. It takes effect in 2023.”

The bill would give users in Colorado the right to access and delete personal information online, echoing similar legislation in Virginia and California. Predictably, Facebook and other tech companies opposed the bill.

Cynthia Murrell, October 15, 2021

TikTok: Privacy Spotlight

September 15, 2021

There is nothing like rapid EU response to privacy matters. “TikTok Faces Privacy Investigations by EU Watchdog” states:

The watchdog is looking into its processing of children’s personal data, and whether TikTok is in line with EU laws about transferring personal data to other countries, such as China.

The data hoovering capabilities of a TikTok-type app have been known for what — a day or two or a decade? My hunch is that we are leaning toward the multi-year awareness side of the privacy fence. The write up points out:

TikTok said privacy was “our highest priority”.

Plus about a year ago an EU affiliated unit poked into the TikTok privacy matter.

However, the write up fails to reference a brilliant statement by a Swisher-type of thinker. My recollection is that the gist of the analysis of the TikTok privacy issue in the US was, “Hey, no big deal.”

We’ll see. I wait for a report on this topic. Perhaps a TikTok indifferent journalist will make a TikTok summary of the report findings.

Stephen E Arnold, September 15, 2021

Not an Onion Report: Handwaving about Swizzled Data

August 24, 2021

I read at the suggestion of a friend “These Data Are Not Just Excessively Similar. They Are Impossibly Similar.” At first glance, I thought the write up was a column in an Onion-type of publication. Nope, someone copied the same data set and pasted it into itself.

Here’s what the write up says:

The paper’s Excel spreadsheet of the source data indicated mathematical malfeasance.

Malfeasance. Okay.

But what caught my interest was the inclusion of this name: Dan Ariley. If this is the Dan Ariely who wrote these books, that fact alone is suggestive. If it is a different person, then we are dealing with routine data dumbness or data dishonesty.


The write up contains what I call academic ducking and covering. You may enjoy this game, but I find it boring. Non reproducible results, swizzled data, and massaged numerical recipes are the status quo.

Is there a fix? Nope, not as long as most people cannot make change or add up the cost of items in a grocery basket. Smart software depends on data. And if those data are like those referenced in this Metafilter article, well. Excitement.

Stephen E Arnold, August 24, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta