The Many Ways Police Can Access User Data

January 14, 2021

We hope that by now, dear reader, you understand digital privacy is an illusion. For those curious about the relationship between big tech, personal data, and law enforcement, we suggest “How Your Digital Trials Wind Up in the Hands of the Police,” shared by Ars Technica. The article, originally published by Wired, begins by describing how police used a Google keyword warrant to track down one high-profile suspect. We’re reminded that data gathered for one ostensible purpose, like building an online profile, can be repurposed as evidence. From the smart speakers and wearable devices that record us to apps that track location and other data, users are increasingly signing away their privacy rights. Writer Sidney Fussell notes:

“The problem isn’t just any individual app, but an over-complicated, under-scrutinized system of data collection. In December, Apple began requiring developers to disclose key details about privacy policies in a ‘nutritional label’ for apps. Users ‘consent’ to most forms of data collection when they click ‘Agree’ after downloading an app, but privacy policies are notoriously incomprehensible, and people often don’t know what they’re agreeing to. An easy-to-read summary like Apple’s nutrition label is useful, but not even developers know where the data their apps collect will eventually end up.”

Amid protests over policing and racial profiling, several tech companies are reevaluating their cooperation with law enforcement. Amazon hit pause on sales of facial recognition tech to police even as it noted an increase in requests for user data by law enforcement. Google vowed to focus on better representation, education, and support for the Black community. Even so, it continues to supply police with data in response to geofence warrants. These requests are being made of Google and other firms more and more often. Fussell writes:

“As with keyword warrants, police get anonymized data on a large group of people for whom no tailored warrant has been filed. Between 2017 and 2018, Google reported a 1,500 percent increase in geofence requests. Apple, Uber, and Snapchat also have received similar requests for the data of a large group of anonymous users. … These warrants allow police to rapidly accelerate their ability to access our private information. In some cases, the way apps collect data on us turns them into surveillance tools that rival what police could collect even if they were bound to traditional warrants.”

Civil rights groups are pushing back on these practices. Meanwhile, users would do well to pause and consider before hitting “Agree.”

Cynthia Murrell, January 14, 2021

Traffic: Can a Supercomputer Make It Like Driving in 1930?

January 12, 2021

Advertisers work long and hard to find roads which are scenic and can be “managed” with the assistance of some government authorities to be perfect. The idea is that a zippy new vehicle zooms along a stretch of tidy highway (no litter or obscene slogans spray painted on billboards, please). Behind the wheel or the semi-autonomous driver seat is a happy person. Zoom, zoom, zoom. (I once knew a poet named Alex Kuo. He wrote poems about driving. I found this interesting, but I hate driving, flying, or moving anywhere outside of my underground office in rural Kentucky.

I also read a book called Traffic: Why We Drive the Way We Do (and What It Says about Us). I recall the information about Los Angeles’ super duper traffic management computer. If my memory is working this morning, the super duper traffic computer made traffic worse. An individual with some numerical capability can figure out why. Let those chimpanzees throw darts at a list of publicly traded security and match the furry entity with the sleek MBA. Who wins? Yeah.

I thought about the hapless people who have to deal with driving, riding trains, or whatever during the Time of Rona. Better than pre Rona, but not by much. Humans travel according the habit, the age old work when the sun shines adage, or because clumping is baked into our DNA.

The problem is going to be solved, at least that’s the impression I obtained from “Could a Supercomputer Help Fix L.A.’s Traffic Problems?” Now traffic in Chicago sucks, but the wizards at the Argonne National Laboratory are going to remediate LaLa Land. I learned:

The Department of Energy’s Argonne National Laboratory is leading a project to examine traffic data sets from across the Los Angeles region to develop new strategies to reduce traffic congestion.

And what will make the difference this time? A supercomputer. How is that supercomputer doing with the Covid problem? Yeah, right.

The write up adds:

Super computers at the Argonne Laboratory are able to take a year’s worth of traffic data gathered from some 11,160 sensors across southern California, as well as movement data from mobile devices, to build forecasting models. They can then be applied to simulation projects.

Who in LA has the ball?

Not the LA Department of Transportation. Any other ideas?

And how was driving in LA in 1930? Pretty awful according to comments made by my mother.

Stephen E Arnold, January 12, 2021

Soros: Just in Time 20-20 Hindsight

November 18, 2020

Here’s an interesting quote (if it is indeed accurate):

SFM [a George Soros financial structure] made this investment [in Palantir Technologies] at a time when the negative social consequences of big data were less understood,” the firm said in a statement Tuesday. SFM would not make an investment in Palantir today.

The investment concerns Palantir Technologies. George Soros, who is 90 years young, according to “Soros Regrets Early Investment in Peter Thiel’s Palantir,” includes this statement:

Soros has sold all the shares it’s permitted to sell at this time and will keep selling, according to the statement. “SFM does not approve of Palantir’s business practices,” the firm said.

Hindsight is 20-20. Or is it?

Hindsight bias can cause memory distortion. Because the event happened like you thought it would, you go back and revise your memory of what you were thinking right before the event. You re-write history, so to speak, and revise the probability in hindsight. Going forward, you use that new, higher probability to make future decisions. When in fact, the probabilities haven’t changed at all. That leads to poor judgment.—“Innovators: Beware the Hindsight Bias

Stephen E Arnold, November 18, 2020

Hard Data Predicts Why Songs Are Big Hits

August 26, 2020

Hollywood has a formula system to make blockbuster films and the music industry has something similar. It is harder to predict hit music than films, but Datanami believes someone finally has the answer: “Hooktheory Uses Data To Quantify What Makes Songs ‘Great’.”

Berkeley startup Hooktheory knows that many songs have similar melodies and lyrics. Hooktheory makes software and other learning materials for songwriters and musicians. With their technology, the startup wants to prove what makes music popular is quantifiable. Hooktheory started a crowdsourced database dubbed “Theorytabs” that analyses popular songs and the plan is to make it better with machine learning.

Theorytabs is a beloved project:

“The Hooktheory analysis database began as a “labor of love” by Hooktheory co-founders Dave Carlton, Chris Anderson and Ryan Miyakawa, based on the idea that “conventional tabs and sheet music are great for showing you how to play a song, but they’re not ideal for understanding how everything fits together.” Over time, the project snowballed into a community effort that compiled tens of thousands of Theorytabs, which Hooktheory describes as “similar to a guitar tab but powered by a simple yet powerful notation that stores the chord and melody information relative to the song’s key.”

Theorytabs users can view popular songs from idol singers to videogame themes. They can play around with key changes, tempos, mixers, and loops, along with listening to piano versions and syncing the songs up with YouTube music videos.

Hooktheory owns over 20,000 well-formatted tabs for popular music. The startup is working with Carnegie Mellon University and New York University to take Theorytabs to the next level. The music community has welcomed Theorytabs and people are eager to learn about the data behind great music.

Whitney Grace, August 27, 2020

Yes, Elegance in Language Explains Big Data in a More Satisfying Way for Some

July 14, 2020

I was surprised and then uncomfortable with the information in a tweet thread from Abebab. The tweet explained that “Big Dick Data” is a formal academic term. Apparently this evocative and polished turn of phrase emerged from a write up by “D’Ignazio and F. Klein”.

Here’s the definition:

a formal, academic term that D’Ignazio & F. Klein have coined to denote big data projects that are characterized by masculinist, totalizing fantasies of world domination as enacted through data capture and analysis.

To prove the veracity of the verbal innovation, an image from a publication is presented; herewith a copy:


When I came upon the tweet, the item accrued 119 likes.


  • Is the phrase a contribution to the discussion of Big Data, or is the phrase a political statement?
  • Will someone undertake a PhD dissertation on the subject, using the phrase as the title or will a business publisher crank out an instant book?
  • What mid tier consulting firm will offer an analysis of this Big Data niche and rank the participants using appropriate categories to communicate each particular method?

Outstanding, tasteful, and one more — albeit quite small — attempt to make clear that discourse is being stretched.

Above all, classy or possibly a way to wrangle a job writing one liners for a comedian looking for Big Data chuckles.

Stephen E Arnold, July 14, 2020

CFO Surprises: Making Smart Software Smarter

April 27, 2020

The Cost of Training NLP Models is a useful summary. However, the write up leaves out some significant costs.

The focus of the paper is a:

review the cost of training large-scale language models, and the drivers of these costs.

The cost factors discussed include:

  • The paradox of compute costs going down yet the cost of processing data goes up—a lot. The reason is that more data are needed and more data can be crunched more quickly. Zoom go the costs.
  • The unknown unknowns associated with processing the appropriate amount of data to make the models work as well as they can
  • The wide use of statistical models which have a voracious appetite for training data.

These are valid points. However, the costs of training include other factors, and these are significant as well; for example:

  1. The directs and indirects associated with creating training sets
  2. The personnel costs required to assess and define retraining and the information assembly required for that retraining
  3. The costs of normalizing training corpuses.

More research into the costs of smart software training and tuning is required.

Stephen E Arnold, April 28, 2020


Homeland Security Wants To Make Most of Its Data

April 24, 2020

The US Department of Homeland Security gathers terabytes of data relating to national security. One of the department’s biggest quandaries is figuring out how to share that information across all law enforcement agencies. FedTech explains how Homeland Security discovered a solution in the article, “DHS’ CDM Program Focuses On Shared Services Dashboard.”

The project for sharing data is officially from the Department of Homeland Security and is called Continuous Diagnostics and Mitigation program. The Continuous Diagnostics and Mitigation program is a dashboard that gives IT leaders keener insights into cybersecurity vulnerabilities and how IT security compares to other agencies. From April 2020 to September 2020 (the end of the fiscal year), the Department of Homeland Security will pilot the dashboard. The Continuous Diagnostics and Mitigation program uses Elasticsearch to power its enterprise search, metrics, and business analytics.

Kevin Cox is the manager for the Continuous Diagnostics and Mitigation program. Cox states that the program will be expanded beyond regular law enforcement agency:

“DHS is also focused on bringing in more agencies that were not originally participating in the CDM program, Cox tells Federal News Network. DHS needed to make sure they had asset management capabilities, awareness of the devices connected to their networks and identity and access management capabilities, according to Cox.

For 34 smaller, non-CFO Act agencies, DHS has provided them with a common shared service platform to serve as their CDM dashboard, although each small agency can see its own data individually as well, which is summarized in the larger federal dashboard.

Cox notes that this process has not been easy, and DHS benefits when it has flexibility to meet each individual agency’s cybersecurity data needs.”

One of the program’s goals is to see if the tool meets the desired requirements. Cox wants the data to be recorded, utilized on the dashboard, insights are found, and shared with agencies across the dashboard. It sounds like the Continuous Diagnostics and Mitigation program is a social media platform that specializes in cybersecurity threats.

Whitney Grace, April 24, 2020

Smart Software: What Is Wrong?

April 8, 2020

We have the Google not solving death. We have the IBM Watson thing losing its parking spot at a Houston cancer center. We have a Department of Justice study reporting issues with predictive analytics. And, the supercomputer and their smart software have not delivered a solution to the coronavirus problem. Yep. What’s up?

Data Science: Reality Doesn’t Meet Expectations” explains some of the reasons. DarkCyber recommends this write up. The article provides seven reasons why the marketing fluff generated by  former art history majors for “bros” of different ilk are not delivering; to wit:

  1. People don’t know what “data science” does.
  2. Data science leadership is sorely lacking.
  3. Data science can’t always be built to specs.
  4. You’re likely the only “data person”
  5. Your impact is tough to measure — data doesn’t always translate to value
  6. Data & infrastructure have serious quality problems.
  7. Data work can be profoundly unethical. Moral courage required.

DarkCyber has nothing to add.

Stephen E Arnold, April 8, 2020

Big Data Gets a New Term: DarkCyber Had to Look This One Up

April 2, 2020

In our feed this morning (April 1, 2020) we skipped over the flood of news about Zoom (a Middle Kingdom inspired marvel), the virus stories output by companies contributing their smart software to find a solution), and the trend of Amazon bashing (firing a worker who wanted to sanitize a facility and Amazon’s organizational skills are wobbling).

What stopped our scanning eyes was “Why Your Business May Be on a Data-Driven Coddiwomple.” DarkCyber admits that one of our team write a story for an old school publisher which used the word “cuculus” in its title “Google in the Enterprise 2009: The Cuculus Strategy.” A “cuculus,” as you probably know, gentle reader, is a remarkable bird, sort of a thief.

But Coddiwomple? That word means travel in a purposeful manner to a vague definition. Most of the YouTube train ride and the Kara and Nate trips qualify. Other examples include the aimless wandering of enterprise search vendors who travel to the lands of customer service, analytics, business process engineering, and only occasionally returning to their home base of the 50 year old desert of proprietary enterprise search.

What’s the point of “Why Your Business May Be on a Data-Driven Coddiwomple”? DarkCyber believes the main point is valid:

In practical terms the lack of clarity on the starting point can involve a lack of vision into what the specific objectives of the team are, or what human resources and skills are already in house. Meanwhile, the diverse and siloed stakeholders in a “destination” for the data-driven endeavor may all have slightly different ideas on what the result should be, leading to a divergent and fuzzy path to follow.

In DarkCyber’s lingo, these data and analytics journeys are just hand waving and money spending.

Are businesses and other entities data driven?

Ho ho ho. Most organizations are not sure what the heck is going on. The data are easy to interpret, and no fancy, little understood analytics system is needed to figure out that an iceberg has nicked the good ship Silicon Lollipop.

There are interesting uses of data and clever applications of systems and methods that are quite old.

Like the cuculus, opportunism is important. The coddiwomple is a secondary effect. The cuculus gets into a company’s nest and raises money consumers. When the money suckers are bigger, each flies to another nest and the cycle repeats.

Data driven is a metaphor for doing something even though results are often difficult to explain: Higher costs, increased complexity, and an inability to adapt to the business environment.

I support the cuculus inspired consultants. The management of the nest can enjoy the coddiwomple as they seek a satisfying place to begin again.

Stephen E Arnold, April 2, 2020

The Problem of Too Much Info

March 17, 2020

The belief is that the more information one has the better decision one can make. Is this really true? The Eurasia Review shares how too much information might be a bad thing in the article, “More Information Doesn’t Necessarily Help People Make Better Decisions.”

According to the Stevens Institute of Technology, too much knowledge causes people to make worse decisions. The finding explains that there is a critical gap between assimilating new information with past knowledge and beliefs. Associate Professor of Computer Science at the Steves Institute Samantha Kleinberg is studying the phenomenon using AI and machine learning to investigate how financial advisors and healthcare professionals to their clients. She discovered:

“ ‘Being accurate is not enough for information to be useful,’ said Kleinberg. ‘It’s assumed that AI and machine learning will uncover great information, we’ll give it to people and they’ll make good decisions. However, the basic point of the paper is that there is a step missing: we need to help people build upon what they already know and understand how they will use the new information.’

For example: when doctors communicate information to patients, such as recommending blood pressure medication or explaining risk factors for diabetes, people may be thinking about the cost of medication or alternative ways to reach the same goal. ‘So, if you don’t understand all these other beliefs, it’s really hard to treat them in an effective way,’ said Kleinberg, whose work appears in the Feb. 13 issue of Cognitive Research: Principles and Implications.”

Kleinberg and her team studied 4,000 participants on their decision making processes with scenarios they would be familiar with to ones they would not. When confronted with an unusual problem, participants focused on the problem without any extra knowledge, but if they were asked to deal with a regular scenario such as healthcare or finances their prior knowledge got in the way.

Information overload and not being able to merge old information with the new is a problem. How do you fix it? Your answer is as good as mine.

Whitney Grace, March 17, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta