9 21

September 20, 2020

One of the DarkCyber research team came across this chart on the Datawrapper Web site. Datawrapper provides millennial-ready analysis tools. With some data and the firm’s software, anyone can produce a chart like this one with green bars for negative numbers.

datawrapper chicago

What is the chart displaying. The odd green bar shows the decline in job postings. Why green? No idea. What is the source of the data? Glassdoor, a job listings site. The data apply only to Chicago, Illinois. The time period is August 2020 versus August 2019. The idea is that the longer the bar, the greater the decline. Why is the bar green? Isn’t red a more suitable color for negative numbers?

Shown in this image are the top 12 sectors for job loss. To be clear, the longer the bar, the fewer job postings. Fewer job postings, one assumes, translates to reduced opportunities for employment.

What’s interesting is that accounting, consulting, information technology, telecommunications, and computer software and hardware are big losers. Those expensive MBAs, the lost hours studying for the CPA examination, and thumb typing through man pages are gone for now.


  • The colors? Red maybe.
  • The decline in high technology work and knowledge work is interesting.
  • The “open jobs” numbers are puzzling. Despite declines, Chicago – the city of big shoulders and big challenges – has thousands of jobs in declining sectors.

Net net: IT and computer software and hardware look promising. The chart doesn’t do the opportunities justice. And the color?

Stephen E Arnold, September 20, 2020

Count Bayesie Speaks Truth

September 10, 2020

Navigate to “Why Bayesian Stats Needs More Monte Carlo Methods.” Each time I read an informed write up about the 18th century Presbyterian minister who could do some math, I think about a fellow who once aspired to be the Robert Maxwell of content management. Noble objective is it not?

That person grew apoplectic when I explained how Autonomy in the early 1990s was making use of mathematical procedures crafted in the 18th century. I wish I have made a TikTok video of his comical attempt to explain that a human or software system should not under any circumstances inject a data point that was speculative.

Well, my little innumeric content management person, get used to Bayes. Plus there’s another method at which you can rage and bay. Yep, Monte Carlo. If you were horrified by the good Reverend’s idea, wait until you did into Monte Carlo. Strapping these two stastical stallions to the buggy called predictive analytics is commonplace.

The write up closes poetically, which may be more in line with the fuzzy wuzzy discipline of content management:

It may be tempting to blame the complexity of the details of Bayesian methods, but it’s important to realize that when we are taught the beauty of calculus and analytical methods we are often limited to a relatively small set of problems that map well to the solutions of calc 101. When trying to solve real world problems mathematically complex problems pop up everywhere and analytical solutions either escape or fail us.

Net net: Use what matches the problem. Also, understand the methods. Key word: Understand.

Stephen E Arnold, September 10, 2020

Data Brokers: A Partial List

September 7, 2020

DarkCyber has fielded several inquiries in the last three months about data brokers. My response has been to point out that some data brokers are like quinoa farmers near Cusco: Small, subsistence data reselling; others are like Consolidated Foods, the industrialized outfits.

Yon can review a partial list of data brokers on this Github page. However, I want to point out:

  • Non US data brokers have information as well. Some of that information is particularly interesting, and it is unlikely that the average email phisher or robocall outfit will have access to these data. (No, I am not listing some of these interesting firms.)
  • There are several large data brokers not on this list. In my lectures I mention a giant data broker wanna be, but in most cases when I say “Amazon”, the response is, “My family uses Amazon a couple of times a week.” I don’t push back. I just move forward. What one does not know does not exist for some people.
  • Aggregating services with analytics plumbing are probably more important than individual chunks of data from either the quinoa farmers or from a combine. Why? With three items of data and a pool of “maybe useful” content, it is possible to generate some darned interesting outputs.

Putting the focus on a single type of digital artifact is helpful, sometimes interesting, and may be a surprise to some uninformed big time researcher. But the magic of applied analytics is where the oomph is.

Stephen E Arnold, September 7, 2020

Facial Recognition: Who Is Against Early Diagnosis of Heart Disease?

September 3, 2020

The anti-facial recognition cohort may have a new challenge on their capable hands. Facial recognition is controversial. What if analysis of a face — for instance, in a selfie — can lead to an early diagnosis of heart disease. The person is alerted to visit a doctor. What if a life is saved? Is facial recognition granted a hall pass for a medical application?

I don’t want to dwell on fencing applications of pattern recognition. I would suggest that a quick look at “AI Expected to Detect Heart Disease via Selfies: Chinese Researchers” might be interesting. The write up states:

Facial appearance has long been identified as an indicator of cardiovascular risk. Features such as male pattern baldness, earlobe crease, xanthelasmata (yellowish deposit of fat around or on the eyelids) and skin wrinkling are the most common predictors.

And what about accuracy?

According to the results published in the European Heart Journal, the algorithm had a sensitivity of 80 percent and specificity of 54 percent, outperforming the traditional prediction model of coronary artery disease. Sensitivity refers to the algorithm’s ability to designate a patient with a disease as positive, while specificity is the test’s ability to designate a patient without disease as negative.

Interesting. How will anti-FR cohorts deal with medical technology which finds its way into different government agencies? DarkCyber does not have an answer, but perhaps pattern recognition will be banned? Perhaps not, however?

Stephen E Arnold, September 3, 2020

Bringing IT Department into Analytics Decisions: Seems Reasonable

August 25, 2020

Woe to the company that implements a data analysis solution without consulting its IT department. That is the moral of the IT Brief write-up, “Extracting Insights from Data Requires More than Just a Pretty Dashboard.” A slick dashboard is nice to have, and it can offer non-technical workers the comfort of pretty graphs, projections, and generated reports. But what happens when users do not understand the data that underlies these results? Contributor Steve Singer writes:

“If you’re not sure where your data comes from, or how clean it is, you can’t trust the reports you generate from it. In some cases, if you don’t know what you have, you don’t even know how to ask the right questions. Somehow, we all have to get smarter about our approaches to all the data in our organizations and our development of the skill sets needed to capitalize on dashboard analytics. … In some businesses, decisions on dashboard purchases and deployment are made with little or no consultation with the IT department and data specialists. No one carefully considers whether the stores of data are in a suitable form or location to support the new tools. All too often they are not. Business decision-makers then find themselves disappointed when the tools fail to deliver the benefits they expected. Avoiding this scenario requires business units discuss their objectives with IT so that together they can decide on the most effective products and approaches. Data specialists must be able to assess whether tools are fit for purpose and able to be linked to the organization’s existing IT infrastructure.”

A company’s IT department is (or should be) a wealth of technical expertise at decision-makers’ fingertips. Singer offers four tips for working together to make the best choices: Begin with a clear plan that defines objectives, then decide whether infrastructure changes are needed; examine data sources and stores; establish a trust score for available data; then, and only then, select the appropriate dashboard or toolset. Though such collaboration would be a drastic change for some companies, it is well worth the effort when data projects actually product the desired results. That beats flashy but meaningless graphs any day.

Cynthia Murrell, August 25, 2020

Predictive Analytics: A Time and a Place, Not Just in LE?

August 17, 2020

The concept seems sound: analyze data from past crimes to predict future crimes and stop them before they happen. However, in practice the reality is not so simple. That is, as Popular Mechanics explains, “Why Hundreds of Mathematicians Are Boycotting Predictive Policing.” Academic mathematicians are in a unique position—many were brought into the development of predictive policing algorithms in 2016 by The Institute for Computational and Experimental Research in Mathematics (ICERM). One of the partners, PredPol, makes and sells predictive policing tools. Reporter Courtney Linder informs us:

“Several prominent academic mathematicians want to sever ties with police departments across the U.S., according to a letter submitted to Notices of the American Mathematical Society on June 15. The letter arrived weeks after widespread protests against police brutality, and has inspired over 1,500 other researchers to join the boycott. These mathematicians are urging fellow researchers to stop all work related to predictive policing software, which broadly includes any data analytics tools that use historical data to help forecast future crime, potential offenders, and victims. … Some of the mathematicians include Cathy O’Neil, author of the popular book Weapons of Math Destruction, which outlines the very algorithmic bias that the letter rallies against. There’s also Federico Ardila, a Colombian mathematician currently teaching at San Francisco State University, who is known for his work to diversify the field of mathematics.”

Linder helpfully explains what predictive policing is and how it came about. The embedded four-minute video is a good place to start (interestingly, it is produced from a pro-predictive policing point of view). The article also details why many object to the use of this technology. Chicago’s Office of the Inspector General has issued an advisory with a list of best practices to avoid bias, while Santa Cruz has banned the software altogether. We’re told:

“The researchers take particular issue with PredPol, the high-profile company that helped put on the ICERM workshop, claiming in the letter that its technology creates racist feedback loops. In other words, they believe that the software doesn’t help to predict future crime, but instead reinforces the biases of the officers.”

Structural bias also comes into play, as well as the consideration that some crimes go underreported, skewing data. The piece wraps up by describing how widespread this technology is, an account that can be summarized by quoting PredPol’s own claim that one in 33 Americans are “protected” by its software.

With physics and other disciplines like Google online advertising based on probabilities and predictive analytics, what’s the scientific limit on real world applications? Subjective perceptions?

Cynthia Murrell, August 17, 2020

After 20 Plus Years, Whoa! Surveillance by Big Tech

August 10, 2020

DarkCyber has noted a flurry of write ups expressing surprise, rage, indignation, and blusterification at the idea of a commercial company collecting data. Hello, services are free for a basic reason: Making money. Part of making money is to have something that other companies and organizations will purchase. A good example is personal information about users of free services. The way big companies work is that there is a constant pressure to find new ways to generate money. Thus, there are data sucking apps; there are advertisements and more advertisements; there are subscriptions which lock in revenue while providing an Amazon-style we know a lot about those who shop on Amazon; and there are many ornaments on these methods.

I got a kick out of “Silicon Valley’s Vast Data Collection Should Worry You More Than TikTok.” We know the story well. Commercial firms in the US gather data and license it, often to marketing firms and to other organizations. After two decades of blissful ignorance a devoted band of “real” journalists are now probing the core business model of many technology centric companies.

Give me a break. We are talking decades of business processes designed to generate useful reports from flows of actions by individuals. In some countries, the government performs this task. In others, commercial enterprises do the work and license the normalized data to governments.

This passage from the write up tickled my funny bone:

And none of this is unreasonable. We should be worried about private companies and governments potentially collecting data on millions of unsuspecting people and censoring content they don’t like. But those based in China represent just a sliver of that threat.

Yep, the old “woulda, coulda, shoulda” ploy. May I remind you, gentle reader, that we are decades into the automation of data about the actions of individuals. These are the happy and often ignorant humanoids who download apps, run queries, click on videos, and send personal message while leaving a data trail a foot deep and a mile wide.

And now the need for something?

And data collection is not a technical and economic issue. Nope. Data collection is politics; for example:

TikTok’s critics might point to the increasingly scary behavior of China’s government as to why Chinese control of information is particularly alarming. They’re right about the behavior, but they curiously ignore the fact that the United States itself is currently governed by a far-right demagogue with his own concentration camps and authoritarian repression, and that the party behind him, which aligns entirely with his politics, reliably cycles into power at least once every eight years.

What’s the fix? Well, “oppose it all.”

Where were the regulators, the users, and the competitors 20 years ago? Probably in grade school, blissfully unaware that those handheld gadgets would become more important than other activities. Okay, adult thumbtypers, your outrage is interesting. Step back, and perhaps you can see why the howls of outrage, the references to evil forms of government, and the horrors of toting around a device that usually provides real time documentation of one’s actions as a bad thing.

But after 20 years, is it surprising that personal data actions are captured, analyzed, and used to provide more data “stuff” to consume? As I said, its been 20 years with no lessening of the processes. Complain to your parents. Maybe they dropped the ball? Commercial enterprises and governments are like beavers. And beavers do what beavers do.

Stephen E Arnold, August 10, 2020

Stratifyd: Marketing Push

August 6, 2020

Stratifyd or Taste Analytics competes in the analytics sector. The company has raised about $55 million since it opened for business in 2015. I read “Stratifyd Launches Next Generation Data Analytics Platform.” The write up confused me. The company’s Web site clear: “Blazing fast data insights that reveal your hidden story.”


The article about Stratifyd says:

Stratifyd, a technology company that democratizes data science and artificial intelligence (AI) through self-service data analytics, today announced a revolution in data analytics with the launch of its next generation platform. This powerful analytics engine was re-designed from the ground up to be intuitive and easy-to-use, enabling business users – regardless of education, skill, or job function – to harness the power of proprietary and third-party data to easily reveal and understand hidden stories represented within the data, thus delivering the benefits of a data science team to every organization.

The write up reports this:

The Stratifyd platform now provides the functionality to meet the demanding data science needs of an organization, but is specifically designed to be easy to use for those with limited data analytics experience. It empowers users of all skill levels to connect data sources to the platform, perform in depth analysis and data modeling, and discover insightful stories faster and more easily than previously possible. Through a graphical user interface, pre-built and customizable data analytics models, and simplified dashboards, the platform enables business users to extract insights (i.e., stories) that are hidden in the data and essential in helping companies improve customer service, better understand customer requirements, deliver product enhancements that address gaps in the market, solve problems experienced by customers, rollout new product and service offerings that deliver a competitive advantage, and more.

DarkCyber’s view is that one click access to data can lead to interesting decisions even for a company with “data science in its DNA.” We also noted that, like Amazon, Stratifyd has a “flywheel.” Instead of a business model which generates new businesses by selling online products, Stratifyd’s approach is providing a “data storytelling flywheel.”

Yep, stories and lots of buzzwords.

Stephen E Arnold, August 6, 2020

Quantexa: A Better Way to Nail a Money Launderer?

July 29, 2020

We noted the Techcrunch article “Quantexa Raises $64.7M to Bring Big Data Intelligence to Risk Analysis and Investigations.” There were a number of interesting statements or factoids in the write up; for example:

Altogether, Quantexa has “thousands of users” across 70+ countries, it said, with additional large enterprises, including Standard Chartered, OFX and Dunn & Bradstreet.

We also circled in true blue marker this passage:

As an example, typically, an investigation needs to do significantly more than just track the activity of one individual or one shell company, and you need to seek out the most unlikely connections between a number of actions in order to build up an accurate picture. When you think about it, trying to identify, track, shut down and catch a large money launderer (a typical use case for Quantexa’s software) is a classic big data problem.

And lastly:

Marria [the founder] says that it has a few key differentiators from these. First is how its software works at scale: “It comes back to entity resolution that [calculations] can be done in real time and at batch,” he said. “And this is a platform, software that is easily deployed and configured at a much lower total cost of ownership. It is tech and that’s quite important in the current climate.”

Some “real time” systems require time consuming and often elaborate configuration to produce useful outputs. The buzzwords take precedence over the nuts and bolts of installing, herding data, and tuning the outputs of this type of system.

Worth monitoring how the company’s approach moves forward.

Stephen E Arnold, July 29, 2020

EU Wants Google to Promise It Will Not Use Fitbit Data to Enhance Search

July 27, 2020

We noted “Europe Wants Google to Pledge That Fitbit Data Won’t Further Enhance Search.” Let’s see what “pledge” means:

Your Dictionary says: “The definition of a pledge is something held as security on a contract, a promise, or a person who is in a trial period before joining an organization. An example of a pledge is a cash down payment on a car. An example of a pledge is a promise that you’ll buy a person’s car.”

Dictionary.com says: “A solemn promise or agreement to do or refrain from doing something:a pledge of aid; a pledge not to wage war. Something delivered as security for the payment of a debt or fulfillment of a promise, and subject to forfeiture on failure to pay or fulfill the promise.”

Wordsense.eu says: “From Middle English plege?, from Anglo-Norman plege?, from Old French plege? (Modern French pleige?) from Medieval Latin plevium?, plebium?, from Medieval Latin plebi?? (“I pledge”), from Frankish *plegan? (“to pledge; to support; to guarantee”), from Proto-Germanic *plehan?? (“to care about, be concerned with”). Akin to Old High German pflegan? (“to take care of, be accustomed to”), Old Saxon plegan? (“to vouch for”), Old English pl?on? (“to risk, endanger”).”

The write up says:

EU regulators are asking Google to pledge that Fitbit information will not be used to “further enhance its search advantage.” Another demand involves letting third-parties have “equal” access to that data.

DarkCyber’s comment: Ho, ho, ho. Guarantee? Data are ingested and processed. Ho, ho, ho. No humans involved. Ho, ho, ho. It’s an artificial intelligence system. Ho, ho, ho. Let the lawyers figure it out. Ho, ho, ho. Fitbit users buy products, and Google wants to sell like Amazon. Ho, ho, ho.

Stephen E Arnold, July 27, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta