China: A Digital Currency Forecast

September 27, 2020

DarkCyber noted “‘One Day Everyone Will Use China’s Digital Currency.” If you have read Beyond Search/Dark Cyber before, you may know that words like “all,” “every,” and similar categorical affirmatives are irritants. We live in an era of “black swans” and words like “never” are tough to accept as characterizing the present datasphere. Nevertheless, we have an “everyone” from the Beeb.

The main idea is that Chinese digital currency will become the big dog. Hasta la vista dollares en efectivo. The Delphic statement comes from Chandler Guo, a “pioneer in cryuptocurrency.” The Chinese DCEP is coming. DCEP is the digital currency electronic payment, and it seems destined to become the way to pay.

The write up notes:

But many question whether it will succeed and there are concerns that it will be used by Beijing to spy on citizens.

And there is the Chinese spy thing.

The article includes an anonymous source, a now standard journalistic convention:

“The Chinese government believes that if some other countries can also use the Chinese currency it can break the United States’ monetary sovereignty. The United States has built the current global financial system and the instruments,” says an anonymous Chinese crypto currency observer known as Bitfool.

Are Guo and Bitfool correct? Sure, why not. It is 2020, the Year of the Black Swan.

Stephen E Arnold, September 27, 2020

Automated Databases: Data Sets for Sale

September 25, 2020

The word “scrape” refers to software. The code copies data from a Web site. The data are converted to a standard format. These data are gathered in a database. The key points are that the data can be accessed on a public Web site. Some scraping processes can log on and enter passwords. The result of “scraping” is the digital equivalent of an old-fashioned researcher’s note cards. Scrapehunt provides scraping as a service. The magic of’s service is that you can purchase a set of scraped data created by the company. You can also use the service to generate a custom database. One example is the firm’s news database which can cost a couple of hundred dollars or less if there is a sale underway. The data can be used to provide grist for a machine learning mill. For more information, navigate to Scrapehunt’s Web site.

Stephen E Arnold, September 25, 2020

Data Brokers: A Partial List

September 7, 2020

DarkCyber has fielded several inquiries in the last three months about data brokers. My response has been to point out that some data brokers are like quinoa farmers near Cusco: Small, subsistence data reselling; others are like Consolidated Foods, the industrialized outfits.

Yon can review a partial list of data brokers on this Github page. However, I want to point out:

  • Non US data brokers have information as well. Some of that information is particularly interesting, and it is unlikely that the average email phisher or robocall outfit will have access to these data. (No, I am not listing some of these interesting firms.)
  • There are several large data brokers not on this list. In my lectures I mention a giant data broker wanna be, but in most cases when I say “Amazon”, the response is, “My family uses Amazon a couple of times a week.” I don’t push back. I just move forward. What one does not know does not exist for some people.
  • Aggregating services with analytics plumbing are probably more important than individual chunks of data from either the quinoa farmers or from a combine. Why? With three items of data and a pool of “maybe useful” content, it is possible to generate some darned interesting outputs.

Putting the focus on a single type of digital artifact is helpful, sometimes interesting, and may be a surprise to some uninformed big time researcher. But the magic of applied analytics is where the oomph is.

Stephen E Arnold, September 7, 2020

Another Data Marketplace: Amazon, Microsoft, Oracle, or Other Provider for This Construct?

August 31, 2020

The European Union is making a sharp U-turn on data privacy, we learn from MIT Technology Review’s article, “The EU Is Launching a Market for Personal Data. Here’s What That Means for Privacy.” The EU has historically protected its citizens’ online privacy with vigor, fighting tooth and nail against the commercial exploitation of private information. As of February, though, the European Commission has decided on a completely different data strategy (PDF). Reporter Anna Artyushina writes:

The Trusts Project, the first initiative put forth by the new EU policies, will be implemented by 2022. With a €7 million [8.3 million USD] budget, it will set up a pan-European pool of personal and nonpersonal information that should become a one-stop shop for businesses and governments looking to access citizens’ information. Global technology companies will not be allowed to store or move Europeans’ data. Instead, they will be required to access it via the trusts. Citizens will collect ‘data dividends,’ which haven’t been clearly defined but could include monetary or nonmonetary payments from companies that use their personal data. With the EU’s roughly 500 million citizens poised to become data sources, the trusts will create the world’s largest data market. For citizens, this means the data created by them and about them will be held in public servers and managed by data trusts. The European Commission envisions the trusts as a way to help European businesses and governments reuse and extract value from the massive amounts of data produced across the region, and to help European citizens benefit from their information.”

It seems shifty they have yet to determine just how citizens will benefit from this data exploitation, I mean, value-extraction. There is no guarantee people will have any control over their information, and there is currently no way to opt out. This change is likely to ripple around the world, as the way EU approaches data regulation has long served as an example to other countries.

The concept of data trusts has been around since 2018, when Sir Tim Berners Lee proposed it. Such a trust could be for-profit, for a charitable cause, or simply for data storage and protection. As Artyushina notes, whether this particular trust actually protects citizens depends on the wording of its charter and the composition of its board of directors. See the article for examples of other trusts gone wrong, as well as possible solutions. Let us hope this project is set up and managed in a way that puts citizens first.

Cynthia Murrell, August 31, 2020

Forget Structured Query Language Commands? Yeah, Not Yet

August 29, 2020

One of the DarkCyber team spotted a demonstration service called The idea is that the system will accept natural language queries of information stored in structured databases. According to the DarkCyber person, the queries launched into the natural language box were:

Sheva War with Whom

Sheva Frequency

The sparse interface sports a Content button which displays the information in the system.

How did this work?


Not well. NLP systems pose challenges still it seems.

Interesting idea but some rough edges need a bit of touch up.

Stephen E Arnold, August 29, 2020

Amazon and Toyota: Tacoma Connects to AWS

August 20, 2020

This is just a very minor story. For most people, the information reported in “Toyota, Amazon Web Services Partner On Cloud-Connected Vehicle Data” will be irrelevant. The value of the data collected by the respective firms and their partners is trivial and will not have much impact. Furthermore, any data processed within Amazon’s streaming data marketplace and made available to some of the firm’s customers will be of questionable value. That’s why I am not immediately updating my Amazon reports to include the Toyota and insurance connection.

Now to the minor announcement:

Toyota will use AWS’ services to process and analyze data “to help Toyota engineers develop, deploy, and manage the next generation of data-driven mobility services for driver and passenger safety, security, comfort, and convenience in Toyota’s cloud-connected vehicles. The MSPF and its application programming interfaces (API) will enable Toyota to use connected vehicle data to improve vehicle design and development, as well as offer new services such as rideshare, full-service lease, proactive vehicle maintenance notifications and driving behavior-based insurance.

Are there possible implications from this link up? Sure, but few people care about Amazon’s commercial, financial, and governmental services, why think about issues like:

  • Value of the data to the AWS streaming data marketplace
  • Link analytics related to high risk individuals or fleet owners
  • Significance of the real time data to predictive analytics, maybe to insurance carriers and others?

Nope, not much of a big deal at all. Who cares? Just mash that Buy Now button and move on. Curious about how Amazon ensures data integrity in such a system? If you are, you can purchase our 50 page report about Amazon’s advanced data security services. Just write darkcyber333 at yandex dot com.

But I know first hand after two years of commentary, shopping is more fun than thinking about Amazon examined from a different viewshed.

Stephen E Arnold, August 20, 2020

Instagram: What Does Suspicious Mean at This Facebook Outfit?

August 19, 2020

DarkCyber noted what could be construed as a baby step toward adulting or a much bigger step toward Facebook obtaining more fine-grained information. “Instagram Will Make Suspicious Accounts Verify Their Identity” states:

Instagram is taking new steps to root out bots and other accounts trying to manipulate its platform. The company says it will start asking some users to verify their identities if it suspects “potential inauthentic behavior.” Instagram stresses that the new policy won’t affect most users, but that it will target accounts that seem suspicious.

It seems that “inauthentic” means “suspicious.” Okay, what is that exactly. The write up quotes an Instagram  something as saying:

This includes accounts potentially engaged in coordinated inauthentic behavior, or when we see the majority of someone’s followers are in a different country to their location, or if we find signs of automation, such as bot accounts.

What addresses inauthenticity? How about this?

Under the new rules, these accounts will be asked to verify their identity by submitting a government ID. If they don’t, the company may down-rank their posts in Instagram’s feed or disable their account entirely.

When a moment of adulting or a data grab, the Facebook continues to be Facebook.

Stephen E Arnold, August 19, 2020

Data Federation: K2View Seizes Lance, Mounts Horse, and Sallies Forth

August 13, 2020

DarkCyber noted “K2View Raises $28 million to Automate Enterprise Data Unification.”

Here’s the write up’s explanation of the K2View:

K2View’s “micro-database” Fabric technology connects virtually to sources (e.g., internet of things devices, big data warehouses and data lakes, web services, and cloud apps) to organize data around segments like customers, stores, transactions, and products while storing it in secure servers and exposing it to devices, apps, and services. A graphical interface and auto-discovery feature facilitate the creation of two-way connections between app data sources and databases via microservices, or loosely coupled software systems. K2View says it leverages in-memory technology to perform transformations and continually keep target databases up to date.

The write up contains a block diagram:



  1. It is difficult to determine how much manual (human) work will be required to deal with content objects not recognized by the K2View system
  2. What happens if the Internet connection to a data source goes down?
  3. What is the fall back when a microservice is not available or removed from service?

Many organizations offer solutions to disparate types of data scattered across many systems. Perhaps K2View will slay the digital windmills of silos, different types of data, and unstable connections? Silos have been part of the data landscape as long as Don Quixote has been spearing windmills.

Stephen E Arnold, August 13, 2020

After 20 Plus Years, Whoa! Surveillance by Big Tech

August 10, 2020

DarkCyber has noted a flurry of write ups expressing surprise, rage, indignation, and blusterification at the idea of a commercial company collecting data. Hello, services are free for a basic reason: Making money. Part of making money is to have something that other companies and organizations will purchase. A good example is personal information about users of free services. The way big companies work is that there is a constant pressure to find new ways to generate money. Thus, there are data sucking apps; there are advertisements and more advertisements; there are subscriptions which lock in revenue while providing an Amazon-style we know a lot about those who shop on Amazon; and there are many ornaments on these methods.

I got a kick out of “Silicon Valley’s Vast Data Collection Should Worry You More Than TikTok.” We know the story well. Commercial firms in the US gather data and license it, often to marketing firms and to other organizations. After two decades of blissful ignorance a devoted band of “real” journalists are now probing the core business model of many technology centric companies.

Give me a break. We are talking decades of business processes designed to generate useful reports from flows of actions by individuals. In some countries, the government performs this task. In others, commercial enterprises do the work and license the normalized data to governments.

This passage from the write up tickled my funny bone:

And none of this is unreasonable. We should be worried about private companies and governments potentially collecting data on millions of unsuspecting people and censoring content they don’t like. But those based in China represent just a sliver of that threat.

Yep, the old “woulda, coulda, shoulda” ploy. May I remind you, gentle reader, that we are decades into the automation of data about the actions of individuals. These are the happy and often ignorant humanoids who download apps, run queries, click on videos, and send personal message while leaving a data trail a foot deep and a mile wide.

And now the need for something?

And data collection is not a technical and economic issue. Nope. Data collection is politics; for example:

TikTok’s critics might point to the increasingly scary behavior of China’s government as to why Chinese control of information is particularly alarming. They’re right about the behavior, but they curiously ignore the fact that the United States itself is currently governed by a far-right demagogue with his own concentration camps and authoritarian repression, and that the party behind him, which aligns entirely with his politics, reliably cycles into power at least once every eight years.

What’s the fix? Well, “oppose it all.”

Where were the regulators, the users, and the competitors 20 years ago? Probably in grade school, blissfully unaware that those handheld gadgets would become more important than other activities. Okay, adult thumbtypers, your outrage is interesting. Step back, and perhaps you can see why the howls of outrage, the references to evil forms of government, and the horrors of toting around a device that usually provides real time documentation of one’s actions as a bad thing.

But after 20 years, is it surprising that personal data actions are captured, analyzed, and used to provide more data “stuff” to consume? As I said, its been 20 years with no lessening of the processes. Complain to your parents. Maybe they dropped the ball? Commercial enterprises and governments are like beavers. And beavers do what beavers do.

Stephen E Arnold, August 10, 2020

Quantexa: A Better Way to Nail a Money Launderer?

July 29, 2020

We noted the Techcrunch article “Quantexa Raises $64.7M to Bring Big Data Intelligence to Risk Analysis and Investigations.” There were a number of interesting statements or factoids in the write up; for example:

Altogether, Quantexa has “thousands of users” across 70+ countries, it said, with additional large enterprises, including Standard Chartered, OFX and Dunn & Bradstreet.

We also circled in true blue marker this passage:

As an example, typically, an investigation needs to do significantly more than just track the activity of one individual or one shell company, and you need to seek out the most unlikely connections between a number of actions in order to build up an accurate picture. When you think about it, trying to identify, track, shut down and catch a large money launderer (a typical use case for Quantexa’s software) is a classic big data problem.

And lastly:

Marria [the founder] says that it has a few key differentiators from these. First is how its software works at scale: “It comes back to entity resolution that [calculations] can be done in real time and at batch,” he said. “And this is a platform, software that is easily deployed and configured at a much lower total cost of ownership. It is tech and that’s quite important in the current climate.”

Some “real time” systems require time consuming and often elaborate configuration to produce useful outputs. The buzzwords take precedence over the nuts and bolts of installing, herding data, and tuning the outputs of this type of system.

Worth monitoring how the company’s approach moves forward.

Stephen E Arnold, July 29, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta