IHS Markit Data Lake “Catalog”

July 14, 2020

One of the DarkCyber research team spotted this product announcement from IHS, a diversified information company: “IHS Markit’s New Data Lake Delivers Over 1,000 Datsets in an Integrated Catalogued Platform.” The article states:

The cloud-based platform stores, catalogues, and governs access to structured and unstructured data. Data Lake solutions include access to over 1,000 proprietary data assets, which will be expanded over time, as well as a technology platform allowing clients to manage their own data. The IHS Markit Data Lake Catalogue offers robust search and exploration capabilities, accessed via a standardized taxonomy, across datasets from the financial services, transportation and energy sectors.

The idea is consistently organized information. Queries can run across the content to which the customer has access.

Similar services are available from other companies; for example, Oracle BlueKai.

One question which comes up is, “What exactly are the data on offer?” Another is, “How much does it cost to use the service?”

Let’s tackle the first question: Scope.

None of the aggregators make it easy to scan a list of datasets, click on an item, and get a useful synopsis of the content, content elements, number of items in the dataset, update frequency (annual, monthly, weekly, near real time), and the cost method applicable to a particular “standard” query.

A search of Bing and Google reveals the name of particular sets of data; for example, Carfax. However, getting answers to the scope question can require direct interaction with the company. Some aggregators operate in a similar manner.

The second question: Cost?

The answer to the cost question is a tricky one. The data aggregators have adopted a set or a cluster of pricing scenarios. It is up to the customer to look at the disclosed data and do some figuring. In DarkCyber’s experience, the data aggregators know much more about what content process, functions or operations generate the maximum profit for the vendor. The customer does not have this insight. Only through use of the system, analyzing the invoices, and paying them is it possible to get a grip on costs.

DarkCyber’s view is that data marketplaces are vulnerable to disruption. With a growing demand for a wide range of information some potential customers want answers before signing a contract and outputting big bucks.

Aggregators are a participant in what DarkCyber calls “professional publishing.” The key to this sector is mystery and a reluctance to spell out exact answers to important questions.

What company is poised to disrupt the data aggregation business? Is it the small scale specialist like the firms pursued relentlessly by “real” journalists seeking a story about violations of privacy? Is it a giant company casting about for a new source of revenue and, therefore, is easily overlooked. Aggregation is not exactly exciting for many people.

DarkCyber does not know. One thing seems highly likely: Professional publishing data aggregation sector is likely to face competitive pressure in the months ahead.

Some customers may be fed up with the secrecy and lack of clarity and entrepreneurs will spot the opportunity and move forward. Rich innovators will just buy the vendors and move in new directions.

Stephen E Arnold, July 14, 2020

The Covidization of Good Enough

July 14, 2020

I have written numerous times about the zippy young PhD with an attitude. After my talk about declining “findability”, Zippy (not his real name) spoke with me after my talk. He had one point and repeated it to me several times:

Search is good enough.

In my first lecture at the National Cyber Crime Conference on Monday, July 13, 2020, I make the point that locating information or getting software updates that work is an indication that “good enough” has become the target.

I read “A Moment of Clarity Regarding the Raison d’Etre for the App Store.” The article, which is critical of the giant weird technology company, includes this statement:

I worry that this sort of “Who cares, it’s better than nothing” attitude has seeped into Apple itself, and explains how we wound up with barely modified iPad apps shipping as system apps on the Mac. But more than anything I worry that this exemplifies where Apple has lost its way with the App Store.

What we have is a big American company allowing another big American company to deliver “good enough” products and services.

Couple this attitude with the challenges in education and pandemics. What do you get?

Hey, a good enough mind set. The problem is that good enough isn’t.

Stephen E Arnold, July 14, 2020

Amazon: We Love the Cheery Smile, But Does It Have a Darker Meaning?

July 13, 2020

Who needs the Dark Web when one has Amazon? The Markup reveals, “Amazon’s Enforcement Failures Leave Open a Back Door to Banned Goods—Some Sold and Shipped by Amazon Itself.” Investigators at The Markup began combing the site for banned goods after a series of deaths and illnesses attributed to one counterfeit pill maker. The fake-Percocet maker, now in prison, revealed he’d bought his pill press right off Amazon. The journalists were dismayed to find nearly 100 dangerous and/or illegal items readily available on the site. All of these products are explicitly banned in Amazon’s third-party seller rules and prohibitions for the U.S. market. Reporters Annie Gilbertson and Jon Keegan write:

“The Markup filled a shopping cart with a bounty of banned items: marijuana bongs, ‘dab kits’ used to inhale cannabis concentrates, ‘crackers’ that can be used to get high on nitrous oxide, and compounds that reviews showed were used as injectable drugs. We found two pill presses and a die used to shape tablets into a Transformers logo, which is among the characters that have been found imprinted on club drugs such as ecstasy. We found listings for prohibited tools for picking locks and jimmying open car doors. And we found AR-15 gun parts and accessories that Amazon specifically bans. Almost three dozen listings for banned items were sold by third parties but available to ship from Amazon’s own warehouses. At least four were listed as ‘Amazon’s Choice.’ The phrase ‘ships from and sold by Amazon.com’ appeared beneath the buy button of five of the banned items we found, which two former employees confirmed means those products are, in fact, sold by Amazon. In addition, one of the sellers we were able to reach also confirmed it sold the items to Amazon.”

Of course, “Amazon’s choices” are often chosen by algorithm, which is part of the problem. The site does have a process for finding and removing banned products, but the human reviewers cannot keep up with the onslaught of third-party uploads. The journalists found several products that evaded detection by being listed as something they are not—like the AR-15 vise block masquerading as a desk accessory, complete with paperclips and pencil erasers in the image. Other items simply avoid telltale keywords, but are plain as day to anyone who views the listing. It is apparent even the algorithm has a clue because it frequently recommends items related to the product at hand. See the article for more examples.

What will Amazon do about this alarming issue? Well, if we take spokesperson Patrick Graham’s responses as a guide, the answer is it will downplay the problem. Seems about right.

Cynthia Murrell, July 13, 2020

Google and the Middle Kingdom

July 10, 2020

Remember when Google nosed into China and suggested that the country change how it approached life, business, and online? Few do. Suffice it to say that Google’s Silicon Valley inputs did produce one reaction: A small dish of day old sweet red bean dumplings. Yummy.

Flash forward to the present. “Google Shuts Down Cloud Project, Says No Plan to Offer Cloud Services in China” reports that Google

has shut down its cloud project named ‘Isolated Region’ and added that it was not weighing options to offer its cloud platform in China.

The article states:

The search engine giant, however, said that the project’s shutdown was not due to either of those two reasons and that it has not offered cloud platform services in China.

Perhaps Google became impatient waiting for China to modify its methods?

Stephen E Arnold, July 10, 2020

Huawei and Its Sci-Fi Convenience Vision

July 9, 2020

One of the DarkCyber research team spotted what looked like a content marketing, rah rah article called “Huawei’s 1+8+N Strategy Will Be a Big Success in China As It Has No Competitors.”

We talked about the article this morning and dismissed its words as less helpful than most recycled PR. The gem in the write up is this diagram which was tough to read in the original. We poked around and came across a Huawei video which you can view on the Sparrow News Web site.

Here’s a version of the 1+8+N diagram. If you are trying to read the word “sphygmomanometer” means blood pressure gizmo. The term is shorthand for “smart medical devices”.


The idea is that the smartphone is the de facto surveillance device. It provides tags for the device itself and a “phone number” for the device owner. Burner phones registered to smart puppets require extra hoops, and government authorities are going to come calling when the identify of the burner phone’s owner is determined via cross correlation of metadata.

The diagram has three parts, right? Sort of. First, the “plus” sign in the 1+8+N is Huawei itself. Think of Huawei as the Ma Bell, just definitely very cozy with the Chinese government. The “plus” means glue. The glue unites or fuses the data from the little icons.

The focal point of the strategy is the individual.

From the individual, the diagram shows no phone computing devices. There are nine devices identified, but more can be added. These nine devices connected to an individual are all smart; that is, Internet of things, mobile aware, surveillance centric, and related network connected products.

The 1

The “1” refers to the smartphone.

The 8

The eight refers to the smart devices an individual uses. (The smartphone is interacting with these eight devices either directly or indirectly as long as there is battery and electrical power.)

Augmented / virtual reality “glasses”


Personal computers






The connection between and among the devices is enabled by Huawei HiLink or mobile WiFi, although Bluetooth and other wireless technologies are an option.

The N

The N like the math symbol refers to any number of ecologies. An ecology could be a person riding in a vehicle, watching a presentation displayed by a connected projector, a smart printer, a separate but modern smart camera, a Chinese Roomba type robot, a smart scale for weighing a mobile phone owner, a medical device connected or embedded in an individual, a device streaming a video, a video game played on a device or online, a digital map.

These use cases cluster; for example, mobile, smart home, physical health, entertainment, and travel. Other categories can, of course, be added.

Is 1+8+N the 21st Century E=MC^2?

Possibly. What is clear is that Huawei has done a very good job of mapping out the details of the Chinese intelligence and surveillance strategy. By extension, one can view the diagram as one that could be similar to those developed by the governments of Iran, North Korea, Russia, and a number of other nation states.

The smartphone delivers on its potential in the 1+8+N diagram, if the Huawei vision gets traction.


The 1+8+N equation has been around since 2019. Its resurfacing may have more to do with Huawei’s desire to be quite clear about what its phones and other products and services can deliver.

The company uses the phrase “full scene” instead of the American jargon of a 360 degree view.

Neither phrase captures the import of data in multiple dimensions. Tracking and analyzing data through time enables a number of interesting dependent features, services, and functions.

The 1+8+N may be less about math and more about intelligence than some of the write ups about the diagram discuss.

Stephen E Arnold, July 9, 2020

Subscriptions: Spreadsheet Fever Fuels the Magazine Model

July 9, 2020

Nothing is easier. Plug in a series of four numbers, highlight the cells, and drag the little black box. Excel spits out the “projected next number.” Magic.

Think about this. Mail out 10 million snail mail pitches for a year’s subscription to a jazzy magazine, maybe Psychology Today or something similar. Fire up the spreadsheet, plug in the estimated number of sign ups, and project how much money will flow into the coffers of the magazine publisher or the third party handling the campaign from an office in Hoboken.

Subscriptions are the “next big thing” for many businesses. Here in rural Kentucky, our single car wash sells a “subscription.” The idea is that the car wash gets upfront money, and the lucky buyer can drive in one every two weeks and get the horse and buggy hosed down. Working good? Not so much.

BMW is selling subscriptions to features like heated steering wheels. Tesla, the auto company owned by Joe Rogan star, Elon Musk has subscriptions on its radar too.

Twitter, according to Bloomberg, the socially positive and continually uplifting information service, may be going to a subscription model. The DarkCyber research team has long considered Twitter a very useful tool for misinformation, disinformation, and reformation. Asking “fake personas” to pay for the service may work. On the other hand, industrious individuals may find the steady stream of innovations in encrypted messaging apps a possible complement. But look at those Excel projections. Imagine a 1,000,000 subscribers at $10 US a month. Wow, drag those tiny black squares. Count your bonus now.

The Quibi short form video service is subscription based. No one on the DarkCyber team has downloaded the app nor peered over someone’s shoulder while social distancing outside the general store in our small town. (It is near the vacant subscription car wash.)

According to a possibly specious, wildly incorrect, and statistically flawed report, Quibi’s subscription model is not selling like Rona N95 masks. The rock solid “real” news outfit Verge published “Quibi Reportedly Lost 90 Percent of Early Users after Their Free Trials Expired.”

The marketing technique implemented get six issues free and then pay only $10 US a month approach. How are magazines doing these days? Yep, stunning business.

The write up recycles data from a “research firm” named Sensor Tower and reports:

Streaming service Quibi only managed to convert a little under 10 percent of its early wave of users into paying subscribers, says mobile analytics firm Sensor Tower. According to the firm’s new report on Quibi’s early growth, the short-form video platform signed up about 910,000 users in its first few days back in April. Of those users, only about 72,000 stuck around after the three-month free trial, indicating the app had about an 8 percent conversion rate.

Short form video content is available mostly for free. Ever hear of Funimate?

Let’s step back. Advertising online is a monopoly game with two outstanding firms managing the dice, the money, and the cute little tokens. Direct mail is more expensive. With creative, list rentals, and fulfillment house fees, figure $5 to $7 per envelope delivered by snail mail. The promo can be cheaper if you go with a single “please, subscribe” flier in a ValPak envelope. Inserts in a daily newspaper. Okay, that’s a great idea. Door knob hanging? Nope. Banner ads on the Adf.ly network. Yeah, maybe?

Subscription plays are looking good when viewed through the blood shot eyes of someone with spreadsheet fever.

Reality may be different. Even National Geographic is a non profit. Hey, there’s an idea for BMW, Twitter, and Quibi. When this bout of spreadsheet fever winds down, consider the benefits of becoming a non governmental organization: Donations, fund raisers, merchandise, and more.

Stephen E Arnold, July 9, 2020

Misunderstanding the Google Hidden URL Play

July 4, 2020

I read “Where Am I?” The write up address the void in the browser’s address bar. The point is that Google hides urls.

The author address the “problem” this way:

Based on the contents of the page, I’m clearly on a NYTimes property, but based on the address bar I’m clearly on google.com. If I click in the address bar I see https://www.google.com/amp/s/www.nytimes.com/2020/05/22/technology/google-antitrust.amp.html.

The write up points out that Google wants the user to click on the “address bar” and then try to figure out who owns the Web page displayed.

Phishing is a popular sport, and it seems that Google’s blank or modified address bar is a giant opaque lake for bad actors.

The author of the write points out:

Google serves NYTimes’ controlled content on a Google domain.

The write up adds:

In work security trainings and guides on the Internet we are trained to look at the URL bar to help make a decision on whether to trust a site, but the Google AMP Cache requires contradictory assumptions.

Here’s a diagram of Google as the Internet. What’s “in” Google becomes the Internet:


Stephen E Arnold, The Google Legacy and Google Version 2, both published by Infonortics (now defunct like many publishing house). Users, partners, advertisers, and developers only “know” what Google decides to provide. Blank urls are an overt indication of Google’s “ownership” of the “Internet.” The diagram was first created for an Arnold lecture about Google in 2003.

Several observations:

  1. Google’s apparent objective is to become the gateway to the Internet. This is a variation of its walled garden approach. What you “receive” and “see” is the Internet. Obfuscating urls is one step toward this goal.
  2. The way to “find” certain content is to buy ads. Scrubbing urls for PDFs means that if someone wants content found, there is a road. That road is Google Advertising.
  3. Confusion in a Google service is understood by the happy Googlers. The confusion increases dependence on Google to locate information.

This is what some might characterize as “just business.” DarkCyber’s view is the Google is creating opportunities for bad actors to make phishing easier than ever.

Hey, how hard is it to create a spoofed page, SEO that puppy, and display it to one of my neighbors’ bridge partners?

Easy, gentle reader. Without ethical control or meaningful guidelines, the Google is — in case you have not figured it out — is the Internet.

A blank address bar is just the beginning too. Think of this control as a form of “independence.” Life is simpler when it is controlled.

Stephen E Arnold, July 4, 2020

Algolia Pricing

July 3, 2020

Years ago I listened to a wizard from Verity explain that a query should cost the user per cell. Now that struck me as a really stupid idea. Data sets were getting larger. The larger the data set, even extremely well crafted narrow queries would “touch” more cells. In a world of real time queries and stream processing, the result of the per cell model would be more than just interesting, it would be a deal breaker.

Pricing digital anything has been difficult. In the good old days of the late 1970s and early 1980s, one paid in many different ways — within the same system. The best example of this was the AT&T/British Telecom approach to online data.

Here’s what was involved. I am 77 and working from memory:

  1. Installation, set up, or preparation fee. This was dependent of factors such as location, distance from a node, etc.
  2. Base rate; that is, what one paid simply to be connected. This could be an upfront fee or calculated on some measurement which was intentionally almost impossible to audit or verify.
  3. Service required. Today this would be called bandwidth or connect time. The definition was slippery, but it was a way for the telcos of that era to add a fee.

If a connection went to a data center housing data, then other fees would kick in; for example:

  1. Hourly fee billed fractionally for the connect time to the database
  2. Per item fee when extracting data from the database
  3. A “print” or “type” fee which applied to the format of the data extracted
  4. A “report” fee because reports required cost recovery for the pre-coded template, query time, formatting, and outputting.

There were other fees, but the most fascinating one was the “threshold fee.” The idea is that paid for 60 minutes of connect time. When the 61st minute was required, the threshold was crossed, and the billing could go up, often by factors of 2X or more. No warning, of course. And the mechanism for calculating threshold fees were not disclosed to the normal customer. (After I became a contractor to Bell Communications Research, I learned that the threshold fees were determined based on “outside” or exogenous factors. In Bell Head speak this seemed to mean, “This is where we make even more money.”

To sum up, online pricing was a remarkable swamp. Little wonder that outsiders would be baffled at the online invoices generated by the online providers. Exciting, yes. Happy customers, nah. No one at the AT&T/British Telecom type outfits cared about non Bell Heads. No Young Pioneer T shirt? Ho, ho, ho. Pay your bill or we kill your account. Ho ho ho.

Algolia announced a new pricing plan. You can read about it here. The idea is to reduce confusion and be more “customer friendly.” What’s interesting to me is the string of comments on the Hacker News site. You can read these comments at this link.

There’s some back and forth with Algolia participating.

Some of the comments underscore the type of “surprise” that certain types of pricing models spark; for example, from alooPotato:

We (Streak) are in the same boat. Looks like we’d be paying approx half a million dollars a month on their new pricing which would be ~100x more than we are paying now. Haven’t heard from our enterprise rep but starting to get nervous… Sounds like the new pricing is for their ecommerce customers given how much value they provide them, doesn’t seem to make sense anymore for SaaS use cases.

ysavir takes a balanced view; that is, some good, some bad:

Not the GP, but I figure their point is as follows: If I’m running an e-commerce website, I don’t mind pay-per-search since those searches may turn into sales, so the cost is justified. My income scales with search count, and the Algolia price is part of user acquisition costs. If I’m running a SaaS business, the search is a feature for customers who have already paid, so I don’t see any further returns from the search being used. The more a client uses search, the less I’m profiting from having them as a client. They could potentially even cost me money to service them!

The point is that any pricing model — whether the AT&T/British Telecom type pricing “simplification” or a made-up, wacko approach like the IBM J1, J2, J3, etc. approach — is not going to meet the requirements of every customer.

The modern approach to pricing is to obfuscate and generate opaque variable prices. You can see this model in action by navigating to Amazon and running a query for “mens golf shirt and then zipping over to AWS and check out the prices for Sagemaker models to drive Athena. Got the difference, gentle reader?

The nifty world of enterprise search has been a wonderland of pricing methods. I flipped through the pricing data files for the three editions of the Enterprise Search Report which I began writing in 2002. Here are some highlights:

  • Base fee plus engineering services. Upgrades priced individually.
  • Base fee plus fixed price over a period of time.
  • Variable elements like the crazy “per cell” idea from the guy who is now the head of Google Search (Oh, yeah!)
  • Free if the customer (the US government) licensed other software
  • One time charge. Upgrades are easy. Buy another license.
  • Free. The vendor is in the business of selling engineering support, training, and custom widgets to make the search system sort of work.
  • Whatever can be billed. This is extremely popular because the negotiation process reveals the allocated funds and the search system vendor angles to get as much of the allocated cash as humanly possible.
  • Free for the first budget cycle. Then when funds become available, prices are negotiated.
  • Custom quote only. NDA required.

Today, life is easier. One can download a free and open source search system, hit the local university for some “interns”, and let ‘er rip. Another alternative is to look for a hosted search service. Blossom.com maybe?

Net net: Pricing has one goal: Generate revenue and lock in for the vendor. That’s one reason why vendors of what I can search centric services are so darned lovable.

Stephen E Arnold, July 3, 2020

Facebook Ad Boycott Risk: The Mark of El Zucko

July 2, 2020

I have a general rule: Those with power are likely to stomp on little people like me. What happens when companies that need access to Facebook users get cute with El Zucko?

Mr. Zuckerberg may not have a sword like El Zorro’s, but he has a digital cattle probe, and he can crank up the voltage.

Moral: A big advertiser better be a heck of a lot bigger than El Zucko, or the advertiser will end up with some memorable Facebook moments. Not all of these love taps with the cattle probe will be “likes.”

The trust outfit published “Facebook Frustrates Advertisers As Boycott over Hate Speech Kicks Off.” The message I carried away from the trust outfit’s “real” news story was that Facebook keeps on being Facebook.

Let’s consider the advertisers’ options:

First, advertisers can route their digital advertising to services which disseminate content on AdF.ly type networks. If you are not familiar with this fine option, check it out. If AdF.ly is a bit too avant garde, there is lovable Alphabet Google YouTube. Ads can appear in interesting contexts. Because the AGY systems are dynamic, one may not know where ads appear. Not to worry, right?

Second, advertisers can run into the arms of those lovable Amazonians. Pitching consulting services on Amazon is tricky, but it is not impossible. Options range from zippy videos for the Twitch.tv consumers, or one can team up with a vendor of something and package one’s consulting service with the tangible product as an after purchase “training” or “support” option.

Third, advertisers can hunt down the ad sales professionals at print publications. These individuals are easy to spot. Their schedules are vacant like their eyes. Well, maybe that is a haunted look related to fear. Just buy space in ever popular publications like the local newspaper. Alternatively why not buy double truck ads in the Wall Street Journal and the New York Times. Those must work. IBM ran it’s “we are in a yellow submarine” ad a few days ago.

Fourth, advertisers can pay search engine optimization experts to pump their message hither and yon using every conceivable type of digital channel available. Everyone loves irrelevant content and links to big company Web sites where emails can be provided and money spent.

Fifth, hang it up. Emulate the businesses which are closing. Blame it on the pandemic, the surge, or whatever.

Net net: Facebook for the foreseeable future has considerable power. El Zucko can keep on doing what he does best; that is, whatever he wants. When he decides to raise ad rates and change the rules of his game, he will. There are ways to implement differential pricing and other types of hair shirt freebies for certain advertisers.

The mark of El Zucko may be a painful burn and a giant Z on an expanse of advertiser skin in the game.

Stephen E Arnold, July 2, 2020

Interesting Supercomputer Item: Lenovo

July 2, 2020

Lenovo, Top of the World Chinese Supercomputer Supplier, Sweeps All Markets” contains an interesting statement:

In the Top 500 list for June 2020, China is shown with a home installed base of 228 machines, whereas 20 years ago, in 2000, the country had just two of the top 500 machines installed. In comparison, the US had 258 machines in place 20 years ago, now it has just 117 supercomputers – of which 44, or 38%, are Chinese Lenovo machines. And to further hammer home China’s success, not a single one of the country’s own huge installed base of 228 machines is an American machine – there are no Crays, no IBMs, no Dells. Plenty of American chips, but no American supplier presence.

But wait. Was Lenovo an IBM unit?

The answer is, “Yes until 2005.”

The question is, “What was Lenovo’s management able to do with a unit IBM deemed surplus?”

Answer: Nose into new markets.

Why? Let’s ask Watson.

Stephen E Arnold, July 2, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta