IHS Markit Data Lake “Catalog”

July 14, 2020

One of the DarkCyber research team spotted this product announcement from IHS, a diversified information company: “IHS Markit’s New Data Lake Delivers Over 1,000 Datsets in an Integrated Catalogued Platform.” The article states:

The cloud-based platform stores, catalogues, and governs access to structured and unstructured data. Data Lake solutions include access to over 1,000 proprietary data assets, which will be expanded over time, as well as a technology platform allowing clients to manage their own data. The IHS Markit Data Lake Catalogue offers robust search and exploration capabilities, accessed via a standardized taxonomy, across datasets from the financial services, transportation and energy sectors.

The idea is consistently organized information. Queries can run across the content to which the customer has access.

Similar services are available from other companies; for example, Oracle BlueKai.

One question which comes up is, “What exactly are the data on offer?” Another is, “How much does it cost to use the service?”

Let’s tackle the first question: Scope.

None of the aggregators make it easy to scan a list of datasets, click on an item, and get a useful synopsis of the content, content elements, number of items in the dataset, update frequency (annual, monthly, weekly, near real time), and the cost method applicable to a particular “standard” query.

A search of Bing and Google reveals the name of particular sets of data; for example, Carfax. However, getting answers to the scope question can require direct interaction with the company. Some aggregators operate in a similar manner.

The second question: Cost?

The answer to the cost question is a tricky one. The data aggregators have adopted a set or a cluster of pricing scenarios. It is up to the customer to look at the disclosed data and do some figuring. In DarkCyber’s experience, the data aggregators know much more about what content process, functions or operations generate the maximum profit for the vendor. The customer does not have this insight. Only through use of the system, analyzing the invoices, and paying them is it possible to get a grip on costs.

DarkCyber’s view is that data marketplaces are vulnerable to disruption. With a growing demand for a wide range of information some potential customers want answers before signing a contract and outputting big bucks.

Aggregators are a participant in what DarkCyber calls “professional publishing.” The key to this sector is mystery and a reluctance to spell out exact answers to important questions.

What company is poised to disrupt the data aggregation business? Is it the small scale specialist like the firms pursued relentlessly by “real” journalists seeking a story about violations of privacy? Is it a giant company casting about for a new source of revenue and, therefore, is easily overlooked. Aggregation is not exactly exciting for many people.

DarkCyber does not know. One thing seems highly likely: Professional publishing data aggregation sector is likely to face competitive pressure in the months ahead.

Some customers may be fed up with the secrecy and lack of clarity and entrepreneurs will spot the opportunity and move forward. Rich innovators will just buy the vendors and move in new directions.

Stephen E Arnold, July 14, 2020


Comments are closed.

  • Archives

  • Recent Posts

  • Meta