Data Federation? Loser. Go with a Data Lake House
February 8, 2022
I have been the phrase “data lake house” or “datalake house.” I noted some bold claims about a new data lake house approach in “Managed Data Lakehouse Startup Onehouse Launches with $8M in Funding.” The write up states:
One of the flagship features of Onehouse’s lakehouse service is a technology called incremental processing. It allows companies to start analyzing their data soon after it’s generated, which is difficult when using traditional technologies.
The write up adds:
The company’s lakehouse service automatically optimizes customers’ data ingestion workflows to improve performance, the startup says. Because the service is delivered via the cloud on a fully managed basis, customers don’t have to manage the underlying infrastructure.
The idea of course is that traditional methods of handling data are [a] slow, [b] expensive, and [c] difficult to implement.
The premise is that the data lake house delivers more efficient use of data and a way to “future proof the data architected for machine learning / data science down the line.”
When I read this I thought of Vivisimo’s explanation of its federating method. IBM bought Vivisimo, and I assume that it is one of the ingredient in IBM’s secret big data sauce. MarkLogic also suggested in one presentation I sat through that its system would ingest data and the MarkLogic system (once eyed by the Google as a possible acquisition) would allow near real time access to the data. One person in the audience was affiliated with the US Library of Congress, and that individual seemed quite enthused about MarkLogic. And there are companies which facilitate data manipulation; for example, Kofax and its data connectors.
From my point of view, the challenge is that today large volumes of data are available. These data have to be moved from point A to point B. Ideally data do not require transformation. At some point in the flow, data in motion can be processed. There are firms which offer real time or near real time data analytics; for example, Trendalyze.com.
Conversion, moving, saving, and then doing something “more” with the data remain challenges. Maybe Onehouse has the answer?
Stephen E Arnold, February 8, 2022