Data Transformation and the Problem of Fixes

September 24, 2009

I read “Fix Data before Warehousing It” by Marty Moseley and came away with the sense that some important information was omitted from the article. The essay was well written. My view is that the write up should have anchored the analysis in a bedrock of cost analysis.

Data shoved into a data warehouse are supposed to reduce costs. Stuffing inconsistent data into a warehouse does the opposite. My research as well as information I have heard suggests that data transformation (which includes normalization and the other “fixing tasks”) can consume up to one third of an information technology budget. Compliance is important. Access is important. But the cost of fixing data can be too high for many organizations. As a result, the data in the data warehouse are not clean. I prefer the word “broken” because that word makes explicit one point—the outputs from a data warehouse with broken data may be misleading or incorrect.

The ComputerWorld article is prescriptive, but it does not come right out and nail the cost issue or the lousy outputs issue. I think that these two handmaidens of broken data deserve center stage. Until the specific consequences of broken data are identified and made clear to management, prescriptions won’t resolve what is a large and growing problem. In my world, the failure of traditional warehousing systems to enforce or provide transformation and normalization tools makes it easier for a disruptive data management system to overthrow the current data warehousing world order. Traditional databases and data warehousing systems allow broken data and, even worse, permit outputs from these broken data. Poor data management practices cannot be correct by manual methods because of the brutal costs such remediation actions incur. My opinion is that data warehousing is reaching a critical point in its history.

Automated methods combined with smart software are part of the solution. The next generation data management systems can provide cost cutting features so that today’s market leaders become very quickly tomorrow’s market followers. Just my opinion.

Stephen Arnold, September 24, 2009

Comments

One Response to “Data Transformation and the Problem of Fixes”

  1. Marty Moseley on October 1st, 2009 3:01 pm

    Hey Stephen –

    Your points are good ones, so thanks for weighing in!

    The topic you suggest is an entirely different article, in my opinion. Such an article would address the business case for “fixing” or preventing broken data (the costs, the risks, the benefits/value & the opportunities), whereas my article suggested a better approach for preventing broken data in the first place, and was geared towards data warehouse folks.

    Thanks for your thoughtful comments!

    Marty

  • Archives

  • Recent Posts

  • Meta