Business Intelligence: The Grunt Work? Time for a Latte
July 10, 2015
I read “One Third of BI Pros Spend Up to 90% of Time Cleaning Data.” Well, well, well. Good old and frail eWeek has reported what those involved in data work have known for what? Decades, maybe centuries? The write up states with typical feather duster verbiage:
A recent survey commissioned by data integration platform provider Xplenty indicates that nearly one-third of business intelligence (BI) professionals are little more than “data janitors,” as they spend a majority of their time cleaning raw data for analytics.
What this means is that the grunt work in analytics still has to be done. This is difficult and tedious work even with normalization tools and nifty hand crafted scripts. Who wants to do this work? Not the MBAs who need slick charts to nail their bonus. Not the frantic marketer who has to add some juice to the pale and wan vice president’s talk at the Rotary Club. Not anyone, except those who understand the importance of scrutinizing data.
The write up points out that extract, transform, and load functions or ETL in the jingoism of Sillycon Valley is work. Guess what? The eWeek story uses these words to explain what the grunt work entails:
- Integrating data from different platforms
- Transforming data
- Cleansing data
- Formatting data.
But here’s the most important item in the article: If the report on which the article is based is correct, 21 percent of the data require special care and feeding. How’s that grab you for a task when you are pumping a terabyte of social media or intercept data a day? Right. Time for a bit of Facebook and a trip to Starbuck’s.
What happens if the data are not ship shape? Well, think about the fine decisions flowing from organizations which are dependent on data analytics. Why not chase down good old United Airlines and ask the outfit if anyone processed log files for the network which effectively grounded all flights? Know anyone at the Office of Personnel Management? You might ask the same question.
Ignoring data or looking at outputs without going through the grunt work is little better than guessing. No, wait. Guessing would probably return better outcomes. Time for some Foosball.
Stephen E Arnold, July 10, 2015