Importance of Good Data to AI Widely Underappreciated
March 27, 2018
Reliance on AI has now become embedded in our culture, even as we struggle with issues of algorithmic bias and data-driven discrimination. Tech news site CIO reminds us, “AI’s Biggest Risk Factor: Data Gone Wrong.” In the detailed article, journalist Maria Korolov begins with some early examples of “AI gone bad” that have already occurred, and explains how this happens; hard-to-access data, biases lurking within training sets, and faked data are all concerns. So is building an effective team of data management workers who know what they are doing. Regarding the importance of good data, Korolov writes:
Ninety percent of AI is data logistics, says JJ Guy, CTO at Jask, an AI-based cybersecurity startup. All the major AI advances have been fueled by advances in data sets, he says. ‘The algorithms are easy and interesting, because they are clean, simple and discrete problems,’ he says. ‘Collecting, classifying and labeling datasets used to train the algorithms is the grunt work that’s difficult — especially datasets comprehensive enough to reflect the real world.’… However, companies often don’t realize the importance of good data until they have already started their AI projects. ‘Most organizations simply don’t recognize this as a problem,’ says Michele Goetz, an analyst at Forrester Research. ‘When asked about challenges expected with AI, having well curated collections of data for training AI was at the bottom of the list.’ According to a survey conducted by Forrester last year, only 17 percent of respondents say that their biggest challenge was that they didn’t ‘have a well-curated collection of that to train an AI system.’
Eliminating bias gleaned from training sets (like one AI’s conclusion that anyone who’s cooking must be a woman) is tricky, but certain measures could help. For example, tools that track how an algorithm came to a certain conclusion can help developers correct its impression. Also, independent auditors bring in a fresh perspective. These delicate concerns are part of why, says Korolov, AI companies are “taking it slow.” This is slow? We’d better hang on to our hats whenever (they decide) they’ve gotten a handle on these issues.
Cynthia Murrell, March 27, 2018