Synthetic Data: The Future Because Real Data Is Too Inefficient
January 28, 2022
One of the biggest problems with AI advancement is the lack of comprehensive datasets. AI algorithms use datasets to learn how to interpret and understand information. The lack of datasets has resulted in biased aka faulty algorithms. The most notorious examples are “racist” photo recognition or light sensitivity algorithms that are unable to distinguish dark skin tones. VentureBeat shares that a new niche market has sprung up: “Synthetic Data Platform Mostly AI Lands $25M.”
Mostly AI is an Austria startup that specializes in synthetic data for AI model testing and training. The company recently acquired $25 million in funding from Molten Ventures with plans to invest the funds to accelerate the industry. Mostly AI plans to hire more employees, create unbiased algorithms, and increase their presence in Europe and North America.
It is difficult for AI developers to roundup comprehensive datasets, because of privacy concerns. There is tons of data available for AI got learn from, but it might not be anonymous and it could be biased from the get go.
Mostly AI simulates real datasets by replicating the information for data value chains but removing the personal data points. The synthetic data is described as “good as the real thing” without violating privacy laws. The synthetic data algorithm works like other algorithms:
“The solution works by leveraging a state-of-the-art generative deep neural network with an in-built privacy mechanism. It learns valuable statistical patterns, structures, and variations from the original data and recreates these patterns using a population of fictional characters to give out a synthetic copy that is privacy compliant, de-biased, and just as useful as the original dataset – reflecting behaviors and patterns with up to 99% accuracy.”
Mostly AI states that their platform also accelerates the time it takes to access the datasets. They claim their technology reduces the wait time by 90%.
Demands for synthetic data are growing as the AI industry burgeons and there is a need for information to advance the technology. Efficient, acceptable error rates, objective methods: What could go wrong?
Whitney Grace, January 27, 2022