Averaging Information Is Not Cutting It Anymore

January 16, 2018

Here is something interesting that comes after the headline of “People From Around The Globe Met For The First Flat Earth Conference” and beliefs that white supremacists are gaining more power.  The Frontiers Media shares that, “Rescuing Collective Wisdom When The Average Group Opinion Is Wrong” is an article that pokes fun at the fanaticism running rampant in the news.  Beyond the fanaticism in the news, there is a real concern with averaging when it comes to data science and other fields that heavily rely on data.

The article breaks down the different ways averaging is used and the different theorems that are developed from it.  The introduction is a bit wordy but it sets the tone:

The total knowledge contained within a collective supersedes the knowledge of even its most intelligent member. Yet the collective knowledge will remain inaccessible to us unless we are able to find efficient knowledge aggregation methods that produce reliable decisions based on the behavior or opinions of the collective’s members. It is often stated that simple averaging of a pool of opinions is a good and in many cases the optimal way to extract knowledge from a crowd. The method of averaging has been applied to analysis of decision-making in very different fields, such as forecasting, collective animal behavior, individual psychology, and machine learning. Two mathematical theorems, Condorcet’s theorem and Jensen’s inequality, provide a general theoretical justification for the averaging procedure. Yet the necessary conditions which guarantee the applicability of these theorems are often not met in practice. Under such circumstances, averaging can lead to suboptimal and sometimes very poor performance. Practitioners in many different fields have independently developed procedures to counteract the failures of averaging. We review such knowledge aggregation procedures and interpret the methods in the light of a statistical decision theory framework to explain when their application is justified. Our analysis indicates that in the ideal case, there should be a matching between the aggregation procedure and the nature of the knowledge distribution, correlations, and associated error costs.

Understanding how data can be corrupted is half the battle of figuring out how to correct the problem.  This is one of the complications related to artificial intelligence and machine learning.  One example is trying to build sentiment analysis engines.  These require huge data terabytes and the Internet provides an endless supply, but the usual result is that the sentiment analysis engines end up racist, misogynist, and all around trolls.  It might lead to giggles but does not very accurate results.

Whitney Grace, January 17, 2018


Comments are closed.

  • Archives

  • Recent Posts

  • Meta