Why Analyzing Amazon EBook Reading Lists Is Useful

October 30, 2019

An intriguing study in machine-learning models suggests human language behaviors may be more affected by what we read than previously thought. Neuroscience News tells us “What 26,000 Books Reveal When it Comes to Learning Language.” Brendan T. Johns, an assistant professor at the University at Buffalo, and Randall K. Jamieson, a professor at the University of Manitoba, created the models. The article tells us:

“The models, called distributional models, serve as analogies to the human language learning process. The 26,000 books that support the analysis of this research come from 3,000 different authors (about 2,000 from the U.S. and roughly 500 from the U.K.) who used over 1.3 billion total words. George Bernard Shaw is often credited with saying Britain and America are two countries separated by a common language. But the languages are not identical, and in order to establish and represent potential cultural differences, the researchers considered where each of the 26,000 books was located in both time (when the author was born) and place (where the book was published). With that information established, the researchers analyzed data from 10 different studies involving more than 1,000 participants, using multiple psycholinguistic tasks. ‘The question this paper tries to answer is, “If we train a model with similar materials that someone in the U.K. might have read versus what someone in the U.S. might have read, will they become more like these people?”’ says Johns. ‘We found that the environment people are embedded in seems to shape their behavior.’”

The researchers have developed what they call their “selective reading hypothesis.” They report that culture-specific and time-specific collections represent different language environments, and different behaviors arise from exposure to these environments. Conversely, they say one could predict what types of things people have read based on their language behavior.

Informed by the results, Johns is now working to build machine-learning frameworks for education that would pinpoint information to enhance each individual’s learning. He also sees a potential here to help people at risk of developing Alzheimer’s—researchers might be able to create exercises and stimuli to help such patients retain semantic associations longer, for example, or at least develop more personalized assessments. It is nice to see machine language models being put to such worthwhile purposes.

Now about that Kindle library some individuals have amassed?

Cynthia Murrell, October 30, 2019

Written by Stephen E. Arnold · Filed Under intelware, News

Comments

Comments are closed.

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.