How Search Moves Forward

September 8, 2017

Researchers at UT Austin are certainly into search engines, and are eager to build improved neural models. The piece “The Future of Search Engines” at Innovation Toronto examines two approaches, suggested by associate professor Matthew Lease, to create more effective information retrieval systems. The article begins by describing how search engines currently generate their results:

The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence — especially natural language processing — and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results — and the algorithms that generate them — for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws. Search engine results are often not as ‘smart’ as we’d like them to be, lacking a true understanding of language and human logic. Beyond that, they sometimes replicate and deepen the biases embedded in our searches, rather than bringing us new information or insight.

The first paper, Learning to Effectively Select Topics For Information Retrieval Test Collections (PDF), details a way to pluck and combine the best work of several annotators, professional and crowd-sourced alike, for each text. The Innovation Toronto article spends more time on the second paper,  Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization (PDF). The approach detailed here taps into existing resources like WordNet, a lexical database for the English language, and domain ontologies like the Unified Medical Language System. See the article for the team’s suggestions on using weight sharing to blend machine learning and human knowledge.

The researchers’ work was helped by grants from the National Science Foundation, the Institute of Museum and Library Services, and the Defense Advanced Research Projects Agency, three government organizations hoping for improvements in the quality of crowdsourced information. We’re reminded that, though web-search companies do perform their own research, it is necessarily focused on commercial applications and short-term solutions. The sort of public investment we see at work here can pave the way to more transformative, long-term developments, the article concludes.

Cynthia Murrell, September 8, 2017

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta