Tireless Readers Those Bots
September 18, 2020
AI bots do marvelous things such as facial recognition, document analysis, and creating false videos of world leaders singing pop songs. AI bots, however, are only as smart as they are programmed. The MIT Technology Review shares how smart AI bots are in the article, “This Know-It-All AI Learns By Reading The Entire Web Nonstop.”
Most AI bots are good at consuming and regurgitating information, but lack the knowledge to interpret the content. If AI is going to be more integral in society, algorithms need to be smarter and also trustworthy. Diffbot is supposed to be different from its brethren, because it is designed to be factual. Diffbot reads everything on the public Internet in multiple languages and extracts as many facts as possible. It sounds like Diffbot know how to double and triple check facts.
Diffbot takes the facts and transforms them into a three part factoid that relates the information together: subject, verb, object. Each factoid interconnects and forms an interconnected knowledge graph of facts. Knowledge graphs have been used for years and are the basis for the semantic web. Google implemented knowledge graphs a few years ago, but only uses them for popular search terms. Diffbot wants to make knowledge graphs for everything on the Internet. How does Diffbot read everything?
“To collect its facts, Diffbot’s AI reads the web as a human would—but much faster. Using a super-charged version of the Chrome browser, the AI views the raw pixels of a web page and uses image-recognition algorithms to categorize the page as one of 20 different types, including video, image, article, event, and discussion thread. It then identifies key elements on the page, such as headline, author, product description, or price, and uses NLP to extract facts from any text.
Every three-part factoid gets added to the knowledge graph. Diffbot extracts facts from pages written in any language, which means that it can answer queries about Katy Perry, say, using facts taken from articles in Chinese or Arabic even if they do not contain the term ‘Katy Perry.’”
Diffbot rebuilds its knowledge graph every four to five days by adding 100 million to 150 million entities each month. Machine learning allows Diffbot to merge old information with new. Diffbot must also add new hardware as the knowledge graph grows.
Diffbot is currently used by DuckDuckGo to make Google-like boxes, Snapchat uses it to feeds its news pages, Adidas and Nike use it to track counterfeit shoes, and Zola uses it to assist people making wedding lists. For the moment, Diffbot only interacts with people in code, but the plan is to make it a universal factoid question answering system.
That sounds familiar, doesn’t it?
Whitney Grace, September 18, 2020