Instagram Algorithm to Recognize Cruelty and Kindness
September 14, 2017
Instagram is using machine learning to make its platform a kinder place, we learn from the CBS News article, “How Instagram is Filtering Out Hate.” Contributor (and Wired Editor-In-Chief) Nick Thompson interviewed Instagram’s CEO Kevin Systrom, and learned the company is using about 20 humans to teach its algorithm to distinguish naughty from nice. The article relates:
Systrom has made it his mission to make kindness itself the theme of Instagram through two new phases: first, eliminating toxic comments, a feature that launched this summer; and second, elevating nice comments, which will roll out later this year. ‘Our unique situation in the world is that we have this giant community that wants to express themselves,’ Systrom said. ‘Can we have an environment where they feel comfortable to do that?’ Thompson told ‘CBS This Morning’ that the process of ‘machine learning’ involves teaching the program how to decide what comments are mean or ‘toxic’ by feeding in thousands of comments and then rating them.
It is smarter censorship if you will. Systrom seems comfortable embracing a little censorship in favor of kindness, and we sympathize; “trolls” are a real problem, after all. Still, the technology could, theoretically, be used to delete or elevate certain ideological or political content. To censor or not to censor is a fine and important line, and those who manage social media sites will be the ones who must walk it. No pressure.
Cynthia Murrell, September 14, 2017
An Algorithm with Unintended Consequences
September 12, 2017
Some of us who follow developments in AI wondered about this: apparently, the algorithm YouTube tasked with eliminating “extremist content” on its platform goes too far. Business Insider reports, “YouTube’s Crackdown on Extremist Content and ISIS Is Also Hurting Researchers and Journalists.” It is a good thing there now exist commercial services that can meet the needs of analysts, researchers, and government officials; many of these services are listed in Stephen E Arnold’s Dark Web Notebook.
In this case, the problem is an algorithm that cannot always distinguish between terrorist propaganda and terrorist coverage. Since the site implemented its new steps to combat terrorist content, several legitimate researchers and journalists have protested that their content was caught in the algorithm’s proverbial net and summarily removed; some of it had been available on the site for years. Reporter Rob Price writes
Open-source researcher Eliot Higgins says he has had his old videos about Syria deleted and his account was suspended as the Google-owned video platform attempts to tackle material that supports terrorism. Middle East Eye reports that Syrian opposition news site Orient News was also deleted, as was a video uploaded by one of the publication’s own journalists. ‘YouTube has now suspended my account because of videos of Syria I uploaded 2-3 years ago. Nice anti-ISIS AI you’ve got there, YouTube,’ Higgins tweeted on Saturday. ‘Ironically, by deleting years-old opposition channels YouTube is doing more damage to Syrian history than ISIS could ever hope to achieve.’ In another incident, a video from American journalist Alexa O’Brien’s video that was used in Chelsea Manning’s trial was deleted, according to Middle East Eye.
Higgins, whose account has since been reinstated, has an excellent point—ultimately, tools that destroy important documentation along with propaganda are counter-productive. Yes, algorithms are faster (and cheaper) than human workers. But do we really want to sacrifice first-hand footage of crucial events for the sake of speedy sanitization? There must be a better way.
Cynthia Murrell, September 12, 2017
My Feed Personalization a Step Too Far
September 8, 2017
In an effort to be even more user-friendly and to further encourage a narcissistic society, Google now allows individuals to ‘follow’ or ‘unfollow’ topics, delivered daily to devices, as they deem them interesting or uninteresting. SEJ explains the new feature which is considered an enhancement of their ‘my feed’ which is intended to personalize news.
As explained in the article,
Further advancements to Google’s personalized feed include improved machine learning algorithms, which are said to be more capable at anticipating what an individual may find interest. In addition to highlighting stories around manually and algorithmically selected topics of interest, the feed will also display stories trending in your area and around the world.
That seems like a great way to keep people current on topics ranging geographically, politically and culturally, but with the addition of ‘follow’ or ‘unfollow’, once again, individuals can reduce their world to a series of pop-star updates and YouTube hits. Isn’t it an oxymoron to both suggest topics and stories in an effort to keep an individual informed of the world around them, and yet allow them to stop the suggestions are they appear boring or lack familiarity? Now, Google, you can do better.
Catherine Lamsfuss, September 15, 2017
How Search Moves Forward
September 8, 2017
Researchers at UT Austin are certainly into search engines, and are eager to build improved neural models. The piece “The Future of Search Engines” at Innovation Toronto examines two approaches, suggested by associate professor Matthew Lease, to create more effective information retrieval systems. The article begins by describing how search engines currently generate their results:
The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence — especially natural language processing — and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results — and the algorithms that generate them — for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws. Search engine results are often not as ‘smart’ as we’d like them to be, lacking a true understanding of language and human logic. Beyond that, they sometimes replicate and deepen the biases embedded in our searches, rather than bringing us new information or insight.
The first paper, Learning to Effectively Select Topics For Information Retrieval Test Collections (PDF), details a way to pluck and combine the best work of several annotators, professional and crowd-sourced alike, for each text. The Innovation Toronto article spends more time on the second paper, Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization (PDF). The approach detailed here taps into existing resources like WordNet, a lexical database for the English language, and domain ontologies like the Unified Medical Language System. See the article for the team’s suggestions on using weight sharing to blend machine learning and human knowledge.
The researchers’ work was helped by grants from the National Science Foundation, the Institute of Museum and Library Services, and the Defense Advanced Research Projects Agency, three government organizations hoping for improvements in the quality of crowdsourced information. We’re reminded that, though web-search companies do perform their own research, it is necessarily focused on commercial applications and short-term solutions. The sort of public investment we see at work here can pave the way to more transformative, long-term developments, the article concludes.
Cynthia Murrell, September 8, 2017
Google to Further Predict Relevant News to Subscribers
August 30, 2017
It’s no surprise that these days most people rely on something other than themselves to find relevant news stories, be it social media, a news feed or even Google. For many, it’s easier to let others determine what is truly important. Google, a leader in pointing out useful information and news, has stepped up their steering game and announce the update to their app which will further think for each user.
According to a recent liliputing article,
…the feed still shows things like news, videos, and sports scores. But Google isn’t just choosing content based on the way you interact with Google search, apps, and services anymore. The company will also surface items that are trending locally and around the globe, helping you stay up to date on things that you might otherwise have missed. The company says it uses machine learning algorithms to predict which things you’ll be most interested in seeing.
For those uncomfortable with only seeing news stories Google’s algorithms deem worthy of your consideration there are steps you can take to delete your preferences and habits. Perhaps Google’s intentions are altruistic and the app will be Big Brother approved really helpful to the masses. We sure hope so!
Catherine Lamsfuss, August 30, 2017
Learn About Machine Learning
August 30, 2017
For an in-depth look at the technology behind Google Translate, turn to Stats and Bots’ write-up, “Machine Learning Translation and the Google Translate Algorithm.” Part of a series that aims to educate users about the technology behind machine learning (ML), the illustrated article delves into the details behind Google’s deep learning translation tools. Writer Daniil Korbut explains the factors that make it problematic to “teach” human language to an AI, then describes Long Short-Term Memory (LSTM) networks, bidirectional RNNs, sequence-to-sequence models, and how Google put those tools together. See the article for those details that are a bit above this writer’s head. There’s just one thing missing—any acknowledgment of the third parties that provide Google with language technology. Oh well.
Another valuable resource on machine learning, found at YCombinator, is Google researcher Jeff Dean’s Lecture for YC AI. The post includes a video that is over an hour long, but it also shares the informative slides from Dean’s presentation. They touch on scientific and medical applications for machine learning, then examine sequence-to-sequence models, automated machine learning, and “higher performance” ML models. One early slide reproduces a Google blog post in which Dean gives a little history (and several relevant links):
Allowing computers to better understand human language is one key area for our research. In late 2014, three Brain team researchers published a paper on Sequence to Sequence Learning with Neural Networks, and demonstrated that the approach could be used for machine translation. In 2015, we showed that this this approach could also be used for generating captions for images, parsing sentences, and solving computational geometry problems. In 2016, this previous research (plus many enhancements) culminated in Brain team members worked closely with members of the Google Translate team to wholly replace the translation algorithms powering Google Translate with a completely end-to-end learned system (research paper). This new system closed the gap between the old system and human quality translations by up to 85% for some language pairs. A few weeks later, we showed how the system could do “zero-shot translation”, learning to translate between languages for which it had never seen example sentence pairs (research paper). This system is now deployed on the production Google Translate service for a growing number of language pairs.
These surveys of Google’s machine translation tools offer a lot of detailed information for those interested in the topic. Just remember that Google is not (yet?) the only game in town.
Cynthia Murrell, August 30, 2017
Google: Ethics, Algorithms and Pirates, Oh My!
July 28, 2017
That search engines have changed the way the world looks for information, no one will argue, but for as much good as Google and other major search engines have done at promoting the free sharing of all types of information, it has also allowed and, at the time, encouraged the illegal sharing of private property. Namely, search engines are encouraging, even if absently, piracy of music and video.
Recently, despite vowing the contrary, Google has been called out for pirate sites being found not only in the coveted ‘top 10’ of search results, but even highlighted by Google.
The findings aren’t going to help Google’s already contentious relationship with the music and video industries, both of which have spent years accusing Google of doing too little to prevent piracy. They’ve routinely argued that Google should outright remove pirate sites from its results, not just demote them…
Google claims the algorithm is to blame – it is too good at finding the results people want. This puts Google in yet another tough spot. Should they continue to let the algorithm rule, come what may, or tinker with it to punish pirates? If the latter is to be, what does this say for other controversial sites?
Catherine Lamsfuss, July 28, 2017
Western in Western Out
July 26, 2017
A thoughtful piece at Quartz looks past filter bubbles to other ways mostly Western developers are gradually imposing their cultural perspectives on the rest of the world—“Silicon Valley Has Designed Algorithms to Reflect Your Biases, Not Disrupt Them.” Search will not get you objective information, but rather the content your behavior warrants. Writer Ramesh Srinivasan introduces his argument:
Silicon Valley dominates the internet—and that prevents us from learning more deeply about other people, cultures, and places. To support richer understandings of one another across our differences, we need to redesign social media networks and search systems to better represent diverse cultural and political perspectives. The most prominent and globally used social media networks and search engines— Facebook and Google—are produced and shaped by engineers from corporations based in Europe and North America. As a result, technologies used by nearly 2 billion people worldwide reflect the design perspectives of the limited few from the West who have power over how these systems are developed.
It is worth reading the whole article for its examination of the issue, and suggestions for what to do about it. Algorithm transparency, for example, would at least let users know what principles guide a platform’s content selections. Taking input from user communities in other cultures is another idea. My favorite is a proposal to prioritize firsthand sources over Western interpretations, even ones with low traffic or that are not in English. As Srinivasan writes:
Just because this option may be the easiest for me to understand doesn’t mean that it should be the perspective I am offered.
That sums up the issue nicely.
Cynthia Murrell, July 26, 2017
AI Feeling a Little Sentimental
July 24, 2017
Big data was one of the popular buzzwords a couple years ago, but one conundrum was how organizations were going to use all that mined data? One answer has presented itself: sentiment analysis. Science shares the article, “AI In Action: How Algorithms Can Analyze The Mood Of The Masses” about how artificial intelligence is being used to gauge people’s emotions.
Social media presents a constant stream of emotional information about products, services, and places that could be useful to organizations. The problem in the past is that no one knew how to fish all of that useful information out of the social media Web sites and make it a usable. By using artificial intelligence algorithms and natural language processing, data scientists are finding associations between words, the language used, posting frequency, and more to determine everything from a person’s mood to their personality, income level, and political associations.
‘There’s a revolution going on in the analysis of language and its links to psychology,’ says James Pennebaker, a social psychologist at the University of Texas in Austin. He focuses not on content but style, and has found, for example, that the use of function words in a college admissions essay can predict grades. Articles and prepositions indicate analytical thinking and predict higher grades; pronouns and adverbs indicate narrative thinking and predict lower grades…’Now, we can analyze everything that you’ve ever posted, ever written, and increasingly how you and Alexa talk,’ Pennebaker says. The result: ‘richer and richer pictures of who people are.’
AI algorithms are able to turn a person’s online social media accounts and construct more than a digital fingerprint of a person. The algorithms act like digital mind readers and recreate a person based on the data they publish.
Whitney Grace, July 24, 2017
Instagram Reins in Trolls
July 21, 2017
Photo-sharing app Instagram has successfully implemented DeeText, a program that can successfully weed out nasty and spammy comments from people’s feeds.
Wired in an article titled Instagram Unleashes an AI System to Blast Away Nasty Comments says:
DeepText is based on recent advances in artificial intelligence, and a concept called word embeddings, which means it is designed to mimic the way language works in our brains.
DeepText initially was built by Facebook, Instagram’s parent company for preventing abusers, trolls, and spammers at bay. Buoyed by the success, it soon implemented on Instagram.
The development process was arduous wherein a large number of employees and contractors for months were teaching the DeepText engine how to identify abusers. This was achieved by telling the algorithm which word can be abusive based on its context.
At the moment, the tools are being tested and rolled out for a limited number of users in the US and are available only in English. It will be subsequently rolled out to other markets and languages.
Vishal Ingole, July 21, 2017