Algorithmic Recommendations and Real Journalists: Volatile Combination

September 22, 2017

I love the excitement everyone has for mathy solutions to certain problems. Sure, the math works. What is tough for some to grasp is that probabilities are different from driving one’s automobile into a mine drainage ditch. Fancy math used to figure out who likes what via clustering or mixing previous choices with information about what “similar” people purchased is different. The car is in the slime: Yes or no. The recommendation is correct: Well, somewhere between 70 percent and 85 percent most of the time.

That’s a meaningful difference.

I thought about the “car in the slime” example when I read “Anatomy of a Moral Panic”. The write up states:

The idea that these ball bearings are being sold for shrapnel is a reporter’s fantasy. There is no conceivable world in which enough bomb-making equipment is being sold on Amazon to train an algorithm to make this recommendation.

Excellent point.

However, the issue is that many people, not just “real” journalists, overlook the fact that a probability is not the same as the car in the slime. As smart software becomes the lazy person’s way to get information, it is useful to recall that some individuals confuse the outputs of a statistical numerical recipe with reality.

I find this larger issue a bit more frightening than the fact that recommendation engines spit out guesses about what is similar and the humans who misunderstand.

Stephen E Arnold, September 22, 2017

Trust the Search Black Box and Only the Black Box

September 21, 2017

This article reads like an infomercial for a kitchen appliance.  It asks the same, old question, “How much time do you waste searching for relevant content?”  Then it leads into a pitch for Microsoft and some other companies.  BA Insights wrote, “The Increasingly Intelligence Search Experience” to be an original article, but frankly it sounds like every spiel to sell a new search algorithm.

After the “hook,” the article runs down the history of Microsoft and faceted search along with refiners and how it was so revolutionary at the time.  Do not get me wrong, this was a revolution move, but it sounds like Microsoft invented the entire tool rather than just using it as a strategy.  There is also a brief mention on faceted navigation, then they throw “intelligence search” at us:

Microsoft’s definition of “intelligence” may still be vague, but it’s clear that the company believes its work in machine-learning, when combined with its cloud platform, can give it a leg up over its competitors. The Microsoft Graph and these new intelligent machine-learning capabilities provide personalized insights based on a user’s personal network, project assignments, meeting schedule, and other search and collaboration activities. These features make it possible not only to search using traditional methods and take action based on those results, but for the tools and systems to proactively provide intelligent, personalized, and timely information before you ask for it – based on your profile, permissions, and activity history.

Oh!  Microsoft is so smart that they have come up with something brand new that companies which specialize in search have never thought of before.  Come on, how many times have we seen and read claims like this before?  Microsoft is doing revolutionary things, but not so much in the field of search technology.  They have contributed to its improvement over the years, but if this was such a revolutionary piece of black box software why has not anyone else picked it up?

Little black box software has their uses, but mostly for enterprise and closed systems-not the bigger Web.

Whitney Grace, September 21, 2015

A Write Up about Facebook Reveals Shocking Human Weakness

September 18, 2017

What do I need with another write up about Facebook? We use the service to post links to stories in this blog, Beyond Search. My dog has an account to use when a site demands a Facebook user name and password. That’s about it. For me, Facebook is an online service which sells ads and provides useful information to some analysts and investigators. Likes, mindless uploading of images, and obsessive checking of the service. Sorry, not for a 74 year old in rural Kentucky, thank you very much.

I did read “How Facebook Tricks You Into Trusting Algorithms.”

I noted this statement, which I think is interesting:

The [Facebook] News Feed is meant to be fun, but also geared to solve one of the essential problems of modernity—our inability to sift through the ever-growing, always-looming mounds of information.

Why use Facebook instead of a service like Talkwalker? Here’s the answer:

Who better, the theory goes, to recommend what we should read and watch than our friends? Zuckerberg has boasted that the News Feed turned Facebook into a “personalized newspaper.”

Several observations:

  1. The success of Facebook is less about “friends” and more about anomie, the word I think used by Émile Durkheim to describe one aspect of “modern” life.
  2. The human mind, it seems, can form attachments to inanimate objects like Teddy Bears, animate objects like a human or dog, or to simulacra which intermediate for the user between the inanimate and the animate.
  3. Assembling large populations of “customers”, Facebook has a way to sell ads based on human actions as identified by the Facebook monitoring software.

So what?

As uncertainty spikes, the “value” of Facebook will go up. No online service is invulnerable. Ennui, competition, management missteps, or technological change can undermine even the most dominant company.

I am not sure that Facebook “tricks” anyone. The company simply responds to the social opportunity modern life presents to people in many countries.

Build a life via the gig economy? Nah, pretty tough to do.

Generate happiness via Likes? Nah, ping ponging between angst and happiness is the new normal.

Become a viral success? Nah, better chance at a Las Vegas casino for most folks?

Facebook, therefore, is something that would have to be created if the real Facebook did not exist.

Will Facebook gain more “power”? Absolutely. Human needs are forever. Likes are transient. Keep on clicking. Algorithms will do the rest.

Stephen E Arnold, September 18, 2017

Yandex Adds Deep Neural Net Algorithm

September 18, 2017

One of Google’s biggest rivals, at least in Asia, is Russian search engine Yandex and in efforts to keep themselves on top of search, Yandex added a new algorithm and a few other new upgrades.  Neowin explains what the upgrades are in the article, “Yandex Rolls Out Korolev Neural Net Search Algorithm.”  Yandex named its upgraded deep neural network search algorithm Korolev and they also added Yandex. Toloka new mass-scale crowdsources platform that feeds search results into MatrixNext.

Korolev was designed to handle long-tail queries in two new ways its predecessor Palekh could not.  Korolev delves into a Web page’s entire content and also it can analyze documents a thousand times faster in real time.  It is also designed to learn the more it is used, so accuracy will improve the more Korolev is used.

Korolev had an impressive release and namesake:

The new Korolev algorithm was announced at a Yandex event at the Moscow Planetarium. Korolev is of course named after the Soviet rocket engineer, Sergei Korolev, who oversaw the Sputnik project, 60 years ago, and the mission that saw Yuri Gagarin get to space. Yandex teleconferenced with Fyodor Yurchikhin and Sergey Ryazansky who are currently representing Russia on the International Space Station.

Yandex is improving its search engine results and services to keep on top of the industry and technology.

Whitney Grace, September 18, 2015

My Feed Personalization a Step Too Far

September 15, 2017

In an effort to be even more user-friendly and to further encourage a narcissistic society, Google now allows individuals to ‘follow’ or ‘unfollow’ topics, delivered daily to devices, as they deem them interesting or uninteresting. SEJ explains the new feature which is considered an enhancement of their ‘my feed’ which is intended to personalize news.

As explained in the article,

Further advancements to Google’s personalized feed include improved machine learning algorithms, which are said to be more capable at anticipating what an individual may find interest. In addition to highlighting stories around manually and algorithmically selected topics of interest, the feed will also display stories trending in your area and around the world.

That seems like a great way to keep people current on topics ranging geographically, politically and culturally, but with the addition of ‘follow’ or ‘unfollow’, once again, individuals can reduce their world to a series of pop-star updates and YouTube hits. Isn’t it an oxymoron to both suggest topics and stories in an effort to keep an individual informed of the world around them, and yet allow them to stop the suggestions are they appear boring or lack familiarity? Now, Google, you can do better.

Catherine Lamsfuss, September 15, 2017

Markov: Maths for the Newspaper Reader

September 14, 2017

Remarkable. I read a pretty good write up called “That’s Maths: Andrey Markov’s Brilliant Ideas Are Still Bearing Fruit.” I noted the source of the article: The Irish Times. A “real” newspaper. Plus it’s Irish. Quick name a great Irish mathematician? I like Sir William Rowan Hamilton, who my slightly addled mathy relative Vladimir Igorevich Arnold and his boss/mentor/leader of semi clothed hikes in the winter Andrey Kolmogorov thought was an okay guy.

Markov liked literature. Well, more precisely, he liked to count letter frequencies and occurrence in Russian novels like everyone’s fave Eugene Onegin. His observations fed his insight that a Markov Process or Markov Chain was a useful way to analyze probabilities in certain types of data. Applications range from making IBM Watson great again to helping outfits like Sixgill generate useful outputs. (Not familiar with Sixgill? I cover the company in my forthcoming lecture at the TechnoSecurity & Digital Forensics Conference next week.)

I noted this passage which I thought was sort of accurate or at least close enough for readers of “real” newspapers:

For a Markov process, only the current state determines the next state; the history of the system has no impact. For that reason we describe a Markov process as memoryless. What happens next is determined completely by the current state and the transition probabilities. In a Markov process we can predict future changes once we know the current state.

The write up does not point out that the Markov Process becomes even more useful when applied to Bayesian methods enriched with some LaPlacian procedures. Now stir in the nuclear industry’s number one with a bullet Monte Carlo method and stir the ingredients. In my experience and that of my dear but departed relative, one can do a better job at predicting what’s next than a bookie at the Churchill Downs Racetrack. MBAs on Wall Street have other methods for predicting the future; namely, chatter at the NYAC or some interactions with folks in the know about an important financial jet blast before ignition.

A happy quack to the Irish Times for running a useful write up. My great uncle would emit a grunt, which is as close as he came to saying, “Good job.”

Stephen E Arnold, September 14, 2017

Instagram Algorithm to Recognize Cruelty and Kindness

September 14, 2017

Instagram is using machine learning to make its platform a kinder place, we learn from the CBS  News article, “How Instagram is Filtering Out Hate.” Contributor (and Wired Editor-In-Chief) Nick Thompson interviewed Instagram’s CEO Kevin Systrom, and learned the company is using about 20 humans to teach its algorithm to distinguish naughty from nice. The article relates:

Systrom has made it his mission to make kindness itself the theme of Instagram through two new phases: first, eliminating toxic comments, a feature that launched this summer; and second, elevating nice comments, which will roll out later this year. ‘Our unique situation in the world is that we have this giant community that wants to express themselves,’ Systrom said. ‘Can we have an environment where they feel comfortable to do that?’ Thompson told ‘CBS This Morning’ that the process of ‘machine learning’ involves teaching the program how to decide what comments are mean or ‘toxic’ by feeding in thousands of comments and then rating them.

It is smarter censorship if you will. Systrom seems comfortable embracing a little censorship in favor of kindness, and we sympathize; “trolls” are a real problem, after all. Still, the technology could, theoretically, be used to delete or elevate certain ideological or political content. To censor or not to censor is a fine and important line, and those who manage social media sites will be the ones who must walk it. No pressure.

Cynthia Murrell, September 14, 2017

An Algorithm with Unintended Consequences

September 12, 2017

Some of us who follow developments in AI wondered about this: apparently, the algorithm YouTube tasked with eliminating “extremist content” on its platform goes too far. Business Insider reports, “YouTube’s Crackdown on Extremist Content and ISIS Is Also Hurting Researchers and Journalists.”  It is a good thing there now exist commercial services that can meet the needs of analysts, researchers, and government officials; many of these services are listed in Stephen E Arnold’s Dark Web Notebook.

In this case, the problem is an algorithm that cannot always distinguish between terrorist propaganda and terrorist coverage. Since the site implemented its new steps to combat terrorist content, several legitimate researchers and journalists have protested that their content was caught in the algorithm’s proverbial net and summarily removed; some of it had been available on the site for years. Reporter Rob Price writes

Open-source researcher Eliot Higgins says he has had his old videos about Syria deleted and his account was suspended as the Google-owned video platform attempts to tackle material that supports terrorism. Middle East Eye reports that Syrian opposition news site Orient News was also deleted, as was a video uploaded by one of the publication’s own journalists. ‘YouTube has now suspended my account because of videos of Syria I uploaded 2-3 years ago. Nice anti-ISIS AI you’ve got there, YouTube,’ Higgins tweeted on Saturday. ‘Ironically, by deleting years-old opposition channels YouTube is doing more damage to Syrian history than ISIS could ever hope to achieve.’ In another incident, a video from American journalist Alexa O’Brien’s video that was used in Chelsea Manning’s trial was deleted, according to Middle East Eye.

Higgins, whose account has since been reinstated, has an excellent point—ultimately, tools that destroy important documentation along with propaganda are counter-productive. Yes, algorithms are faster (and cheaper) than human workers. But do we really want to sacrifice first-hand footage of crucial events for the sake of speedy sanitization? There must be a better way.

Cynthia Murrell, September 12, 2017

My Feed Personalization a Step Too Far

September 8, 2017

In an effort to be even more user-friendly and to further encourage a narcissistic society, Google now allows individuals to ‘follow’ or ‘unfollow’ topics, delivered daily to devices, as they deem them interesting or uninteresting. SEJ explains the new feature which is considered an enhancement of their ‘my feed’ which is intended to personalize news.

As explained in the article,

Further advancements to Google’s personalized feed include improved machine learning algorithms, which are said to be more capable at anticipating what an individual may find interest. In addition to highlighting stories around manually and algorithmically selected topics of interest, the feed will also display stories trending in your area and around the world.

That seems like a great way to keep people current on topics ranging geographically, politically and culturally, but with the addition of ‘follow’ or ‘unfollow’, once again, individuals can reduce their world to a series of pop-star updates and YouTube hits. Isn’t it an oxymoron to both suggest topics and stories in an effort to keep an individual informed of the world around them, and yet allow them to stop the suggestions are they appear boring or lack familiarity? Now, Google, you can do better.

Catherine Lamsfuss, September 15, 2017

How Search Moves Forward

September 8, 2017

Researchers at UT Austin are certainly into search engines, and are eager to build improved neural models. The piece “The Future of Search Engines” at Innovation Toronto examines two approaches, suggested by associate professor Matthew Lease, to create more effective information retrieval systems. The article begins by describing how search engines currently generate their results:

The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence — especially natural language processing — and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results — and the algorithms that generate them — for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws. Search engine results are often not as ‘smart’ as we’d like them to be, lacking a true understanding of language and human logic. Beyond that, they sometimes replicate and deepen the biases embedded in our searches, rather than bringing us new information or insight.

The first paper, Learning to Effectively Select Topics For Information Retrieval Test Collections (PDF), details a way to pluck and combine the best work of several annotators, professional and crowd-sourced alike, for each text. The Innovation Toronto article spends more time on the second paper,  Exploiting Domain Knowledge via Grouped Weight Sharing with Application to Text Categorization (PDF). The approach detailed here taps into existing resources like WordNet, a lexical database for the English language, and domain ontologies like the Unified Medical Language System. See the article for the team’s suggestions on using weight sharing to blend machine learning and human knowledge.

The researchers’ work was helped by grants from the National Science Foundation, the Institute of Museum and Library Services, and the Defense Advanced Research Projects Agency, three government organizations hoping for improvements in the quality of crowdsourced information. We’re reminded that, though web-search companies do perform their own research, it is necessarily focused on commercial applications and short-term solutions. The sort of public investment we see at work here can pave the way to more transformative, long-term developments, the article concludes.

Cynthia Murrell, September 8, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta