Facebook and Humans: Reality Is Not Marketing
May 16, 2016
I read “Facebook News Selection Is in Hands of Editors Not Algorithms, Documents Show.” The main point of the story is that Facebook uses humans to do work. The idea is that algorithms do not seem to be a big part of picking out what’s important.
The write up comes from a “real” journalism outfit. The article points out:
The boilerplate about its [Facebook’s] news operations provided to customers by the company suggests that much of its news gathering is determined by machines: “The topics you see are based on a number of factors including engagement, timeliness, Pages you’ve liked and your location,” says a page devoted to the question “How does Facebook determine what topics are trending?”
After reading this, I thought of Google’s poetry created by its artificial intelligence system. Here’s the line which came to mind:
I started to cry. (Source: Quartz)
I vibrate with the annoyance bubbling under the surface of the newspaper article. Imagine. Facebook has great artificial intelligence. Facebook uses smart software. Facebook open sources its systems and methods. The company says it is at the cutting edge of replacing humans with objective procedures.
The article’s belief in baloney is fried and served cold on stale bread. Facebook uses humans. The folks at real journalism outfits may want to work through articles like “Different Loci of Semantic Interference in Picture Naming vs. Word-Picture Matching Tasks” to get a sense of why smart systems go wandering.
So what’s new? Palantir Technologies uses humans to index content. Without that human input, the “smart” software does some useful work, but humans are part of the work flow process.
Other companies use humans too. But the marketing collateral and the fizzy presentations at fancy conferences paint a picture of a world in which cognitive, artificially intelligent, smart systems do the work that subject matter experts used to do. Humans, like indexers and editors, are no longer needed.
Now reality pokes is rose tinted fingertips into the real world.
Let me be clear. One reason I am not happy with the verbiage generated about smart software is one simple fact.
Most of the smart software systems require humans to fiddle at the beginning when a system is set up, while the system operates to deal with exceptions, and after an output is produced to figure out what’s what. In short, smart software is not that smart yet.
There are many reasons but the primary one is that the math and procedures underpinning many of the systems with which I am familiar are immature. Smart software works well when certain caveats are accepted. For example, the vaunted Watson must be trained. Watson, therefore, is not that much different from the training Autonomy baked into its IDOL system in the mid 1990s. Palantir uses humans for one simple reason. Figuring out what’s important to a team under fire with software works much better if the humans with skin in the game provide indexing terms and identify important points like local names for stretches of highway where bombs can be placed without too much hassle. Dig into any of the search and content processing systems and you find expenditures for human work. Companies licensing smart systems which index automatically face significant budget overruns, operational problems because of lousy outputs, and piles of exceptions to either ignore or deal with. The result is that the smoke and mirrors of marketers speaking to people who want a silver bullet are not exactly able to perform like the carefully crafted demonstrations. IBM i2 Analyst’s Notebook requires humans. Fast Search (now an earlobe in SharePoint) requires humans. Coveo’s system requires humans. Attivio’s system requires humans. OpenText’s suite of search and content processing requires humans. Even Maxxcat benefits from informed set up and deployment. Out of the box, dtSearch can index, but one needs to know how to set it up and make it work in a specific Microsoft environment. Every search and content processing system that asserts that it is automatic is spackling flawed wallboard.
For years, I have given a lecture about the essential sameness of search and content processing systems. These systems use the same well known and widely taught mathematical procedures. The great breakthroughs at SRCH2 and similar firms amount to optimization of certain operations. But the whiziest system is pretty much like other systems. As a result, these systems perform in a similar manner. These systems require humans to create term lists, look up tables of aliases for persons of interest, hand craft taxonomies to represent the chunk of reality the system is supposed to know about, and other “libraries” and “knowledgebases.”
The fact that Watson is a source of amusement to me is precisely because the human effort required to make a smart system work is never converted to cost and time statements. People assume Watson won Jeopardy because it was smart. People assume Google knows what ads to present because Google’s software is so darned smart. People assume Facebook mines its data to select news for an individual. Sure, there is automation of certain processes, but humans are needed. Omit the human and you get the crazy Microsoft Tay system which humans taught to be crazier than some US politicians.
For decades I have reminded those who listened to my lectures not to confuse what they see in science fiction films with reality. Progress in smart software is evident. But the progress is very slow, hampered by the computational limits of today’s hardware and infrastructure. Just like real time, the concept is easy to say but quite expensive and difficult to implement in a meaningful way. There’s a reason millisecond access to trading data costs so much that only certain financial operations can afford the bill. Smart software is the same.
How about less outrage from those covering smart software and more critical thinking about what’s required to get a system to produce a useful output? In short, more info and less puffery, more critical thinking and less sawdust. Maybe I imagined it but both the Google and Tesla self driving vehicles have crashed, right? Humans are essential because smart software is not as smart as those who believe in unicorns assume. Demos, like TV game shows, require pre and post production, gentle reader.
What happens when humans are involved? Isn’t bias part of the territory?
Stephen E Arnold, May 16, 2016