Big Shock: Social Media Algorithms Are Not Your Friend
December 11, 2017
One of Facebook’s founding fathers, Sean Parker, has done a surprising about-face on the online platform that earned him billions of dollars. Parker has begun speaking out against social media and the hidden machinery that keeps people interested. We learned more from a recent Axios story, “Sean Parker Unloads on Facebook ‘Exploiting’ Human Psychology.”
According to the story:
Parker’s I-was-there account provides priceless perspective in the rising debate about the power and effects of the social networks, which now have scale and reach unknown in human history. He’s worried enough that he’s sounding the alarm.
According to Parker:
The thought process that went into building these applications, Facebook being the first of them, … was all about: ‘How do we consume as much of your time and conscious attention as possible?’
And that means that we need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that’s going to get you to contribute more content, and that’s going to get you … more likes and comments.
What’s at stake here isn’t just human psychology being exploited, though. It’s a major part of the story, but, as Forbes pointed out, we are on the cusp of social engineering via social media. If more people like Parker don’t stand up and offer a solution, we fear there could be serious repercussions.
Patrick Roland, December 11, 2017
Google Told to Rein in Profits
December 5, 2017
Google makes a lot of money with their advertising algorithms. Every quarter their profit looms higher and higher, but the San Francisco Gate reports that might change in the article, “Google Is Flying High, But Regulatory Threats Loom.” Google and Facebook are being told they need to hold back their hyper efficient advertising machines. Why? Possible Russian interference in the 2016 elections and the widespread dissemination of fake news.
New regulations would require Google and Facebook to add more human oversight into their algorithms. Congress already has a new bill on the floor with new regulations for online political ads to allow more transparency. Social media sites like Twitter and Facebook already making changes, but Google has not done anything and will not get a free pass.
It’s hard to know whether Congress or regulators will actually step up and regulate the company, but there seems to be a newfound willingness to consider such action,’ says Daniel Stevens, executive director of the Campaign for Accountability, a nonprofit watchdog that tracks Google spending on lobbyists and academics. ‘Google, like every other industry, should not be left to its own devices.’
Google has remained mostly silent, but has made a statement that they will increase “efforts to improve transparency, enhance disclosures, and reduce foreign abuse.” Google is out for profit like any other company in the world. The question is if they have the conscience to comply or will find a way around it.
Whitney Grace, December 5, 2017
Big Data and Search Solving Massive Language Processing Headaches
December 4, 2017
Written language can be a massive headache for those needing search strength. Different spoken languages can complicate things when you need to harness a massive amount of data. Thankfully, language processing is the answer, as software architect Federico Thomasetti wrote in his essay, “A Guide to Natural Language Processing.”
According to the story:
…the relationship between elements can be used to understand the importance of each individual element. TextRank actually uses a more complex formula than the original PageRank algorithm, because a link can be only present or not, while textual connections might be partially present. For instance, you might calculate that two sentences containing different words with the same stem (e.g., cat and cats both have cat as their stem) are only partially related.
The original paper describes a generic approach, rather than a specific method. In fact, it also describes two applications: keyword extraction and summarization. The key differences are:
- the units you choose as a foundation of the relationship
- the way you calculate the connection and its strength
Natural language processing is a tricky concept to wrap your head around. But it is becoming a thing that people have to recognize. Currently, millions of dollars are being funneled into perfecting this platform. Those who can really lead the pack here will undoubtedly have a place at the international tech table and possibly take over. This is a big deal.
Patrick Roland, December 4, 2017
Google Relevance: A Light Bulb Flickers
November 20, 2017
The Wall Street Journal published “Google Has Chosen an Answer for You. It’s Often Wrong” on November 17, 2017. The story is online, but you have to pay money to read it. I gave up on the WSJ’s online service years ago because at each renewal cycle, the WSJ kills my account. Pretty annoying because the pivot of the WSJ write up about Google implies that Google does not do information the way “real” news organizations do. Google does not annoy me the way “real” news outfits handle their online services.
For me, the WSJ is a collection of folks who find themselves looking at the exhaust pipes of the Google Hellcat. A source for a story like “Google Has Chosen an Answer for You. It’s Often Wrong” is a search engine optimization expert. Now that’s a source of relevance expertise! Another useful source are the terse posts by Googlers authorized to write vapid, cheery comments in Google’s “official” blogs. The guts of Google’s technology is described in wonky technical papers, the background and claims sections of the Google’s patent documents, and systematic queries run against Google’s multiple content indexes over time. A few random queries does not reveal the shape of the Googzilla in my experience. Toss in a lack of understanding about how Google’s algorithms work and their baked in biases, and you get a write up that slips on a banana peel of the imperative to generate advertising revenue.
I found the write up interesting for three reasons:
- Unusual topic. Real journalists rarely address the question of relevance in ad-supported online services from a solid knowledge base. But today everyone is an expert in search. Just ask any millennial, please. Jonathan Edwards had less conviction about his beliefs than a person skilled in the use of locating a pizza joint on a Google Map.
- SEO is an authority. SEO (search engine optimization) experts have done more to undermine relevance in online than any other group. The one exception are the teams who have to find ways to generate clicks from advertisers who want to shove money into the Google slot machine in the hopes of an online traffic pay day. Using SEO experts’ data as evidence grinds against my belief that old fashioned virtues like editorial policies, selectivity, comprehensive indexing, and a bear hug applied to precision and recall calculations are helpful when discussing relevance, accuracy, and provenance.
- You don’t know what you don’t know. The presentation of the problems of converting a query into a correct answer reminds me of the many discussions I have had over the years with search engine developers. Natural language processing is tricky. Don’t believe me. Grab your copy of Gramatica didactica del espanol and check out the “rules” for el complemento circunstancial. Online systems struggle with what seems obvious to a reasonably informed human, but toss in multiple languages for automated question answer, and “Houston, we have a problem” echoes.
I urge you to read the original WSJ article yourself. You decide how bad the situation is at ad-supported online search services, big time “real” news organizations, and among clueless users who believe that what’s online is, by golly, the truth dusted in accuracy and frosted with rightness.
Humans often take the path of least resistance; therefore, performing high school term paper research is a task left to an ad supported online search system. “Hey, the game is on, and I have to check my Facebook” takes precedence over analytic thought. But there is a free lunch, right?
In my opinion, this particular article fits in the category of dead tree media envy. I find it amusing that the WSJ is irritated that Google search results may not be relevant or accurate. There’s 20 years of search evolution under Googzilla’s scales, gentle reader. The good old days of the juiced up CLEVER methods and Backrub’s old fashioned ideas about relevance are long gone.
I spoke with one of the earlier Googlers in 1999 at a now defunct (thank goodness) search engine conference. As I recall, that confident and young Google wizard told me in a supercilious way that truncation was “something Google would never do.”
What? Huh?
Guess what? Google introduced truncation because it was a required method to deliver features like classification of content. Mr. Page’s comment to me in 1999 and the subsequent embrace of truncation makes clear that Google was willing to make changes to increase its ability to capture the clicks of users. Kicking truncation to the curb and then digging through the gutter trash told me two things: [a] Google could change its mind for the sake of expediency prior to its IPO and [b] Google could say one thing and happily do another.
I thought that Google would sail into accuracy and relevance storms almost 20 years ago. Today Googzilla may be facing its own Ice Age. Articles like the one in the WSJ are just belated harbingers of push back against a commercial company that now has to conform to “standards” for accuracy, comprehensiveness, and relevance.
Hey, Google sells ads. Algorithmic methods refined over the last two decades make that process slick and useful. Selling ads does not pivot on investing money in identifying valid sources and the provenance of “facts.” Not even the WSJ article probes too deeply into the SEO experts’ assertions and survey data.
I assume I should be pleased that the WSJ has finally realized that algorithms integrated with online advertising generate a number of problematic issues for those concerned with factual and verifiable responses.
Google and Search Trust: Math Is Objective, Right?
November 11, 2017
I trust Google. No, I really trust Google. The reason is that I have a reasonable grasp of the mechanism for displaying search result. I also have developed some okay behaviors when I cannot locate PowerPoint files, PDF files, or find current information from pastesites. I try to look quickly at ads on a page and then discard hits which point to those “relevant” inclusions. I even avoid Google’s free services because these —despite some Xoogler protests — these can and do disappear without warning.
Trust, however, seems to mean different things to different people. Consider the write up “It’s Time to Stop Trusting Google Search Already.” The write up suggests that people usually trust Google. The main point is that those people should not trust Google. I like the “already” too. Very hip. Breezy like almost, gentle reader.
I noted this passage:
Alongside pushing Google to stop “fake news,” we should be looking for ways to limit trust in, and reliance on, search algorithms themselves. That might mean seeking handpicked video playlists instead of searching YouTube Kids, which recently drew criticism for surfacing inappropriate videos.
I find the notion of trusting algorithms interesting. Perhaps the issue is not “algorithms” but:
- Threshold values which determine what’s in and what’s out
- Data quality
- Administrative controls which permit “overrides” by really bright sales “engineers”
- The sequence in the work flow for implementing particular algorithms or methods
- Inputs from other Google systems which function in a manner similar to human user clicks
- Quarterly financial objectives.
Trust is good; knowledge of systems and methods, engineer bias, sequence influence, and similar topics might be more fruitful than this fatalistic viewpoint:
But when something like search screws up, we can’t just tell Google to offer the right answers. We have to operate on the assumption that it won’t ever have them.
By the way, was Google’s search system and method “objective” when it integrated the GoTo, Overture, Yahoo pay to play methods which culminated in the hefty payment to the Yahooligans in 2004? Was Google ever more than “Clever”?
Stephen E Arnold, November 11, 2017
Facebook Image Hashing
November 8, 2017
This is a short post. I read “Revenge Porn: Facebook Teaming Up with Government to Stop Nude Photos Ending Up on Messenger, Instagram.” The method referenced in the write up involves “hashing.” Without getting into the weeds, the approach reminded me of the system and method developed by Terbium Labs for its Matchlight innovation. If you are curious about these techniques, you might want to take a quick look at the Terbium Web site. Based on the write up, it is not clear if the Facebook approach was developed by that company or if a third party was involved. Worth watching how this Facebook attempt to deal with some of its interesting content issues evolves.
Stephen E Arnold, November 8, 2017
Great Moments in Image Recognition: Rifle or Turtle?
November 7, 2017
I read “AI Image Recognition Fooled by Single Pixel Change.” The write up explains:
In their research, Su Jiawei and colleagues at Kyushu University made tiny changes to lots of pictures that were then analyzed by widely used AI-based image recognition systems…The researchers found that changing one pixel in about 74% of the test images made the neural nets wrongly label what they saw. Some errors were near misses, such as a cat being mistaken for a dog, but others, including labeling a stealth bomber a dog, were far wider of the mark.
Let’s assume that these experts are correct. My thought is that neural networks may need a bit of tweaking.
What about facial recognition? I don’t want to elicit the ire of Xooglers, Apple iPhone X users, or the savvy folks at universities honking the neural network horns. Absolutely not. My goodness. What if I at age 74 wanted to apply via LinkedIn and its smart software for a 9 to 5 job sweeping floors?
Years ago I prepared a series of lectures pointing out how widely used algorithms were vulnerable to directed flows of shaped data. Exciting stuff.
The write up explains that the mavens are baffled:
There is certainly something strange and interesting going on here, we just don’t know exactly what it is yet.
May I suggest an assumption that methods work as sci fi and tech cheerleaders say they do is incorrect?
Stephen E Arnold, November 7, 2017
Queries Change Ranking Factors
October 26, 2017
Did you ever wonder how Google determines which Web pages to send to the top of search results? According to the Search Engine Journal, how Google decides on page rankings depends on the query results-see more in the article: “Google: Top Ranking Factors Change Depending On Query.” The article contains screenshots of a Twitter conversation between people at Google as they discuss search rankings.
Gary Illyes explains that there are not three ranking factors that apply to all search results. John Mueller joined the conversation and said that Google’s algorithm’s job is to display the relevant content, but other factors vary. Mueller also adds that trying to optimize content for ranking factors in simply short-term thinking. Illyes mentioned that links (backlinking presumably) is not much of a factor either.
In summary:
That’s why it’s important for Google’s algorithms to be able to adjust and recalculate for different ranking signals.
Ranking content based on the same 3 ranking signals at all times would result in Google not always delivering the most ‘relevant’ content to users.
As John Mueller says, at the end of the day that’s what Google search is trying to accomplish.
There is not a magic formula to appear at the top of Google search results. Content is still key as is paid results too.
Whitney Grace, October 26, 2017
Wave of Fake News Is Proving a Boon for the Need for Humans in Tech
October 20, 2017
We are often the first to praise the ingenious algorithms and tools that utilize big data and search muscle for good. But we are also one of the first to admit when things need to be scaled back a bit. The current news climate makes a perfect argument for that, as we discovered in a fascinating Yahoo! Finance piece, “Fake News is Still Here, Despite Efforts by Google and Facebook.”
The article lays out all the failed ways that search giants like Google and social media outlets like Facebook have failed to stop the flood of fake news. Despite the world’s sharpest algorithms and computer programs, they can’t seem to curb the onslaught of odd news.
The article wisely points out that it is not a computer problem anymore, but, instead, a human one. The solution is proving to be deceptively simple: human interaction.
Facebook said last week that it would hire an extra 1,000 people to help vet ads after it found a Russian agency bought ads meant to influence last year’s election. It’s also subjecting potentially sensitive ads , including political messages, to ‘human review.’
In July, Google revamped guidelines for human workers who help rate search results in order to limit misleading and offensive material. Earlier this year, Google also allowed users to flag so-called ‘featured snippets’ and ‘autocomplete’ suggestions if they found the content harmful.
Bravo, we say. There is a limit to what high powered search and big data can do. Sometimes it feels as if those horizons are limitless, but there is still a home for humans and that is a good thing. A balance of big data and beating human hearts seems like the best way to solve the fake news problem and perhaps many others out there.
Patrick Roland, October 20, 2017
Big Data Might Just Help You See Through Walls
October 18, 2017
It might sound like science fiction or, worse, like a waste of time, but scientists are developing cameras that can see around corners. More importantly, these visual aids will fill in our human blind spots. According to an article in MIT News, “An Algorithm For Your Blind Spot,” it may have a lot of uses, but needs some serious help from big data and search.
According to the piece about the algorithm, “CornerCameras,”
CornerCameras generates one-dimensional images of the hidden scene. A single image isn’t particularly useful since it contains a fair amount of “noisy” data. But by observing the scene over several seconds and stitching together dozens of distinct images, the system can distinguish distinct objects in motion and determine their speed and trajectory.
Seems like a pretty neat tool. Especially, when you consider that this algorithm could help firefighters find people in burning buildings or help bus drivers spot a child running onto the street. However, it is far from perfect.
The system still has some limitations. For obvious reasons, it doesn’t work if there’s no light in the scene, and can have issues if there’s low light in the hidden scene itself. It also can get tripped up if light conditions change, like if the scene is outdoors and clouds are constantly moving across the sun. With smartphone-quality cameras the signal also gets weaker as you get farther away from the corner.
Seems like they have a brilliant idea in need of a big data boost. We can envision a world where these folks partner with big data and search giants to help fill in the gaps of the algorithm and provide a powerful tool that can save lives. Here’s to hoping we’re not the only ones making that connection.
Patrick Roland, October 18, 2017