Google Is Taught Homosexuality Is Bad

December 12, 2017

The common belief is that computers and software are objectives, inanimate objects capable of greater intelligence than humans. The truth is that humans developed computers and software, so the objective, inanimate objects are only as smart as their designers. What is even more hilarious is the sentiment analysis AI development process requires tons of data for the algorithms to read and teach itself to recognize patterns. The data used is “contaminated” with human emotion and prejudices. Motherboard wrote about how artificial bias pollutes AI in the article, “Google’s Sentiment Analyzer Thinks Being Gay Is Bad.”

The problem when designing AI is that if it is programmed with polluted and biased data, then these super intelligent algorithms will discriminate against people rather than being objective. Google released its Cloud Natural Language API that allows developers to add Google’s deep learning models into their own applications. Along with entity recognition, the API included a sentiment analyzer that detected when text contained a positive or negative sentiment. However, it has a few bugs and returns biased results, such as saying being gay is bad, certain religions are bad, etc.

It looks like Google’s sentiment analyzer is biased, as many artificially intelligent algorithms have been found to be. AI systems, including sentiment analyzers, are trained using human texts like news stories and books. Therefore, they often reflect the same biases found in society. We don’t know yet the best way to completely remove bias from artificial intelligence, but it’s important to continue to expose it.

The problem with programming AI algorithms is that it is difficult to feed it data free of human prejudices. It is difficult to work around these prejudices, because they are so ingrained in most data. Programmers are kept on their toes to find a solution, but it is not a one size fits all one. Too bad they cannot just stick with numbers and dictionaries.

Whitney Grace, December 12, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, Google, News | Comments Off on Google Is Taught Homosexuality Is Bad

Big Shock: Social Media Algorithms Are Not Your Friend

December 11, 2017

One of Facebook’s founding fathers, Sean Parker, has done a surprising about-face on the online platform that earned him billions of dollars. Parker has begun speaking out against social media and the hidden machinery that keeps people interested. We learned more from a recent Axios story, “Sean Parker Unloads on Facebook ‘Exploiting’ Human Psychology.”

According to the story:

Parker’s I-was-there account provides priceless perspective in the rising debate about the power and effects of the social networks, which now have scale and reach unknown in human history. He’s worried enough that he’s sounding the alarm.

According to Parker:

The thought process that went into building these applications, Facebook being the first of them, … was all about: ‘How do we consume as much of your time and conscious attention as possible?’

And that means that we need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that’s going to get you to contribute more content, and that’s going to get you … more likes and comments.

What’s at stake here isn’t just human psychology being exploited, though. It’s a major part of the story, but, as Forbes pointed out, we are on the cusp of social engineering via social media. If more people like Parker don’t stand up and offer a solution, we fear there could be serious repercussions.

Patrick Roland, December 11, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Facebook, News, Social Media | Comments Off on Big Shock: Social Media Algorithms Are Not Your Friend

Google Told to Rein in Profits

December 5, 2017

Google makes a lot of money with their advertising algorithms. Every quarter their profit looms higher and higher, but the San Francisco Gate reports that might change in the article, “Google Is Flying High, But Regulatory Threats Loom.” Google and Facebook are being told they need to hold back their hyper efficient advertising machines. Why? Possible Russian interference in the 2016 elections and the widespread dissemination of fake news.

New regulations would require Google and Facebook to add more human oversight into their algorithms. Congress already has a new bill on the floor with new regulations for online political ads to allow more transparency. Social media sites like Twitter and Facebook already making changes, but Google has not done anything and will not get a free pass.

It’s hard to know whether Congress or regulators will actually step up and regulate the company, but there seems to be a newfound willingness to consider such action,’ says Daniel Stevens, executive director of the Campaign for Accountability, a nonprofit watchdog that tracks Google spending on lobbyists and academics. ‘Google, like every other industry, should not be left to its own devices.’

Google has remained mostly silent, but has made a statement that they will increase “efforts to improve transparency, enhance disclosures, and reduce foreign abuse.” Google is out for profit like any other company in the world. The question is if they have the conscience to comply or will find a way around it.

Whitney Grace, December 5, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Education, Google, News, Social Media | Comments Off on Google Told to Rein in Profits

Big Data and Search Solving Massive Language Processing Headaches

December 4, 2017

Written language can be a massive headache for those needing search strength. Different spoken languages can complicate things when you need to harness a massive amount of data. Thankfully, language processing is the answer, as software architect Federico Thomasetti wrote in his essay, “A Guide to Natural Language Processing.”

According to the story:

…the relationship between elements can be used to understand the importance of each individual element. TextRank actually uses a more complex formula than the original PageRank algorithm, because a link can be only present or not, while textual connections might be partially present. For instance, you might calculate that two sentences containing different words with the same stem (e.g., cat and cats both have cat as their stem) are only partially related.

The original paper describes a generic approach, rather than a specific method. In fact, it also describes two applications: keyword extraction and summarization. The key differences are:

the units you choose as a foundation of the relationship
the way you calculate the connection and its strength

Natural language processing is a tricky concept to wrap your head around. But it is becoming a thing that people have to recognize. Currently, millions of dollars are being funneled into perfecting this platform. Those who can really lead the pack here will undoubtedly have a place at the international tech table and possibly take over. This is a big deal.

Patrick Roland, December 4, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News, Search, Technology | Comments Off on Big Data and Search Solving Massive Language Processing Headaches

Google Relevance: A Light Bulb Flickers

November 20, 2017

The Wall Street Journal published “Google Has Chosen an Answer for You. It’s Often Wrong” on November 17, 2017. The story is online, but you have to pay money to read it. I gave up on the WSJ’s online service years ago because at each renewal cycle, the WSJ kills my account. Pretty annoying because the pivot of the WSJ write up about Google implies that Google does not do information the way “real” news organizations do. Google does not annoy me the way “real” news outfits handle their online services.

For me, the WSJ is a collection of folks who find themselves looking at the exhaust pipes of the Google Hellcat. A source for a story like “Google Has Chosen an Answer for You. It’s Often Wrong” is a search engine optimization expert. Now that’s a source of relevance expertise! Another useful source are the terse posts by Googlers authorized to write vapid, cheery comments in Google’s “official” blogs. The guts of Google’s technology is described in wonky technical papers, the background and claims sections of the Google’s patent documents, and systematic queries run against Google’s multiple content indexes over time. A few random queries does not reveal the shape of the Googzilla in my experience. Toss in a lack of understanding about how Google’s algorithms work and their baked in biases, and you get a write up that slips on a banana peel of the imperative to generate advertising revenue.

I found the write up interesting for three reasons:

Unusual topic. Real journalists rarely address the question of relevance in ad-supported online services from a solid knowledge base. But today everyone is an expert in search. Just ask any millennial, please. Jonathan Edwards had less conviction about his beliefs than a person skilled in the use of locating a pizza joint on a Google Map.
SEO is an authority. SEO (search engine optimization) experts have done more to undermine relevance in online than any other group. The one exception are the teams who have to find ways to generate clicks from advertisers who want to shove money into the Google slot machine in the hopes of an online traffic pay day. Using SEO experts’ data as evidence grinds against my belief that old fashioned virtues like editorial policies, selectivity, comprehensive indexing, and a bear hug applied to precision and recall calculations are helpful when discussing relevance, accuracy, and provenance.
You don’t know what you don’t know. The presentation of the problems of converting a query into a correct answer reminds me of the many discussions I have had over the years with search engine developers. Natural language processing is tricky. Don’t believe me. Grab your copy of Gramatica didactica del espanol and check out the “rules” for el complemento circunstancial. Online systems struggle with what seems obvious to a reasonably informed human, but toss in multiple languages for automated question answer, and “Houston, we have a problem” echoes.

I urge you to read the original WSJ article yourself. You decide how bad the situation is at ad-supported online search services, big time “real” news organizations, and among clueless users who believe that what’s online is, by golly, the truth dusted in accuracy and frosted with rightness.

Humans often take the path of least resistance; therefore, performing high school term paper research is a task left to an ad supported online search system. “Hey, the game is on, and I have to check my Facebook” takes precedence over analytic thought. But there is a free lunch, right?

In my opinion, this particular article fits in the category of dead tree media envy. I find it amusing that the WSJ is irritated that Google search results may not be relevant or accurate. There’s 20 years of search evolution under Googzilla’s scales, gentle reader. The good old days of the juiced up CLEVER methods and Backrub’s old fashioned ideas about relevance are long gone.

I spoke with one of the earlier Googlers in 1999 at a now defunct (thank goodness) search engine conference. As I recall, that confident and young Google wizard told me in a supercilious way that truncation was “something Google would never do.”

What? Huh?

Guess what? Google introduced truncation because it was a required method to deliver features like classification of content. Mr. Page’s comment to me in 1999 and the subsequent embrace of truncation makes clear that Google was willing to make changes to increase its ability to capture the clicks of users. Kicking truncation to the curb and then digging through the gutter trash told me two things: [a] Google could change its mind for the sake of expediency prior to its IPO and [b] Google could say one thing and happily do another.

I thought that Google would sail into accuracy and relevance storms almost 20 years ago. Today Googzilla may be facing its own Ice Age. Articles like the one in the WSJ are just belated harbingers of push back against a commercial company that now has to conform to “standards” for accuracy, comprehensiveness, and relevance.

Hey, Google sells ads. Algorithmic methods refined over the last two decades make that process slick and useful. Selling ads does not pivot on investing money in identifying valid sources and the provenance of “facts.” Not even the WSJ article probes too deeply into the SEO experts’ assertions and survey data.

I assume I should be pleased that the WSJ has finally realized that algorithms integrated with online advertising generate a number of problematic issues for those concerned with factual and verifiable responses.

Written by Stephen E. Arnold · Filed Under algorithms, Business strategy, Feature, Search | Comments Off on Google Relevance: A Light Bulb Flickers

Google and Search Trust: Math Is Objective, Right?

November 11, 2017

I trust Google. No, I really trust Google. The reason is that I have a reasonable grasp of the mechanism for displaying search result. I also have developed some okay behaviors when I cannot locate PowerPoint files, PDF files, or find current information from pastesites. I try to look quickly at ads on a page and then discard hits which point to those “relevant” inclusions. I even avoid Google’s free services because these —despite some Xoogler protests — these can and do disappear without warning.

Trust, however, seems to mean different things to different people. Consider the write up “It’s Time to Stop Trusting Google Search Already.” The write up suggests that people usually trust Google. The main point is that those people should not trust Google. I like the “already” too. Very hip. Breezy like almost, gentle reader.

I noted this passage:

Alongside pushing Google to stop “fake news,” we should be looking for ways to limit trust in, and reliance on, search algorithms themselves. That might mean seeking handpicked video playlists instead of searching YouTube Kids, which recently drew criticism for surfacing inappropriate videos.

I find the notion of trusting algorithms interesting. Perhaps the issue is not “algorithms” but:

Threshold values which determine what’s in and what’s out
Data quality
Administrative controls which permit “overrides” by really bright sales “engineers”
The sequence in the work flow for implementing particular algorithms or methods
Inputs from other Google systems which function in a manner similar to human user clicks
Quarterly financial objectives.

Trust is good; knowledge of systems and methods, engineer bias, sequence influence, and similar topics might be more fruitful than this fatalistic viewpoint:

But when something like search screws up, we can’t just tell Google to offer the right answers. We have to operate on the assumption that it won’t ever have them.

By the way, was Google’s search system and method “objective” when it integrated the GoTo, Overture, Yahoo pay to play methods which culminated in the hefty payment to the Yahooligans in 2004? Was Google ever more than “Clever”?

Stephen E Arnold, November 11, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News | 3 Comments

Facebook Image Hashing

November 8, 2017

This is a short post. I read “Revenge Porn: Facebook Teaming Up with Government to Stop Nude Photos Ending Up on Messenger, Instagram.” The method referenced in the write up involves “hashing.” Without getting into the weeds, the approach reminded me of the system and method developed by Terbium Labs for its Matchlight innovation. If you are curious about these techniques, you might want to take a quick look at the Terbium Web site. Based on the write up, it is not clear if the Facebook approach was developed by that company or if a third party was involved. Worth watching how this Facebook attempt to deal with some of its interesting content issues evolves.

Stephen E Arnold, November 8, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Facebook Image Hashing

Great Moments in Image Recognition: Rifle or Turtle?

November 7, 2017

I read “AI Image Recognition Fooled by Single Pixel Change.” The write up explains:

In their research, Su Jiawei and colleagues at Kyushu University made tiny changes to lots of pictures that were then analyzed by widely used AI-based image recognition systems…The researchers found that changing one pixel in about 74% of the test images made the neural nets wrongly label what they saw. Some errors were near misses, such as a cat being mistaken for a dog, but others, including labeling a stealth bomber a dog, were far wider of the mark.

Let’s assume that these experts are correct. My thought is that neural networks may need a bit of tweaking.

What about facial recognition? I don’t want to elicit the ire of Xooglers, Apple iPhone X users, or the savvy folks at universities honking the neural network horns. Absolutely not. My goodness. What if I at age 74 wanted to apply via LinkedIn and its smart software for a 9 to 5 job sweeping floors?

Years ago I prepared a series of lectures pointing out how widely used algorithms were vulnerable to directed flows of shaped data. Exciting stuff.

The write up explains that the mavens are baffled:

There is certainly something strange and interesting going on here, we just don’t know exactly what it is yet.

May I suggest an assumption that methods work as sci fi and tech cheerleaders say they do is incorrect?

Stephen E Arnold, November 7, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, News | Comments Off on Great Moments in Image Recognition: Rifle or Turtle?

Queries Change Ranking Factors

October 26, 2017

Did you ever wonder how Google determines which Web pages to send to the top of search results? According to the Search Engine Journal, how Google decides on page rankings depends on the query results-see more in the article: “Google: Top Ranking Factors Change Depending On Query.” The article contains screenshots of a Twitter conversation between people at Google as they discuss search rankings.

Gary Illyes explains that there are not three ranking factors that apply to all search results. John Mueller joined the conversation and said that Google’s algorithm’s job is to display the relevant content, but other factors vary. Mueller also adds that trying to optimize content for ranking factors in simply short-term thinking. Illyes mentioned that links (backlinking presumably) is not much of a factor either.

In summary:

That’s why it’s important for Google’s algorithms to be able to adjust and recalculate for different ranking signals.

Ranking content based on the same 3 ranking signals at all times would result in Google not always delivering the most ‘relevant’ content to users.

As John Mueller says, at the end of the day that’s what Google search is trying to accomplish.

There is not a magic formula to appear at the top of Google search results. Content is still key as is paid results too.

Whitney Grace, October 26, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, search engine | 46 Comments

Wave of Fake News Is Proving a Boon for the Need for Humans in Tech

October 20, 2017

We are often the first to praise the ingenious algorithms and tools that utilize big data and search muscle for good. But we are also one of the first to admit when things need to be scaled back a bit. The current news climate makes a perfect argument for that, as we discovered in a fascinating Yahoo! Finance piece, “Fake News is Still Here, Despite Efforts by Google and Facebook.”

The article lays out all the failed ways that search giants like Google and social media outlets like Facebook have failed to stop the flood of fake news. Despite the world’s sharpest algorithms and computer programs, they can’t seem to curb the onslaught of odd news.

The article wisely points out that it is not a computer problem anymore, but, instead, a human one. The solution is proving to be deceptively simple: human interaction.

Facebook said last week that it would hire an extra 1,000 people to help vet ads after it found a Russian agency bought ads meant to influence last year’s election. It’s also subjecting potentially sensitive ads , including political messages, to ‘human review.’

In July, Google revamped guidelines for human workers who help rate search results in order to limit misleading and offensive material. Earlier this year, Google also allowed users to flag so-called ‘featured snippets’ and ‘autocomplete’ suggestions if they found the content harmful.

Bravo, we say. There is a limit to what high powered search and big data can do. Sometimes it feels as if those horizons are limitless, but there is still a home for humans and that is a good thing. A balance of big data and beating human hearts seems like the best way to solve the fake news problem and perhaps many others out there.

Patrick Roland, October 20, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Google, News, Social Media | Comments Off on Wave of Fake News Is Proving a Boon for the Need for Humans in Tech

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

Google Is Taught Homosexuality Is Bad

Big Shock: Social Media Algorithms Are Not Your Friend

Google Told to Rein in Profits

Big Data and Search Solving Massive Language Processing Headaches

Google Relevance: A Light Bulb Flickers

Google and Search Trust: Math Is Objective, Right?

Facebook Image Hashing

Great Moments in Image Recognition: Rifle or Turtle?

Queries Change Ranking Factors

Wave of Fake News Is Proving a Boon for the Need for Humans in Tech

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta