When AI Spreads Propaganda

February 28, 2017

We thought Google was left-leaning, but an article at the Guardian, “How Google’s Search Algorithm Spreads False Information with a Rightwing Bias,” seems to contradict that assessment. The article cites recent research by the Observer, which found neo-Nazi and anti-Semitic views prominently featured in Google search results. The Guardian followed up with its own research and documented more examples of right-leaning misinformation, like climate-change denials, anti-LGBT tirades, and Sandy Hook conspiracy theories. Reporters Olivia Solon and Sam Levin tell us:

The Guardian’s latest findings further suggest that Google’s searches are contributing to the problem. In the past, when a journalist or academic exposes one of these algorithmic hiccups, humans at Google quietly make manual adjustments in a process that’s neither transparent nor accountable.

At the same time, politically motivated third parties including the ‘alt-right’, a far-right movement in the US, use a variety of techniques to trick the algorithm and push propaganda and misinformation higher up Google’s search rankings.

These insidious manipulations – both by Google and by third parties trying to game the system – impact how users of the search engine perceive the world, even influencing the way they vote. This has led some researchers to study Google’s role in the presidential election in the same way that they have scrutinized Facebook.

Robert Epstein from the American Institute for Behavioral Research and Technology has spent four years trying to reverse engineer Google’s search algorithms. He believes, based on systematic research, that Google has the power to rig elections through something he calls the search engine manipulation effect (SEME).

Epstein conducted five experiments in two countries to find that biased rankings in search results can shift the opinions of undecided voters. If Google tweaks its algorithm to show more positive search results for a candidate, the searcher may form a more positive opinion of that candidate.

This does add a whole new, insidious dimension to propaganda. Did Orwell foresee algorithms? Further complicating the matter is the element of filter bubbles, through which many consume only information from homogenous sources, allowing no room for contrary facts. The article delves into how propagandists are gaming the system and describes Google’s response, so interested readers may wish to navigate there for more information.

One particular point gives me chills– Epstein states that research shows the vast majority of readers are not aware that bias exists within search rankings; they have no idea they are being manipulated. Perhaps those of us with some understanding of search algorithms can spread that insight to the rest of the multitude. It seems such education is sorely needed.

Cynthia Murrell, February 28, 2017

Written by Stephen E. Arnold · Filed Under AI, algorithms, Facebook, Google, News, Search, search engine | 1 Comment

Forecasting Methods: Detail without Informed Guidance

February 27, 2017

Let’s create a scenario. You are a person trying to figure out how to index a chunk of content. You are working with cancer information sucked down from PubMed or a similar source. You run an extraction process and push the text through an indexing system. You use a system like Leximancer and look at the results. Hmmm.

Next you take a corpus of blog posts dealing with medical information. You suck down the content and run it through your extractor, your indexing system, and your Leximancer set up. You look at the results. Hmmm.

How do you figure out what terms are going to be important for your next batch of mixed content?

You might navigate to “Selecting Forecasting Methods in Data Science.” The write up does a good job of outlining some of the numerical recipes taught in university courses and discussed in textbooks. For example, you can get an overview in this nifty graphic:

And you can review outputs from the different methods identified like this:

Useful.

What’s missing? For the person floundering away like one government agency’s employee at which I worked years ago, you pick the trend line you want. Then you try to plug in the numbers and generate some useful data. If that is too tough, you hire your friendly GSA schedule consultant to do the work for you. Yep, that’s how I ended up looking at:

Manually selected data
Lousy controls
Outputs from different systems
Misindexed text
Entities which were not really entities
A confused government employee.

Here’s the takeaway. Just because software is available to output stuff in a log file and Excel makes it easy to wrangle most of the data into rows and columns, none of the information may be useful, valid, or even in the same ball game.

When one then applies without understanding different forecasting methods, we have an example of how an individual can create a pretty exciting data analysis.

Descriptions of algorithms do not correlate with high value outputs. Data quality, sampling, understanding why curves are “different”, and other annoying details don’t fit into some busy work lives.

Stephen E Arnold, February 27, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Indexing, News | 1 Comment

Software Bias Is Being Addressed

February 27, 2017

Researchers are working to fix the problem of bias in software, we learn from the article, “He’s Brilliant, She’s Lovely: Teaching Computers to Be Less Sexist” at NPR’s blog, All Tech Considered. Writer Byrd Pinkerton begins by noting that this issue of software reflecting human biases is well-documented, citing this article from his colleague. He then informs us that Microsoft, for one, is doing something about it:

Adam Kalai thinks we should start with the bits of code that teach computers how to process language. He’s a researcher for Microsoft and his latest project — a joint effort with Boston University — has focused on something called a word embedding. ‘It’s kind of like a dictionary for a computer,’ he explains. Essentially, word embeddings are algorithms that translate the relationships between words into numbers so that a computer can work with them. You can grab a word embedding ‘dictionary’ that someone else has built and plug it into some bigger program that you are writing. …

Kalai and his colleagues have found a way to weed these biases out of word embedding algorithms. In a recent paper, they’ve shown that if you tell the algorithms to ignore certain relationships, they can extrapolate outwards.

And voila, a careful developer can teach an algorithm to fix its own bias. If only the process were so straightforward for humans. See the article for more about the technique.

Ultimately, though, the problem lies less with the biased algorithms themselves and more with the humans who seek to use them in decision-making. Researcher Kalai points to the marketing of health-related products as a project for which a company might actually want to differentiate between males and females. Pinkerton concludes:

For Kalai, the problem is not that people sometimes use word embedding algorithms that differentiate between gender or race, or even algorithms that reflect human bias. The problem is that people are using the algorithms as a black box piece of code, plugging them in to larger programs without considering the biases they contain, and without making careful decisions about whether or not they should be there.

So, though discoveries about biased software are concerning, it is good to know the issue is being addressed. We shall see how fast the effort progresses.

Cynthia Murrell, February 27, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Microsoft, News, Security, software | Comments Off on Software Bias Is Being Addressed

Bing Improvements

February 17, 2017

Online marketers are usually concerned with the latest Google algorithm, but Microsoft’s Bing is also a viable SEO target. Busines2Community shares recent upgrades to that Internet search engine in its write-up, “2016 New Bing Features.” The section on the mobile app seems to be the most relevant to those interested in Search developments. Writer Asaf Hartuv tells us:

For search, product and local results were improved significantly. Now when you search using the Bing app on an iPhone, you will get more local results with more information featured right on the page. You won’t have to click around to get what you want.

Similarly, when you search for a product you want to buy, you will get more options from more stores, such as eBay and Best Buy. You won’t have to go to as many websites to do the comparison shopping that is so important to making your purchase decision.

While these updates were made to the app, the image and video search results were also improved. You get far more options in a more user-friendly layout when you search for these visuals.

The Bing app also includes practical updates that go beyond search. For example, you can choose to follow a movie and get notified when it becomes available for streaming. Or you can find local bus routes or schedules based on the information you select on a map.

Hartuv also discusses upgrades to Bing Ads (a bargain compared to Google Ads, apparently), and the fact that Bing is now powering AOL’s search results (after being dropped by Yahoo). He also notes that, while not a new feature, Bing Trends is always presenting newly assembled, specialized content to enhance users’ understanding of current events. Hartuv concludes by prompting SEO pros to remember the value of Bing.

Cynthia Murrell, February 17, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Applications, Bing, Google, Microsoft, News, Search, search engine, SEO, Technology | Comments Off on Bing Improvements

Enterprise Heads in the Sand on Data Loss Prevention

February 16, 2017

Enterprises could be doing so much more to protect themselves from cyber attacks, asserts Auriga Technical Manager James Parry in his piece, “The Dark Side: Mining the Dark Web for Cyber Intelligence” at Information Security Buzz. Parry informs us that most businesses fail to do even the bare minimum they should to protect against hackers. This minimum, as he sees it, includes monitoring social media and underground chat forums for chatter about their company. After all, hackers are not known for their modesty, and many do boast about their exploits in the relative open. Most companies just aren’t bothering to look that direction. Such an effort can also reveal those impersonating a business by co-opting its slogans and trademarks.

Companies who wish to go beyond the bare minimum will need to expand their monitoring to the dark web (and expand their data-processing capacity). From “shady” social media to black markets to hacker libraries, the dark web can reveal much about compromised data to those who know how to look. Parry writes:

Yet extrapolating this information into a meaningful form that can be used for threat intelligence is no mean feat. The complexity of accessing the dark web combined with the sheer amount of data involved, correlation of events, and interpretation of patterns is an enormous undertaking, particularly when you then consider that time is the determining factor here. Processing needs to be done fast and in real-time. Algorithms also need to be used which are able to identify and flag threats and vulnerabilities. Therefore, automated event collection and interrogation is required and for that you need the services of a Security Operations Centre (SOC).

The next generation SOC is able to perform this type of processing and detect patterns, from disparate data sources, real-time, historical data etc. These events can then be threat assessed and interpreted by security analysts to determine the level of risk posed to the enterprise. Forewarned, the enterprise can then align resources to reduce the impact of the attack. For instance, in the event of an emerging DoS attack, protection mechanisms can be switched from monitoring to mitigation mode and network capacity adjusted to weather the attack.

Note that Parry’s company, Auriga, supplies a variety of software and R&D services, including a Security Operations Center platform, so he might be a tad biased. Still, he has some good points. The article notes SOC insights can also be used to predict future attacks and to prioritize security spending. Typically, SOC users have been big businesses, but, Parry advocates, scalable and entry-level packages are making such tools available to smaller companies.

From monitoring mainstream social media to setting up an SOC to comb through dark web data, tools exist to combat hackers. The question, Parry observes, is whether companies will face the growing need to embrace those methods.

Cynthia Murrell, February 16, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Data, News, Security, software | 2 Comments

Google Battling Pirates More and More Each Year

February 10, 2017

So far, this has been a booming year for DMCA takedown requests, we learn from TorrentFreak’s article, “Google Wipes Record Breaking Half Billion Pirate Links in 2016.” The number of wiped links has been growing rapidly over the last several years, but is that good or bad news for copyright holders? That depends on whom you ask. Writer Ernesto reveals the results of TorrentFreak’s most recent analysis:

Data analyzed by TorrentFreak reveals that Google recently received its 500 millionth takedown request of 2016. The counter currently [in mid-July] displays more than 523,000,000, which is yet another record. For comparison, last year it took almost the entire year to reach the same milestone. If the numbers continue to go up at the same rate throughout the year, Google will process a billion allegedly infringing links during the whole of 2016, a staggering number.

According to Google roughly 98% of the reported URLs are indeed removed. This means that half a billion links were stripped from search results this year alone. However, according to copyright holders, this is still not enough. Entertainment industry groups such as the RIAA, BPI and MPAA have pointed out repeatedly that many files simply reappear under new URLs.

Indeed; copyright holders continue to call for Google to take stronger measures. For its part, the company insists increased link removals is evidence that its process is working quite well. They issued out an update of their report, “How Google Fights Piracy.” The two sides remain deeply divided, and will likely be at odds for some time. Ernesto tells us some copyright holders are calling for the government to step in. That could be interesting.

Cynthia Murrell, February 10, 2017

Written by Stephen E. Arnold · Filed Under algorithms, Copyright, cybercrime, Google, News, Search quality, Technology | Comments Off on Google Battling Pirates More and More Each Year

Probability Algorithms: Boiling Prediction Down

February 6, 2017

I read “The Algorithms Behind Probabilistic Programming.” Making a somewhat less than familiar topic accessible is a good idea. If you want to get a sense for predictive analytics, why not read a blog post about Bayesian methods with a touch of Markov? The write up pitches a more in depth report about predictive analytics. “The Algorithms Behind…” write up makes it clear to peg prediction on a method which continues to confound some “real” consultants. I like the mentions of Monte Carlo methods and the aforementioned sporty Markov. I did not see a reference to LaPlace. Will you be well on your way to understanding predictive analytics after working through the article from Fast Forward Labs. No, but you will have some useful names to Google. When I read explanations of these methods, I like to reflect on Autonomy’s ground breaking products from the 1990s.

Stephen E Arnold, February 6, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Probability Algorithms: Boiling Prediction Down

Smart Software Recipe Fiesta

February 2, 2017

I read “140 Machine Learning Formulas.” The listing hits the top 10 most popular algorithms and adds an additional 130. The summary of the formulas is at this link. A happy quack to Rubens Zimbres who compiled the list. A profile of Mr. Zimbres is available at this link. FYI. He’s looking for a new challenge.

Stephen E Arnold, February 2, 2017

Written by Stephen E. Arnold · Filed Under algorithms, News | Comments Off on Smart Software Recipe Fiesta

Now Watson Wants to Be a Judge

December 27, 2016

IBM has deployed Watson in many fields, including the culinary arts, sports, and medicine. The big data supercomputer can be used in any field or industry that creates a lot of data. Watson, in turn, will digest the data, and depending on the algorithms spit out results. Now IBM wants Watson to take on the daunting task of judging, says The Drum in “Can Watson Pick A Cannes Lion Winner? IBM’s Cognitive System Tries Its Arm At Judging Awards.”

According to the article, judging is a cognitive process and requires special algorithms, not the mention the bias of certain judges. In other words, it should be right up Watson’s alley (perhaps the results will be less subjective as well). The Drum decided to put Watson to the ultimate creative test and fed Watson thousands of previous Cannes films. Then Watson predicted who would win the Cannes Film Festival in the Outdoor category this year.

This could change the way contests are judged:

The Drum’s magazine editor Thomas O’Neill added: “This is an experiment that could massively disrupt the awards industry. We have the potential here of AI being able to identify an award winning ad from a loser before you’ve even bothered splashing out on the entry fee. We’re looking forward to seeing whether it proves as accurate in reality as it did in training.

I would really like to see this applied to the Academy Awards that are often criticized for their lack of diversity and consisting of older, white men. It would be great to see if Watson would yield different results that what the Academy actually selects.

Whitney Grace, December 27, 2016

Written by Stephen E. Arnold · Filed Under algorithms, Data, IBM Watson, News, Technology | Comments Off on Now Watson Wants to Be a Judge

An Apologia for People. Big Data Are Just Peachy Keen

December 25, 2016

I read “Don’t Blame Big Data for Pollsters’ Failings.” The news about the polls predicting a victory for Hillary Clinton reached me in Harrod’s Creek five days after the election. Hey, Beyond Search is in rural Kentucky. It looks from the news reports and the New York Times’s odd letter about doing “real” journalism that the pundits predicted that the mare would win the US derby.

The write up explains that Big Data did not fail. The reason? The pollsters were not using Big Data. The sample sizes were about 1,000 people. Check your statistics book. In the back will be samples sizes for populations. If you have an older statistics book, you have to use the formula like

Big Data doesn’t fool around with formulas. Big Data just uses “big data.” Is the idea is that the bigger the data, the better the output?

The write up states that the problem was the sample itself: The actual humans.

The write up quotes a mid tier consultant from an outfit called Ovum which reminds me of eggs. I circled this statement:

“When you have data sets that are large enough, you can find signals for just about anything,” says Tony Baer, a big data analyst at Ovum. “So this places a premium on identifying the right data sets and asking the right questions, and relentlessly testing out your hypothesis with test cases extending to more or different data sets.”

The write up tosses in social media. Facebook takes the position that its information had minimal effect on the election. Nifty assertion that.

The solution is, as I understand the write up, to use a more real time system, different types of data, and math. The conclusion is:

With significant economic consequences attached to political outcomes, it is clear that those companies with sufficient depth of real-time behavioral data will likely increase in value.

My view is that hope and other distinctly human behaviors certainly threw an egg at reality. It is great to know that there is a fix and that Big Data emerge as the path forward. More work ahead for the consultants who often determine sample sizes by looking at Web sites like SurveySystem and get their sample from lists of contributors, a 20 something’s mobile phone contact list, or lists available from friends.

If you use Big Data, tap into real time streams of information, and do the social media mining—you will be able to predict the future. Sounds logical? Now about that next Kentucky Derby winner? Happy or unhappy holiday?

Stephen E Arnold, December 25, 2016

Written by Stephen E. Arnold · Filed Under algorithms, Analytics, Big data, News | Comments Off on An Apologia for People. Big Data Are Just Peachy Keen

« Previous Page — Next Page »

Search the site
Subscribe to Beyond Search
Feature archive
News archive

Stephen E. Arnold monitors search, content processing, text mining and related topics from his high-tech nerve center in rural Kentucky. He tries to winnow the goose feathers from the giblets. He works with colleagues worldwide to make this Web log useful to those who want to go "beyond search". Contact him at sa [at] arnoldit.com. His Web site with additional information about search is arnoldit.com.

Categories
- 3D-Printing
- Acquisition
- Advertising
- Aggregation
- AI
- Alexa
- algorithms
- Amazon
- Amazonia
- Analytics
- Appliance
- Applications
- Audio
- Augmented Reality
- Big data
- Bing
- Bitcoin
- Bitext
- Book review
- Business intelligence
- Business process
- Business strategy
- Censorship
- Cloud computing
- Company Profile
- Conferences
- Connectors
- Consulting
- Consumer
- Content processing
- Copyright
- Corporate Concerns
- Cost
- Crawl
- Crowdfunding
- cryptocurrency
- Customer support
- Cyber OSINT
- cybercrime
- cybersecurity
- Dark Web
- DarkCyber
- Data
- Data mining
- Database
- Deepfakes
- Digital Assistant
- Digital Library
- E2EE
- ECommerce
- EDiscovery
- Editorial opinion
- Education
- Emoticons
- Enterprise
- Enterprise search
- Entity extraction
- Ethics
- Facebook
- Faceted search
- Factualities
- Feature
- Federated search
- Financial
- Fogint
- Google
- Governance
- Government
- Hackers
- healthcare
- IBM Watson
- Image search
- Indexing
- Infrastructure
- Innovation
- Integration
- intelware
- Interface
- Internet
- Interview
- Investment
- law enforcement
- Legal matters
- Library automation
- Management
- Marketing
- Mathematics
- Metadata
- Microsoft
- Mobile
- Natural language processing
- News
- NGIA
- Online (general)
- Open Access
- Open source
- OSINT
- Osint Radar
- Overflight
- Palantir
- Patents
- Personnel
- Podcast
- Policeware
- Portals
- Predictive coding
- Privacy
- Profile
- Publishing
- Quotation
- Real time search
- Reference tool
- Rich media
- Robot Writer
- Search
- Search enabled applications
- search engine
- Search quality
- Security
- Semantic
- Sentiment analysis
- SEO
- SharePoint
- Short Honks
- Smart Technology
- Social
- Social Media
- software
- Statistics
- Taxonomy
- Technology
- Text analytics
- Text processing
- Tools
- Tor
- Training
- Translation
- Twitter
- Uncategorized
- Unstructured Data
- User experience
- User Interface
- Vertical search
- Video
- visualization
- Voice search
- Voice technology
- Web 3
- Web Services
- Webinar
- Windows
- Work flow
- XML
- Yahoo

Beyond Search

When AI Spreads Propaganda

Forecasting Methods: Detail without Informed Guidance

Software Bias Is Being Addressed

Bing Improvements

Enterprise Heads in the Sand on Data Loss Prevention

Google Battling Pirates More and More Each Year

Probability Algorithms: Boiling Prediction Down

Smart Software Recipe Fiesta

Now Watson Wants to Be a Judge

An Apologia for People. Big Data Are Just Peachy Keen

Search the site

Categories

Archives

Recent Posts

Meta

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Search the site

Categories

Archives

Recent Posts

Meta