Smart Software Writes: Fake News May Improve Its Fakiness

March 17, 2018

Humans are the worst critics, but is it possible that AI could become even worse? We think that AI are still too dumb to read stories and understand the context. AI is still not smart enough to provide context to words, it is called sentiment analysis, but it can recognize news stories from personal opinions and experiences. Motherboard on Vice wrote an article that discusses how many news stories are really new and noteworthy called, “AI System Sorts News Articles By Whether Or Not They Contain Actual Information.”

In order to separate the white noise from the correct frequency, the machine learning AI would need an objective metric of content density and an objective way to evaluate news stories within that density. An AI that could read deduce real stories from fake would be programmed like any other machine learning program: get a bunch of data and identify the different the data by splitting it into appropriate groups. One team built an AI based on this model and came back with decent return:

“In a recent paper published in the Journal of Artificial Intelligence Research, computer scientists Ani Nenkova and Yinfei Yang, of Google and the University of Pennsylvania, respectively, describe a new machine learning approach to classifying written journalism according to a formalized idea of “content density.” With an average accuracy of around 80 percent, their system was able to accurately classify news stories across a wide range of domains, spanning from international relations and business to sports and science journalism, when evaluated against a ground truth dataset of already correctly classified news articles.”

When the evaluated data was set against subset that had been labeled for validation purposes, the return was only about fifty percent.

Our view of this “progress” is that it seems that the software can be trained by feeding the system fake news. The smart software will then, if the information in the article is accurate, be able to improve the fake news. Does this mean that the improved fake news will be “better”?

The problem is that AI still has trouble deciphering human intent and the true meaning behind words. It is similar to how some autistic people have trouble understanding human social cues. AI needs to become much more human to understand human language intricacies.

Whitney Grace, March 17, 2018

Crime Prediction: Not a New Intelligence Analysis Function

March 16, 2018

We noted “New Orleans Ends Its Palantir Predictive Policing Program.” The interest in this Palantir Technologies’ project surprised us from our log cabin with a view of the mine drainage run off pond. The predictive angle is neither new nor particularly stealthy. Many years ago when I worked for one of the outfits developing intelligence analysis systems, the “predictive” function was a routine function.

Here’s how it works:

  • Identify an entity of interest (person, event, organization, etc.)
  • Search for other items including the entity
  • Generate near matches. (We called this “fuzzification” because we wanted hits which were “near” the entity in which we had an interest. Plus, the process worked reasonably well in reverse too.)
  • Punch the analyze function.

Once one repeats the process several times, the system dutifully generates reports which make it easy to spot:

  • Exact matches; for example, a “name” has a telephone number and a dossier
  • Close matches; for example, a partial name or organization is associated with the telephone number of the identity
  • Predicted matches; for example, based on available “knowns”, the system can generate a list of highly likely matches.

The particular systems with which I am familiar allow the analyst, investigator, or intelligence professional to explore the relationships among these pieces of information. Timeline functions make it trivial to plot when events took place and retrieve from the analytics module highly likely locations for future actions. If an “organization” held a meeting with several “entities” at a particular location, the geographic component can plot the actual meetings and highlight suggestions for future meetings. In short, prediction functions work in a manner similar to Excel’s filling in items in a number series.

heat map with histogram

What would you predict as a “hot spot” based on this map? The red areas, the yellow areas, the orange areas, or the areas without an overlay? Prediction is facilitated with some outputs from intelligence analysis software. (Source: Palantir via Google Image search)

Read more

Google and France: A Dust Up Escalates

March 16, 2018

The addled goose knows that most Google watchers are mesmerized by the GOOG’s about face with regard to editorial responsibility. There’s the ban on crypto currency. If you missed that news, Google is doing with other firms and some governments have failed to do — taken steps to check the rampant craziness about calculated “money” designed to work around countries’ banking systems, laws, and procedure. Then there is the linkage of the ever accurate Wikipedia to some YouTube videos. The idea is to provide some sort of knowledge based balance to what is now the Alexandria of cat videos.

Put those decisions aside, gentle reader.

Tucked into the flow of news, almost hidden beneath the editorial responsibility stories, was “France Will Sue Apple and Google over ‘Abusive’ Treatment of Developers.” A article states:

Speaking on RTL Radio on Wednesday, Finance Minister Bruno Le Maire said that he “believes in an economy based on justice” and “will take Google and Apple before the Paris Commercial Court for abusive business practices”, Reuters reports. These allegedly-abusive business practices relate to the way that the tech giants impose tariffs on developers who sell their apps via the iTunes App Store and Google Play, respectively.

There are some flashpoint words in this report. We noted “abusive.” We also think the references to “tariffs” is interesting.

The signal from France merits attention. The consequences could be interesting.

Oh, the subtitle to the story is:

Firms could face fines in the ‘millions of Euros’ if found guilty

That might be why this signal cannot be dismissed with, « Bah laisse tomber ! »

Stephen E Arnold, March 16, 2018

Bigquery Equals Big Data Transfers for Google

March 16, 2018

Google provides hundreds of services for its users; these include YouTube, AdWord, DoubleClick Campaign Manager, and more.  Google, however, is mainly used as a search engine and all of the content on its other services are fed into the search algorithm so they can be queried.  In order for all of the content to be searchable, it needs to be dumped and mined.  That requires a lot of push power, so what does Google use?  According to Smart Data Collective, Google uses the, ““Big Query Service: Next Big Thing Unveiled By Google On Big Data”.“”

Google and big data have not been in the news together for a while, but the BigQuery Data Transfer Service shows how it is moving away from SaaS.  How exactly does this work?

According to a Google’s blog post, the new service automates the migration of data from these apps in BigQuery in a scheduled and managed manner. So good so far, the service will support data transfers from AdWords, DoubleClick Campaign Manager, DoubleClick for Publishers, and YouTube Content and Channel Owner Reports and so forth. As soon as the data gets to BigQuery, users can begin querying on the immediate basis. With the help of Google Cloud Dataprep, users cannot only clean and prep the data for that analysis but also further think of analyzing other data alongside that information kept in BigQuery.

The data moves from the apps within 24 hours and BigQuery customers can schedule their own data deliveries so they occur regularly.  Customers who already use BigQuery are Trivago and Zenith.

The article turns into a press release for other services Google provides related to machine learning and explains how it is the leading company in the industry.  It is simply an advertisement for cloud migration and yet another Google service.

Whitney Grace, March 16, 2018

Twitter: Designed for Interesting Messaging

March 15, 2018

Fake news is getting harder to control and social media networks makes it harder to weed out the truth from the lies. Engadget shares how, “Twitter’s Fake News Problem Is Getting Worse” and how tragedy exacerbates the problem. For example, when a crazed shooter opened fired at a high school in Parkland, Florida, social media, including Twitter, helped spread fake news. The fake news misidentified the gunman, the number of gunmen, how a comedian was one of the shooters when it was just a meme, and misidentified missing people.

The problem is getting worse with doctored news stories, reporters being accused of false claims, and then misinformed readers reposting the information again and again. The most ironic thing is that this is what social media, especially Twitter was designed for:

“It’s just further evidence that Twitter’s fake-news problem is getting worse. After all, Twitter’s very nature is to spread information at lightning speed with little to no oversight. And ironically, it is this quality that brought Twitter to prominence in the first place. One of Twitter’s defining moments was when Janis Krum tweeted about U.S. Airways Flight 1549 landing in the Hudson River on January 15th, 2009 — he was the first to have reported it, and the tweet soon went viral. “It changed everything,” Twitter co-founder Jack Dorsey told CNBC in 2013. “Suddenly the world turned its attention because we were the source of news — and it wasn’t us, it was this person in the boat using the service.” Twitter was no longer just a place for discussing what you had for lunch. It became a place where you could get news from real people experiencing events first-hand, which was often faster than mainstream news.”

Using Twitter and other social media networks as news aggregators has brought a fresh perspective to Internet social interactions. Fake news is easily generated and shared not only by bots, but by Internet trolls and then multiplied by people who do not think critically about content.

Here’s a Beyond Search tip: Refrain from reposting anything you are unsure about, but most people do not have the filters nor the skills to distinguish fact from fiction. For many, their beliefs make the facts. Twitter and similar tools become easy to use amplifiers.

Twitter and other social networks do have a responsibility to curb the false news. In the past, old-fashioned newspapers were ideally held accountable and reporters strove to fact check all their articles. Whatever happened to the fact checking department? (Tip: Library reference desks might be an oasis for some fact checkers.) Maybe we need to create a tool that runs everything through Wikipedia first which seems to be the easy way for Google to wriggle off the hook for certain types of content.

Whitney Grace, March 15, 2018

The Flaws in Smart Software Methods

March 15, 2018

I read “Machine Learning Models Keep Getting Spoofed by Adversarial Attacks and It’s Not Clear If This Can Ever Be Fixed.” About four years ago I gave a series of lectures about the most commonly used mathematical procedures used in smart software. The lecture included equations, which I learned, are not high on the list of popular ways law enforcement and intelligence professionals favorite types of information.

Despite the inclusion of this lecture in some of my conference talks, only since the allegations, assertions, and counter assertions about interference via social media has the topic of flawed methods become popular.

The write up “Machine Learning Models…” is okay. The write up covers the basics, but specific information about why clustering can be disrupted or why anomaly detection numerical recipes can go off the rails is not included.

My point is that models can be enhanced and improved. However, in order to make even incremental progress, the companies, universities, and individuals involved in cooking up warmed over mathematical procedures have to take the initiative; for example:

  1. Question the use of textbook methods. Does Google’s struggle to identify faces in images reflect a dependence on Dr. Norvig’s recipe book?
  2. Become more demanding when threshold settings are implemented by an intern or an engineer who thinks the defaults are just dandy.
  3. Examine outputs in the context of a user who has subject matter expertise in the content and can identify wonky outputs
  4. Encourage developers to move beyond copying and pasting routines from college courses or methods invoked from a library someone said was pretty good.
  5. Evaluate the work flow sequence for its impact on system outputs
  6. Verify that “more data” works around flaws in the data by magic.

Until these types of shifts take place, smart software — whether for machine learning or making sense of real time flows of data — will remain less than perfect for many use cases.

Stephen E Arnold, March 15, 2018

Schmidt Admits It Is Hard to Discern Between Fact and Fiction

March 15, 2018

One basic research essential is learning how to tell the difference between fact and fiction.  It used to be easier to control and verify news because information dissemination was limited to physical mediums.  The Internet blew everything out of the water and made it more difficult to discern fact and fiction.  Humans can be taught tricks, but AI still has a lot to learn.  The Daily Mail reports that, “Alphabet Chairman Eric Schmidt Admits It Is ‘Very Difficult’ For Google’s Algorithm To Separate Fact From Fiction In Its Search Results.”

Millions of articles and other content is posted daily online.  Google’s job is to sift through it and delivery the most accurate results.  When opposing viewpoints are shared, Google’s algorithm has difficulty figuring out the truth.  Eric Schmidt says that can be fixed with tweaking.  He viewed fact vs. fiction problems as bugs that need repair and with some work they can be fixed.  The article highlights some of the more infamous examples of Google’s failing such as the AutoComplete feature and how conspiracy theories can be regarded as fact.

Search results displaying only hard truth will be as elusive as accurate sentiment analytics.

Schmidt added:

That is a core problem of humans that they tend to learn from each other and their friends are like them.  And so until we decide collectively that occasionally somebody not like you should be inserted into your database, which is sort of a social values thing, I think we are going to have this problem.’

Or we can just wait until we make artificial intelligence smarter.

Whitney Grace, March 15, 2018

Is Change Coming to High Tech Lobbying in Washington, DC?

March 14, 2018

The received wisdom in Washington, DC is that when it comes to politics, money talks.

The idea is simple: Donate money to a politician’s campaign or a politician’s favorite “cause” and get your email and phone calls answered.

The Independent explains that, “Google Outspends All Rival Washington Lobbyists For First Time In 2017.”

In 2017, Google spent $18 million to lobby Congress on a slew of issues ranging from immigration, tax reform, antitrust, and online advertising. Tech companies have big bucks and the power to take on Congress on governmental policies. Lawmakers, on the other hand, fire back with pot shots like allowing Russian operatives to share content and how their software and other technology allows tech companies to abuse their power.

Google’s Washington operation proposed legislation that would require Web companies to collaborate on a public database of political as that run on their platforms. The idea is that the database would prevent foreign nations from exploiting online platforms. Other companies like Amazon and Facebook have ramped up their lobbying spending too.

Despite the power tech companies wield, their roles in society are changing and there is some fear associated with it:

“‘These are companies that are touching so many parts of the economy, they are touching so many parts of our geography. So it’s inevitable that they are going to engage in a host of political and policy issues,’ said Julie Samuels, the executive director of Tech: NYC, a group that represents New York-based tech firms. Samuels added that Silicon Valley has also had to adjust to a new political order, under a Republican administration. ‘Many tech companies had only been real players during the Obama administration. They had a lot to learn.’”

Now the received wisdom may have to modified. Beyond Search noted that Palantir has landed a chunk of a US government contract to create a DCGS which meets the needs of the US Army.

We think that Google will continue to support lobbying, but it will seek more deals like its tie up with the US government’s push for artificial intelligence. What may emerge is a new approach to influencing procurement decisions and legislation in Washington.

Whitney Grace, March 14, 2018

Facebook: Now Expectations for Responsibility Are Rising

March 14, 2018

Recently, British Prime Minister Theresa May spoke out against the vengeful and often dangerous way in which social media has been utilized. According to one account she stood up for women and minorities and other groups being disenfranchised online. Good, right? Apparently, it was a little too late, as a fiery Guardian piece told us in, “Theresa May Thinks Facebook Will Police Itself? Some hope.”

In typical British journalistic tradition, the piece heavily criticizes the PM’s statement:

“This is typical Mayspeak: it mimes determination but is devoid of substance. It’s like hoping that the alcohol industry will help to stamp out binge drinking or that food manufacturers will desist from encouraging childhood obesity. Neither industry will comply for the simple reason that their continued prosperity depends on people drinking more alcohol and consuming more sugar and fat.”

While a politician saying that they trust Facebook and social media to police themselves is laughable no matter what country you live in, it raises an interesting question. Wired recently took up the same topic with an interesting spin. While its author acknowledges Facebook’s attempts at correcting its mistakes and being a safer platform for users, it points out that there’s a really simple way to handle this: more transparency. Social media giants are may find themselves forced to shift from “utility” mode to “responsible publisher” mode. When this occurs, the algorithms which help generate revenue may be found to have an unacceptable social downside.

Patrick Roland, March 14, 2018

Come on Google, Stop Delivering Offensive Content

March 14, 2018

Sentiment analytics is notoriously hard to program and leads to more chuckles than accurate results.  Throughout the year, Google, Facebook, and other big names have dealt with their own embarrassing sentiment analytics fiascos and they still continue.  The Verge shares, “Google’s Top Search Results Promote Offensive Content, Again” in an unsurprising headline.

One recent example took an offensive meme from the swathe subreddit when “gender fluid” was queried and made it the first thing displayed.  Yes, it is funny, but stuff like this keeps happening without any sign of stopping:

The slip-up comes just a month after Google briefly gave its “top stories” stamp of approval to two 4chan threads identifying the wrong suspect in the recent Las Vegas mass shooting tragedy. This latest search result problem appears to be related to the company’s snippet feature. Featured snippets are designed to answer queries instantly, and they’ve often provided bad answers in the past. Google’s Home device, for example, used a featured snippet to answer the question ‘are women evil?’ with the horrendously bad answer ‘every woman has some degree of prostitute in her.’

The ranking algorithm was developed to pull the most popular stories and deliver them regardless of their accuracy.  Third parties and inaccurate sources can manipulate the ranking algorithm for their own benefit or human.  Google is considered the de facto source of information.  There is a responsibility of purveying the truth, but there will always be people who take advantage of the news outlets.

Whitney Grace, March 14, 2018

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta