Smart Software: Chess Is One Thing, Bias Another

December 13, 2017

I enjoyed learning how Google’s smart software taught itself chess in four hours and was able to perform at a high level against mere humans. I also got a kick out of the news that Google’s smart software cannot filter YouTube videos for objectionable material. Google is in the process of hiring 10,000 humans to wade through the hours of video uploaded every minute to YouTube. Ironic? No, just different PR teams.

I read “Researchers Combat Gender and Racial Bias in Artificial Intelligence.” The write up assumes that everyone knows that algorithms contain biases. Sure, that’s a good assumption for most people.

The reality is that a comparatively few algorithmic approaches dominate smart software today. The building blocks are arranged in different sequences. The Facebookers and Googlers chug away with setting thresholds working with subsets to chop Big Data down to something affordable, and other algorithmic donkey work.

But it appears that some folks have now realized that smart software contains biases. I would toss in ethics, but that’s another epistemological challenge to keep “real” journalists on the hunt for stories.

The write up asserts:

While the AI algorithms did a credible job of predicting income levels and political leanings in a given area, Gebru [a Stanford AI wizard] says her work was susceptible to bias — racial, gender, socioeconomic.

Well, Microsoft and IBM are tackling this interesting challenge:

Researchers at Microsoft, IBM and the University of Toronto identified the need for fairness in AI systems back in 2011. Now in the wake of several high-profile incidents — including an AI beauty contest that chose predominantly white faces as winners — some of the best minds in the business are working on the bias problem.

I was tickled to learn that the smart software outfit Google has a different approach:

Google researchers are studying how adding some manual restrictions to machine learning systems can make their outputs more understandable without sacrificing output quality, an initiative nicknamed GlassBox.

Yep, humans. Chess is an easier problem to solve than bias. But in comparison to ethics, bias strikes me as a lower hurdle.

Ah, the irony. Humans instead of software at the GOOG.

Stephen E Arnold, December 13, 2017

Knowledge Supposedly the Best Investment

December 13, 2017

Read, read, read, read!  You are told it is good for you, but, much like eating vegetables, no one wants to do it.  School children loath their primers, adults say they do not have the time, and senior citizens explain it puts them to sleep.  Reading, however, is the single best investment an individual can make.  This is not new, but the Observer treats reading like some epiphany in the article, “If You’re Not Spending Five Hours Per Week Learning, You’re Being Irresponsible.”

The article opens with snippets about famous smart people and how they take the time to read at least an hour a day.  The stories are followed by these wise words:

The answer is simple: Learning is the single best investment of our time that we can make. Or as Benjamin Franklin said, ‘An investment in knowledge pays the best interest.’  This insight is fundamental to succeeding in our knowledge economy, yet few people realize it. Luckily, once you do understand the value of knowledge, it’s simple to get more of it. Just dedicate yourself to constant learning.

The standard excuse follows that in today’s modern world we are too busy making money in order to survive to learn new things, then we are slugged with the dire downer that demonetization is making previously expensive technology cheaper or even free.  Examples are provided such as video conferencing, video game consoles, cameras, encyclopedias, and anything digital.  All of these are found on a smartphone.

Technology that was once gold is now cheap, making knowledge more valuable.  Then we are told that technology will make certain jobs obsolete and the only way to survive in the future will be to gain more knowledge and apply, because this can never be taken from you. The bottom line is to read, learn, apply knowledge, and then make that a daily ritual.  The message is not anything new, but does learning via filtered and censored online search results count?

Whitney Grace, December 13, 2017

Googles Data Police Fail with Creepy Videos

December 13, 2017

YouTube is suffering from a really strange problem lately. In various children’s programming feeds, inappropriate knockoff videos of popular cartoon characters keep appearing. It has parents outraged, as we learned in a Fast Company article, “Creepy Kids Videos Like These Keep Popping Up on YouTube.”

The videos feature things like Elle from “Frozen” firing machine guns. According to the story:

A YouTube policy imposed this year says that videos showing “family entertainment characters” being “engaged in violent, sexual, vile, or otherwise inappropriate behavior” can’t be monetized with ads on the platform. But on Monday evening Fast Company found at least one violent, unlicensed superhero video, entitled “Learn Colors With Superheroes Finger Family Song Johny Johny Yes Papa Nursery Rhymes Giant Syringe,” still included ads. A YouTube spokesperson didn’t immediately comment, but by Tuesday the video’s ads had been removed.

The videos may well draw ire from legislators, as Congress takes an increasingly close look at user-generated content online in the wake of Russian election manipulation.

It feels like they really need to have a tighter rein on content. But it would surprise us if this Congress would impose too much on YouTube’s parent company, Google. With Net Neutrality likely being erased by Congress, the idea of any deeper oversight is unlikely. If anything, we think Google will be given less oversight.

Patrick Roland, December 13, 2017

Dark Cyber for December 12, 2017, Now Available

December 12, 2017

The HonkinNews Dark Cyber program for December 12, 2017, presents a snapshot of a next-generation investigation analysis system, data about illegal drugs on the Dark Web, and news about a secure chat system which runs within Tor. Most analysts and investigators have access to a range of software and hardware devices designed to make sense of data from a range of computing devices. However, the newer systems offer visual analyses which often surprise with their speed, power, and ability to deliver “at a glance” insights. This week’s Dark Cyber examines Brainspace, now a unit of Cyxtera. Brainspace’s graphics are among the most striking in the intelligence analysis market. The role that Cyxtera plays is perhaps more important. The company is a roll up of existing businesses and focused on cloud delivery of advanced software and services. Dark Cyber also provides facts from a recent European Union report about illegal substances on the Dark Web. What’s interesting about the report is that the data it presents seems to understate the magnitude of the volume of drug sales via the Dark Web. You can download the report without charge from the url included in this week’s program. The final story addresses what is a growing challenge for law enforcement and intelligence authorities: Secure chat within Tor. The Dark Cyber team reports that Anonymous Portugal has made this alleged breakthrough. (The second edition of the Dark Web Notebook will include a new chapter about chat and related services plus ways to compromise these communications.) You can view the program at this link https://youtu.be/E2jNuJXblOI.

Kenny Toth, December 12, 2017

IBM AI: Speeding Up One Thing, Ignoring a Slow Thing

December 12, 2017

I read “IBM Develops Preprocessing Block, Makes Machine Learning Faster Tenfold.” I read this statement and took out my trust Big Blue marketing highlight felt tip:

“To the best of our knowledge, we are first to have generic solution with a 10x speedup. Specifically, for traditional, linear machine learning models — which are widely used for data sets that are too big for neural networks to train on — we have implemented the techniques on the best reference schemes and demonstrated a minimum of a 10x speedup.” [Emphasis added to make it easy to spot certain semantically-rich verbiage.”]

I like the traditional, linear, and demonstrated lingo.

From my vantage point, this is useful, but it is one modest component of a traditional, linear machine learning “model”.

The part which suck ups subject matter experts, time, and money (lots of money) includes these steps:

  1. Collecting domain specific information, figuring out what’s important and what’s not, and figuring out how to match what a person or subsystem needs to know against this domain knowledge
  2. Collecting the information. Sure, this seems easy, but it can be a slippery fish for some domains. Tidy, traditional domains like a subset of technical information make it easier and cheaper to fiddle with word lists, synonym expansion “helpers”, and sources which are supposed to be accurate. Accuracy, of course, is a bit like mom’s apple pie.
  3. Converting the source information into a format which the content processing system can use without choking storage space with exceptions or engaging in computationally expensive conversions which have to be checked by software or humans before pushing the content to the content processing subsystem. (Some outfits fudge by limiting content types. The approach works in some eDiscovery system because the information is in more predictable formats.)

What is the time and money relationship of dealing with these three steps versus the speed up for the traditional machine learning models? In my experience the cost of the three steps identified above are often greater than the cost of the downstream processes. So a 10 percent speed up in a single process is helpful but it won’t pay for pizza for the development team.

Just my view from Harrod’s Creek, which sees things in a way which is different from IBM marketing and IBM Zurich wizards. Shoot those squirrels before eating them, you hear.

Stephen E Arnold, December  12, 2017

Google Is Taught Homosexuality Is Bad

December 12, 2017

The common belief is that computers and software are objectives, inanimate objects capable of greater intelligence than humans.  The truth is that humans developed computers and software, so the objective, inanimate objects are only as smart as their designers.  What is even more hilarious is the sentiment analysis AI development process requires tons of data for the algorithms to read and teach itself to recognize patterns.  The data used is “contaminated” with human emotion and prejudices.  Motherboard wrote about how artificial bias pollutes AI in the article, “Google’s Sentiment Analyzer Thinks Being Gay Is Bad.”

The problem when designing AI is that if it is programmed with polluted and biased data, then these super intelligent algorithms will discriminate against people rather than being objective.  Google released its Cloud Natural Language API that allows developers to add Google’s deep learning models into their own applications.  Along with entity recognition, the API included a sentiment analyzer that detected when text contained a positive or negative sentiment.  However, it has a few bugs and returns biased results, such as saying being gay is bad, certain religions are bad, etc.

It looks like Google’s sentiment analyzer is biased, as many artificially intelligent algorithms have been found to be. AI systems, including sentiment analyzers, are trained using human texts like news stories and books. Therefore, they often reflect the same biases found in society. We don’t know yet the best way to completely remove bias from artificial intelligence, but it’s important to continue to expose it.

The problem with programming AI algorithms is that it is difficult to feed it data free of human prejudices. It is difficult to work around these prejudices, because they are so ingrained in most data.  Programmers are kept on their toes to find a solution, but it is not a one size fits all one.  Too bad they cannot just stick with numbers and dictionaries.

Whitney Grace, December 12, 2017

China Has an AI Police Station and That Is Not a Good Thing

December 12, 2017

The wave of things artificial intelligence can do is amazing. In China, they are even handling law enforcement with intelligent machines. While this might be a boon for efficiency, people like Stephen Hawking are not happy. We learned more from the Sanvada article, “Check Out The Artificial Intelligence-Powered Police Station in China.”

According to the story:

Recently China announced the opening of an AI-powered police station in Wuhan illustrating its plans to fully incorporate artificial intelligence as a functional part of its systems.

But the most interesting turn comes later, stating:

Artificial intelligence may not yet be up to the task. After all, not every case in the designated area will relate to car or driving related issues. Artificial intelligence has yet to be proven to have the capability of solving complex disputes. It may not use of all of the facts or comprehend the intricate dynamics of human relationships or the damage which can be caused to people whether it is in the case of molestation or rape and hence, may not have the sensitivity to deal with such scenarios.

We love the multitude of uses for AI but have to agree with the skepticism of Sanvada. One of the smartest people on the planet also agrees. Stephen Hawking recently commented that “AI could be the worst event in human history.” Let’s hope he’s not right and let’s hope wise guidance proves that AI police stations stay a novelty in the world of AI.

Patrick Roland, December 12, 2017

Progress: From Selling NLP to Providing NLP Services

December 11, 2017

Years ago, Progress Software owned an NLP system. I recall conversations with natural language processing wizards from Easy Ask. Larry Harris developed a natural language system in 1999 or 2000. Progress purchased EasyAsk in 2005 if memory serves. I interviewed Craig Bassin in 2010 as part of my Search Wizards Speak series.

The recollection I have was that Progress divested itself of EasyAsk in order to focus on enterprise applications other than NLP. No big deal. Software companies are bought and sold everyday.

However, what makes this recollection interesting to me is the information in “Beyond NLP: 8 Challenges to Building a Chatbot.” Progress went from a software company who owned an NLP system to a company which is advising people like me how challenging a chatbot system can be to build and make work. (I noted that the Wikipedia entry for Progress does not mention the EasyAsk acquisition and subsequent de-acquisition.) Either small potatoes or a milestone best jumped over I assume.)

Presumably it is easier to advise and get paid to implement than funding and refining an NLP system like EasyAsk. If you are not familiar with EasyAsk, the company positions itself in eCommerce site search with its “cognitive eCommerce” technology. EasyAsk’s capabilities include voice enabled natural language mobile search. This strikes me as a capability which is similar to that of a chatbot as I understand the concept.

History is history one of my high school teachers once observed. Let’s move on.

What are the eight challenges to standing up a chatbot which sort of works? Here they are:

  1. The chat interface
  2. NLP
  3. The “context” of the bot
  4. Loops, splits, and recursions
  5. Integration with legacy systems
  6. Analytics
  7. Handoffs
  8. Character, tone, and persona.

As I review this list, I note that I have to decide whether to talk to a chatbot or type into a box so a “customer care representative” can assist me. The “representative” is, the assumption is, a smart software robot.

I also notice that the bot has to have context. Think of a car dealer and the potential customer. The bot has to know that I want to buy a car. Seems obvious. But okay.

“Loops, splits, and recursions.” Frankly I have no idea what this means. I know that chatbot centric companies use jargon. I assume that this means “programming” so the NLP system returns a semi-on point answer.

Integration with legacy systems and handoffs seem to be similar to me. I would just call these two steps “integration” and be done with it.

The “character, tone, and persona” seems to apply to how the chatbot sounds; for example, the nasty, imperious tone of a Kroger automated check out system.

Net net: Progress is in the business of selling advisory and engineering services. The reason, in my opinion, was that Progress could not crack the code to make search and retrieval generate expected payoffs. Like some Convera executives, selling search related services was a more attractive path.

Stephen E Arnold, December 11, 2017

Cricket More Popular Than Koran

December 11, 2017

In the West, we tend to think that Islamic countries spend all waking hours of the day praying, reading the Koran, and doing other religious-based activities.  We forget that these people are just as human as the rest of the world and have a genuine interest in other things, like sports.  While not the most popular sport in North America, cricket has billions of fans and is very popular in Pakistan reports Research Snipers in the article, “Most Popular Keywords Searched On Google Pakistan.”

Google Trends is a free service the search engine provides that allows people to see how popular a search query is.  It shows how popular the search query is across a global spectrum.  When it comes to Pakistan, the most popular search terms of 2017 are as follows:

Top keywords searched in Pakistan in 2017, till now are

  • Pakistan

  • Cricket Pakistan

  • Pakistan Cricket Team

  • India

  • Pakistan India

  • News Pakistan

    Pakistan Jobs.

People in Pakistan are huge sports fans of the British sport and shopping apparently.  The Google AutoComplete tool suggests search terms based on letters users type into the search box.  Wen “A” is typed into a Pakistan Google search box, Amazon pops up.  Pakistanis love to shop and the sports cricket.  They are not any different than the rest of the world.

Whitney Grace, December 11, 2017

Big Shock: Social Media Algorithms Are Not Your Friend

December 11, 2017

One of Facebook’s founding fathers, Sean Parker, has done a surprising about-face on the online platform that earned him billions of dollars. Parker has begun speaking out against social media and the hidden machinery that keeps people interested. We learned more from a recent Axios story,Sean Parker Unloads on Facebook ‘Exploiting’ Human Psychology.

According to the story:

Parker’s I-was-there account provides priceless perspective in the rising debate about the power and effects of the social networks, which now have scale and reach unknown in human history. He’s worried enough that he’s sounding the alarm.

According to Parker:

The thought process that went into building these applications, Facebook being the first of them, … was all about: ‘How do we consume as much of your time and conscious attention as possible?’

 

And that means that we need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that’s going to get you to contribute more content, and that’s going to get you … more likes and comments.

What’s at stake here isn’t just human psychology being exploited, though. It’s a major part of the story, but, as Forbes pointed out, we are on the cusp of social engineering via social media. If more people like Parker don’t stand up and offer a solution, we fear there could be serious repercussions.

Patrick Roland, December 11, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta