Googles Data Police Fail with Creepy Videos

December 13, 2017

YouTube is suffering from a really strange problem lately. In various children’s programming feeds, inappropriate knockoff videos of popular cartoon characters keep appearing. It has parents outraged, as we learned in a Fast Company article, “Creepy Kids Videos Like These Keep Popping Up on YouTube.”

The videos feature things like Elle from “Frozen” firing machine guns. According to the story:

A YouTube policy imposed this year says that videos showing “family entertainment characters” being “engaged in violent, sexual, vile, or otherwise inappropriate behavior” can’t be monetized with ads on the platform. But on Monday evening Fast Company found at least one violent, unlicensed superhero video, entitled “Learn Colors With Superheroes Finger Family Song Johny Johny Yes Papa Nursery Rhymes Giant Syringe,” still included ads. A YouTube spokesperson didn’t immediately comment, but by Tuesday the video’s ads had been removed.

The videos may well draw ire from legislators, as Congress takes an increasingly close look at user-generated content online in the wake of Russian election manipulation.

It feels like they really need to have a tighter rein on content. But it would surprise us if this Congress would impose too much on YouTube’s parent company, Google. With Net Neutrality likely being erased by Congress, the idea of any deeper oversight is unlikely. If anything, we think Google will be given less oversight.

Patrick Roland, December 13, 2017

Dark Cyber for December 12, 2017, Now Available

December 12, 2017

The HonkinNews Dark Cyber program for December 12, 2017, presents a snapshot of a next-generation investigation analysis system, data about illegal drugs on the Dark Web, and news about a secure chat system which runs within Tor. Most analysts and investigators have access to a range of software and hardware devices designed to make sense of data from a range of computing devices. However, the newer systems offer visual analyses which often surprise with their speed, power, and ability to deliver “at a glance” insights. This week’s Dark Cyber examines Brainspace, now a unit of Cyxtera. Brainspace’s graphics are among the most striking in the intelligence analysis market. The role that Cyxtera plays is perhaps more important. The company is a roll up of existing businesses and focused on cloud delivery of advanced software and services. Dark Cyber also provides facts from a recent European Union report about illegal substances on the Dark Web. What’s interesting about the report is that the data it presents seems to understate the magnitude of the volume of drug sales via the Dark Web. You can download the report without charge from the url included in this week’s program. The final story addresses what is a growing challenge for law enforcement and intelligence authorities: Secure chat within Tor. The Dark Cyber team reports that Anonymous Portugal has made this alleged breakthrough. (The second edition of the Dark Web Notebook will include a new chapter about chat and related services plus ways to compromise these communications.) You can view the program at this link https://youtu.be/E2jNuJXblOI.

Kenny Toth, December 12, 2017

IBM AI: Speeding Up One Thing, Ignoring a Slow Thing

December 12, 2017

I read “IBM Develops Preprocessing Block, Makes Machine Learning Faster Tenfold.” I read this statement and took out my trust Big Blue marketing highlight felt tip:

“To the best of our knowledge, we are first to have generic solution with a 10x speedup. Specifically, for traditional, linear machine learning models — which are widely used for data sets that are too big for neural networks to train on — we have implemented the techniques on the best reference schemes and demonstrated a minimum of a 10x speedup.” [Emphasis added to make it easy to spot certain semantically-rich verbiage.”]

I like the traditional, linear, and demonstrated lingo.

From my vantage point, this is useful, but it is one modest component of a traditional, linear machine learning “model”.

The part which suck ups subject matter experts, time, and money (lots of money) includes these steps:

  1. Collecting domain specific information, figuring out what’s important and what’s not, and figuring out how to match what a person or subsystem needs to know against this domain knowledge
  2. Collecting the information. Sure, this seems easy, but it can be a slippery fish for some domains. Tidy, traditional domains like a subset of technical information make it easier and cheaper to fiddle with word lists, synonym expansion “helpers”, and sources which are supposed to be accurate. Accuracy, of course, is a bit like mom’s apple pie.
  3. Converting the source information into a format which the content processing system can use without choking storage space with exceptions or engaging in computationally expensive conversions which have to be checked by software or humans before pushing the content to the content processing subsystem. (Some outfits fudge by limiting content types. The approach works in some eDiscovery system because the information is in more predictable formats.)

What is the time and money relationship of dealing with these three steps versus the speed up for the traditional machine learning models? In my experience the cost of the three steps identified above are often greater than the cost of the downstream processes. So a 10 percent speed up in a single process is helpful but it won’t pay for pizza for the development team.

Just my view from Harrod’s Creek, which sees things in a way which is different from IBM marketing and IBM Zurich wizards. Shoot those squirrels before eating them, you hear.

Stephen E Arnold, December  12, 2017

Google Is Taught Homosexuality Is Bad

December 12, 2017

The common belief is that computers and software are objectives, inanimate objects capable of greater intelligence than humans.  The truth is that humans developed computers and software, so the objective, inanimate objects are only as smart as their designers.  What is even more hilarious is the sentiment analysis AI development process requires tons of data for the algorithms to read and teach itself to recognize patterns.  The data used is “contaminated” with human emotion and prejudices.  Motherboard wrote about how artificial bias pollutes AI in the article, “Google’s Sentiment Analyzer Thinks Being Gay Is Bad.”

The problem when designing AI is that if it is programmed with polluted and biased data, then these super intelligent algorithms will discriminate against people rather than being objective.  Google released its Cloud Natural Language API that allows developers to add Google’s deep learning models into their own applications.  Along with entity recognition, the API included a sentiment analyzer that detected when text contained a positive or negative sentiment.  However, it has a few bugs and returns biased results, such as saying being gay is bad, certain religions are bad, etc.

It looks like Google’s sentiment analyzer is biased, as many artificially intelligent algorithms have been found to be. AI systems, including sentiment analyzers, are trained using human texts like news stories and books. Therefore, they often reflect the same biases found in society. We don’t know yet the best way to completely remove bias from artificial intelligence, but it’s important to continue to expose it.

The problem with programming AI algorithms is that it is difficult to feed it data free of human prejudices. It is difficult to work around these prejudices, because they are so ingrained in most data.  Programmers are kept on their toes to find a solution, but it is not a one size fits all one.  Too bad they cannot just stick with numbers and dictionaries.

Whitney Grace, December 12, 2017

China Has an AI Police Station and That Is Not a Good Thing

December 12, 2017

The wave of things artificial intelligence can do is amazing. In China, they are even handling law enforcement with intelligent machines. While this might be a boon for efficiency, people like Stephen Hawking are not happy. We learned more from the Sanvada article, “Check Out The Artificial Intelligence-Powered Police Station in China.”

According to the story:

Recently China announced the opening of an AI-powered police station in Wuhan illustrating its plans to fully incorporate artificial intelligence as a functional part of its systems.

But the most interesting turn comes later, stating:

Artificial intelligence may not yet be up to the task. After all, not every case in the designated area will relate to car or driving related issues. Artificial intelligence has yet to be proven to have the capability of solving complex disputes. It may not use of all of the facts or comprehend the intricate dynamics of human relationships or the damage which can be caused to people whether it is in the case of molestation or rape and hence, may not have the sensitivity to deal with such scenarios.

We love the multitude of uses for AI but have to agree with the skepticism of Sanvada. One of the smartest people on the planet also agrees. Stephen Hawking recently commented that “AI could be the worst event in human history.” Let’s hope he’s not right and let’s hope wise guidance proves that AI police stations stay a novelty in the world of AI.

Patrick Roland, December 12, 2017

Progress: From Selling NLP to Providing NLP Services

December 11, 2017

Years ago, Progress Software owned an NLP system. I recall conversations with natural language processing wizards from Easy Ask. Larry Harris developed a natural language system in 1999 or 2000. Progress purchased EasyAsk in 2005 if memory serves. I interviewed Craig Bassin in 2010 as part of my Search Wizards Speak series.

The recollection I have was that Progress divested itself of EasyAsk in order to focus on enterprise applications other than NLP. No big deal. Software companies are bought and sold everyday.

However, what makes this recollection interesting to me is the information in “Beyond NLP: 8 Challenges to Building a Chatbot.” Progress went from a software company who owned an NLP system to a company which is advising people like me how challenging a chatbot system can be to build and make work. (I noted that the Wikipedia entry for Progress does not mention the EasyAsk acquisition and subsequent de-acquisition.) Either small potatoes or a milestone best jumped over I assume.)

Presumably it is easier to advise and get paid to implement than funding and refining an NLP system like EasyAsk. If you are not familiar with EasyAsk, the company positions itself in eCommerce site search with its “cognitive eCommerce” technology. EasyAsk’s capabilities include voice enabled natural language mobile search. This strikes me as a capability which is similar to that of a chatbot as I understand the concept.

History is history one of my high school teachers once observed. Let’s move on.

What are the eight challenges to standing up a chatbot which sort of works? Here they are:

  1. The chat interface
  2. NLP
  3. The “context” of the bot
  4. Loops, splits, and recursions
  5. Integration with legacy systems
  6. Analytics
  7. Handoffs
  8. Character, tone, and persona.

As I review this list, I note that I have to decide whether to talk to a chatbot or type into a box so a “customer care representative” can assist me. The “representative” is, the assumption is, a smart software robot.

I also notice that the bot has to have context. Think of a car dealer and the potential customer. The bot has to know that I want to buy a car. Seems obvious. But okay.

“Loops, splits, and recursions.” Frankly I have no idea what this means. I know that chatbot centric companies use jargon. I assume that this means “programming” so the NLP system returns a semi-on point answer.

Integration with legacy systems and handoffs seem to be similar to me. I would just call these two steps “integration” and be done with it.

The “character, tone, and persona” seems to apply to how the chatbot sounds; for example, the nasty, imperious tone of a Kroger automated check out system.

Net net: Progress is in the business of selling advisory and engineering services. The reason, in my opinion, was that Progress could not crack the code to make search and retrieval generate expected payoffs. Like some Convera executives, selling search related services was a more attractive path.

Stephen E Arnold, December 11, 2017

Cricket More Popular Than Koran

December 11, 2017

In the West, we tend to think that Islamic countries spend all waking hours of the day praying, reading the Koran, and doing other religious-based activities.  We forget that these people are just as human as the rest of the world and have a genuine interest in other things, like sports.  While not the most popular sport in North America, cricket has billions of fans and is very popular in Pakistan reports Research Snipers in the article, “Most Popular Keywords Searched On Google Pakistan.”

Google Trends is a free service the search engine provides that allows people to see how popular a search query is.  It shows how popular the search query is across a global spectrum.  When it comes to Pakistan, the most popular search terms of 2017 are as follows:

Top keywords searched in Pakistan in 2017, till now are

  • Pakistan

  • Cricket Pakistan

  • Pakistan Cricket Team

  • India

  • Pakistan India

  • News Pakistan

    Pakistan Jobs.

People in Pakistan are huge sports fans of the British sport and shopping apparently.  The Google AutoComplete tool suggests search terms based on letters users type into the search box.  Wen “A” is typed into a Pakistan Google search box, Amazon pops up.  Pakistanis love to shop and the sports cricket.  They are not any different than the rest of the world.

Whitney Grace, December 11, 2017

Big Shock: Social Media Algorithms Are Not Your Friend

December 11, 2017

One of Facebook’s founding fathers, Sean Parker, has done a surprising about-face on the online platform that earned him billions of dollars. Parker has begun speaking out against social media and the hidden machinery that keeps people interested. We learned more from a recent Axios story,Sean Parker Unloads on Facebook ‘Exploiting’ Human Psychology.

According to the story:

Parker’s I-was-there account provides priceless perspective in the rising debate about the power and effects of the social networks, which now have scale and reach unknown in human history. He’s worried enough that he’s sounding the alarm.

According to Parker:

The thought process that went into building these applications, Facebook being the first of them, … was all about: ‘How do we consume as much of your time and conscious attention as possible?’

 

And that means that we need to sort of give you a little dopamine hit every once in a while, because someone liked or commented on a photo or a post or whatever. And that’s going to get you to contribute more content, and that’s going to get you … more likes and comments.

What’s at stake here isn’t just human psychology being exploited, though. It’s a major part of the story, but, as Forbes pointed out, we are on the cusp of social engineering via social media. If more people like Parker don’t stand up and offer a solution, we fear there could be serious repercussions.

Patrick Roland, December 11, 2017

Cloud Computing Resources: Cost Analysis for Machine Learning

December 8, 2017

Information about the cost of performing a specific task in a cloud computing set up can be tough to get. Reliable cross platform, apples-to-apples cost analyses are even more difficult to obtain.

A tip of the hat to the author of “Machine Learning Benchmarks: Hardware Providers.” The article includes some useful data about the costs of performing tasks on the cloud services available from Amazon, Google,  Hetzner, and IBM,

My suggestion is to make a copy of the article.

The big surprise: Amazon was the high-cost service. Google is less expensive.

One downside: No Microsoft costs.

Stephen E Arnold, December 8, 2017

Twitter Changes API Offerings and Invites Trouble

December 8, 2017

Twitter has beefed up its API offerings to users, but it comes with an increasing price tag. While that is not a huge issue for many people, it will invite some problem if not played properly. We discovered this interesting change in a recent Venture Beat piece, “Twitter’s New Premium APIs Give Developers Access to More Tweets, Higher Rate Limits.”

According to the story:

Twitter is offering a solution for developers who are angry about limitations imposed on their apps when they use the service’s free APIs. The company has now introduced premium APIs to bridge the gap between the free service and the enterprise-level tools it provides through Gnip.

 

Developers will likely welcome this solution, though many will also say it’s long overdue. After the company’s mea culpa at its Flight conference in 2015, Twitter has made efforts to understand developers’ needs and has reallocated resources, including selling its Fabric mobile developer platform to Google.

Time will tell if this uptick in API accessibility will help Twitter financially. The company has long been seeking a financial home run since going public. While there are several ways APIs can solve outside problems and bring stability to a company, this can also fall flat on its face. Especially if developers don’t want to pay the fees or if the APIs don’t live up to the hype. Fingers crossed.

Patrick Roland, December 8, 2017

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta