Smart Software: Chess Is One Thing, Bias Another

December 13, 2017

I enjoyed learning how Google’s smart software taught itself chess in four hours and was able to perform at a high level against mere humans. I also got a kick out of the news that Google’s smart software cannot filter YouTube videos for objectionable material. Google is in the process of hiring 10,000 humans to wade through the hours of video uploaded every minute to YouTube. Ironic? No, just different PR teams.

I read “Researchers Combat Gender and Racial Bias in Artificial Intelligence.” The write up assumes that everyone knows that algorithms contain biases. Sure, that’s a good assumption for most people.

The reality is that a comparatively few algorithmic approaches dominate smart software today. The building blocks are arranged in different sequences. The Facebookers and Googlers chug away with setting thresholds working with subsets to chop Big Data down to something affordable, and other algorithmic donkey work.

But it appears that some folks have now realized that smart software contains biases. I would toss in ethics, but that’s another epistemological challenge to keep “real” journalists on the hunt for stories.

The write up asserts:

While the AI algorithms did a credible job of predicting income levels and political leanings in a given area, Gebru [a Stanford AI wizard] says her work was susceptible to bias — racial, gender, socioeconomic.

Well, Microsoft and IBM are tackling this interesting challenge:

Researchers at Microsoft, IBM and the University of Toronto identified the need for fairness in AI systems back in 2011. Now in the wake of several high-profile incidents — including an AI beauty contest that chose predominantly white faces as winners — some of the best minds in the business are working on the bias problem.

I was tickled to learn that the smart software outfit Google has a different approach:

Google researchers are studying how adding some manual restrictions to machine learning systems can make their outputs more understandable without sacrificing output quality, an initiative nicknamed GlassBox.

Yep, humans. Chess is an easier problem to solve than bias. But in comparison to ethics, bias strikes me as a lower hurdle.

Ah, the irony. Humans instead of software at the GOOG.

Stephen E Arnold, December 13, 2017

IBM AI: Speeding Up One Thing, Ignoring a Slow Thing

December 12, 2017

I read “IBM Develops Preprocessing Block, Makes Machine Learning Faster Tenfold.” I read this statement and took out my trust Big Blue marketing highlight felt tip:

“To the best of our knowledge, we are first to have generic solution with a 10x speedup. Specifically, for traditional, linear machine learning models — which are widely used for data sets that are too big for neural networks to train on — we have implemented the techniques on the best reference schemes and demonstrated a minimum of a 10x speedup.” [Emphasis added to make it easy to spot certain semantically-rich verbiage.”]

I like the traditional, linear, and demonstrated lingo.

From my vantage point, this is useful, but it is one modest component of a traditional, linear machine learning “model”.

The part which suck ups subject matter experts, time, and money (lots of money) includes these steps:

  1. Collecting domain specific information, figuring out what’s important and what’s not, and figuring out how to match what a person or subsystem needs to know against this domain knowledge
  2. Collecting the information. Sure, this seems easy, but it can be a slippery fish for some domains. Tidy, traditional domains like a subset of technical information make it easier and cheaper to fiddle with word lists, synonym expansion “helpers”, and sources which are supposed to be accurate. Accuracy, of course, is a bit like mom’s apple pie.
  3. Converting the source information into a format which the content processing system can use without choking storage space with exceptions or engaging in computationally expensive conversions which have to be checked by software or humans before pushing the content to the content processing subsystem. (Some outfits fudge by limiting content types. The approach works in some eDiscovery system because the information is in more predictable formats.)

What is the time and money relationship of dealing with these three steps versus the speed up for the traditional machine learning models? In my experience the cost of the three steps identified above are often greater than the cost of the downstream processes. So a 10 percent speed up in a single process is helpful but it won’t pay for pizza for the development team.

Just my view from Harrod’s Creek, which sees things in a way which is different from IBM marketing and IBM Zurich wizards. Shoot those squirrels before eating them, you hear.

Stephen E Arnold, December  12, 2017

Google Is Taught Homosexuality Is Bad

December 12, 2017

The common belief is that computers and software are objectives, inanimate objects capable of greater intelligence than humans.  The truth is that humans developed computers and software, so the objective, inanimate objects are only as smart as their designers.  What is even more hilarious is the sentiment analysis AI development process requires tons of data for the algorithms to read and teach itself to recognize patterns.  The data used is “contaminated” with human emotion and prejudices.  Motherboard wrote about how artificial bias pollutes AI in the article, “Google’s Sentiment Analyzer Thinks Being Gay Is Bad.”

The problem when designing AI is that if it is programmed with polluted and biased data, then these super intelligent algorithms will discriminate against people rather than being objective.  Google released its Cloud Natural Language API that allows developers to add Google’s deep learning models into their own applications.  Along with entity recognition, the API included a sentiment analyzer that detected when text contained a positive or negative sentiment.  However, it has a few bugs and returns biased results, such as saying being gay is bad, certain religions are bad, etc.

It looks like Google’s sentiment analyzer is biased, as many artificially intelligent algorithms have been found to be. AI systems, including sentiment analyzers, are trained using human texts like news stories and books. Therefore, they often reflect the same biases found in society. We don’t know yet the best way to completely remove bias from artificial intelligence, but it’s important to continue to expose it.

The problem with programming AI algorithms is that it is difficult to feed it data free of human prejudices. It is difficult to work around these prejudices, because they are so ingrained in most data.  Programmers are kept on their toes to find a solution, but it is not a one size fits all one.  Too bad they cannot just stick with numbers and dictionaries.

Whitney Grace, December 12, 2017

China Has an AI Police Station and That Is Not a Good Thing

December 12, 2017

The wave of things artificial intelligence can do is amazing. In China, they are even handling law enforcement with intelligent machines. While this might be a boon for efficiency, people like Stephen Hawking are not happy. We learned more from the Sanvada article, “Check Out The Artificial Intelligence-Powered Police Station in China.”

According to the story:

Recently China announced the opening of an AI-powered police station in Wuhan illustrating its plans to fully incorporate artificial intelligence as a functional part of its systems.

But the most interesting turn comes later, stating:

Artificial intelligence may not yet be up to the task. After all, not every case in the designated area will relate to car or driving related issues. Artificial intelligence has yet to be proven to have the capability of solving complex disputes. It may not use of all of the facts or comprehend the intricate dynamics of human relationships or the damage which can be caused to people whether it is in the case of molestation or rape and hence, may not have the sensitivity to deal with such scenarios.

We love the multitude of uses for AI but have to agree with the skepticism of Sanvada. One of the smartest people on the planet also agrees. Stephen Hawking recently commented that “AI could be the worst event in human history.” Let’s hope he’s not right and let’s hope wise guidance proves that AI police stations stay a novelty in the world of AI.

Patrick Roland, December 12, 2017

Neural Network Revamps Search for Research

December 7, 2017

Research is a pain, especially when you have to slog through millions of results to find specific and accurate results.  It takes time and lot of reading, but neural networks could cut down on the investigation phase.  The Economist wrote a new article about how AI will benefit research: “A Better Way To Search Through Scientific Papers.”

The Allen Institute for Artificial Intelligence developed Semantic Search to aid scientific research.  Semantic Search’s purpose is to discover scientific papers most relevant to a particular problem.  How does Semantic Scholar work?

Instead of relying on citations in other papers, or the frequency of recurring phrases to rank the relevance of papers, as it once did and rivals such as Google Scholar still do, the new version of Semantic Scholar applies AI to try to understand the context of those phrases, and thus achieve better results.

Semantic Scholar relies on a neural network, a system that mirrors real neural networks and learns by trial and error tests.  To make Semantic Search work, the Allen Institute team annotated ten and sixty-seven abstracts.  From this test sample, they found 7,000 medical terms with which 2,000 could be paired.  The information was fed into the Semantic Search neural network, then it found more relationships based on the data.  Through trial and error, the neural network learns more patterns.

The Allen Institute added 26 million biomedical research papers to the already 12 million in the database.  The plan is to make scientific and medical research more readily available to professionals, but also to regular people.

Whitney Grace, December 7, 2017

Filtered Content: Tactical Differences between Dow Jones and Thomson Reuters

December 5, 2017

You may know that Dow Jones has an online search company. The firm is called Factiva, and it is an old-school approach to finding information. The company recently announced a deal with an outfit called Curation. Founded by a former newspaper professional, Curation uses mostly humans to assemble reports on hot topics. Factiva is reselling these services, and advertising for customers in the Wall Street Journal. Key point: This is mostly a manual method. The approach was more in line with the types of “reports” available from blue chip consulting firms.

You may also know that Thomson Reuters has been rolling out machine curated reports. These have many different product names. Thomson Reuters has a large number of companies and brands. Not surprisingly, Thomson’s approach has to apply to many companies managed by executives who compete with regular competitors like Dow Jones but also among themselves. Darwin would have loved Thomson Reuters. The point is that Thomson Reuters’ approach relies on “smart” software.

You can read about Dow Jones’ play here.

You can read about Thomson Reuters’ play here.

My take is that these two different approaches reflect the painful fact that there is not clear path forward for professional publishing companies. In order to make money from electronic information, two of the major players are still experimenting. The digital revolution began, what?, about 40 years ago.

One would have thought that leading companies like Dow Jones and Thomson Reuters would have moved beyond the experimental stage and into cash cow land.

Not yet it seems. The reason for my pointing out these two different approaches is that there are more innovative methods available. For snapshots of companies which move beyond the Factiva and Thomson methods, watch Dark Cyber, a new program is available every Tuesday via YouTube at this link.

Stephen E Arnold, December 5, 2017

Microsoft Bing Has the Last AI Laugh

December 1, 2017

Nobody likes Bing, but because it is a Microsoft product it continues to endure.  It chugs along as the second most used search engine in the US, but apparently first is the worst and second is the best for creating a database of useful information for AI.  India News 24 shares that, “Microsoft Bing: The Redmond Giant’s Overlooked Tool” is worth far more than thought.

Every day millions of users use Bing by inputting search queries as basic keywords, questions, and even images.  In order to test an AI algorithm, huge datasets are needed so the algorithm can learn and discover patterns.  Bing is the key to creating the necessary datasets.  You also might be using Bing without knowing it as it powers Yahoo search and is also on Amazon tablets.

All of this has helped Microsoft better understand language, images and text at a large scale, said Steve Clayton, who as Microsoft’s chief storyteller helps communicate the company’s AI strategy.  It is amazing how Bing serves a dual purpose:

Bing serves dual purposes, he said, as a source of data to train artificial intelligence and a vehicle to be able to deliver smarter services.  While Google also has the advantage of a powerful search engine, other companies making big investments in the AI race – such as IBM or Amazon – do not.

Amazon has access to search queries centered on e-commerce, but when it comes to everything else that is not available in one of their warehouses.  This is where Bing comes in.  Bing feeding Microsoft’s AI projects has yet to turn a profit, but AI is still a new market and new projects are always being worked on.

Whitney Grace, December 1, 2017

The Thing Holding AI Back Is the Thing It Needs Most, Data

November 30, 2017

Here’s an interesting problem: for artificial intelligence and machine learning to thrive, it needs a massive amount of information. However, they need so much data that it causes hiccups in the system. Google has a really interesting solution to this problem, as we learned in the Reuter’s article, “Google’s Hinton Outlines New AI Advance That Requires Less Data.”

The bundling of neurons working together to determine both whether a feature is present and its characteristics also means the system should require less data to make its predictions.


The leader of Google Brain said, “The hope is that maybe we might require less data to learn good classifiers of objects, because they have this ability of generalizing to unseen perspectives or configurations of images.

Less data for big data? It’s just crazy enough to work. In fact, some of the brightest minds in the business are trying to, as ComputerWorld said, “do less with more.” The piece focuses on Fuzzy LogiX and their attempts to do exactly what Google is hypothetically saying. It will be interesting to see what happens, but we are betting on technology cracking this nut.

Patrick Roland, November 30, 2017


Semantic Scholar Expanding with Biomedical Lit

November 29, 2017

Academic publishing is the black hole of the publishing world.  While it is a prestigious honor to have your work published by a scholar press or journal, it will not have a high circulation.  One reason that academic material is blocked behind expensive paywalls and another is that papers are not indexed well.  Tech Crunch has some good news for researchers: “Allen institute For AI’s Semantic Scholar Adds Biomedical Papers To Its AI-Sorted Corpus.”

The Allen Institute for AI started the Semantic Scholar is an effort to index scientific literature with NLP and other AI algorithms.  Semantic Scholar will now include biomedical texts in the index.  There is way too much content available for individuals to read and create indices.  AI helps catalog and create keywords for papers by scanning an entire text, pulling key themes, and adding it to the right topic.

There’s so much literature being published now, and it stretches back so far, that it’s practically impossible for a single researcher or even a team to adequately review it. What if a paper from six years ago happened to note a slight effect of a drug byproduct on norepinephrine production, but it wasn’t a main finding, or was in a journal from a different discipline?

Scientific studies are being called into question, especially when the tests are funded by corporate entities.  It is important to verify truth from false information as we consume more and more each day.  Tools like Semantic Scholar are key to uncovering the truth.  It is too bad it does not receive more attention.

Whitney Grace, November 29, 2017


Experts Desperately Seeking the Secret to Big Data Security

November 28, 2017

As machine learning and AI becomes a more prevalent factor in our day-to-day life, the daily risk of a security breach threatens. This is a major concern for AI experts and you should be concerned too. We learned how scary the fight feels from a recent Tech Target article, “Machine Learning’s Training is Security Vulnerable.”

According to the story:

To tune machine learning algorithms, developers often turn to the internet for training data — it is, after all, a virtual treasure trove of the stuff. Open APIs from Twitter and Reddit, for example, are popular training data resources. Developers scrub them of problematic content and language, but the data-cleansing techniques are no match for the methods used by adversarial actors…

What could solve that risk? Some experts have been proposing a very interesting solution: a global security framework. While this seems like a great way to roadblock hackers, it may also pose a threat. As the Tech Target piece states, hacking technology usually moves at the same speed as a normal tech. So, a global security framework would look like a mighty tempting prize for hackers looking to cause global chaos. Proceed with caution!

Patrick Roland, November 28, 2017

Next Page »

  • Archives

  • Recent Posts

  • Meta