Google: A Question of Judgment
July 3, 2019
In the realm of unintended consequences, this one is a doozy. MIT Technology Review reports, “YouTube’s Algorithm Makes it Easy for Pedophiles to Find More Videos of Children.” The brief write-up provides just-the-facts coverage of the disturbing issue. Writer Charlotte Jee summarizes:
“YouTube’s automated recommendation system has gathered a collection of prepubescent, partially clothed children and is recommending it to people who have watched similar videos, the New York Times reports. While some of the recommendations have been switched off on certain videos, the company has refused to end the practice. …
We noted:
“YouTube disabled comments on many videos of children in February after an outcry over pedophiles using the comment section to guide each other. It doesn’t let kids under 13 open accounts. However, it won’t stop recommending videos of children because it is worried about negative impact on family vloggers, some of whom have many millions of followers. In a blog post responding to the New York Times story, YouTube said that it was ‘limiting’ recommendations on some videos that may put children at risk.”
Those limits are to be applied to videos with minors in “risky situations,” though the blog post does not specify who, or what, will make that judgment. Jee is suspicions of YouTube’s motivations, noting that the site’s goal is to capture and keep “eyeballs.” Despite what else is allowed to thrive across the platform, the company apparently decided to draw a (dotted) line at this issue.
Cynthia Murrell, July 3, 2019
Machine Learning: Whom Does One Believe?
June 28, 2019
Ah, another day begins with mixed messages. Just what the relaxed, unstressed modern decider needs.
First, navigate to “Reasons Why Machine Learning can Prove Beneficial for Your Organization.” The reasons include:
- Segment customer coverage. No, I don’t know what this means either.
- Accurate business forecasts. No, machine learning systems cannot predict horse races or how a business will do. How about the impact of tariffs or a Fed interest rate change?
- Improved customer experience. No, experiences are not improving. How do I know? Ask a cashier to make change? Try to get an Amazon professional to explain how to connect a Mac laptop to an Audible account WITHOUT asking, “May I take control of your computer with our software?”
- Make decisions confidently. Yep, that’s what a decider does in the stable, positive, uplifting work environment of a electronic exchange when a bug costs millions in a two milliseconds.
- Automate your routine tasks. Absolutely. Automation works well. Ask the families of those killed by “intelligence stoked” automobiles or smart systems on a 737 Max.
But there’s a flip side to these cheery “beneficial” outcomes. Navigate to “Machine Learning Systems Are Stuck in a Rut.” We noted these statements. First a quote from a technical paper.
In this paper we argue that systems for numerical computing are stuck in a local basin of performance and programmability. Systems researchers are doing an excellent job improving the performance of 5-year old benchmarks, but gradually making it harder to explore innovative machine learning research ideas.
Next this comment by the person who wrote the “Learning Systems” article:
The thrust of the argument is that there’s a chain of inter-linked assumptions / dependencies from the hardware all the way to the programming model, and any time you step outside of the mainstream it’s sufficiently hard to get acceptable performance that researchers are discouraged from doing so.
Which is better? Which is correct?
Be a decider either using a black box or the stuff between your ears.
Stephen E Arnold, June 28, 2019
Handy List of Smart Software Leaders
June 27, 2019
As the field of AI grows, it can be difficult to keep track of the significant players. Datamation shares a useful list in, “Top 45 Artificial Intelligence Companies.” If you skim the lineup, just keep in mind—entries are not ranked in any way, simply listed in alphabetical order. Writer Andy Patrizio begins with some observations about the industry:
“AI is driving significant investment from venture capitalist firms, giant firms like Microsoft and Google, academic research, and job openings across a multitude of sectors. All of this is documented in the AI Index, produced by Stanford University’s Human-Centered AI Institute. …
We noted:
“Consulting giant Accenture believes AI has the potential to boost rates of profitability by an average of 38 percentage points and could lead to an economic boost of US$14 trillion in additional gross value added (GVA) by 2035. In Truth, artificial intelligence holds a plethora of possibilities—and risks. ‘It will have a huge economic impact but also change society, and it’s hard to make strong predictions, but clearly job markets will be affected,’ said Yoshua Bengio, a professor at the University of Montreal, and head of the Montreal Institute for Learning Algorithms.”
For their selections, Datamation chose companies of particular note and those that have invested heavily in AI. Many names are ones you would expect to see, like Amazon, Google, IBM, and Microsoft. Others are more specialized—robotics platforms Anki and CloudMinds, for example, or iCarbonX, Tempus, and Zebra Medical Vision for healthcare. Several entries are open source. Check out the article for more.
Cynthia Murrell, June 24, 2019
DeepMind Studies Math
June 27, 2019
It’s like magic! ExtremeTech reports, “Google Fed a Language Algorithm Math Equations. It Learned How to Solve New Ones.” While Google’s DeepMind is, indeed, used as a language AI, it’s neural network approach enables it to perform myriad tasks, like beating humans at games from Go to Capture the Flag. Writer Adam Dachis describes how researchers taught DeepMind to teach itself math:
“For training data, DeepMind received a series of equations along with their solutions—like a math textbook, only without any explanation of how those solutions can be reached. Google then created a modular system to procedurally generate new equations to solve, with a controllable level of difficulty, and instructed the AI to provide answers in any form. Without any structure, DeepMind had to intuit how to solve new equations solely based on seeing a limited number of completed examples. Challenging existing deep learning algorithms with modular math presents a very difficult challenge to an AI and existing neural network models performed at relatively similar levels of accuracy. The best-performing model, known as Transformer, managed to provide correct solutions to 50 percent of the time and it was designed for the purpose of natural language understanding—not math. When only judging Transformer on its ability to answer questions that utilized numbers seen in the training data, its accuracy shot up to 76 percent.”
Furthermore, Dachis writes, DeepMind’s approach to math suggests a solution to a persistent problem facing those who would program computers to do math—while our mathematics is built on a base-10 system, software “thinks” in binary. The article goes into detail, with illustrations, about why this is such a headache. See the write-up for those details, but here is the upshot—computers cannot represent every possible number on the number line. They rely on strategic rounding to get as close as they can. Usually this works out fine, but on occasion it does produce a significant rounding error. Dachis hopes analysis of the Transformer language model will point the way toward greater accuracy, through both changes to the algorithm and new training data. Perhaps.
Cynthia Murrell, June 27, 2019
How Smart Software Goes Off the Rails
June 23, 2019
Navigate to “How Feature Extraction Can Be Improved With Denoising.” The write up seems like a straight forward analytics explanation. Lots of jargon, buzzwords, and hippy dippy references to length squared sampling in matrices. The concept is not defined in the article. And if you remember statistics 101, you know that there are five types of sampling: Convenience, cluster, random, systematic, and stratified. Each has its strengths and weaknesses. How does one avoid the issues? Use length squared sampling obviously: Just sample rows with probability proportional to the square of their Euclidean norms. Got it?
However, the math is not the problem. Math is a method. The glitch is in defining “noise.” Like love, there are many ways to define love. The write up points out:
Autoencoders with more hidden layers than inputs run the risk of learning the identity function – where the output simply equals the input – thereby becoming useless. In order to overcome this, Denoising Autoencoders(DAE) was developed. In this technique, the input is randomly induced by noise. This will force the autoencoder to reconstruct the input or denoise. Denoising is recommended as a training criterion for learning to extract useful features that will constitute a better higher level representation.
Can you spot the flaw in approach? Consider what happens if the training set is skewed for some reason. The system will learn based on the inputs smoothed by statistical sanding. When the system encounters real world data, the system will, by golly, convert the “real” inputs in terms of the flawed denoising method. As one wit observed, “So s?c^2 p gives us a better estimation than the zero matrix.” Yep.
To sum up, the system just generates “drifting” outputs. The fix? Retraining. This is expensive and time consuming. Not good when the method is applied to real time flows of data.
In a more colloquial turn of phrase, the denoiser may not be denoising correctly.
A more complex numerical recipes are embedded in “smart” systems, there will be some interesting consequences. Does the phrase “chain of failure”? What about “good enough”?
Stephen E Arnold, June 23, 2019
Sure, Computers Are Psychic
June 12, 2019
Psychics, mentalism, divination, and other ways to communicate with the dead or see the future are not real. These so-called gifts are actually ancient arts in human behavior, psychology, and nature. With practice and skill anyone can learn how to manipulate and predict someone’s future movements, that is basically all algorithms are doing. According to Entrepreneur, humans are leaving bread crumb trails online that algorithms watch and then can predict an individual’s behavior: “How Algorithms Can Predict Our Intentions Faster Than We Can.”
While artificial intelligence (AI) and natural language processing (NLP) are still developing technologies, their advancements are quickly made. Simply by tracking an individual’s Web activities, AI and NLP can learn behavior patterns and “predict” intentions, thoughts, and even our next move.
Social media is a big predictor of future events too. Take the 2016 election of Hilary Clinton vs. Donald Trump, then there is Brett Kavanaugh’s trials and his confirmation to the Supreme Court. When Paul Nemirovsky’s dMetrics analyzed unstructured social media data, they found that the data was skewed in favor of Kavanaugh’s assignment to the court. Later this came to pass as fact. On the positive side of things, this could mean better investment outcomes, improved marketing messaging, higher customer satisfaction, and deeper insights into anything we choose.
Algorithms are literally dumb pieces of code. They only do what they are programmed. In order for them to understand user data, algorithms need NLP:
“Natural Language Processing, or NLP, is a neuro-network that essentially teaches itself the way we say things. By being exposed to different conversational experiences, the machine learns. Simply put, once you tell the machine what each sentence means, it records each meaning in order to process it in the future. By processing this information, it learns the skills to better understand our intentions than we do.”
NLP is not magic and needs to be programmed like any piece of software. Predictive analytics are still and will be a work in progress for some time, because of costs, applications, and also ethical violations. Will predictive analytics powered by AI and NLP be used for evil? Er, yeah. They will also be used for good, like cars, guns, computers, and putting words in the mouths of people who never made a particular statement.
Whitney Grace, June 12, 2019
Google: Can Semantic Relaxing Display More Ads?
June 10, 2019
For some reason, vendors of search systems have shuddered if a user’s query returns a null set. the idea is that a user sends a query to a system or more correctly an index. The terms in the query do not match entries in the database. The system displays a message which says, “No results match your query.”
For some individuals, that null set response is high value information. One can bump into null sets when running queries on a Web site; for example, send the anti fungicide query to the Arnold Information Technology blog at this link. Here’s the result:
From this response, one knows that there is no content containing the search phrase. That’s valuable for some people.
To address this problem, modern systems “relax” the query. The idea is that the user did not want what he or she typed in the search box. The search system then changes the query and displays those results to the stupid user. Other systems take action and display results which the system determines are related to the query. You can see these relaxed results when you enter the query shadowdragon into Google. Here are the results:
Google ignored my spelling and displays information about a video game, not the little known company Shadowdragon. At least Google told me what it did and offers a way to rerun the query using the word I actually entered. But the point is that the search was “relaxed.”
The purpose of semantic expansion is a variation of Endeca’s facets. The idea is that a key word belongs to a category. If a system can identify a category, then the user can get more results by selecting the category and maybe finding something useful. Endeca’s wine demonstration makes this function and its value clear.
A Math Cheat Sheet with 212 Pages
May 30, 2019
When I was in high school, there was one student who wrote on his arm. Fortunately I just remembered the information. I wonder if this person will be interested in the “Mathematics Cheat Sheet.” The “sheet” contains 200 plus pages. I assume that if one could write tiny numbers and letters, a page or two might be recorded on an arm, the back of one’s hand, one’s palm, and maybe another body part. On the other hand, it is probably easier to use a smart phone and look for the information surrounded by ads for one of those “help you children learn” services. If you fancy a cheat “sheet” for math which will consume three fifths of a ream of paper (plus or minus a percent or two), enjoy. (I must confess that I browsed the “sheet” and was stunned to learn how much I have forgotten. Power? When did I confront this equation, when I was 14? Maybe 15?
But at age 75, I am lucky if I can remember how to get money from an automatic teller machine which asks me which language I prefer. Still thinking. Thinking.)
Stephen E Arnold, May 30, 2019
Chain of Failure: A Reminder about Logic
May 26, 2019
I spotted a reference to a blog post on Yodaiken. It’s title is “Von Neumann’s Critique of Automata Theory and Logic in Computer Science.” Do we live in a Von Neumann world? I was delighted to be reminded of the observations in this passage. Here’s the snippet I circled in yellow highlighter:
In a sufficiently long chain of operations the cumulative effect of these individual probabilities of failure may (if unchecked) reach the order of magnitude of unity-at which point it produces, in effect, complete unreliability.
Interesting. Perhaps failure is part of the DNA of smart software?
Stephen E Arnold, May 26, 2019
Data Science Book: Free for Now
May 24, 2019
We spotted a post by Capri Granville which points to a free data science book. The post also provides a link to other free books. The Microsoft Research India book is “Foundations of Data Science” by Ravi Kannan. You can as of May 24, 2019, download the book without charge at this link: https://www.cs.cornell.edu/jeh/book.pdf. Cornell charges students about $55,188 for an academic year. DarkCyber believes that “free” may not be an operative word where the Theory Center used to love those big IBM computers. No, they were not painted Azure.
Stephen E Arnold, May 24, 2019