Calculus Made Almost Easy

December 2, 2019

Just a quick tip of the hat to 0a.io. You have to love that url. Navigate to “Calculus Explained with Pics and Gifs.”

The site provides an overview of calculus. Pictures and animations make it easy to determine if one was sleeping in calculus class or paying attention.

The site went live with the information five years ago. One of the DarkCyber team spotted it and sent along the link. Worth a visit.

Stephen E Arnold, December 2, 2019

Can Machine Learning Pick Out The Bullies?

November 13, 2019

In Walt Disney’s 1942 classic Bambi, Thumper the rabbit was told, “If you can’t say something nice, don’t say nothing at all.”

Poor grammar aside, the thumping rabbit did delivered wise advice to the audience. Then came the Internet and anonymity, when the trolls were released to the world. Internet bullying is one of the world’s top cyber crimes, along with identity and money theft. Passionate anti-bullying campaigners, particularly individuals who were cyber-bullying victims, want social media Web sites to police their users and prevent the abusive crime. Trying to police the Internet is like herding cats. It might be possible with the right type of fish, but cats are not herd animals and scatter once the tasty fish is gone.

Technology might have advanced enough to detect bullying and AI could be the answer. Innovation Toronto wrote, “Machine Learning Algorithms Can Successfully Identify Bullies And Aggressors On Twitter With 90 Percent Accuracy.” AI’s biggest problem is that algorithms can identify and harvest information, they lack the ability to understand emotion and context. Many bullying actions on the Internet are sarcastic or hidden within metaphors.

Computer scientist Jeremy Blackburn and his team from Binghamton University analyzed bullying behavior patterns on Twitter. They discovered useful information to understand the trolls:

“ ‘We built crawlers — programs that collect data from Twitter via variety of mechanisms,’ said Blackburn. ‘We gathered tweets of Twitter users, their profiles, as well as (social) network-related things, like who they follow and who follows them.’ ”

The researchers then performed natural language processing and sentiment analysis on the tweets themselves, as well as a variety of social network analyses on the connections between users. The researchers developed algorithms to automatically classify two specific types of offensive online behavior, i.e., cyber bullying and cyber aggression. The algorithms were able to identify abusive users on Twitter with 90 percent accuracy. These are users who engage in harassing behavior, e.g. those who send death threats or make racist remarks to users.

“‘In a nutshell, the algorithms ‘learn’ how to tell the difference between bullies and typical users by weighing certain features as they are shown more examples,’ said Blackburn.”

Blackburn and his teams’ algorithm only detects the aggressive behavior, it does not do anything to prevent cyber bullying. The victims still see and are harmed by the comments and bullying users, but it does give Twitter a heads up on removing the trolls.

The anti-bullying algorithm prevents bullying only after there are victims. It does little assist the victims, but it does prevent future attacks. What steps need to be taken to prevent bullying altogether? Maybe schools need to teach classes on Internet etiquette with the Common Core, then again if it is not on the test it will not be in a classroom.

Whitney Grace, November 13, 2019

Tech Backlash: Not Even Apple and Goldman Sachs Exempt

November 11, 2019

Times are indeed interesting. Two powerful outfits—Apple (the privacy outfit with a thing for Chinese food) and Goldman Sachs (the we-make-money-every way possible organization) are the subject of “Viral Tweet about Apple Card Leads to Goldman Sachs Probe.” The would-be president’s news machine stated, “Tech entrepreneur alleged inherent bias in algorithms for card.” The card, of course, is the Apple-Goldman revenue-generating credit card. Navigate to the Bloomberg story. Get the scoop.

On the other hand, just look at one of the dozens and dozens of bloggers commenting about this bias, algorithm, big name story. Even more intriguing is that the aggrieved tweeter’s wife had her credit score magically changed. Remarkable how smart algorithms work.

DarkCyber does not want to retread truck tires. We do have three observations:

  1. The algorithm part may be more important than the bias angle. The reason is that algorithms embody bias, and now non-technical and non-financial people are going to start asking questions: Superficial at first and then increasingly on point. Not good for algorithms when humans obviously can fiddle the outputs.
  2. Two usually untouchable companies are now in the spotlight for subjective, touchy feely things with which neither company is particularly associated. This may lead to some interesting information about what’s up in the clubby world of the richest companies in the world. Discrimination maybe? Carelessness? Indifference? Greed? We have to wait and listen.
  3. Even those who may have worked at these firms and who now may be in positions of considerable influence may find themselves between a squash wall and sweaty guests who aren’t happy about an intentional obstruction. Those corporate halls which are often tomb-quiet may resound with stressed voices. “Apple” carts which allegedly sell to anyone may be upset. Cleaning up after the spill may drag the double’s partners from two exclusive companies into a task similar to cleaning sea birds after the gulf oil spill.

Will this issue get news traction? Will it become a lawyer powered railroad handcar creeping down the line?

Fascinating stuff.

Stephen E Arnold, November 11, 2019

Visual Data Exploration via Natural Language

November 4, 2019

New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.

The university reported:

A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.

You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”

DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.

Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?

Interesting question and one some may find difficult to consider at this time.

Stephen E Arnold, November 4, 2019

 

Bias: Female Digital Assistant Voices

October 17, 2019

It was a seemingly benign choice based on consumer research, but there is an unforeseen complication. TechRadar considers, “The Problem with Alexa: What’s the Solution to Sexist Voice Assistants?” From smart speakers to cell phones, voice assistants like Amazon’s Alexa, Microsoft’s Cortana, Google’s Assistant, and Apple’s Siri generally default to female voices (and usually sport female-sounding names) because studies show humans tend to respond best to female voices. Seems like an obvious choice—until you consider the long-term consequences. Reporter Olivia Tambini cites a report UNESCO issued earlier this year that suggests the practice sets us up to perpetuate sexist attitudes toward women, particularly subconscious biases. She writes:

“This progress [society has made toward more respect and agency for women] could potentially be undone by the proliferation of female voice assistants, according to UNESCO. Its report claims that the default use of female-sounding voice assistants sends a signal to users that women are ‘obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command like “hey” or “OK”.’ It’s also worrying that these voice assistants have ‘no power of agency beyond what the commander asks of it’ and respond to queries ‘regardless of [the user’s] tone or hostility’. These may be desirable traits in an AI voice assistant, but what if the way we talk to Alexa and Siri ends up influencing the way we talk to women in our everyday lives? One of UNESCO’s main criticisms of companies like Amazon, Google, Apple and Microsoft is that the docile nature of our voice assistants has the unintended effect of reinforcing ‘commonly held gender biases that women are subservient and tolerant of poor treatment’. This subservience is particularly worrying when these female-sounding voice assistants give ‘deflecting, lackluster or apologetic responses to verbal sexual harassment’.”

So what is a voice-assistant maker to do? Certainly, male voices could be used and are, in fact, selectable options for several models. Another idea is to give users a wide variety of voices to choose from—not just different genders, but different accents and ages, as well. Perhaps the most effective solution would be to use a gender-neutral voice; one dubbed “Q” has now been created, proving it is possible. (You can listen to Q through the article or on YouTube.)

Of course, this and other problems might have been avoided had there been more diversity on the teams behind the voices. Tambini notes that just seven percent of information- and communication-tech patents across G20 countries are generated by women. As more women move into STEM fields, will unintended gender bias shrink as a natural result?

Cynthia Murrell, October 17, 2019

The Roots of Common Machine Learning Errors

October 11, 2019

It is a big problem when faulty data analysis underpins big decisions or public opinion, and it is happening more often in the age of big data. Data Science Central outlines several “Common Errors in Machine Learning Due to Poor Statistics Knowledge.” Easy to make mistakes? Yep. Easy to manipulate outputs? Yep. We believe the obvious fix is to make math point and click—let developers decide for a clueless person.

Blogger Vincent Granville describes what he sees as the biggest problem:

“Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.) This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these ‘bad stats’ end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.”

Granville goes on to specify several other sources of mistakes. Analysts sometimes take for granted the accuracy of their data sets, for example, instead of performing a walk-forward test. Relying too much on the old standbys R-squared measures and normal distributions can also lead to errors. Furthermore, he reminds us, scale-invariant modeling techniques must be used when data is expressed in different units (like yards and miles). Finally, one must be sure to handle missing data correctly—do not assume bridging the gap with an average will produce accurate results. See the post for more explanation on each of these points.

Cynthia Murrell, October 11, 2019

Information and the More Exposure Effect

October 1, 2019

The article “Why Do Older People Hate New Music?” caught my attention. Music is not a core interest at DarkCyber. We do mention in our Dark Web 2 lecture that beat sharing and selling sites which permit message exchange are an important source of social content.

This “oldsters hate new” angle is important. The write up contains this assertion:

One of the most researched laws of social psychology is something called the “mere exposure effect.” In a nutshell, it means that the more we’re exposed to something, the more we tend to like it. This happens with people we know, the advertisements we see and, yes, the songs we listen to.

Like many socio-psycho-econo assertions, this idea sounds plausible. Let’s assume that it is correct and apply the insight to online information.

Online news services purport to provide news for me, world news, and other categories. When I review outputs from several services like SmartNews, News360, and Google News, for example, it is clear that the information presented looks and conveys the same information.

If the exposure point is accurate, these services are conditioning me to accept and feel comfortable with specific information. SmartNews shows me soccer news, reports about cruise ship deaths, and write ups which underscore the antics of certain elected officials.

These services do not coordinate, but they do rely on widely used numerical recipes and feedback about what I click on or ignore. What’s interesting is that each of these services delivers a package of content which reflects each service’s view of what interests me.

The problem is that I look at less and less content on these services. Familiarity means that I don’t need to know more about certain topics.

Consequently, as the services become smarter, I move way from these services.

The psychological write up reports:

Psychology research has shown that the emotions that we experience as teens seem more intense than those that comes later. We also know that intense emotions are associated with stronger memories and preferences. All of this might explain why the songs we listen to during this period become so memorable and beloved.

Is familiarity making me more content with online news? Sorry, no.

The familiarity makes it easier to recognize that significant content is not being presented. That’s an interesting issue if my reaction is not peculiar to me.

How does one find additional information about the unfamiliar? Search does not deliver effectively in my opinion.

Stephen E Arnold, October 2, 2019

Should Social Media Algorithms be Used to Predict Crime?

September 18, 2019

Do we want Thought Police? Because this is how you get Thought Police. Though tragedies like the recent mass shootings in El Paso and Dayton are horrifying, some “solutions” are bound to do more harm than good. President Trump’s recent call for social-media companies to predict who will become a mass shooter so authorities can preemptively move against them is right out of Orwell’s 1984. Digital Trends asks, “Can Social Media Predict Mass Shootings Before They Happen?” Technically, it probably can, but with limited accuracy. Journalist Mathew Katz writes:

“Companies like Google, Facebook, Twitter, and Amazon already use algorithms to predict your interests, your behaviors, and crucially, what you like to buy. Sometimes, an algorithm can get your personality right – like when Spotify somehow manages to put together a playlist full of new music you love. In theory, companies could use the same technology to flag potential shooters. ‘To an algorithm, the scoring of your propensity [to] purchase a particular pair of shoes is not very different from the scoring of your propensity to become a mass murderer—the main difference is the data set being scored,’ wrote technology and marketing consultant Shelly Palmer in a newsletter on Sunday. But preventing mass shootings before they happen raises some thorny legal questions: how do you determine if someone is just angry online rather than someone who could actually carry out a shooting? Can you arrest someone if a computer thinks they’ll eventually become a shooter?”

That is what we must decide as a society. We also need to ask whether algorithms are really up to the task. We learn:

“The Partnership on AI, an organization looking at the future of artificial intelligence, conducted an intensive study on algorithmic tools that try to ‘predict’ crime. Their conclusion? ‘These tools should not be used alone to make decisions to detain or to continue detention.’”

But we all know that once people get an easy-to-use tool, the ease-of-use can quickly trump accuracy. Think of how often you see ads online for products you would never buy, Katz prompts. Then consider how it would feel to be arrested for a crime you would never commit.

Cynthia Murrell, September 18, 2019

Handy Visual Reference of Data Model Evaluation Techniques

September 12, 2019

There are many ways to evaluate one’s data models, and Data Science Central presents an extensive yet succinct reference in visual form—“Model Evaluation Techniques in One Picture.” Together, the image and links make for a useful resource. Creator Stephanie Glen writes:

“The sheer number of model evaluation techniques available to assess how good your model is can be completely overwhelming. As well as the oft-used confidence intervals, confusion matrix and cross validation, there are dozens more that you could use for specific situations, including McNemar’s test, Cochran’s Q, Multiple Hypothesis testing and many more. This one picture whittles down that list to a dozen or so of the most popular. You’ll find links to articles explaining the specific tests and procedures below the image.”

Glen may be underselling her list of links after the graphic; it would be worth navigating to her post for that alone. The visual, though, elegantly simplifies a complex topic. It is divided into these subtopics: general tests and tools; regression; classification: visual aids; and Classification: statistics and tools. Interested readers should check it out; you might just decide to bookmark it for future reference, too.

Cynthia Murrell, September 12, 2019

Disrupting Neural Nets: Adversarial Has a More Friendly Spin Than Weaponized

August 28, 2019

In my lecture about manipulation of algorithms, I review several methods for pumping false signals into a data set in order to skew outputs.

The basic idea is that if an entity generates content pulses which are semantically or otherwise related in a way the smart software “counts”, then the outputs are altered.

A good review of some of these flaws in neural network classifiers appears in “How Reliable Are Neural Networks Classifiers Against Unforeseen Adversarial Attacks.”

DarkCyber noted this statement in the write up:

attackers could target autonomous vehicles by using stickers or paint to create an adversarial stop sign that the vehicle would interpret as a ‘yield’ or other sign. A confused car on a busy day is a potential catastrophe packed in a 2000 pound metal box.

Dramatic, yes. Far fetched? Not too much.

Providing weaponized data objects to smart software can screw up the works. Examples range from adversarial clothing, discussed in the DarkCyber video program for August 27, 2019, to the wonky predictions that Google makes when displaying personalized ads.

The article reviews an expensive and time consuming method for minimizing the probability of weaponized data mucking up the outputs.

The problem, of course, is that smart software is supposed to handle the tricky, expensive, and slow process of assembling and refining a training set of data. Talk about smart software is really cheap. Delivering systems which operate in the real world is another kettle of what appear to be fish as determined by a vector’s norm.

The Analytics India article is neither broad nor deep. It does raise awareness of the rather interesting challenges which lurk within smart software.

Understanding how smart software can get off base and drift into LaLa Land begins with identifying the problem.

Smart software cannot learn and discriminate with the type of accuracy many people assume is delivered. Humans assume a system output is 99 percent accurate; for example, Is it raining?

The reality is that adversarial inputs can reduce the accuracy rate significantly.

On good days, smart software can hit 85 to 90 percent accuracy. That’s good enough unless a self driving car hits you. But with adversarial or weaponized data, that accuracy rate can drop below the 65 percent level which most of the systems DarkCyber has tested can reliably achieve.

To sum up, smart software makes mistakes. Weaponized data input into a smart software can increase the likelihood of an error.

The methods can be used in commercial and military theaters.

Neither humans nor software can prevent this from happening on a consistent basis.

So what? Yes, that’s a good question.

Stephen E Arnold, August 29. 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta