The Google: Geofence Misdirection a Consequence of Good Enough Analytics?
March 18, 2020
What a surprise—the use of Google tracking data by police nearly led to a false arrest, we’re told in the NBC News article, “Google Tracked his Bike Ride Past a Burglarized Home. That Made him a Suspect.” Last January, programmer and recreational cyclist Zachary McCoy received an email from Google informing him, as it does, that the cops had demanded information from his account. He had one week to try to block the release in court, yet McCoy had no idea what prompted the warrant. Writer Jon Schuppe reports:
“There was one clue. In the notice from Google was a case number. McCoy searched for it on the Gainesville Police Department’s website, and found a one-page investigation report on the burglary of an elderly woman’s home 10 months earlier. The crime had occurred less than a mile from the home that McCoy … shared with two others. Now McCoy was even more panicked and confused.”
After hearing of his plight, McCoy’s parents sprang for an attorney:
“The lawyer, Caleb Kenyon, dug around and learned that the notice had been prompted by a ‘geofence warrant,’ a police surveillance tool that casts a virtual dragnet over crime scenes, sweeping up Google location data — drawn from users’ GPS, Bluetooth, Wi-Fi and cellular connections — from everyone nearby. The warrants, which have increased dramatically in the past two years, can help police find potential suspects when they have no leads. They also scoop up data from people who have nothing to do with the crime, often without their knowing ? which Google itself has described as ‘a significant incursion on privacy.’ Still confused ? and very worried ? McCoy examined his phone. An avid biker, he used an exercise-tracking app, RunKeeper, to record his rides.”
Aha! There was the source of the “suspicious” data—RunKeeper tapped into his Android phone’s location service and fed that information to Google. The records show that, on the day of the break-in, his exercise route had taken him past the victim’s house three times in an hour. Eventually, the lawyer was able to convince the police his client (still not unmasked by Google) was not the burglar. Perhaps ironically, it was RunKeeper data showing he had been biking past the victim’s house for months, not just proximate to the burglary, that removed suspicion.
Luck, and a good lawyer, were on McCoy’s side, but the larger civil rights issue looms large. Though such tracking data is anonymized until law enforcement finds something “suspicious,” this case illustrates how easy it can be to attract that attention. Do geofence warrants violate our protections against unreasonable searches? See the article for more discussion.
Cynthia Murrell, March 18, 2020
Math Resources
January 27, 2020
One of the DarkCyber team spotted a list of math resources available. Some cost money; others are free. Math Vault lists courses, platforms, tools, and question – answering sites. Some are relatively mainstream like Wolfram Alpha; others, less well publicized like ProofWiki. You can find the listing at this link.
Kenny Toth, January 26, 2020
Quadratic Equations: A New Method
December 15, 2019
If you deal with quadratic equations, you will want to read “A New Way to Make Quadratic Equations Easy.” The procedure is straightforward, and apparently has been either overlooked, lost in time, or dismissed as out of step with current teaching methods. Worth a look but my high school mathematics teacher Ms. Blackburn would not approve. She liked old school methods, including whacking teen aged boys on the head with her wooden ruler.
Stephen E Arnold, December 15. 2019
Calculus Made Almost Easy
December 2, 2019
Just a quick tip of the hat to 0a.io. You have to love that url. Navigate to “Calculus Explained with Pics and Gifs.”
The site provides an overview of calculus. Pictures and animations make it easy to determine if one was sleeping in calculus class or paying attention.
The site went live with the information five years ago. One of the DarkCyber team spotted it and sent along the link. Worth a visit.
Stephen E Arnold, December 2, 2019
Can Machine Learning Pick Out The Bullies?
November 13, 2019
In Walt Disney’s 1942 classic Bambi, Thumper the rabbit was told, “If you can’t say something nice, don’t say nothing at all.”
Poor grammar aside, the thumping rabbit did delivered wise advice to the audience. Then came the Internet and anonymity, when the trolls were released to the world. Internet bullying is one of the world’s top cyber crimes, along with identity and money theft. Passionate anti-bullying campaigners, particularly individuals who were cyber-bullying victims, want social media Web sites to police their users and prevent the abusive crime. Trying to police the Internet is like herding cats. It might be possible with the right type of fish, but cats are not herd animals and scatter once the tasty fish is gone.
Technology might have advanced enough to detect bullying and AI could be the answer. Innovation Toronto wrote, “Machine Learning Algorithms Can Successfully Identify Bullies And Aggressors On Twitter With 90 Percent Accuracy.” AI’s biggest problem is that algorithms can identify and harvest information, they lack the ability to understand emotion and context. Many bullying actions on the Internet are sarcastic or hidden within metaphors.
Computer scientist Jeremy Blackburn and his team from Binghamton University analyzed bullying behavior patterns on Twitter. They discovered useful information to understand the trolls:
“ ‘We built crawlers — programs that collect data from Twitter via variety of mechanisms,’ said Blackburn. ‘We gathered tweets of Twitter users, their profiles, as well as (social) network-related things, like who they follow and who follows them.’ ”
The researchers then performed natural language processing and sentiment analysis on the tweets themselves, as well as a variety of social network analyses on the connections between users. The researchers developed algorithms to automatically classify two specific types of offensive online behavior, i.e., cyber bullying and cyber aggression. The algorithms were able to identify abusive users on Twitter with 90 percent accuracy. These are users who engage in harassing behavior, e.g. those who send death threats or make racist remarks to users.
“‘In a nutshell, the algorithms ‘learn’ how to tell the difference between bullies and typical users by weighing certain features as they are shown more examples,’ said Blackburn.”
Blackburn and his teams’ algorithm only detects the aggressive behavior, it does not do anything to prevent cyber bullying. The victims still see and are harmed by the comments and bullying users, but it does give Twitter a heads up on removing the trolls.
The anti-bullying algorithm prevents bullying only after there are victims. It does little assist the victims, but it does prevent future attacks. What steps need to be taken to prevent bullying altogether? Maybe schools need to teach classes on Internet etiquette with the Common Core, then again if it is not on the test it will not be in a classroom.
Whitney Grace, November 13, 2019
Tech Backlash: Not Even Apple and Goldman Sachs Exempt
November 11, 2019
Times are indeed interesting. Two powerful outfits—Apple (the privacy outfit with a thing for Chinese food) and Goldman Sachs (the we-make-money-every way possible organization) are the subject of “Viral Tweet about Apple Card Leads to Goldman Sachs Probe.” The would-be president’s news machine stated, “Tech entrepreneur alleged inherent bias in algorithms for card.” The card, of course, is the Apple-Goldman revenue-generating credit card. Navigate to the Bloomberg story. Get the scoop.
On the other hand, just look at one of the dozens and dozens of bloggers commenting about this bias, algorithm, big name story. Even more intriguing is that the aggrieved tweeter’s wife had her credit score magically changed. Remarkable how smart algorithms work.
DarkCyber does not want to retread truck tires. We do have three observations:
- The algorithm part may be more important than the bias angle. The reason is that algorithms embody bias, and now non-technical and non-financial people are going to start asking questions: Superficial at first and then increasingly on point. Not good for algorithms when humans obviously can fiddle the outputs.
- Two usually untouchable companies are now in the spotlight for subjective, touchy feely things with which neither company is particularly associated. This may lead to some interesting information about what’s up in the clubby world of the richest companies in the world. Discrimination maybe? Carelessness? Indifference? Greed? We have to wait and listen.
- Even those who may have worked at these firms and who now may be in positions of considerable influence may find themselves between a squash wall and sweaty guests who aren’t happy about an intentional obstruction. Those corporate halls which are often tomb-quiet may resound with stressed voices. “Apple” carts which allegedly sell to anyone may be upset. Cleaning up after the spill may drag the double’s partners from two exclusive companies into a task similar to cleaning sea birds after the gulf oil spill.
Will this issue get news traction? Will it become a lawyer powered railroad handcar creeping down the line?
Fascinating stuff.
Stephen E Arnold, November 11, 2019
Visual Data Exploration via Natural Language
November 4, 2019
New York University announced a natural language interface for data visualization. You can read the rah rah from the university here. The main idea is that a person can use simple English to create complex machine learning based visualizations. Sounds like the answer to a Wall Street analyst’s prayers.
The university reported:
A team at the NYU Tandon School of Engineering’s Visualization and Data Analytics (VIDA) lab, led by Claudio Silva, professor in the department of computer science and engineering, developed a framework called VisFlow, by which those who may not be experts in machine learning can create highly flexible data visualizations from almost any data. Furthermore, the team made it easier and more intuitive to edit these models by developing an extension of VisFlow called FlowSense, which allows users to synthesize data exploration pipelines through a natural language interface.
You can download (as of November 3, 2019, but no promises the document will be online after this date) “FlowSense: A Natural Language Interface for Visual Data Exploration within a Dataflow System.”
DarkCyber wants to point out that talking to a computer to get information continues to be of interest to many researchers. Will this innovation put human analysts out of their jobs.
Maybe not tomorrow but in the future. Absolutely. And what will those newly-unemployed people do for money?
Interesting question and one some may find difficult to consider at this time.
Stephen E Arnold, November 4, 2019
Bias: Female Digital Assistant Voices
October 17, 2019
It was a seemingly benign choice based on consumer research, but there is an unforeseen complication. TechRadar considers, “The Problem with Alexa: What’s the Solution to Sexist Voice Assistants?” From smart speakers to cell phones, voice assistants like Amazon’s Alexa, Microsoft’s Cortana, Google’s Assistant, and Apple’s Siri generally default to female voices (and usually sport female-sounding names) because studies show humans tend to respond best to female voices. Seems like an obvious choice—until you consider the long-term consequences. Reporter Olivia Tambini cites a report UNESCO issued earlier this year that suggests the practice sets us up to perpetuate sexist attitudes toward women, particularly subconscious biases. She writes:
“This progress [society has made toward more respect and agency for women] could potentially be undone by the proliferation of female voice assistants, according to UNESCO. Its report claims that the default use of female-sounding voice assistants sends a signal to users that women are ‘obliging, docile and eager-to-please helpers, available at the touch of a button or with a blunt voice command like “hey” or “OK”.’ It’s also worrying that these voice assistants have ‘no power of agency beyond what the commander asks of it’ and respond to queries ‘regardless of [the user’s] tone or hostility’. These may be desirable traits in an AI voice assistant, but what if the way we talk to Alexa and Siri ends up influencing the way we talk to women in our everyday lives? One of UNESCO’s main criticisms of companies like Amazon, Google, Apple and Microsoft is that the docile nature of our voice assistants has the unintended effect of reinforcing ‘commonly held gender biases that women are subservient and tolerant of poor treatment’. This subservience is particularly worrying when these female-sounding voice assistants give ‘deflecting, lackluster or apologetic responses to verbal sexual harassment’.”
So what is a voice-assistant maker to do? Certainly, male voices could be used and are, in fact, selectable options for several models. Another idea is to give users a wide variety of voices to choose from—not just different genders, but different accents and ages, as well. Perhaps the most effective solution would be to use a gender-neutral voice; one dubbed “Q” has now been created, proving it is possible. (You can listen to Q through the article or on YouTube.)
Of course, this and other problems might have been avoided had there been more diversity on the teams behind the voices. Tambini notes that just seven percent of information- and communication-tech patents across G20 countries are generated by women. As more women move into STEM fields, will unintended gender bias shrink as a natural result?
Cynthia Murrell, October 17, 2019
The Roots of Common Machine Learning Errors
October 11, 2019
It is a big problem when faulty data analysis underpins big decisions or public opinion, and it is happening more often in the age of big data. Data Science Central outlines several “Common Errors in Machine Learning Due to Poor Statistics Knowledge.” Easy to make mistakes? Yep. Easy to manipulate outputs? Yep. We believe the obvious fix is to make math point and click—let developers decide for a clueless person.
Blogger Vincent Granville describes what he sees as the biggest problem:
“Probably the worst error is thinking there is a correlation when that correlation is purely artificial. Take a data set with 100,000 variables, say with 10 observations. Compute all the (99,999 * 100,000) / 2 cross-correlations. You are almost guaranteed to find one above 0.999. This is best illustrated in may article How to Lie with P-values (also discussing how to handle and fix it.) This is being done on such a large scale, I think it is probably the main cause of fake news, and the impact is disastrous on people who take for granted what they read in the news or what they hear from the government. Some people are sent to jail based on evidence tainted with major statistical flaws. Government money is spent, propaganda is generated, wars are started, and laws are created based on false evidence. Sometimes the data scientist has no choice but to knowingly cook the numbers to keep her job. Usually, these ‘bad stats’ end up being featured in beautiful but faulty visualizations: axes are truncated, charts are distorted, observations and variables are carefully chosen just to make a (wrong) point.”
Granville goes on to specify several other sources of mistakes. Analysts sometimes take for granted the accuracy of their data sets, for example, instead of performing a walk-forward test. Relying too much on the old standbys R-squared measures and normal distributions can also lead to errors. Furthermore, he reminds us, scale-invariant modeling techniques must be used when data is expressed in different units (like yards and miles). Finally, one must be sure to handle missing data correctly—do not assume bridging the gap with an average will produce accurate results. See the post for more explanation on each of these points.
Cynthia Murrell, October 11, 2019
Information and the More Exposure Effect
October 1, 2019
The article “Why Do Older People Hate New Music?” caught my attention. Music is not a core interest at DarkCyber. We do mention in our Dark Web 2 lecture that beat sharing and selling sites which permit message exchange are an important source of social content.
This “oldsters hate new” angle is important. The write up contains this assertion:
One of the most researched laws of social psychology is something called the “mere exposure effect.” In a nutshell, it means that the more we’re exposed to something, the more we tend to like it. This happens with people we know, the advertisements we see and, yes, the songs we listen to.
Like many socio-psycho-econo assertions, this idea sounds plausible. Let’s assume that it is correct and apply the insight to online information.
Online news services purport to provide news for me, world news, and other categories. When I review outputs from several services like SmartNews, News360, and Google News, for example, it is clear that the information presented looks and conveys the same information.
If the exposure point is accurate, these services are conditioning me to accept and feel comfortable with specific information. SmartNews shows me soccer news, reports about cruise ship deaths, and write ups which underscore the antics of certain elected officials.
These services do not coordinate, but they do rely on widely used numerical recipes and feedback about what I click on or ignore. What’s interesting is that each of these services delivers a package of content which reflects each service’s view of what interests me.
The problem is that I look at less and less content on these services. Familiarity means that I don’t need to know more about certain topics.
Consequently, as the services become smarter, I move way from these services.
The psychological write up reports:
Psychology research has shown that the emotions that we experience as teens seem more intense than those that comes later. We also know that intense emotions are associated with stronger memories and preferences. All of this might explain why the songs we listen to during this period become so memorable and beloved.
Is familiarity making me more content with online news? Sorry, no.
The familiarity makes it easier to recognize that significant content is not being presented. That’s an interesting issue if my reaction is not peculiar to me.
How does one find additional information about the unfamiliar? Search does not deliver effectively in my opinion.
Stephen E Arnold, October 2, 2019