Data Visualizations: An Opportunity Converted into a Border Wall
May 18, 2020
I read “Understanding Uncertainty: Visualizing Probabilities.” The information in the article is useful. Helpful examples make clear how easy it is to create a helpful representation of certain statistical data.
The opportunity today is to make representations of numeric data, probabilities, and “uncertainty” more easily understandable.
The barrier is that “good enough” visualizations can be output with the click of a mouse. The graphic may be attractive, but it may distort the information allegedly presented in a helpful way.
But appearance may be more important than substance. Need examples. Check out the Covid19 “charts”. Most of these are confusing and ignore important items of information.
Good enough is not good enough.
Stephen E Arnold, May 18, 2020
Bayesian Math: Useful Book Is Free for Personal Use
May 11, 2020
The third edition of Bayesian Data Analysis (updated on February 13, 2020) is available at this link. The authors are Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. With the Bayes’ principles in hand, making sense of some of the modern smart systems becomes somewhat easier. The book covers the basics and advanced computation. One of the more interesting sections is Part V: Nonlinear and Nonparametric Models. You may want to add this to your library.
Stephen E Arnold, May11, 2020
Facebook Is Definitely Evil: Plus or Minus Three Percent at a 95 Percent Confidence Level
March 2, 2020
The Verge Tech Survey 2020 allegedly and theoretically reveals the deepest thoughts, preferences, and perceptions of people in the US. The details of these people are sketchy, but that’s not the point of the survey. The findings suggest that Facebook is a problem. Amazon is a problem. Other big tech companies are problems. Trouble right here is digital city.
The survey findings come from a survey of 1123 people “nationally representative of the US.” There was no information about income, group with which the subject identifies, or methodology. But the result is a plus or minus three percent at a 95 percent confidence level. That sure seems okay despite DarkCyber’s questions about:
- Sample selection. Who pulled the sample, from where, were people volunteers, etc.
- “Nationally representative” means what? Was it the proportional representation method? How many people from Montana and the other “states”? What about Puerto Rico? Who worked for which company?
- Plus or minus three percent. That’s a swing at a 95 percent confidence level. In terms of optical character recognition that works out to three to six errors per page about 95 percent of the time. Is this close enough for a drone strike or an enforcement action. Oh, right, this is a survey about big tech. Big tech doesn’t think the DarkCyber way, right?
- What were the socio economic strata of the individuals in the sample?
What’s revealed or discovered?
First, people love most of the high profile “names” or “brands.” Amazon is numero uno, the Google is number two, and YouTube (which is the Google in case you have forgotten is number three. So far, the data look like a name recognition test. “Do you prefer this unknown lye soap or Dove?” Yep, people prefer Dove. But lye soap may be making a come back.
The stunning finding is that Facebook and Twitter impact society in a negative way. Contrast this to lovable Google and Amazon, 72 percent are favorable to the Google and 70 percent are favorable to Amazon.
Here’s the data about which companies people trust. Darned Amazing. People trust Microsoft and Amazon the most.
Which companies do the homeless and people in rural West Virginia trust?
Plus 72 percent of the sample believe Facebook has too much “power.” What does power mean? No clue for the context of this survey.
Gentle reader, please, examine the article containing these data. I want to go back in time and reflect on the people who struggled in my statistics classes. Painful memories but I picked up some cash tutoring. I got out of that business because some folks don’t grasp numerical recipes.
Stephen E Arnold, March 2, 20020
Global CIO Survey: Surprises and Yawns
February 23, 2020
IT Brief published a story about a global CIO survey conducted by Logicalis, an integrator services IT company. The data appeared in “44% of CIOs Think Their AI Comprehension Not Very Successful.” Quite a headline. CIOs are able to give themselves a C minus or D plus in understanding artificial intelligence? Interesting. DarkCyber assumed its was As all the way.
Let’s look at some of the findings, and DarkCyber urges you to check out the original story for more of the data.
- Nine percent of the respondent “believe that their organization is very successful at comprehending the advantages of AI technology”
- 44 percent “believe their organization is not very successful at all” at comprehending the advantages of AI technology
- Internet of Things technologies (not defined, by the way) are used for crating new products, creating operational efficiencies, and enhancing existing products.
These data are based on a sample of 888 CIOs. DarkCyber does not know how these individuals were selected, how the questions were administered, or what methods were used to calculate the nice round percentages.
Stephen E Arnold, February 23, 2020
Want Facebook Statistics?
February 19, 2020
If you want a round up of Facebook statistics, take a look at “Facebook Statistics You Need to Know.” The data come from secondary sources. You may want to verify the factoids before you head to a job interview at Facebook. If you are applying for work at a social media company or a mid tier consulting firm, go with the numbers. Here are three which DarkCyber noted:
An okay, boomer number: People aged 65 and over are the fastest-growing demographic on Facebook
An Amazon wake up call: In the U.S., 15% of social media users use Facebook to shop
TV executive, are you in touch with viewer preferences? Square Facebook videos get 35% more views than landscape videos
No data are presented about the percentage of Mr. Zuckerberg’s neighbors in Palo Alto who dislike him, however.
Stephen E Arnold, February 19, 2020
Why Society Emulates Sheep: Quick Look That Up on Your Mobile Device
September 24, 2019
On a recent visit to Eastern Europe, I learned that in several countries, there was a hierarchy among shepherds. The job of watching sheep fell to those lower in the shepherd hierarchy. The person who could train horses and dogs, knew the ins and outs of the less-than-brilliant sheep, and showed some moxie — that individual was at the top of the sheep heap.
I read “The Distribution of Users’ Computer Skills: Worse Than You Think” and thought about shepherds and sheep. The Nielsen Norman Group reported:
Across 33 rich countries, only 5% of the population has high computer-related abilities, and only a third of people can complete medium-complexity tasks.
The idea is that those without expertise are likely to be sheep-like. Now sheepness is not a bad thing. Sheep are docile and seem content to go along with whatever the shepherd hierarchy decides. Even when getting shorn, the sheep can be controlled, and they don’t seem to form a group and wait for the person with the shears to turn his back so a stampede can nuke the individual with the shears.
But in today’s world with its technical hierarchy, the Nielsen Norman Group data suggest that a hierarchy exists for technology.
This is useful information for those at the top of the technology skill heap.
Think about the shepherd hierarchy. Which is better? The person with expertise or the freshly-shorn sheep? What is the likelihood that those with limited technical expertise can accurately perceive what today’s digital shepherds are doing.
Herding, shearing, or anticipating grilled lamb shank?
Stephen E Arnold, September 24, 2019
Factualities for August 14, 2019
August 14, 2019
Kick back at the beach, grab a pen, and craft some numbers.
The number of the week is:
3. The rank of medical error as a cause of death in the US. Source: Science Alert
Other notable confections, examples of sleeping in Statistics 101, and the deliria from spreadsheet fever are:
40. Number of Windows drivers which contain privilege of escalation vulnerabilities. Source: Neowin
60. The percent increase in fraud attacks on the food and beverage market. Source: Restaurant Technology
74. Percent of digital transactions handled by Amazon. Source: Search Engine Watch
90. Percentage of startups which fail. Source: Inventiva
200. The percentage increase in destructive malware attacks since January 2019. Source Silicon Angle from IBM
$880. Amount Verizon charged a library for less than 500 megabytes of “roaming” data. Source: ArsTechnica
10,000. Number of medical records lost by the New York Fire Department. Source: Engadget
42,000. Number of fake soldiers receiving pay in Afghanistan. These fakes are called “ghost soldiers.” Source: Military.com
$1 million. Amount Apple with pay for a specific iPhone exploit. Source: Digital Trends
$1.05 million. Amount the US Department of Energy has allocated to a blockchain energy management program. Source: Coin Telegraph
$3 million. Amount Facebook has allegedly promised specific publishers news to participate in a Facebook “news” service. Source: Apple Insider
$8.6 million. Amount Cisco Systems paid as a fine because its security product did not secure. Source: TechDirt
$1.5 billion. Palantir’s government contracts. Source: BizJournals from Lantinx (Note: Paywalls used to protect this high value data about a privately held company doing business related to some low profile work.)
$2 billion. The amount North Korea allegedly stole from cyber crime victims in order to pay for weapons. Source: Computing
$4.25 billion. Amount Apple spent on research and development in the June 2019 quarter1. Source: Apple Insider
$5.24 billion. Uber’s loss in a single 90 day period. In case you are wondering, that works out to more than $50 million per day. Source: MarketWatch
$16 billion. That’s the size of the blockchain solution market in 2023, a mere four years in the future. Evidence? Nope. Source: Crypto browser.io
20 billion. The number of data events Badoo handles each day. Yep, Badoo, not Baido. Evidence: Nah. Source: Infoq
Stephen E Arnold, August 12, 2019
DarkCyber for August 13, 2019, Now Available
August 13, 2019
DarkCyber for August 13, 2019, is now available at www.arnoldit.com/wordpress and on Vimeo at https://www.vimeo.com/353202530. The program is a production of Stephen E Arnold. It is the only weekly video news shows focusing on the Dark Web, cybercrime, and lesser known Internet services.
DarkCyber (August 6, 2019) reviews on way for organization compromised via ransomware to address the problem. The approach is free and can work in many cases. Europol, a number of national police agencies, and more than 20 commercial vendors have created NoMoreRansom.org. The site provides specific information and decryption methods for more than 100 widely used ransomware systems. Each of the decryption tools is available with a how-to user manual and links to the code required to decrypt the encrypted data. If a user cannot identify the specific malware used to attack an organization, the site includes a feature which can identify the specific ransomware used in an attack. For those unfamiliar with the mechanics of ransomware, the site includes a Frequently Asked Questions section. The information is clear, concise, and designed for a person with average computing expertise. Most system professionals will find the site intuitive and designed to allow quick access to the needed decryption tools.
Other stories in this week’s DarkCyber include:
Setting up a front company. DarkCyber reports that an online information service has published information explaining how to set up a front company in the US. Front companies or “fronts” are useful for tax evasion, money laundering, and fraud. Few states in the US require basic information about those setting up the front company. Data about directors of the company is not required in dozens of states. The procedure is simple, and in some states, the registration of the front company can be handled by a representative such as a law firm. Front companies are used to hide ownership of assets; for example, other companies.
The US government has published a report about the security lapses at Equifax, a credit checking service. The company lost more millions of customers’ personally identifying information. DarkCyber provides a direct link to this informative government report. Bad actors, however, may find the information in the report useful in determining how to attack a financial services firm in the US.
The United States Postal Service cyber intelligence team is adding tactics. The USPS will make us of some of the techniques popular with cyber criminals. The mail services in Western Europe and the US have been used to deliver contraband and enable other illegal activities. The new approach will make it possible for investigators to join closed forums and discussion groups and adopt other behaviors in wide use by bad actors.
Researchers at the University of California-Berkeley have developed a method for enhancing solar cells. With the new technology, drones could greatly extend their flight time. The technique enhances the voltage generated by solar cells using sophisticating reflective coatings and new manufacturing procedures. Surveillance drones, for example, could remain aloft for weeks or months, not hours and days.
A new multipart series about Amazon policeware initiative begins on November 1, 2019. Programs are available on Vimeo.com and YouTube.com.
The last program in this series will be on August 27, 2019. DarkCyber will return in November 2019 with a new series focused on Amazon’s policeware.
Kenny Toth, August 13, 2019
Factualities for August 7, 2019
August 7, 2019
The summer doldrums have had no suppressing effect on those spreadsheet jockeys, wizards of pop up surveys, and latte charged predictors.
Here’s our fanciest number of the week. It comes from an outfit called The Next Web:
1 billion. The number of people who watch esports. Esports are video games. Does Amazon Twitch, Google YouTube, and Ninja’s new home report verifiable data? Yeah, sure. Source: TNW
There was a close race for craziest. We have recognized a runner up, however, we marveled at this figure:
13. Percentage of apps on the Google Play app store which have more than 1,000 installs. And 13 apps have more than 10 million users. (How many Android phones are there in the world? More than 2 billion, if NewZoo data are “sort of correct.”) Source: ZDNet
Here’s our “normal” rundown of factualities:
(20). The percentage decrease in malware. Source: Computing UK
12. Minutes per hour devoted to TV commercials on the AT&T owned Turner television network. Source: Los Angeles Times
$5. The amount Google paid people for permission to scan their faces. Source: The Verge
33. The percentage of businesses running Windows XP which was rolled out in 2001. Source: Slashdot
50. The percentage of companies which do not know if their security procedures are working. Source: IT Pro Portal
50. The percentage of the cloud market which Amazon has. Source: Marketwatch
50. The percentage of “workers” who was half their time struggling with data. Source: ZDNet
82. Percentage of people who will connect to any free WiFi service available to them. Source: Slashdot
89. Percentage of Germans who think France is a trustworthy partner. Source: Reddit
100,000. Estimated staff IBM terminated. An unknown percentage of these professionals were too old to make IBM hip again. Source: Bloomberg
$8.6 million. Amount Cisco Systems had to pay for selling a security product which was not secure. Source: DarkReading
106 million. Number of people whose personal details were stolen in the Capital One breach of an Amazon AWS system. Source: Washington Post
250 million. Number of email accounts stolen by trickbot. Source: Forbes
1 billion. Number of people who watch esports (online games). Source: Next Web
$4.769 trillion. The net worth of 13,650 Harvard grads. Source: MarketWatch
Stephen E Arnold, August 7, 2019
Flawed Data In, Bias Out
August 3, 2019
Artificial intelligence is biased. AI algorithms are biased against non-white people as well as females. The reason is that the programmers are usually white males and it is usually an oversight to add data that makes their AI algorithms diverse. Silicon Republic shares a brand new ways that AI is biased, this time against poorer individuals: “Biased AI Reportedly Struggles To Identify Objects From Poorer Households.”
The biggest biased AI culprits are visual recognition algorithms built to identify people and objects. The main cause behind their biases is the lack of diverse data. The article points out how Facebook’s AI research lab discovered how biased data exists in internationally used visual object recognition systems. Microsoft Azure, Google Cloud Vision, Amazon Rekognition, Clarifai, and IBM Watson use algorithms that were tasked with identifying common household items from a global dataset. Information in the dataset included:
“The dataset covers 117 categories of different household items and documents the average monthly income of households from various countries across the world, ranging from $27 in Burundi to $10,098 in China. When the algorithms were shown the same product but from different parts of the world, the researchers found that there was a 10pc increase in chance they would fail to identify items from a household earning less than $50 versus one making more than $3,500 a month.”
This raises an interesting view on how the AI are programmed to identify objects. One example is identifying soap on different surfaces. In richer countries, soap was identified when it was in a soap pump dispenser on a tiled counter, but in poorer countries it was bar soap on a dirty surface. The AI was 20% more likely to identify objects in richer countries than poor ones. The difference increases with living rooms with a 40% accuracy difference and it is due to the lack of items in poorer homes. The programmers believe the bias is due to most of the data comes from wealthier countries and lack of information from poorer ones.
Is this another finding from Captain Obvious’ research lab? Is it possible to generate more representative datasets? Obviously not.
Whitney Grace, August 3, 2019