Whom Do We Trust? Facebook, Google, Others?
June 10, 2020
Internet giants Google and Facebook keep assuring us they respect our privacy, but can we trust them? Facebook, for example, just promised the personal data it is supplying to Covid-19 researchers, academics, and humanitarian agencies is stripped of any identifying information. Daijiworld reports, “Facebook Says Not Sharing Users’ Data with Researchers, Academics.” We’re told:
“Over the past few months, public health researchers have used data sets released by Facebook to inform decisions around Covid-19 across Asia, Europe and North America.”
However, we are assured, Facebook’s Data for Good program protects users’ anonymity:
“The social networking giant said it has created a differential privacy framework that protects the privacy of individuals in aggregated datasets by ensuring no one can identify specific people in these datasets. In 2017, the company launched ‘Data for Good’ with the goal of empowering partners with data to help make progress on major social issues. … Facebook said the research partners enrolled in the ‘Data for Good’ programme only have access to aggregate information from Facebook and it does not share any individual information.”
Sounds great—but are we to simply take Facebook’s word for it? The company is not exactly known for its transparency.
Meanwhile, Inventiva reports, “Google Is Sued for Secretly Amassing a Vast Trove of User Web Data.” Despite that company’s pledge that users are in complete control of their data, a complaint recently filed in federal court in San Jose claims otherwise. The plaintiffs accuse Google of invasion of privacy and violations of federal wiretapping law. Writer Apurva Saxena reports:
“Google surreptitiously amasses billions of bits of information –every day — about internet users even if they opt out of sharing their information, three consumers alleged in a proposed class action lawsuit. … According to the suit, the company collects information, including IP addresses and browsing histories, whenever users visit web pages or use an app tied to common Google services, such as Google Analytics and Google Ad Manager. This makes ‘Google “one stop shopping” for any government, private, or criminal actor who wants to undermine individuals’ privacy, security, or freedom,’ the consumers allege.”
Companies like Facebook and Google (one might add in Amazon for good measure) have obtained a great deal of power and revenue through data collection, and we have only their promises that they are not violating user privacy. Who will hold them accountable? We shall see how this lawsuit pans out; similar suits have been summarily dismissed.
Cynthia Murrell, June 10, 2020
A Smarter Captioning AI
June 10, 2020
Algorithms have been used to caption images for some time now. However, the results tend to be rather generic and treat images as separate from accompanying text. Tech Xplore reports on “A System to Produce Context-Aware Captions for News Images.” Journalist Ingrid Fadelli writes:
“Alasdair Tran, Alexander Mathews and Lexing Xie at the Australian National University have been trying to develop new systems that can generate more sophisticated and descriptive image captions. In a paper recently pre-published on arXiv, they introduced an automatic captioning system for news images that takes the general context behind an image into account while generating new captions. The goal of their study was to enable the creation of captions that are more detailed and more closely resemble those written by humans. … The three researchers went on to develop and implement the first end-to-end system that can generate captions for news images. The main advantage of end-to-end models is their simplicity. This simplicity ultimately allows the researchers’ model to be linguistically rich and generate real-world knowledge such as the names of people and places.”
Instead of ignoring rare words, the model analyzes them. The team eschewed the typical LTSM architecture for Transformer, a more recent architecture now used by language modeling and machine translation researchers. The shift allows for richer vocabulary and sentence structure. The team also worked to improve their model’s accuracy in identifying individuals in photos. This is particularly useful since, they found, most newspaper images feature people. The curious can check out a demo of the system, titled Transform and Tell.
Fadelli describes the researchers’ hopes for the system’s future:
“Tran, Mathews and Xie would also like to train their model to complete a slightly different task to that tackled in their recent work, namely, that of picking an image that could go well with an article from a large database, based on the article text. Their model’s attention mechanism could also allow it to identify the best place for the image within the text, which could ultimately speed up news publishing processes.”
The team also suggest their system could be used to extrapolate longer passages or summarize related background information. We are curious to see how this technology evolves.
Cynthia Murrell, June 10, 2020
Is AI Really Working?
June 10, 2020
I am tempted to point out that Microsoft’s artificially intelligent editorial system made an error. Confusing mixed race singers and doing some fancy dance steps behind the scenes reminds me that the company cannot update Windows 10 reliably.
But the more interesting AI (smart software) story is IBM’s announcement to the US Congress no less that it is not in the facial recognition business. Adios has arrived. “IBM Is Canceling Its Facial Recognition Programs” reports:
In a letter to Congress on Monday (June 9, 2020), IBM CEO Arvind Krishna said the company wants to work with lawmakers to advance justice and racial equity through police reform, educational opportunities and the responsible use of technology.
What the write up and similar ones did not mention is that IBM is the proud owner of IBM Watson. The question is, “Why is Watson, the billion dollar poster child of smart software, not able to deliver facial recognition that passes muster?”
There’s the virtue signaling value, of course. But the reliability of Watson and other IBM “smart” technology has to be viewed as apparently falling short of the corn hole target?
The reference to bias is a convenient way to get around accuracy issues, training costs, and inability of Watson-infused IBM solutions to deliver a nicely grilled burger when serving up fresh-from-the-can Spam.
Please, recall that IBM Watson wrote a cookbook. No, there wasn’t a recipe for “spam” in the tome. But there was tamarind which seems to beg for baloney.
Stephen E Arnold, June 10, 2020
Mobile Security Is Possible, But It Is Work
June 10, 2020
Ads are a pain on desktop devices, but they are even more annoying on mobile devices. The worst type of ads are the ones where the X is hidden, making it impossible to close the ad. Mobile ads are only getting worse as mobile devices become SOP and IT-Online shares more insight into the “Mobile Adware: The Silent Plague With No Origin.”
The article focuses on a research from the Check Point’s Cyber Security Report 2020 and the insights are alarming. According to the security report, 27% of companies experienced a security breach through a mobile device. What is even worse is that most companies do not prioritize mobile security, making mobile devices the most vulnerable area. Check Point’s regional director stated:
“ ‘It only takes one compromised mobile device for cybercriminals to steal confidential information and access an organisation’s corporate network,’ explains Pankaj Bhula, Regional Director: Africa at Check Point. ‘More and more mobile threats are created each day, with higher levels of sophistication and larger success rates. Mobile adware, a form of malware designed to display unwanted advertisements on a user’s screen, is utilised by cybercriminals to execute sixth-generation cyberattacks.’”
Adware is like a plague, because it can secretly be downloaded onto a phone and collect a user’s personal information from location to banking information. Adware is designed to sneak onto a phone and deleting it is harder than finding an X on an annoying ad. Adware sneaks onto mobiles devices through applications, usually through a device’s specific store.
It is smart advice to not download third party apps from unverified companies, especially ones discussed in ads or low download rates. Do not trust anything without researching it first.
Whitney Grace, June 10, 2020
Conferences: A Juicy Source of Intelligence?
June 9, 2020
Conferences are interesting. These face-to-face experiences are becoming virtual. After decades of operating off the radar for most attendees, the content of conferences is “suddenly” getting some love.
Decades ago, I worked at a company which produced a database called CPI or Conference Papers Index. That database was sold to another firm, and I am not sure if the original product persists 39 years later. Only a handful of customers accessed this product compared to our flagship databases ABI/INFORM and Business Dateline.
“Potential Organized Fraud in ACM/IEEE Computer Architecture Conferences” caused me to think about who (the people) and the companies (the outfits hiring the people) used CPI. Almost 40 years ago, the who and the companies were either government agencies from countries which now provide high technology to the US and other nation states and companies either based in the US with non-US owners or outfits with names difficult to connect to a particular discipline. Did I care 40 years ago? Nope. We wanted to sell that database for several reasons:
- Conference organizers were among the most disorganized and distracted outfits we tapped for information; for example, copies of talks, abstracts, and names and affiliations of speakers. Much effort and many “let’s have lunch” and “yes, we will send that information tomorrow.” Sorry, lesson learned. Conferences 40 years ago were a different content animal. Fiefdoms, ego centric owners who wanted “total control”, trade associations eager to serve their members and preserve their mostly concierge type jobs, and similar flora and fauna. Much remains unchanged even as conferences undergo Rona-ization.
- Customers were not plentiful. The customers the CPI attracted wanted more: More images, more full text, more presentation foils. Delivering more cost money and it was not clear that if we invested the money to get “more” information that it would be a profitable operation. My hunch is that indexes of conferences, including the wonky listings one can find on the Internet, are essentially useless. Why? Sponsors are not indexed consistently. Names of speakers are not included as searchable content. The presentations, if one is lucky, becomes a YouTube video, usually delivered with both lousy audio and video. Sigh. Conferences are today a black hole of content. Going into the virtual conference business just makes the black hole deeper and weirder than before Rona.
- Conference organization is a remarkable exercise in rejecting, begging, and scrambling. Each conference wants stars for the keynotes. Each conference wants new talent to deliver hot information. Each conference desperately needs sponsors; that is, people to pay for snacks (yuck), liquor (much loved by attendees except for virtual presentations unless a company FedExes bottles to an attendee-with-a-budget’s home), and lunch (now a weird buffet brown bag thing which hopefully will disappear from real and virtual events completely). The organizer wants to put on a stellar show but lacks the expertise, money, and organizational talent to pull off most events.
What’s the fix?
If the information in the write up is accurate, it seems — note the hedge word “seems” — that individuals, companies, and countries are doing everything in their power to get their hands on the same information that people told us to include in our Conference Papers Index.
Valuable data include:
- Abstracts of proposed talks, some submitted a year before an event in certain event cycles
- The actual draft presentations: Text, PDFs of the visuals, author’s biography, and author details
- Names of speakers, addresses, email, etc.
The blog post suggests that some fancy dancing has been underway in the rarified world of big tech at the ACM and IEEE computer architecture conferences.
The article is worth reading.
However, there is context for what amounts to intelligence exploitation.
The question is, “Will most conference organizers care?” Another question, “Will most conference organizers be sufficiently adept at addressing the alleged problem?”
DarkCyber has a tentative answer, “Nope. The sucking of conference data is an institutionalized behavior for many “experts,” their employers, some government entities, and even employees of conference companies.
Net net: Squeeze the fruit for informational juice.
Stephen E Arnold, June 9, 2020
Rounding Error? Close Enough for Horse Shoes in Michigan
June 9, 2020
Ah, Michigan. River Rouge, the bridge to Canada, and fresh, sparkling water. These cheerful thoughts diminished when I read “Government’s Use of Algorithm Serves Up False Fraud Charges.”
The write up describes a smart system. The smart system was not as smart as some expected. The article states:
While the agency still hasn’t publicly released details about the algorithm, class actions lawsuits allege that the system searched unemployment datasets and used flawed assumptions to flag people for fraud, such as deferring to an employer who said an employee had quit — and was thus ineligible for benefits — when they were really laid off.
Where did the system originate? A D student in the University of Michigan’s Introduction to Algorithms class? No. The article reports:
The state’s unemployment agency hired three private companies to develop MiDAS, as well as additional software. The new system was intended to replace one that was 30 years old and to consolidate data and functions that were previously spread over several platforms, according to the agency’s 2013 self-nomination for an award with the National Association of State Chief Information Officers. The contract to build the system was for more than $47 million. At the same time as the update, the agency also laid off hundreds of employees who had previously investigated fraud claims.
Cathy O’Neil may want to update her 2016 “Weapons of Math Destruction.” Michigan has produced some casualties. What other little algorithmic surprises are yet to be discovered? Will online learning generate professionals who sidestep these types of mathiness? Sure.
Stephen E Arnold, June 9, 2020
What Type of Content Is Plentiful? Ever Been to a Cow Barn?
June 9, 2020
DarkCyber enjoyed “Most Tech Content Is Bullshit.” The write up explains:
I saw developers taking other people’s solutions for granted. Not thinking twice about the approach, not bothering about analyzing it.
When asked about the behavior, the article highlights four common behaviors:
- It was in some article.
- I copy-pasted it from X.
- I was doing it in my previous project.
- Someone told me so.
Unfortunately, these four points cover the bases for odd, wrong, and off base information.
The logical error is “appeal to authority.” Information issued from someone perceived as authoritative may be accepted readily. Today some people believe just about anything available online.
Why is this human failing taking place? The write up provides four reasons:
- We are lazy.
- We don’t have time.
- It’s comfortable.
- We don’t believe in ourselves.
The problem is unlikely to be resolved. There are some minor concerns: Money, the pandemic, civil disturbances, and international tensions. Plus, I want to make clear that search engine optimization and a desire to be perceived as an expert are darned significant factors.
Net net: There’s little likelihood of rapid change. Social distance, wait for a bailout check, and be confident in your children’s future. No big deal. And that online fix for sluggish DNS look ups. Not to worry.
Stephen E Arnold, June 9, 2020
Brave Browsing Sniping
June 9, 2020
DarkCyber noted “The Brave Web Browser Is Hijacking Links, and Inserting Affiliate Codes.” The write up explains that the Brave browser is behaving in a way that is unseemly. The point is that a free Web browser is pitching privacy and at the same time performs some underhanded actions to generate revenue. The explanation of the digital sleight of hand is interesting and illustrates that those “gee, stuff is free” online users assume one thing and may find something different. The write up includes this list and suggestions for accessing Web sites in a non-Brave way. We quote:
There is no good reason to use Brave. Use Chromium — the open-source core of Chrome — with the uBlock Origin ad blocker. [Chromium download, uBO Chrome]
Or use Firefox with uBlock Origin — ‘cos it blocks more ads than the Chromium framework will let anything block. [uBO Firefox]
Or, if you want a really cleaned-out Chrome — ungoogled-chromium, with uBlock Origin. [GitHub]
If you’re on Android, use Firefox with uBlock Origin, or the new Firefox Focus browser. [Mozilla]
Brave is a browser for suckers who want to keep getting played — so it’s a 100% crypto enterprise. As Eich’s pinned tweet still tells us: “Who gets paid? If not you, then you’re ‘product’.” [Twitter]
DarkCyber is not sure if this comment is as ominous as it sounded to one DarkCyber researcher:
Brendan Eich has responded to this post by claiming “David lies about us all the time.” I have pointed out that this is a prima facie defamatory statement, and asked him to detail these claimed lies. [Twitter, archive]
Mr. Eich is the alleged perpetrator of the Brave misdeeds. Online marketing and advertising are fascinating disciplines.
Stephen E Arnold, June 8, 2020
DarkCyber for June 9, 2020, Is Now Available: AI and Music Composition
June 9, 2020
The DarkCyber for June 9, 2020, presents a critical look at music generated by artificial intelligence. The focus is the award-winning song in the Eurovision AI 2020 competition. The interview discusses the characteristics of AI-generated music, its impact on music directors, how professional musicians deal with machine-created music, and the implications of non-numan music. The program is a criticism of the state-of-the-art for smart software. Instead of focusing on often over-hyped start ups and large companies making increasingly exaggerated claims, the Australian song and the two musicians make clear that AI is a work in progress. You can view the video at https://vimeo.com/427227666.
Kenny Toth, June 9, 2020
Google Docs: More Than Enabler of Student Messages via Its Comments Function
June 8, 2020
Teachers are often befuddled by their students. Google Docs makes it possible for students to use the comments features to exchange interesting messages. When an adult approaches, a click makes the content disappear. Great for students, not so good for some teachers and parents.
“How Google Docs Became the Social Media of the Resistance” explains what may be another facet of the Googlers’ code Byzantium. Google is ubiquitous and most people don’t think too much or too long about the implications of collaborative tools for word processing and Excel-like software.
Boring, right?
The write up explains:
… Google Docs has emerged as a way to share everything from lists of books on racism to templates for letters to family members and representatives to lists of funds and resources that are accepting donations. Shared Google Docs that anyone can view and anyone can edit, anonymously, have become a valuable tool for grassroots organizing during both the coronavirus pandemic and the police brutality protests sweeping the US. It’s not the first time. In fact, activists and campaigners have been using the word processing software for years as a more efficient and accessible protest tool than either Facebook or Twitter.
Let’s assume that the article is accurate. Will Google take some action to control what its “users” do with its Microsoft Office clone? WWGD (what will Google do) is a new company watching sport. DarkCyber believes that it may become more interesting than bird watching.
Is information stored on Google Docs accessible for monitoring? Maybe Google is not responsible for what is users do? Hmmmm.
Stephen E Arnold, June 8, 2020