Quite an Allegation: Google, Brute Force!
November 23, 2019
I have been around very bright, quite proficient technologists for more than 50 years. Out of college, I worked at Halliburton’s nuclear unit. Real wizards were running around: John Gray, Julian Steyn, Paul Goldstein, and others. I did a stint at Booz, Allen. Maybe not in the nuclear physicist quartile but pretty bright. I did some other work at places with lots of bright people too. No ewok hunts. No weapons.
I can’t recall any threatening behavior. The only intimidation I experienced was a result of my boss and his brain. Dr. Sommers could pose questions which made we vow to spend more time reading and studying.
But Dr. William P. Sommers was soft spoken. He asked good questions and nudged me forward in my career. Maybe questions are brutal? Yikes! Questions.
I cruised along fat, dumb, and happy. But now I learn that a wizard Disneyland is into brutal behavior; specifically, “Google Workers Accuse Tech Giant of Using Brute Force Intimidation Tactics to Silence Employees.”
The headline seems a bit energetic, tinged with some inner motivation to stimulate bad vibes toward everyone’s favorite online advertising vendor. Here’s what the write up says that a protest was planned for November 22, 2019. DarkCyber learned:
The protest, which will see full-time employees, temporary workers, vendors and contractors come together outside the Google building, comes following the company’s decision to fire an employee for allegedly leaking workers’ personal information to the media and place two other workers on leave over their alleged access of documents unrelated to their roles.
Now this quote:
“The company is claiming that it is for looking up calendars and documents, which is something we all do but we know that it is punishment for speaking up for themselves and others,” workers organizing at Google who requested that their names be withheld, said in a statement shared with Newsweek on Thursday.
Poor Google. Newsweek asserts:
Google has faced widespread scrutiny over its internal culture, particularly after thousands of workers around the world staged walkouts over claims of sexual harassment, racism and gender inequality within the company.
Yeah, brutality, intimidation? Okay, employees take money to do work for their employer. Employers create rules so work can be completed. Employees who run afoul of the rules can and will face some feedback.
But the headline evokes an image of a Google executive dressed like one of Darth Vader’s minions firing plasma weapons at hapless Ewoks.
As the professionals on sports programs say, “Come on, man.”
Stephen E Arnold, November 23, 2019
The Cost of Indifference and the Value of Data Governance
November 23, 2019
The DarkCyber team suggests a peek at “Unsecured Server Exposes 4 Billion Records, 1.2 Billion People.” The write up states:
The data itself comes from the data aggregator and enrichment companies People Data Labs (PDL) and OxyData.Io and contains basic personal information, such as names, home and mobile phone numbers and email addresses and what may be information scraped from LinkedIn, Facebook and other social media sources.
The write up points out that the data losses included:
- Over 1.5 billion unique people, including close to 260 million in the U.S.
- Over 1 billion personal email addresses. Work email for 70%+ decision makers in the US, UK, and Canada.
- Over 420 million LinkedIn URLs.
- Over 1 billion Facebook URLs and IDs.
- 400 million plus phone numbers with more than 200 million U.S.-based valid cell phone numbers.
The hosting provider may have been Amazon AWS. The software system was Elasticsearch. The individuals were those who set up the system.
Without reploughing a somewhat rocky field, one might suggest that default settings for cloud services, software, and passwords need a rethink. One might want to think about the staff assigned to the job of setting up the system. One might want to think about the sources of the information the company named in the article tapped. In short, one could think about quite a few points of failure.
Another approach might be to raise the question of responsibility. I suppose this is a type of governance, a term which refers to figuring out what’s to be done and how to complete tasks without creating this all-too-common situation of whizzy systems’ functioning as convenience stores for those who want data.
A few observations:
First, the individuals involved in setting up this system were not, it seems, managed particularly well. That’s a problem when managers don’t know what to stipulate their contractors and employees must do to secure online services. These “individuals” work at different organizations. Thus, coordination and checks are difficult. But the alternative? Loss of data.
Second, the developers of the software understand the security implications of certain user actions. The fix is to purchase additional security. Security is not baked in. Security is an option. That approach may generate revenue, but the quest for revenue seems to have a downside. Loss of data.
Third, the operators of the cloud system continue to follow the “just a platform” approach to business. The idea is that the functionality of a cloud system makes it easy to deploy an application. In a hurry? No problem. Use the basics. Want something special? That takes time, and when done in a careless or partial way, loss of data.
It seems that “loss of data” may be preventable but loss of data is part of the standard operating procedure in the present managerial environment.
How does the problem become lessened? Governance. Will companies and individuals step up and go through the difficult task of figuring out what and how before losing data?
Unlikely. Painful lessons like the one revealed in the source article slip like rain water off the windshield of a car speeding down the information superhighway.
Dangerous? Sure. Will drivers slow down? Nope. The explanation after an accident was, “I don’t know. Car just skidded.” There’s insurance for automobile accidents. For cloud data wrecks, no consequences of a meaningful nature. Just blog posts. These are effective?
I will be talking about how the tendrils of the Dark Web and security lapses may create a greater interest in data governance. Exciting? Only if you were one of the billion or so whose personally identifiable information was put online in a less than secure way. I will be at the DG Vision Conference in Washington, DC, early in December 2019.
Stephen E Arnold, November 23, 2019
AWS: Working Like a Mainframe? Maybe
November 22, 2019
Is it possible to mainframe-ize AWS? Batch operations and similar useful method? If you think the answer is, “No,” you may want to read “How NextRoll Leverages AWS Batch for Daily Business Operations.” The DarkCyber team has been laboring in the AWS data marketplace, and we were delighted to read this post in NextRoll.
We noted this passage:
The freedom of stack is one of them. As long as you can package your requirements into a Dockerfile, and build that docker image, you can deploy it to Batch. This enables us to use a wide variety of technologies in our stack. We have jobs that are written using C/C++, Python, Rust, GoLang, Haskell, Java and other programming languages. Freedom in the way that data is being processed is another reason.
The write up invokes the open source goodness some associate with AWS.
Net net: AWS delivers some useful functions. Mainframers may disagree, but batch is batch.
Stephen E Arnold, November 22, 2019
Light Bulb On. Consumers Not Thrilled with What They See
November 22, 2019
We cannot say this comes as much of a surprise. Citing a recent Pew survey, Fortune reports, “Americans to Companies: We Don’t Trust You With Our Persona Data.” Any confidence the public had that companies can safeguard personal data has been eroded by news of data breach after data breach. On top of that, many consumers have noticed how eerily accurate targeted ads have become due to unannounced data sharing by the likes of Facebook and Google. Writer Danielle Abril tells us:
“The Pew survey, based on responses from 4,272 U.S. adults between June 3 and June 17, found that most Americans doubt that companies will publicly admit to and take responsibility for mismanaging their data. Seventy-nine percent of respondents said they have little to no confidence that businesses will do the right thing. And even though many continue to exchange their data for services and products, 81% of people feel the risks now outweigh the benefits of the exchange. The sentiments appear have intensified over time, as 70% of those surveyed said they feel that their personal information is less secure than it was five years ago. … The survey found that 83% of respondents frequently or occasionally see ads that appear to be based on profiles companies created using their personal data. And of that group, 61% say that the ads are somewhat or very good at accurately reflecting their interests. But that doesn’t mean that people actually want companies using their data this way. More than eight in 10 people are concerned about the information social media companies and advertisers know about them.”
Pointing to user agreements, companies insist they are playing by the rules. They are not wrong, but they are quite aware how opaque those agreements are to most consumers. Over 80 percent of respondents say they are asked each month to agree to one privacy policy or another, and a third say they do so weekly. However, most only skim the policies, at best. Of those who do read them through, more than 85 percent only partially understand them. While it is true that, legally, it is on the consumers to understand what they are signing, tech companies could certainly make it easier. They won’t, though, as long as they can profit from users’ confusion.
Cynthia Murrell, November 22, 2019
Google Discovers Radio
November 22, 2019
We noted The Hindu BusinessLine story “As News Consumption Patterns Change, Google Launches Audio Streaming of News.” Now radio is broadcast. The Google approach uses the Internet. But audio is audio. And audio evokes images of radio technology. The family may not huddle around the glowing vacuum tubes, but the experience is similar. Yes, we know that radio brought some families together in a shared auditory experience.
No gray T shirts and khakis for this family unit of radio listeners.
The Google approach surfs on the trend for isolation. Hey, islands of existence are good, right?
The write up points out:
As people take to podcasts and digital audio content, Google has launched audio news broadcasts. All a user needs to do is to ask Google Assistant to ‘Play the news’, and it begins streaming news. Google has tied up with global media houses such as BBC to provide content to users.
DarkCyber wonders if any other high tech companies have stumbled upon this innovation. Yes, yes, Amazon Alexa can do radio. I think I saw the neighbor’s kid asking his iPhone to play something, maybe Cambridge University’s Naked Scientist. Disappointed kid? I don’t know.
The write up quotes a Googler as stating:
“The audio web is like the text web of the 1990s. At Google, we saw an opportunity to help move digital audio forward by focusing on audio news.
Does anyone hear Eureka arising over the sounds of employee protests?
Stephen E Arnold, November 22, 2019
Google Management Method Called Interrogation by CNBC
November 21, 2019
DarkCyber, happily ensconced in rural Kentucky, does not know if the information in “Google Employees Protested the Interrogation of Two Colleagues by Company’s Investigations Team, Memo Says” is accurate.
But the headline alone is quite interesting. The news story states:
The memo said Berland’s [a Google employee objecting to certain Google projects] questioning lasted 2.5 hours and was conducted by Google’s global investigations team, which allegedly told the employees that they were “not decision-makers” but that they would relay the workers’ message “up the chain.”
The memo seems to have been written by Googlers unhappy with the interaction of some Google professionals and two employees who had voiced concerns about the company’s work for the US government.
Please, read the original CNBC story.
DarkCyber jotted down several observations while two of my team and I tried to figure out who was on first:
1. The meeting was described as an interrogation. That in itself is an interesting word. Maybe interrogation is the wrong word, but it is clear that the meeting was not the equivalent of what my mother called a “kaffeeklatsch.”
2. The meeting involved an investigations team. DarkCyber did not know that Google had such a team, but presumably CNBC is confident that the ever popular online advertising company does. Does the investigations team have a uniform or maybe a badge with the cheerful Google logo?
3. Two and a half hours. My goodness. That’s longer than many feature films. The length of time brings some images to the forefront of the DarkCyber team’s hive mind. Here’s one that one of the programmer analysts called up from his Apple iPhone. (The objectivity of the iPhone search function must be considered, if not investigated.)
A cheerful setting for an informal chat or not?
Net net: If the CNBC story is accurate, Google’s management methods are quite interesting. Not even the high school science club to which I belonged in 1958 considered interrogation of non science club members. Grilling a science club member was simply not on our club members’ radar.
How times have changed!
Stephen E Arnold, November 21, 2019
Amazon Rolls Out an Online Data Market
November 21, 2019
Here is some interesting news from Amazon Web Services. Inside Big Data reports, “Introducing AWS Data Exchange.” Third-party data has become integral to the processes of research, analytics, and machine-learning models for businesses and academic institutions, but the process of tapping into that data has been cumbersome and time-consuming. Organizations have had to establish and manage relationships with disparate data providers, and those providers have had to invest fortunes in marketing and technology to reach and serve customers. The AWS Data Exchange brings all these processes together on Amazon’s cloud platform. This will bring welcome simplicity to data providers and consumers alike while positioning AWS as an indispensable resource.
Oracle has a data marketplace too.
Through the AWS Marketplace, customers will be able to subscribe to popular data providers including Reuters (news data), Change Healthcare (healthcare transactions and claims), Dun & Bradstreet (global business records), Foursquare (location data), TruFactor (anonymized consumer data), and Pitney Bowes (demographics). Clearly, these data vendors represent a diverse assortment of data types to meet a wide range of needs. The API also integrates into certain third-party analytics platforms, like Databricks and Deloitte’s ConvergeHEALTH Miner. See the write-up for more on each of these resources. We also learn:
“Prior to subscribing to a data product, customers can review the price and terms of use that providers make publicly available. Once subscribed, customers can use the AWS Data Exchange API or console to ingest data they subscribe to directly into Amazon Simple Storage Service (Amazon S3) to use across the broadest and deepest portfolio of cloud services in AWS. Each time a provider publishes a new revision of their data, AWS Data Exchange notifies all subscribers via an Amazon CloudWatch Event, allowing them to automatically consume new revisions in their data lakes, applications, analytics, and machine-learning models running on AWS. Data subscription costs are consolidated in customers’ existing AWS invoice. Additionally, customers can ask their data providers to deliver their existing subscriptions to them using AWS Data Exchange at no cost. This enables customers to use AWS Data Exchange to consume all their third-party data in the AWS cloud using a single API. AWS Data Exchange also makes it easy for qualified data providers to securely package, license, and deliver data products to millions of AWS customers worldwide. AWS knows that customers care deeply about privacy and data security. AWS Data Exchange prohibits sharing sensitive personal data (e.g. personal health information) as well as any personal data that is not already lawfully and publicly available.”
The exchange also lets data providers publish their data on their terms, including private offers and custom terms for certain customers. They have the ability to review use cases and manage compliance needs, and will receive daily, weekly, and monthly reports on subscription activity. Perhaps most welcome to some, AWS will manage billing, collection, and secure data delivery. This development will make a big difference for many organizations; Amazon must be pretty pleased with itself.
Cynthia Murrell, November 21, 2019
Info Extraction: Improving?
November 21, 2019
Information extraction (IE) is key to machine learning and artificial intelligence (AI), especially for natural language processing (NLP). The problem with information extraction is while information is pulled from datasets it often lacks context, thusly it fails to properly categorize and rationalize the data. Good Men Project shares some hopeful news for IE in the article, “Measuring Without Labels: A Different Approach To Information Extraction.”
Current IE relies on an AI programmed with a specific set of schema that states what information needs to be extracted. A retail Web site like Amazon probably uses an IE AI programmed to extract product names, UPCs, and price, while a travel Web site like Kayak uses an IE AI to find price, airlines, dates, and hotel names. For law enforcement officials, it is particularly difficult to design schema for human trafficking, because datasets on that subject do not exist. Also traditional IE methods, such as crowdsourcing, do not work due to the sensitivity.
In order to create a reliable human trafficking dataset and prove its worth, the IE dependencies between extractions. A dependency works as:
“Consider the network illustrated in the figure above. In this kind of network, called attribute extraction network (AEN), we model each document as a node. An edge exists between two nodes if their underlying documents share an extraction (in this case, names). For example, documents D1 and D2 are connected by an edge because they share the extraction ‘Mayank.’ Note that constructing the AEN only requires the output of an IE, not a gold standard set of labels. Our primary hypothesis in the article was that, by measuring network-theoretic properties (like the degree distribution, connectivity etc.) of the AEN, correlations would emerge between these properties and IE performance metrics like precision and recall, which require a sufficiently large gold standard set of IE labels to compute. The intuition is that IE noise is not random noise, and that the non-random nature of IE noise will show up in the network metrics. Why is IE noise non-random? We believe that it is due to ambiguity in the real world over some terms, but not others.”
Using the attributes names, phone numbers, and locations, correlations were discovered. AI systems that have dependencies creates a new methodology to evaluate them. Network science relies on non-abstract interactions to test IE, but the AEN is an abstract network of IE interactions. The mistakes, in fact, allow law enforcement to use IE AI to acquire the desired information without having a practice dataset.
Whitney Grace, November 21, 2019
Microsoft Search: Still Playing an Old Eight Track Cassette?
November 20, 2019
How many times has DarkCyber heard about Microsoft’s improved search? Once, twice? Nope, dozens upon dozens. Whether it was the yip yap about Fast Search & Transfer, Colloquis and its natural language processing, Powerset and its semantic search system, Semantic Machines for natural voice functions, or the home brew solutions from hither and yon in the Microsoft research and development empire. There’s Outlook search and Bing search and probably a version of LinkedIn’s open source search kicking around too.
But that’s irrelevant in today’s “who cares about the past?” datasphere. DarkCyber noted “Here’s How Microsoft Is Looking to Make Search Smarter and More Natural.” What is smart search? An abrogation of user intentions? What is more natural? Boolean logic, field codes, date and time metadata, and similar artifacts of a long lost era seem okay for the DarkCyber team.
The write up explains in its own surrealistic way:
Microsoft’s ultimate goal with Microsoft Search is to provide answers not just to simple queries, but also more personalized, complex ones, such as “Can I bring my pet to work?”. The Microsoft Graph API, semantic knowledge understanding from Bing, machine-reading comprehension and the Office 365 storage and services substrate all are playing a role in bringing this kind of search to Microsoft’s apps.
Yeah, okay. But enterprise SharePoint users still complain that current content cannot be located. The current tools are blind to versions of content residing on departmental servers or parked in a cloud account owned by the legal department. And what about the prices just quoted by an enterprise sales professional? Sorry. You are out of luck, but Microsoft is… trying.
Now grab this peek into the future of Microsoft search:
Turing in Bing already has helped Microsoft to understand semantics via searching by concept instead of keyword. Natural-language processing also has helped with understanding query intent, she noted. Semantic understanding means users don’t have to expect exact word matches. (When searching for Coke, matches with “canned soda,” also could be part of the set of results generated, for example.) The Turing researchers are employing machine reading, as well, to help with contextual search/results.
The chaotic and often misfiring Microsoft search technologies do one thing well: Generate revenue for the legions of certified Microsoft partners.
Users? Yeah, Microsoft may help you too. In the meantime, the lawyers will manage their own contract drafts and eDiscovery materials. The engineers will stick with the tools baked into AutoCAD type systems? The marketers will do what marketers in many companies do? Stuff data on USBs, into the Google cloud, or copy the files to a shared folder on a former employee’s desktop. Yes, it happens.
Microsoft and search. Getting better. Here’s a snippet about Powerset (CNET, 2008)
Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset’s Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.
Time for a new eight track tape?
Stephen E Arnold, November 20, 2019
Turkey Surveillance: No, Not the Bird Watching Context
November 20, 2019
A company that makes surveillance software and sells it assorted governments, FinFisher, is fighting back against Netzpolitik, a website working to hold such companies accountable. Bloomberg declares, “Clash Over Surveillance Software Turns Personal in Germany.” Netzpolitik and several advocacy groups filed a criminal complaint against FinFisher, alleging it had sold its spyware to Turkey without the required German federal license. Such complaints are not new, but this one named names within FinFisher as responsible parties. An investigation has been opened by Munich prosecutors.
Not only does FinFisher deny supplying Turkey with spyware, it also claims Netzpolitik is unjustly prejudicing the investigation. It issued a cease-and-desist letter demanding an article about the Turkey allegations be taken down. Though the site’s owner insists the reporting is accurate, he removed the article to avoid the legal fight and a potential injunction. Reporter Ryan Gallagher writes:
“Netzpolitik filed the complaint against FinFisher in collaboration with Reporters Without Borders Germany, the Society for Civil Rights and the European Center for Constitutional and Human Rights. It alleges that covert operators of FinFisher’s technology set up a fake Turkish-language opposition website and Twitter accounts that were used to lure government critics into clicking on a malicious link. It isn’t clear who created the website and social media profiles. FinFisher says it ‘partners exclusively with Law Enforcement and Intelligence Agencies,’ according to its website.
“People who clicked the link — sent through the fake Twitter accounts to supporters of the opposition Republican People’s Party — were prompted to download an Android application that was in fact surveillance software, which would monitor their calls, text messages, photos, and location data, according to a technical report published by the digital rights group Access Now. Source code found on the website used to target the Turkish activists was ‘practically identical’ to the source code of FinSpy, surveillance software developed by FinFisher, the complaint alleges.”
FinFisher is no stranger to scrutiny. News articles have been written, advocacy group reports have been issued, and a WikiLeaks data release has been lobbed. Just recently, Reuters linked the company’s tech to an Uzbekistan agency’s effort to spy on activists and journalists. FinFisher claims it no longer trucks with governments outside the EU unless they are an “EU-001” designated country. (That list includes the likes of Australia, Canada, Japan, New Zealand, Norway, Switzerland, and the U.S.) Though other countries may retain old versions of the technology, AccessNow’s chief technologist notes that licensing restrictions and required updates would make them difficult or impossible to use without FinFisher’s support.
Cynthia Murrell, November 20, 2019