Building Data Sets

March 14, 2019

I read “Why Is It Legal to Collect Data on Kids, Let Alone Sell It?” The write up comes from a person who contributed in some way to the fine operation that Facebook embodies. Now that person is asking questions about building databases.

I noticed this quote, allegedly made by one of the Facebookers past:

“Why is it okay for credit card companies to sell financial records?” McNamee said at the South by Southwest conference in Austin over the weekend. “Why is it legal for cell companies to sell location data? Why is it legal for companies that make apps for health and wellness to sell or trade our data? Why is it legal for anybody on the web to transact in our web history? Why is it legal to collect data on kids under 18, much less sell it?”

You can read the rest of the article, but I want to offer some answers to these questions; to wit:

  1. Because we can collect and build databases. Users are too stupid to know what we are doing.
  2. Because there are no consequences. Regulators and lawyers are as clueless as the users.
  3. Because it is easy if you are smart like us. Anyone not working at a Google- or Facebook-type company or an adviser to one of these outfits is not going to be able to keep up with us.
  4. Because we want do things which make people like us feel cool. Snort, snort, snort.
  5. Because we never understood the silliness related to any philosophical bedrock other than the mantra of “me, me, me” or what I call digital existentialism. One is what one does to attract attention from those just like “one.”
  6. Because it is cool to do the mea culpa thing in public.

Stephen E Arnold, March 14, 2019

OpenAI Smart Software: Too Good, Too Dangerous, Now a Company

March 14, 2019

In February 2019 we tracked articles like this:

Elon Musk-Backed Software Can Churn Fake News Stories and Is ‘Too Dangerous to Release’.”

Allegedly an open-source project backed by the Tesla titan, called OpenAI, can write believable bologna in under 20 seconds with just a small seed to from which to extrapolate. For example, we’re told the software generated a plausible, seven-paragraph news story after being fed these two (false) sentences: “A train carriage containing controlled nuclear materials was stolen in Cincinnati today. Its whereabouts are unknown.” See the article for a couple more examples—my favorite is the one about a long-lost herd of unicorns supposedly found in the Andes. On the plus side, the technology could be effectively used for tasks like creative-writing, proofreading, translations, and summaries—if it is ever deemed safe to release, that is. Writer Tyler Durden describes some of the technical details:

“The software creation is trained in language modeling, which involves predicting the next word or piece of text based on knowledge of all previous words, the same way your auto-complete works on your phone, Gmail account or in Skype. … “As Gizmodo notes, the researchers used 40GB of data pulled from 8 million web pages to train the GPT-2 software. That’s ten times the amount of data they used for the first iteration of GPT. The dataset was pulled together by trolling through Reddit and selecting links to articles that had more than three upvotes. When the training process was complete, they found that the software could be fed a small amount of text and convincingly continue writing at length based on the prompt. It has trouble with ‘highly technical or esoteric types of content’ but when it comes to more conversational writing it generated ‘reasonable samples’ 50 percent of the time.”

The nonprofit OpenAI was formed in 2016 by Musk and Sam Altman. We’re told they do hope to release the software eventually, but are seeking advice from the AI community on how to handle it.

What’s the most recent development? Venture Beat ran this story:

OpenAI Launches new Company for Funding Safe Artificial General Intelligence.”

Innovation or marketing? We know what DarkCyber believes is the answer.

Cynthia Murrell, March 14, 2019

A Justification of Making Things Up?

March 13, 2019

I read “Gut Feelings Often Trump Real Data in Driving Business Decisions, Says Forrester.” The write up is interesting for several reasons. First, Forrester, like other mid tier consulting firms, generates reports about companies with more subjective than objective data. Examples range from pricing data, information from customers about the product or service offered by a company, and concrete information about management compensation, financial performance, and similar data. The metaphor of a wave is compelling but data within would be helpful.

Second, the notion of “real data” underscores that talk about data is often just that—chatter, jargon, baloney. “Real data” are difficult to obtain. For example, a company provides a system which tracks and indexes content in the “hidden Web.” What’s the benchmark? How much data are tracked? How much are not indexable? Other questions like this can be answered but time and money are one hurdle. The real reason is that no one wants to make the effort to get data which can be analyzed and then evaluated in head to head comparisons. “Real data”, such as information spewed from financial analysis spreadsheets, is not examined with care. Dig in and the numbers can wobble. Did a scrutinized company actually cut expenses, or does the spreadsheet report that data in bucket A went away and data in bucket B became larger?

Third, the write up itself emphasizes that visualization, not grubby numbers is where the action is. The future of analysis may be an anigif showing the harried decision maker what he or she needs to know. Who has time to work through data by hand, then comparing those data to other information from other sources?

Quite a write up. Interesting implications. Subjective analysis washes away facts in my experience.

Stephen E Arnold, March 13, 2019

Factualities for March 13, 2019

March 13, 2019

Ah, accurate data. Some gems to ponder.

68 percent. The percentage of those 18 to 37 are comfortable letting smart software access personal data to improve a “customer experience”. Source: Forbes

$17 million. Cost of voice search errors. Source: Forbes

44 percent. The percentage of votes cast online in Estonia’s most recent election. Source ZDNet

2 million. Uyghur’s not detained in China, just in the process of reeducation. Source: USNews

15 million. Number of Android devices with fraudulent advertising software installed. Source: Bitsight

18 percent. Percentage of Facebook users who are teens, an increase of 13 percent from a previous statement. Source:

$137 million. Amount of money not in the deceased crypto CEO’s laptop digital currency account. Source: Business Insider

700. Number of digital currency money laundering cases in 2018. Source: Finance Magnates

15 million. Number of Facebook users lost between 2017 and 2019. Source Marketplace

18 percent. Percentage of teens using Facebook, an increase of 13 percent from a previous statement. Source: Techcrunch

6,000 gigabytes. Amount of data stolen from Citrix’s internal server. Source: The Register

Stephen E Arnold, March 13, 2019

US Government Slow In Adopting Big Data?

March 13, 2019

We are not sure if this is good news or bad news. But the United States may be slow in adopting new technology and policies. The IRS is one government branch that is leveraging big data with actual results. Mondaq shares the IRS’s data analysis in the article, “United States: States Follow The IRS In Joining The Big Data Revolution.”

The IRS has used data analysis since the 1960s to select taxes to adult. As the technology advanced over the years, it has caught more errors and corrected them without any human involvement. The IRS created a new data analysis projected dubbed the Nationally Coordinated Investigation Unit (NCIU). NCIU will focus on using external data and the IRS to select criminal investigations. They also signed a $99 million deal with Palantir. With Palantir’s technology, the IRS will analyze and search terabytes of data on internal and external data sources on a single platform. The IRS is not only data mining for criminal activities. Big data is also being used for civil audits and predict outcomes on cases referred to the IRS Office of Appeals.

State governments have followed the IRS and implemented their own tax data analysis projects. Many of them have already caught fraudulent returns and so far state governments have saved sizable chunks of cash. These data analysis implementations are great, but there are still limitations. We learned:

“Like the IRS, many state departments of revenue have faced significant budgetary pressure in recent years, as governments have tried to cut down the size and cost of government, and have turned to technology to fill the gap. As powerful as data analytics are, however, there is a limit to the extent they can replace human investigators. In 2016, for example, the Arizona Department of Revenue began to lay off dozens of auditors and tax collectors, citing budget cuts. The result was a catastrophe, as audit collections dropped nearly 47 percent—$82 million—in 2017. The IRS itself has taken a markedly different approach: IRS CI has recently announced a hiring blitz, in the course of which it will hire 250 special agents, a number of data scientists, and over 100 professional staff.”

Big data analysis will become a significant tool in the future for the IRS and local tax offices. Good or bad? Excellent question.

Whitney Grace, March 13, 2019

The Search Wars: When Open Starts to Close

March 12, 2019

Compass Search. The precursor. The result? Elasticsearch. No proprietary code. Free and open source. The world of enterprise search shifted.

As a result of Shay Bannon’s efforts, an alternative to proprietary search and interesting financial maneuvers, an individual or organization could download code and set up a functional enterprise search system.

There are proprietary search systems available like Coveo. But most of the offerings are sort of open sourcey. It is a marketing ploy. The forward leaning companies do not use the word search to market their products because zippier functionality is what brings tire kickers and some buyers.

The landscape of search seems to be doing its Hawaii volcano act. No real eruption buts shakes, hot gas, and cracks have begun to appear. The lava flows will come soon enough.

a bezos art

The path is clear to the intrepid developer.

The tip off is Amazon’s announcement that it now offers an open distro for Elasticsearch. Why is Amazon taking this step? The company explains:

Elasticsearch has become an essential technology for log analytics and search, fueled by the freedom open source provides to developers and organizations. Our goal is to ensure that open source innovation continues to thrive by providing a fully featured, 100% open source, community-driven distribution that makes it easy for everyone to use, collaborate, and contribute.

DarkCyber’s briefings about Amazon’s policeware initiative suggest that the online bookstore is adding another component to its robust intelligence system and services.

The move involves or will involve:

  • Entrepreneurs who will see Amazon as creating low friction for new products and services
  • Partners because implementing search can be a consulting gold mine
  • Users
  • Developers who will use an Amazon “off the shelf” solutions
  • Competitors who may find the “other open source” Elasticsearch lagging behind the Amazon “house brand”.

The move is not much of a surprise. Amazon seeks to implement its version of IBM’s 1960s style vendor lock in. Open source is open source, isn’t it? A version of the popular Elasticsearch system which has utility in commercial products to add ons which help make log files more mine-able. Plus search snaps into the DNA of the Amazon jungle of services, functions, features, and services. Where there is confusion, there are opportunities to make money.

Adding a house brand to its ecosystem is a basic tactic in the Amazon playbook. Those T shirts with the great price are Amazon’s, not the expensive stuff with a fancy brand name. T shirts and search? Who cares?

What’s the play mean for over extended proprietary search systems which may never generate a pay day for investors? A lot of explaining seems likely.

What the play mean for Elastic, the company which now operates the son of Compass Search? Some long off site meetings may be ahead and maybe some chats with legal eagles.

What’s the play mean for vendors using Amazon as back end plumbing for their enterprise or policeware services? A swap out of the Elasticsearch system for the Amazon version could be in the cards. Amazon Elasticsearch will probably deliver fewer headaches and lost weekends than using the Banon-Elastic version. Who wants headaches in an already complex, expensive implementation?

The Register quotes an evangelist from AWS as saying:

“We will continue to send our contributions and patches upstream to advance these projects.”

DarkCyber interprets this action and Amazon’s explanations from the perspective and context of a high school football coach:

“Front line, listen up, fork that QB. I want that guy put down. Hard. Let’s go.”

Amazon. The best defense is a good offense, right?

The coach shouts:

“Let’s hit those Sheep hard. Arrrgh.”

Stephen E Arnold, March 12, 2019

DarkCyber for March 12, 2019, Now Available

March 12, 2019

DarkCyber for March 12, 2019, is now available at www.arnoldit.com/wordpress and on Vimeo at https://www.vimeo.com/322579803 ,

The program is a production of Stephen E Arnold. It is the only weekly video news shows focusing on the Dark Web, cyber crime, and lesser known Internet services.

This week’s story line up includes: Cellebrite devices for sale on eBay; emojis can activate app functions; and sources selling bulk personal data.

The feature this week discusses speech analysis. Reports have surfaced which reveal that some US correctional facilities are building databases of inmates’ voice prints. The news appeared coincident with rumors that the US National Security Agency was curtailing its voice collection activities. Companies like Securus Technologies provide tools and services related to prison telephone and unauthorized mobile device use. The Securus Investigator Pro has been available and in use for almost a decade. Voice print technology which is analogous to a digital fingerprint system makes it possible to identify those on a call. Inclusion of behavioral tags promises to make voice print systems more useful. With a tag for the caller’s emotional state, investigators can perform cross correlation and other analytic functions to obtain useful information related to a person of interest.

Links are provided to explanations of Amazon’s policeware system which can be used to perform these types of analytic operations.
The final story provides a snapshot of a 100 page field manual about online deception. Published by the US Army, this document is a comprehensive review of systems and methods for military use of deception in an online environment. Checklists and procedural diagrams make clear why social media operations are successful in civilian and military contexts. The DarkCyber video includes a link so viewers can download this unclassified publication.

Kenny Toth, March 12, 2019

MIT Watson Widget That Allegedly Detects Machine Generated Text

March 11, 2019

The venerable IBM and the even more venerable MIT have developed a widget that allegedly detects machine generated texts. You can feed AP stories into the demo system available at this link. To keep things academic, a bogus text will have a preponderance of green highlights. Human generated texts like academic research papers have some green but more yellow orange and purple words. A clue for natural language generation system developers to exploit? Just a thought.

Here’s the report for the text preceding this sentence. It seems that I wrote the sentence which is semi reassuring. I am on autopilot when dealing with smart software purporting to know when a Twitter, Facebook, Twitch or Discord post is generated by a human or a bot.

image

I deleted the digital heart because I don’t think a humanoid at either IBM or MIT generated the icon. The system does not comprehend emojis but presents one to a page visitor.

Watson can you discern the true from the false? I have an IBM Watson ad somewhere. Perhaps I will feed its text into the system.

Stephen E Arnold, March 11, 2019

MSFT Harbors Crypto Mining in Third Party Apps

March 11, 2019

For those people not deep in the weeds, crypto currency mines are these shadowy pockets of servers that are out of our grasp, literally and figuratively. However, it was recently discovered this type of operation is a lot closer to home than most of us assume, and that’s a problem for security and intelligence professionals. We learned more from a recent TechRadar story, “Microsoft Store Apps Caught Illegally Mining Crypto Currency.”

According to the story:

“[U]nbeknownst to the users that download these apps, they secretly use the processors of the PC they are installed on to mine for crypto currency. According to Symantec, these apps come from three developers: DigiDream, 1clean and Findoo, and it is likely they were developed by the same person or group due to the malicious code Symantec found.”

A more meaningful review of apps in the Microsoft Store seems to be needed. Expensive? Yes. Likely to happen? Maybe.

Patrick Roland, March 11, 2019

Amazonia, March 11, 2019

March 11, 2019

Chug chug chug goes the Bezos bulldozer.

Pop Ups Go Flat

Amazon said that it will shutter 87 of its pop up stores. Source: CNBC

All Hail, Annapurna

Amazon’s AWS success is a result of an acquisition. Forbes makes the complex simple. “How an Acquisition Made by Amazon in 2016 Became the Company’s Secret Sauce.” The “sauce” is Infrastructure as a Service or IaaS. The idea is managing hardware via meta-software. The idea is to knit together diverse entities and customer chips so one can manage services more efficiently.

Going to War for JEDI

The JEDI deal has been chugging along for … too long. Amazon, according to Bloomberg, is becoming more aggressive in an old fashioned way. “Amazon Is Flooding DC with Money and Muscle: the Influence Game” reports that

Federal records show that Amazon.com Inc. lobbied more government entities than any other tech company in 2018 and sought to exert its influence over more issues than any of its tech peers except Alphabet Inc.’s Google. Last year, Amazon spent $14.2 million on lobbying, a record for the company, up from its previous high mark of $12.8 million in 2017. The $77 million that the nine tech companies in the charts below spent in 2018 to lobby Washington looks minuscule next to the $280 million spent by pharmaceutical and health-care products companies. Tech has, however, pulled ahead of the $64 million that commercial banks spent—and Amazon in particular has a cachet that allows it to punch above its weight at times. Of the nine, only the $21 million Google spent on lobbying beat Amazon’s total. Since 2012, Amazon has ramped up spending by more than 460 percent—much faster than its rivals.

Surfacing Amazon Partners Is a Little Easier

Amazon appears to be baby steps to make its partner network more visible. For some reasons, Amazon partners were not too eager to talk about their activities with the online bookstore. “Amazon Debuts AWS Digital CS Competency” includes a partial list of partners; for example, this list, edited for clarity:

Content Management: Acquia, Brightspot, Censhare, Cloudinary, Contentful, Crownpeak, Pagely, Solodev, WP Engine

Marketing Automation: Braze, HubSpot, Localytics, MoEngage, SendGrid, Sigstr, Vidyard

Digital Commerce: Magento, Skava

Customer 360: Adverity, Amplitude, Chartio, Content Square, InsideView, Looker, Manthan, Segment, Tealium, Tickr, Upshot.AI

Consulting Partners: Bulletproof, CloudHesive, G-AsiaPacific, Infosys, Megazone, Metal Toad, Mobiquity, Silver Lining, Vector IT Group.

Complete? No.

AWS Fees: Lyft Version

We noted this fact in CNBC’s headline: “Lyft Plans to Spend $300 Million on Amazon Web Services through 2021.” What’s this buy? The report included this quote from an Amazon professional:

Lyft “is leveraging the breadth and depth of AWS’s services, including database, serverless, machine learning, and analytics, to automate and enhance on-demand, multimodal transportation for riders and drive innovation in its autonomous vehicles business.”

DarkCyber understands that Uber also uses AWS.

AWS Fees: Controlling Costs

AWS makes cloud services easy. That is the viewpoint of some. However, there are nooks and crannies in which services hide or cower. Some of these are overlooked but continue to generate billing. “How to Reduce the Cost of Your Amazon EC2 Service” explains that one has to manage Amazon. The write up explains that significant charges can be accrued from EBS volumes, Elastic IP Addresses, and Snapshots. Who’s on top of these stealthy costs? A Microsoft MVP.

Comparing Cloud Services

Consultants charge big bucks for comparisons with some facts about cloud services. “Comparing Serverless Architecture Providers: AWS, Azure, Google, IBM, and Other FaaS Vendors” offers some information on an ad supported Web site featuring an ad for Microsoft Azure. The comparison is more of a two or three sentence statement of what each vendor asserts. There is a pricing comparison of FaaS offerings, but these may not fit most use cases.

image

Helpful? Somewhat. Readable? Nope.

N2WS does offer some cost optimization tools. More information appears in “N2WS Expands Cost Optimization for Amazon Web Services with Amazon EC2 Resource Scheduling.”

Penetration Testing Amazon Gets Easier

Is Amazon confident, or is Amazon quietly hoping its security gaps will be discovered and reported more quickly? We learned in “Amazon Web Services Will No Longer Require Security Pros Running Penetration Tests on Their Cloud-Based Apps to Get Permission First.” As cloud services like Amazon and Azure gather more customers, their systems are likely to become increasingly attractive targets.

Amazon Emits Pollution

Not a surprise. CNBC reported “Jeff Bezos Is Finally Ending Secrecy over Amazon’s Role in Carbon Emissions.” DarkCyber noted this statement from the article:

Amazon recently announced its Shipment Zero goal under which the company aims to have 50 percent of all deliveries reach net zero carbon emissions by 2030.

Amazon has been less forthcoming than some other big shippers, according to the write up.

Ignored News? Bezos Considered Buying AMI

DarkCyber is not sure if this is accurate, but capturing the headline and the link seems appropriate. The story “Jeff Bezos Considered Buying the National Enquirer’s Parent Company After Photo Leak” appeared in Town and Country Magazine. Interesting.

Stephen E Arnold, March 11, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta