Open Source Software: A Digital Snail Darter

November 26, 2019

Years ago I worked on a project. The focus was the snail darter, a little fish. A commercial initiative intruded on the habitat of the creature. The bureaucratic process chugged forward. I lost track of the snail darter. Probably there are a few of the creatures around, but their future was impinged upon by the need and desire to covert streams and “undeveloped” land into a wonderland of EPA compliant effluent, asphalt, and industrial facilities.

snail darter

Wikipedia’s image shows a paper clip next to a snail darter. This reminds me of my mobile phone next to an Amazon data center.

I thought about the snail darter when I read “Dining Preferences of the Cloud and Open Source: Who Eats Who?” Not surprisingly the write up does not mention the snail darter or its obstruction of “progress”. But the article describes how open source has found its digital manifestations threatened by large commercial firms.

There is a description of Amazon’s method which has disrupted to some degree the happiness of Elastic (developers and maintainers of Elasticsearch) and MongoDB (a DBaaS service). No, I don’t know what DBaaS is. It may be a way to make community supported software tough in a cloud eat cloud datasphere.

We noted this passage:

Most of the current debate focuses on Amazon and a few open source companies they have startled, like gazelles on the savannah, specifically Elastic and MongoDB. All while chronically prefacing their messaging with “customers tell us…”, AWS is offering its own services that are built on (Elastic) or are compatible with (MongoDB) popular open source projects, thereby competing with the relatively successful commercial open source companies associated with those projects. In the case of Elastic, AWS has generously created a new open source distribution of the features that Elastic had held back as proprietary software. The prey have responded with both pluckily defiant blog posts and a frenzy of license engineering to impede AWS’ ability to use their ostensibly open source software. Others, like Cockroach Labs and Redis Labs, have followed with their own new licenses. This has renewed an existential and philosophical debate about open source: is it about free speech or does it also include the right to a free moat for key project contributors? In the end, the high priests of open source do not seem to be endorsing the “open except for people who compete with us” approach.

The main point is that the business model is in place, working, and becoming more important to many developers and organizations.

But Amazon is not unique. Google and Microsoft are following the lead of AWS. Sheep do not appear to be at risk when they tag along, content to generate revenue by playing the me-too game.

The write up concludes on an upbeat; specifically:

Open source is here to stay as a development model. It is hard to imagine any kind of infrastructure or developer software that isn’t open source. But there is work to do on the accompanying business strategy. The next great open source endeavor may be to make multi-cloud a reality, at least for key workloads. But the new associated business models will have to embrace services as the primary delivery model and make a serious commitment to a level of integration that is the hallmark of cloud services.

Net net: There are still some snail darters.

Stephen E Arnold, November 26, 2019

The Cost of Indifference and the Value of Data Governance

November 23, 2019

The DarkCyber team suggests a peek at “Unsecured Server Exposes 4 Billion Records, 1.2 Billion People.” The write up states:

The data itself comes from the data aggregator and enrichment companies People Data Labs (PDL) and OxyData.Io and contains basic personal information, such as names, home and mobile phone numbers and email addresses and what may be information scraped from LinkedIn, Facebook and other social media sources.

The write up points out that the data losses included:

  • Over 1.5 billion unique people, including close to 260 million in the U.S.
  • Over 1 billion personal email addresses. Work email for 70%+ decision makers in the US, UK, and Canada.
  • Over 420 million LinkedIn URLs.
  • Over 1 billion Facebook URLs and IDs.
  • 400 million plus phone numbers with more than 200 million U.S.-based valid cell phone numbers.

The hosting provider may have been Amazon AWS. The software system was Elasticsearch. The individuals were those who set up the system.

Without reploughing a somewhat rocky field, one might suggest that default settings for cloud services, software, and passwords need a rethink. One might want to think about the staff assigned to the job of setting up the system. One might want to think about the sources of the information the company named in the article tapped. In short, one could think about quite a few points of failure.

Another approach might be to raise the question of responsibility. I suppose this is a type of governance, a term which refers to figuring out what’s to be done and how to complete tasks without creating this all-too-common situation of whizzy systems’ functioning as convenience stores for those who want data.

A few observations:

First, the individuals involved in setting up this system were not, it seems, managed particularly well. That’s a problem when managers don’t know what to stipulate their contractors and employees must do to secure online services. These “individuals” work at different organizations. Thus, coordination and checks are difficult. But the alternative? Loss of data.

Second, the developers of the software understand the security implications of certain user actions. The fix is to purchase additional security. Security is not baked in. Security is an option. That approach may generate revenue, but the quest for revenue seems to have a downside. Loss of data.

Third, the operators of the cloud system continue to follow the “just a platform” approach to business. The idea is that the functionality of a cloud system makes it easy to deploy an application. In a hurry? No problem. Use the basics. Want something special? That takes time, and when done in a careless or partial way, loss of data.

It seems that “loss of data” may be preventable but loss of data is part of the standard operating procedure in the present managerial environment.

How does the problem become lessened? Governance. Will companies and individuals step up and go through the difficult task of figuring out what and how before losing data?

Unlikely. Painful lessons like the one revealed in the source article slip like rain water off the windshield of a car speeding down the information superhighway.

Dangerous? Sure. Will drivers slow down? Nope. The explanation after an accident was, “I don’t know. Car just skidded.” There’s insurance for automobile accidents. For cloud data wrecks, no consequences of a meaningful nature. Just blog posts. These are effective?

I will be talking about how the tendrils of the Dark Web and security lapses may create a greater interest in data governance. Exciting? Only if you were one of the billion or so whose personally identifiable information was put online in a less than secure way. I will be at the DG Vision Conference in Washington, DC, early in December 2019.

Stephen E Arnold, November 23, 2019

Amazon Rolls Out an Online Data Market

November 21, 2019

Here is some interesting news from Amazon Web Services. Inside Big Data reports, “Introducing AWS Data Exchange.” Third-party data has become integral to the processes of research, analytics, and machine-learning models for businesses and academic institutions, but the process of tapping into that data has been cumbersome and time-consuming. Organizations have had to establish and manage relationships with disparate data providers, and those providers have had to invest fortunes in marketing and technology to reach and serve customers. The AWS Data Exchange brings all these processes together on Amazon’s cloud platform. This will bring welcome simplicity to data providers and consumers alike while positioning AWS as an indispensable resource.

Oracle has a data marketplace too.

Through the AWS Marketplace, customers will be able to subscribe to popular data providers including Reuters (news data), Change Healthcare (healthcare transactions and claims), Dun & Bradstreet (global business records), Foursquare (location data), TruFactor (anonymized consumer data), and Pitney Bowes (demographics). Clearly, these data vendors represent a diverse assortment of data types to meet a wide range of needs. The API also integrates into certain third-party analytics platforms, like Databricks and Deloitte’s ConvergeHEALTH Miner. See the write-up for more on each of these resources. We also learn:

“Prior to subscribing to a data product, customers can review the price and terms of use that providers make publicly available. Once subscribed, customers can use the AWS Data Exchange API or console to ingest data they subscribe to directly into Amazon Simple Storage Service (Amazon S3) to use across the broadest and deepest portfolio of cloud services in AWS. Each time a provider publishes a new revision of their data, AWS Data Exchange notifies all subscribers via an Amazon CloudWatch Event, allowing them to automatically consume new revisions in their data lakes, applications, analytics, and machine-learning models running on AWS. Data subscription costs are consolidated in customers’ existing AWS invoice. Additionally, customers can ask their data providers to deliver their existing subscriptions to them using AWS Data Exchange at no cost. This enables customers to use AWS Data Exchange to consume all their third-party data in the AWS cloud using a single API. AWS Data Exchange also makes it easy for qualified data providers to securely package, license, and deliver data products to millions of AWS customers worldwide. AWS knows that customers care deeply about privacy and data security. AWS Data Exchange prohibits sharing sensitive personal data (e.g. personal health information) as well as any personal data that is not already lawfully and publicly available.”

The exchange also lets data providers publish their data on their terms, including private offers and custom terms for certain customers. They have the ability to review use cases and manage compliance needs, and will receive daily, weekly, and monthly reports on subscription activity. Perhaps most welcome to some, AWS will manage billing, collection, and secure data delivery. This development will make a big difference for many organizations; Amazon must be pretty pleased with itself.

Cynthia Murrell, November 21, 2019

Microsoft Search: Still Playing an Old Eight Track Cassette?

November 20, 2019

How many times has DarkCyber heard about Microsoft’s improved search? Once, twice? Nope, dozens upon dozens. Whether it was the yip yap about Fast Search & Transfer, Colloquis and its natural language processing, Powerset and its semantic search system, Semantic Machines for natural voice functions, or the home brew solutions from hither and yon in the Microsoft research and development empire. There’s Outlook search and Bing search and probably a version of LinkedIn’s open source search kicking around too.

But that’s irrelevant in today’s “who cares about the past?” datasphere. DarkCyber noted “Here’s How Microsoft Is Looking to Make Search Smarter and More Natural.” What is smart search? An abrogation of user intentions? What is more natural? Boolean logic, field codes, date and time metadata, and similar artifacts of a long lost era seem okay for the DarkCyber team.

The write up explains in its own surrealistic way:

Microsoft’s ultimate goal with Microsoft Search is to provide answers not just to simple queries, but also more personalized, complex ones, such as “Can I bring my pet to work?”. The Microsoft Graph API, semantic knowledge understanding from Bing, machine-reading comprehension and the Office 365 storage and services substrate all are playing a role in bringing this kind of search to Microsoft’s apps.

Yeah, okay. But enterprise SharePoint users still complain that current content cannot be located. The current tools are blind to versions of content residing on departmental servers or parked in a cloud account owned by the legal department. And what about the prices just quoted by an enterprise sales professional? Sorry. You are out of luck, but Microsoft is… trying.

Now grab this peek into the future of Microsoft search:

Turing in Bing already has helped Microsoft to understand semantics via searching by concept instead of keyword. Natural-language processing also has helped with understanding query intent, she noted. Semantic understanding means users don’t have to expect exact word matches. (When searching for Coke, matches with “canned soda,” also could be part of the set of results generated, for example.) The Turing researchers are employing machine reading, as well, to help with contextual search/results.

The chaotic and often misfiring Microsoft search technologies do one thing well: Generate revenue for the legions of certified Microsoft partners.

Users? Yeah, Microsoft may help you too. In the meantime, the lawyers will manage their own contract drafts and eDiscovery materials. The engineers will stick with the tools baked into AutoCAD type systems? The marketers will do what marketers in many companies do? Stuff data on USBs, into the Google cloud, or copy the files to a shared folder on a former employee’s desktop. Yes, it happens.

Microsoft and search. Getting better. Here’s a snippet about Powerset (CNET, 2008)

Much of what Powerset has enabled with its technology is a superior user experience for searching. Powerset’s Wikipedia search, which surfaces concepts, meanings, and relationships (like subject, verbs, and objects in a language), is the very small tip of the iceberg.

Time for a new eight track tape?

Stephen E Arnold, November 20, 2019

The Sharp Toothed MSN Gnaws on the Google Search Carcass

November 18, 2019

Search and retrieval is fraught with challenges. In the enterprise search sector, fraud has been popular as a way to deal with difficulties. In the Web search sector, the methods have been more chimerical.

MSN, a property of Microsoft, published “How Google Interferes With Its Search Algorithms and Changes Your Results.” The write up appears to recycle the work of the Wall Street Journal. The authors allegedly are Kirsten Grind, Sam Schechner, Robert McMillan and John West. It is unlikely that Alphabet Google will invite these people to the firm’s holiday bash this year.

What’s in the write up? The approximately 8,500 word article does the kitchen sink approach to sins. Religious writers boil evil down to seven issues. Google, it seems, requires to words to cover the online advertising firm’s transgressions.

DarkCyber will not engage in the naming of evils. Several observations are warranted:

  1. Google’s waterproof coating has become permeable
  2. After decades, “search experts” are starting to comprehend the intellectual impact of search results which has been shaped
  3. The old-fashioned approach of published editorial policies, details about updating indexes, and user control of queries via Boolean logic is not what fuels the Google method.

But so what? With more than 60 percent of search queries to the Google flowing from mobile devices, old school approaches won’t work. Figuring out what works depends on defining “works”.

Finding information is a big deal. What happens when one tries to hide information? The answers may be observed in the action of Google employees who have forced the company to stop communicating in “all hands” Friday meetings.

What’s Microsoft doing? For one thing, poking Googzilla in the eye with MSN articles is one example of Microsoft’s tactical approach. The other is to ignore problematic Windows 10 updates and “ignite” people to embrace a hybrid cloud paradigm.

And what about Microsoft’s own search technologies. One pundit apologist continues to explain that Microsoft search is just getting more efficient, not better.

Net net: Google and Microsoft may have more in common than some individuals realize. Maybe envy? Maybe techno-attraction? Maybe two black holes circling? Whatever. The situation is interesting.

Stephen E Arnold, November 18, 2019

Remounting the Pegasus Named NSO

November 15, 2019

Those who care about security will want to check out the article, “Pegasus Spyware: All You Need to Know” from the Deccan Herald. Approximately 1,400 smartphones belonging to activists, lawyers, and journalists across four continents suffered cyber attacks that exploited a WhatsApp vulnerability, according to a statement from that company. They say the attacks used the Pegasus software made by (in)famous spyware maker NSO Group. Though the Israeli spyware firm insists only licensed government intelligence and law enforcement agencies use their products, WhatsApp remains unconvinced; the messaging platform is now suing NSO over this.

The article gives a little history on Pegasus and the investigation Citizen Lab and Lookout Security undertook in 2016. We learn the spyware takes two approaches to hacking into a device. The first relies on a familiar technique: phishing. The second, and much scarier, was not a practical threat until now. Writer David Binod Shrestha reports:

“The zero-click vector is far more insidious as it does not require the target user to click or open a link. Until the WhatsApp case, no example of this was seen in real-world usage. Zero-click vectors generally function via push messages that automatically load links within the SMS. Since a lot of recent phones can disable or block push messages, a workaround has evidently been developed. WhatsApp, in its official statement, revealed that a vulnerability in their voice call function was exploited, which allowed for ‘remote code execution via specially crafted series of packets sent to a target phone number.’ Basically, the phones were infected via an incoming call, which even when ignored, would install Pegasus on the device. The data packets containing the spyware code were carried via the internet connection and a small backdoor for its installation was immediately opened when the phone rang. The call would then be deleted from the log, removing any visible trace of infection. The only way you will know if your phone has been infected in the recent attacks is once WhatsApp notifies you via a message on the platform.”

Pegasus itself targets iPhones, but Android users are not immune; a version Google has called Chrysaor focuses on Android. Both versions immediately compromise nearly all the phone’s data (like personal data and passwords) and give hackers access to the mike and camera, live GPS location, keystroke logging, and phone calls. According to the Financial Times, the latest version of Pegasus can also access cloud-based accounts and bypass two-factor authentication. Perhaps most unnerving is the fact that all this activity is undetectable by the user. See the article for details on the spyware’s self-destruct mechanism.

Shrestha shares a list of suggestions for avoiding a Pegasus attack. They are oft-prescribed precautions, but they bear repeating:

“*Never open links or download or open files sent from an unknown source

*Switch off push SMS messages in your device settings

*If you own an iPhone, do not jailbreak it yourself to get around restrictions

*Always install software updates and patches on time

*Turn off Wi-Fi, Bluetooth and locations services when not in use

*Encrypt any sensitive data located on your phone

*Periodically back up your files to a physical storage

*Do not blindly approve app permission requests”

For those who do fall victim to Pegasus, Citizen Lab suggests these remedies—they should delink their cloud accounts, replace their device altogether, change all their passwords, and take security more seriously on the new device. Ouch! Best avoid the attacks altogether.

Cynthia Murrell, November 15, 2019

Parsing Document: A Shift to Small Data

November 14, 2019

DarkCyber spotted “Eigen Nabs $37M to Help Banks and Others Parse Huge Documents Using Natural Language and Small Data.” The folks chasing the enterprise search pot of gold may need to pay attention to figuring out specific problems. Eigen uses search technology to identify the important items in long documents. The idea is “small data.”

The write up reports:

The basic idea behind Eigen is that it focuses what co-founder and CEO Lewis Liu describes as “small data”. The company has devised a way to “teach” an AI to read a specific kind of document — say, a loan contract — by looking at a couple of examples and training on these. The whole process is relatively easy to do for a non-technical person: you figure out what you want to look for and analyze, find the examples using basic search in two or three documents, and create the template which can then be used across hundreds or thousands of the same kind of documents (in this case, a loan contract).

Interesting, but the approach seems similar to identify several passages in a text and submitting these to a search engine. This used to be called “more like this.” But today? Small data.

With the cloud coming back on premises and big data becoming user identified small data, what’s next? Boolean queries?

DarkCyber hopes so.

Stephen E Arnold, November 14, 2019

Google: Chronicle Is Not a Sci Fi Disaster Film. It Just Seems Like It

November 12, 2019

“Google’s Cybersecurity Project ‘Chronicle’ Imploding” may not be true. If the information in the Economic Times is accurate, Google has created another business school case study about Silicon management methods, what DarkCyber describes with this acronym HSSCMM (high school science club management methods).

In 2018 Alphabet, the rejiggered “owner” of Google was created to be what the write called “an independent start up.”

Yeah, that sounds good.

The goal of Chronicle was modest: “Revolutionize cybersecurity.”

Yeah, that sounds even better.

Engadget reported in June 2019:

The cybersecurity company launched in January 2018, and it released its first commercial product, Backstory, in March. In a blog post, Chronicle CEO and co-founder Stephen Gillett said Google Cloud’s cybersecurity tools and Chronicle’s Backstory and VirusTotal are complementary and will be leveraged together.

The Economic times’ write up states:

Google’s cybersecurity project named “Chronicle” is imploding in trouble and some employees feel its management “abandoned and betrayed” the original vision, media reports said.

Staff, including the CEO, have looked for green pastures elsewhere. Chronicle was moved back to the Google mother ship. Salaries were a sore point. It seems Chronicle employees were paid less than other “real” Googlers.

Let’s assume that the information is maybe, sort of accurate. In this non sci-fi thought space, here are some observations:

  1. Thinking, assembling, announcing, and doing can be enhanced with management. No management, problems. Google seems beset with some non-linear challenges.
  2. The life span of this Google activity seems brief: January 2018 to November 2019. Is the time between launch and problems becoming more abbreviated?
  3. Google’s moon shot factory may be veering more and more into a boundary world: Big ideas fail due to the humans working on creating a reality.

To sum up: Chronicle may be another marker on the management superhighway. On the other hand, the Chronicle issue is real.

We’re back to Jorge Luis Borges, the Argentinean writer, who observed:

Reality is not always probable, or likely.

My high school science club was unreal but real as well. Click here for the theme song to Chronicle. Sorry, I meant Twilight Zone.

Stephen E Arnold, November 11, 2019

The UAE and AI: What Will Students Learn?

November 7, 2019

DarkCyber noted “Abu Dhabi AI University Is Key to UAE’s Future As the Oil Dries Up.” The write up states:

The Gulf state is developing healthcare, financial services, renewable energy and materials technology sectors, which will make up the UAE economy when the oil runs out. But first, it needs to ensure its citizens have the skills to drive them. The long-term nature of the UAE government’s initiative is what stands out for Oxford University professor Michael Brady, who is interim president of Abu Dhabi’s Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), which was set up to ensure the UAE has the right skills to drive these industries. The Masdar City-based university has just opened to applications for its first intake of 50 students.

Amazon, Google, and Microsoft, among others, have a presence in UAE. The article quoted Professor Brady as saying:

But it was the ambition that he saw when he visited Abu Dhabi, which puts UK government planning to shame, that cemented his interest “There is a stark difference between the short-termism that characterizes so much of government policy in the UK, where politicians worry about the headlines tomorrow morning,” he said. “It is so refreshing to be part of a government-led initiative that has a 30-year vision to transform the economy and the culture.”

The AI university is important. The question the write up did not address is:

What cloud AI service will be the core of the curriculum?

It seems obvious that the go-to cloud system for students will have an advantage in deploying next-generation solutions.

Worth monitoring which of these three cloud aspirants will capture the hearts and minds of the student, UAE officials, and investors who want to cash in on this investment in the future.

Stephen E Arnold, November 7, 2019

Microsoft Displays Its Amazon AWS Neutralizer

November 5, 2019

I read about Microsoft’s victory over the evil neighbor Amazon. What was Microsoft’s trump card, its AWS neutralizer, its technology innovation?

The answer may have appeared in “Microsoft Unveils Azure Arc, Aiming to Fend Off Google and Amazon with New Hybrid Cloud Tech.” Here’s the once closely-held diagram.

image

Like most AWS-hostile diagrams, it includes three features which customers like the Pentagon and other entities desire:

  1. The ability to integrate multiple clouds, on premises computers, and edge computers into one homogeneous system. (Latency? Don’t bring that up, please.)
  2. The Azure stack in one’s own computer center where it can be managed by an Azure-certified staff with the assistance of Azure-certified Microsoft partners. (Headcount implications. Don’t bring that up, please.)
  3. An Azure administrative system which provides a bird’s-eye view of the client’s Azure-centric system. (Permissions and access controls. Don’t bring that up, please.)

Microsoft has rolled out a comprehensive vision. The challenge is that Amazon and Google have similar visions.

Microsoft may want to check out Amazon’s security and access control technology. But that’s a minor point for a company which struggles to update Windows 10 without disabling user’s computers.

Great diagram though. Someone once observed, “The map is not the territory.” And then there is the increasingly relevant Argentinean writer Jorge Luis Borges who wrote:

Nothing is built on stone; All is built on sand, but we must build as if the sand were stone.

Borjes was a surrealist who could see societal trends despite his blindness.

Stephen E Arnold, November 4, 2019

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta