AWS AI Improves Its Accuracy According to Amazon

January 31, 2020

An interesting bit of jargon creeps into “On Benchmark Data Set, Question-Answering System Halves Error Rate.” That word is “transfer.” Amazon, it seems, is trying to figure out how to reuse data, threshold settings, and workflow outputs.

Think about IBM’s DeepBlue defeat of Gary Kasparov in 1996 or the IBM Watson thing allegedly defeating Ken Jenkins in 2011 without any help from post production or judicious video editing. Two IBM systems and zero “transfer” or more in more Ivory Towerish jargon “transference.”

Humans learn via transfer. Artificial intelligence, despite the marketer assurances, don’t transfer very well. One painful and expensive fact of life which many venture funding outfits ignore is that most AI innovations start from ground zero for each new application of a particular AI technology mash up.

Imagine if DeepBlue were able to transfer its “learnings” to Watson. IBM may have avoided becoming a poster child for inept technology marketing. Watson is now a collection of software modules, but these don’t transfer particularly well. Hand crafting, retraining, testing, tweaking, and tuning are required and then must be reapplied as data drift causes “accuracy” scores to erode like a 1971 Vega.

Amazon suggests that it is making progress on the smart software transference challenge. The write up states:

Language models can be used to compute the probability of any given sequence (even discontinuous sequences) of words, which is useful in natural-language processing. The new language models are all built atop the Transformer neural architecture, which is particularly good at learning long-range dependencies among input data, such as the semantic and syntactic relationships between individual words of a sentence.

DarkCyber has dubbed some of these efforts as Bert and Ernie exercises, but that point of view is DarkCyber’s, not the views of those with skin in the AI game.

Amazon adds:

Our approach uses transfer learning, in which a machine learning model pretrained on one task — here, word sequence prediction — is fine-tuned on another — here, answer selection. Our innovation is to introduce an intermediate step between the pre-training of the source model and its adaptation to new target domains.

Yikes! A type of AI learning. The Amazon approach is named Tanda, not Ernie thankfully. Here’s a picture of how Tanda (transfer and adapt) works:

image

The write up reveals more about how the method functions.

The key part of the write up, in DarkCyber’s opinion, is the “accuracy” data; to wit:

On WikiQA and TREC-QA, our system’s MAP was 92% and 94.3%, respectively, a significant improvement over the previous records of 83.4% and 87.5%. MRR for our system was 93.3% and 97.4%, up from 84.8% and 94%, respectively.

If true, Amazon has now officially left Google, Microsoft, and others working to reduce the costs of training machine learning systems and delivering many wonderful services with a problem.

Most smart systems are fortunate to hit 85 percent accuracy in carefully controlled lab settings. Amazon is nosing into an accuracy range few humans can consistently deliver when indexing, classifying, or identifying if a picture that looks like a dog is actually a dog.

DarkCyber generally doubts data produced by a single research team. That rule holds for these data. Since the author of the report works on Alexa search, maybe Alexa will be able to answer this question, “Will Amazon overturn Microsoft’s JEDI contract award?”

Jargon is one thing. Real world examples are another.

Stephen E Arnold, January 31, 2020

Ontotext: GraphDB Update Arrives

January 31, 2020

Semantic knowledge firm Ontotext has put out an update to its graph database, The Register announces in, “It’s Just Semantics: Bulgarian Software Dev Ontotext Squeezes Out GraphDB 9.1.” Some believe graph databases are The Answer to a persistent issue. The article explains:

“The aim of applying graph database technology to enterprise data is to try to overcome the age-old problem of accessing latent organizational knowledge; something knowledge management software once tried to address. It’s a growing thing: Industry analyst Gartner said in November the application of graph databases will ‘grow at 100 per cent annually over the next few years’. GraphDB is ranked at eighth position on DB-Engines’ list of most popular graph DBMS, where it rubs shoulders with the likes of tech giants such as Microsoft, with its Azure Cosmos DB, and Amazon’s Neptune. ‘GraphDB is very good at text analytics because any natural language is very ambiguous: a project name could be a common English word, for example. But when you understand the context and how entities are connected, you can use these graph models to disambiguate the meaning,’ [GraphDB product manager Vassil] Momtchev said.”

The primary feature of this update is support for the Shapes Constraint Language, or SHACL, which the World Wide Web Consortium recommends for validating data graphs against a set of conditions. This support lets the application validate data against the schema whenever new data is loaded to the database instead of having to manually run queries to check. A second enhancement allows users to track changes in current or past database transactions. Finally, the database now supports network authentication protocol Kerberos, eliminating the need to store passwords on client computers.

Cynthia Murrell, January 31, 2020

Amazon and Open Source: A Wee Bit Sensitive

January 31, 2020

Amazon Web Services (AWS) is one of the nation’s leading cloud computing services and its dominance increases every day. Computer Weekly commented on how AWS might be taking advantage of open source technology in the article, “AWS Hits Back At Open Source Theft Allegations.” Throughout 2019, AWS undermined open source software companies by “stealing” the free version of their software, then hosting it on their cloud computing service.

The actuations were so bad that The New York Times picked up the story and stated that in 2015 AWS integrated Elasticsearch from Elastic into their offerings, now Elastic and AWS are now rivals for customers. MongoDB and Redis have had to alter their open source software and licensed software so their customers know the difference. For example, the free version of MongoDB is integrated into AWS, but the licensed version is not, so it lacks certain features.

AWS responded with:

“In October 2018, Eliot Horowitz, chief technology officer and founder of MongoDB, changed the open source licensing used for MongoDB to reflect the risk of the company’s service revenue being gobbled up by public cloud providers. In response, AWS introduced a MongoDB-compatible service, DocumentDB, in January 2019.”

While open source technology is free, developers behind such offerings usually offer a licensed version with more bells and whistles. These include customer support, free upgrades, patches, and specific features.

AWS is strip mining the open source technology’s source code, then reconfiguring it their services. AWS Vice President of Analytics and ElastiCache states that AWS is only responding to their clients’ demands and their clients want open source software in AWS. He also said that AWS does give back to the open source community:

“AWS contributes mightily to open source projects such as Linux, Java, Kubernetes, Xen, KVM, Chromium, Robot Operating System, Apache Lucene, Redis, s2n, FreeRTOS, AWS Amplify, Apache MXNet, AWS SageMaker NEO, Firecracker, the OpenJDK with Corretto, Elasticsearch, and Open Distro for Elasticsearch. AWS has not copied anybody’s software or services.”

Many of the projects aim to make it easier for developers to build on top of AWS services. SageMaker is its machine learning cloud service; Greengrass extends the AWS cloud to the internet of things (IoT) edge and Firecracker is its kernel virtual machine. However, the s2n project is an open source implementation of the TLS encryption protocol, which AWS made publicly available under the terms of the Apache Software License 2.0.”

While AWS might be a singular provider for multiple services and products, organizations do not want to be locked into one supplier.

Whitney Grace, January 31, 2020

Annual Report for Artificial Intelligence: Nothing Fake Here

January 30, 2020

Ladies and gentlemen, it is a brand new year and time to view 2019 in perspective. If you are interested in artificial intelligence, then you should check out the Human-Centered Artificial Intelligence of Stanford University (HAI)’s yearly artificial intelligence index. The official name is the AI Index 2019 Report, which is a comprehensive roundup of AI studies, learn more in the HAI News post, “Introducing The AI Index 2019 Report.” The AI Index Report is in its third year with the prior two publications establishing it as a prominent information source about AI developments. The latest version has nine chapters spanning a broad range of topics and delivering insights.

The AI Index Report uses an interdisciplinary approach by studying design, analyzing and distilling AI global impact patterns in various industries, research, and public perception. The report is written by the AI Index Steering Committee, consisting of experts across academia and industries with over thirty-five sponsors and data contributors. As with any report there are many wonderful facts to learn about AI:

  • “China now publishes as many AI journals and conference papers per year as Europe, having passed the USA in 2006. The Field-Weighted Citation Impact of USA publications is still about 50% higher than China’s.
  • In the US, the share of AI jobs grew from 0.3% in 2012 to 0.8% of total jobs in 2019. AI labor demand is growing, especially in high-tech services and the manufacturing sector.
  • At the graduate level, AI has rapidly become the most popular specialization among computer science PhD students in North America, with over twice as many students as the second most popular specialization (security/information assurance). In 2018, over 21% of graduating Computer Science PhDs specialized in Artificial Intelligence/Machine Learning.
  • There is a significant increase in AI-related legislation in congressional records, committee reports, and legislative transcripts around the world. “

The AI Index Report is bound to stay around and grow as an important resource for the AI industry, especially if it proves to be free of ads and pushing products. Honest reports about industries are becoming rarer and even academia is polluted by dollar signs.

Whitney Grace, January 30, 2019

Former Amazonian Suggests the Pre Built Models Are Pipe Dreams

January 30, 2020

I read a PR-infused write up with some interesting presumably accurate information. The article is from ZDNet.com (an outfit somewhat removed from Mr. Ziff’s executive dining room.) Its title? “Reality Engines Offers a Deep Learning Tour de Force to Challenge Amazon et al in Enterprise AI”. Here’s a passage which warranted an Amazon orange highlighter circle:

The goal, Reddy told ZDNet, is a service that “automatically creates production-ready models from data in the wild,” to ease the labor of corporations that don’t have massive teams of data scientists and deep learning programmers. “While other companies talk about offering this service, it is still largely a pipe-dream,” wrote Reddy in an email exchange with ZDNet. “We have made significant strides towards this goal,” she said.

Who will care about this assertion? Since the founder of the company is a former top dog of  “AI verticals” at Amazon’s AWS cloud service”, Amazon may care. Amazon asserts that SageMaker and related tools make machine learning easier, faster, better (cheaper may depend on one’s point of view). A positive summary of some of Amazon’s machine learning capabilities appears in “Building Fully Custom Machine Learning Models on AWS SageMaker: A Practical Guide.”

Because the sweeping generalization about “pipe dreams” includes most of the machine learning honchos and honchettes, Facebook, Google, IBM, and others are probably going to pay attention. After all, Reality Engines has achieved “significant strides” with 18 people, some adviser, and money from Google’s former adult, Eric Schmidt, who invested $5.25 million.

The write up provides a glimpse of some of the ingredients in the Reality Engines’ secret sauce:

… The two pillars of the offering are “generative adversarial networks,” known as “GANs,” and “network architecture search.” Those two technologies can dramatically reduce the effort needed to build machine learning for enterprise functions, the company contends. GANs, of course, are famous for making fake faces by optimizing a competition between two neural networks based on the encoding and decoding of real images. In this case, Reality Engines has built something called a “DAGAN,” a GAN that can be used for data augmentation, the practice of making synthetic data sets when not enough data is available to train a neural network in a given domain. DAGANs were pioneered by Antreas Antoniou of the Institute for Adaptive and Neural Computation at the University of Edinburgh in 2018. The Reality Engines team has gone one better: They built a DAGAN by using network architecture search, or “NAS,” in which the computer finds the best architecture for the GAN by trying various combinations of “cells,” basic primitives composed of neural network modules.

For those not able to visualize a GAN and DGAN system, the write up includes an allegedly accurate representation of some of the Reality Engines’ components. The diagram in the write up is for another system, and authored in part by a wizard working at another firm, but let’s assume were are in the ballpark conceptually:

image

It appears that there is a training set. The data are fed to a DenseNet classifier and  a validator. Then the DEGAN generator kicks in, processes data piped from the data sources. What’s interesting is that there are two process blocks (maybe Bayesian at its core with the good old Gaussian stuff mixed in) which “discriminate”. DarkCyber thinks this means that the system tries to reduce its margin of error for metatagging and other operations. The “Real Synthetic” block  may be an error checking component, but the recipe is incomplete.

The approach is a mash up: Reality Engines’ code with software called Bananas,” presumably developed by the company Petuum and possibly experts at the University of Toronto.

How accurate is the system? DarkCyber typically ignores vendor’s assertions about accuracy. You can make up your own mind about this statement:

“The NAS-improved DAGAN improves classification accuracy on the target dataset by as much as 20.5% and can transfer between tasks,” they write.

The “reality” of most machine learning systems is that accuracy of 85 percent is attainable under quite specific conditions: Content from a bounded domain, careful construction of training data, calibration, and on-going retraining when what DarkCyber calls Bayesian drift kicks in. If a system is turned on and just used, accuracy degrades over time. At some point, the outputs are sufficiently wide of the mark that a ZDNet journalist may spot problems.

What does the system output? It seems to DarkCyber that the information in the write up focuses on classifiers. If our interpretation is narrowed to that function, content is dumped into buckets. These buckets make it easy to extract content and perform additional analysis. If each step in a work flow works, the final outs will have a greater likelihood of being “accurate” or “right.” But there are many slips between the cup and the lip as a famous plagiarizer once repeated.

What type of data can the system process? The answer is structured data, presumably cleansed and validated data.

If the Reality Engines’ approach is of interest, the company’s Web site offers a Web site with a “Request Access” button. Click it and you are probably free to test the system or kick its off road tires.

Will bananas and backpropagation be on your machine learning menu in the future?

Stephen E Arnold, January 30, 2020

Ivy Covered Irony: MIT Reports about Harvard

January 30, 2020

DarkCyber has mentioned MIT’s enthusiastic but mostly covert embrace of the late Mr. Epstein’s donations. One of the research team noted this article in the MIT Technology Review: “A Harvard Super Chemist Has Been Arrested Over Lying about Secret China Payments.” The main point of the Epstein-supported MIT Technology Review struck the DarkCyber team as:

According to a charging document written by an FBI agent, Lieber received more than $15 million in US grant funding from the National Institutes of Health and the Department of Defense, among other sources. Researchers are supposed to disclose if they also have foreign funding. But Lieber didn’t do so and then, when confronted, gave “false, fictitious, and fraudulent statements” to the DOD and to the NIH as recently as this month.

Yep, the Epstein-interacting institution is reporting that Harvard engaged in illegal activities.

Several observations:

  • The write up may have more to do with making sure readers of MIT Technology Review know that Harvard University has a bad actor on the payroll
  • Another prestigious institution struggles to provide a reasonable example of ethical behavior
  • An interesting philosophical question can be discussed in a law school class at Suffolk University: “Which is more desirable — Taking money from an accursed human trafficker or selling information to a foreign power?”

DarkCyber is disappointed that two institutions of higher education are teaching by example, just not positive example.

Stephen E Arnold, January 30, 2020

Search Vendor Comparison

January 30, 2020

The Finland-based AddSearch published a comparison of its search and retrieval service with two competitors: Algolia and Swiftype. Each of these is a for-fee solution. The write up appeared in 2019, but DarkCyber wanted to call attention to the article because it does a good job of outlining some of the main characteristics of commercial search solutions. You can locate the article by Anna Pogrebniak in the AddSearch Blog.

Kenny Toth, January 30, 2020

Amazon: Some Trouble Down Under?

January 29, 2020

DarkCyber noted “Case Study: Why the Australian Electoral Commission Migrated to Microsoft Azure.” On the surface, the write up is another PR output. When considered in terms of the competition between Amazon and Microsoft for juicy non commercial jobs, the article provides a check list of what’s lacking in Amazon AWS. DarkCyber identified these “advantages” for the Redmond brain trust which finds questionable methods for altering a user’s Windows 7 machine amusing. (This black screen incident provides a reminder that PR check points may not match a firm’s actual behavior.)

Here are the upsides for Azure, presumably without a black screen on Luddites’ Windows 7 computers:

  • Quick turn around
  • Publicly exposed APIs
  • API management tools
  • API creation tools
  • Real time information feeds
  • Ability to create an “express route” for speedy data communications
  • Zero failure
  • Ability to support self service from users
  • A customer or user service portal
  • A much loved integrator.

What was the deciding factor? The much loved vendor it seems.

Does Amazon match up on these check points? Sure.

Marketing presentations are one thing. The much loved vendor is another.

Stephen E Arnold, January 29, 2020

Amazon Security in the News: AWS Documentation

January 29, 2020

Curious about Amazon’s security features? Navigate to this link and review AWS Security Documentation by Category. In order to make sense of the information, one needs to speak Amazonia; for example, Glacier, Snowball, ECR, ECS, and SQS plus another bulldozer blade of product and service nomenclature. Because an Amazon phone breach allegedly took place, DarkCyber entered the query “mobile” into the AWS Security Documentation search function. Here’s the result:

image

There were 138 pages of results, numbering 1,379 results.

A somewhat cursory review of the information provided zero guidance related to the security issue encountered by Mr. Bezos. Perhaps if he had used an Amazon phone, the documentation would have provided some guidance? Perhaps.

Stephen E Arnold, January 29, 2020

Google Translate: Some Improvements Arrive

January 29, 2020

Google may be struggling with A B testing, but it is improving its translation capabilities.

While Google Translation is more or less accurate, depending on the language, it does have it flaws, especially when it comes to offline translation. SlashGear shares an update on the translation service “Google translate Now Offers Higher Quality Offline Translations.”

Google Translation’s offline services premiered a few years ago, but their quality cannot compare with the online counterpart. The newest update improves the offline translation service by 12% when it comes to grammar and sentence structures. Asians languages have seen improved accuracy with the update.

“Google Translate is best when used with an Internet connection, but there are times the app will prove useful in the absence of WiFi or mobile data. While traveling in a foreign country, for example, someone who doesn’t have mobile data access will find Google Translate’s offline support useful, though the results are often less accurate and polished.”

Google improved its NLP on fifty-nine languages. The old offline translation were understandable, but sounded awkward and like someone learned the language from a textbook. The new update is more grammatically correct and actually makes sense in modern vernacular languages. There’s also new offline transliteration support for ten new languages. Users who translate their own language into one of the new ten languages sees the new original script and the transliteration for accuracy. It helps people who cannot read the language’s writing by putting the sentences in the Latin alphabet.

This is useful for tourists, researchers, polyglots, and students who need to finish their foreign language homework. Now about that mythical A B testing, which is part of the data driven environment for Googlers, right?

Whitney Grace, January 29, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta