Google Does Waymo Than Online Advertising

October 22, 2021

If Google Waymo smart vehicles are confused, are other components of Google’s smart software system off track as well? That’s a good question, and it is one that those fond of snorkeling may have to do a deep dive to answer.

Confused Waymo Robotaxis Keep Flooding Dead-End Street in San Francisco” reports:

Residents of an otherwise quiet neighborhood in San Francisco have been dealing lately with a very weird affliction: the constant buzzing of several Waymo vehicles crowding a dead-end street. The self-driving taxis are flooding the end of 15th Avenue, appearing rather “confused” as they enter the area ….

San Francisco is an interesting city in which to drive. I am easily confused and when I commuted from Berkeley to San Mateo in Plastic Fantastic County, I would end up in some fascinating places. The Cow Palace parking lot was memorable after a bit of congestion on the 101 forced people like me to seek an option.

The write up points out:

What we know for sure is that Waymo has been trialing its autonomous vehicles in San Francisco since 2008. But as we’ve seen other instances of Alphabet’s robotaxis freaking out, the situation begs the question, what’s going on?

Yep, beta testing, trying to minimize crashing into things, and giving those safety drivers something to enter into their Waymo app.

How long has the Google been wrestling with smart software for smart vehicles? Not long enough maybe?

Stephen E Arnold, October 22, 2021

Auditing Algorithms: A Semi-Tough Task

October 22, 2021

Many years ago, I did a project for a large outfit. The goal was to look at a project and figure out why it was a flop. I assembled an okay team, which beavered away. The end result was that a number of small things went wrong. Each added some friction to what on the surface seemed a doable project. The “small things” added friction and the process went nowhere.

I thought about this after I read “Twitter’s Own Research Show That It’s a Megaphone for the Right. But It’s Complicated.

I circled this statement from the article:

We can see that it is happening. We are not entirely sure why it is happening. To be clear, some of it could be user-driven, people’s actions on the platform, we are not sure what it is.

Now back to failure. Humans expect a specific construct to work in a certain way. When it doesn’t humans either embrace root cause analysis or just shrug their shoulders and move on.

Several questions:

  • If those closest to a numerical recipe are not sure what’s causing the unexpected outcome, how will third party algorithm auditors figure out what is happening?
  • Engineering failures like using a material which cannot tolerate a particular amount of stress are relatively easy to figure out. Social media “smart” algorithms may be a more difficult challenge. What tools are available to deal with this engineering failure analysis? Do they work or are they too unabled to look at a result and pinpoint one or more points of inappropriate performance?
  • When humans and social media interact with complex algorithmic systems, do researchers have the meta-models capable of identifying the cause of failures or performance factors resulting from tiny operations in the collective system?

My hunch is that something new exists to be studied. Was Timnit Gebru, the former Google engineer, on the right track?

Stephen E Arnold, October 22, 2021

What Can Slow Down the GOOG? Lawyers Reviewing AI Research Papers

October 21, 2021

I spotted an allegedly true factoid in “Google’s AI Researchers Say Their Output Is Being Slowed by Lawyers after a String of High Level Exits : Getting Published Really Is a Nightmare Right Now.” Here is the paywalled item:

According to Google’s own online records, the company published 925 pieces of AI research in 2019, and a further 962 in 2020. But the company looks to have experienced a moderate slowdown this
year, publishing just 618 research papers in all of 2021 thus far. 

Quite a decrease, particularly in the rarified atmosphere of the smartest people in the world who want to be in a position to train, test, deploy, and benefit from their smart software.

With management and legal cooks in the Google AI kitchen, the production of AI delicacies seems to be going down. Bad for careers? Good for lawyers? Yes and yes.

Is this a surprise? It depends on whom one asks.

At a time when there is chatter that VCs want to pump money into smart software and when some high profile individuals suggest China is the leader in artificial intelligence, the Google downturn in this facet of research is not good news for the GOOG.

Is there a fix? Sure, but none is going to include turning back the hands of time to undo what I call the Battle of Timnit. The decision to try and swizzle around the issue of baked in algorithmic bias appears to have blocked some Google researchers’ snorkels. Deep dives without free flowing research oxygen can be debilitating.

Stephen E Arnold, October 21, 2021

Is Self Driving Ready for Regular Humanoids?

October 21, 2021

Teslas are popular and expensive cars. Tesla owners love their cars with the same fervor as Prius owners, except with a more elitist attitude. Fears of Elon Musk acting as Big Brother have been placated, but Electrek shares how that subject comes into question again in: “Tesla Will Make Sure You Are A Good Driver Before Giving You Access To Fill Self-Driving Beta.”

Musk said that his company will use telemetry data to guarantee its customers are “good” drivers before giving them access to the cars’ self-driving option. That is a problematic approach, because Tesla owners already paid for the software. In September, Tesla released an update to its Full Self-Driving Beta v10 software. The software realizes the dream of cars self-driving, however, drivers are still required to pay attention at all times and keep their hands on the steering wheel.

Teslas still have bugs with its self-driving feature, but Musk promises the upgrade is “mind-blowing.” But Musk only wants “good” drivers with a high safety record to use the beta. Since Teslas are linked to a “hive mind,” Musk has access to their driving data. The “good” driving requirement is a way for Musk to prevent accidents and deaths, but it begs the question if it is legal.

Insurance companies already monitor their customers with safe driving applications to receive discounts. Law enforcement also install breathalyzers in cars to prevent drunk driving. Limiting the self-driving beta is in the same vein, but the pros and cons must be investigated.

It also brings Musk’s intentions into question. Will he take responsibility if a Tesla terminates an annoying humanoid?

Whitney Grace, October 21, 2021

China, Smart Software, and Different Opinions

October 21, 2021

I spotted “China Isn’t the AI Juggernaut the West Fears.” The main idea for the story is that China has cornered smart software applications and innovation. Therefore, the future — at least some of it — is firmly in the grip of the Chinese Communist Party.

My hunch is that this article in the Japan Times is a response to articles like “Former Senior Pentagon Official Says China is Kicking Our Ass in Artificial Intelligence.” Nicolas Chaillan, a former Pentagon official, suggested that China is making significant progress in AI. If China continues on its present path, that country may surpass the US and its allies in smart software.

What’s interesting is that quite different viewpoints are zooming around the interwebs.

The Japan Times’ take which channels Bloomberg includes this statement:

On paper, the U.S. and China appear neck and neck in artificial intelligence. China leads in the share of journal citations — helped by the fact that it also publishes more — while the U.S. is far ahead in the more qualitative metric of cited conference papers, according to a recent report compiled by Stanford University. So while the world’s most populous country is an AI superpower, investors and China watchers shouldn’t put too much stock in the notion that its position is unassailable or that the U.S. is weaker. By miscalculating the others’ abilities, both superpowers risk overestimating their adversary’s strengths and overcompensating in a way that could lead to a Cold War-style AI arms race.

Yep, citation analysis.

I don’t have a dog in this fight. I want to point out that citation analysis, like patent documents, may not tell a comprehensive story.

I would suggest that citation analysis may be distorted by the search engine optimization techniques used by some academics and government-funded researchers. In addition, the publication flow from what I call AI cabals — loose federations of like minded researchers who cross cite one another — provide a fun house mirror opportunity.

That is, what’s reflected is a version of reality, not the reality that a person like myself would perceive without the mirrors.

Net net: The Japan Times’ write up may be off the mark. As a result, the view point of Nicolas Chaillan may warrant serious consideration.

Stephen E Arnold, October 21, 2021

Facebook and Synthetic Data

October 13, 2021

What’s Facebook thinking about its data future?

A partial answer may be that the company is doing some contingency planning. When regulators figure out how to trim Facebook’s data hoovering, the company may have less primary data to mine, refine, and leverage.

The solution?

Synthetic data. The jargon means annotated data that computer simulations output. Run the model. Fiddle with the thresholds. Get good enough data.

How does one get a signal about Facebook’s interest in synthetic data?

Facebook, according to Venture Beat, the responsible social media company acquired AI.Reverie.

Was this a straight forward deal? Sure, just via a Facebook entity called Dolores Acquisition Sub, Inc. If this sounds familiar, the social media leader may have taken its name from a motion picture called “Westworld.”

The write up states:

AI.Reverie — which competed with startups like Tonic, Delphix, Mostly AI, Hazy, Gretel.ai, and Cvedia, among others — has a long history of military and defense contracts. In 2019, the company announced a strategic alliance with Booz Allen Hamilton with the introduction of Modzy at Nvidia’s GTC DC conference. Through Modzy — a platform for managing and deploying AI models — AI.Reverie launched a weapons detection model that ostensibly could spot ammunition, explosives, artillery, firearms, missiles, and blades from “multiple perspectives.”

Booz, Allen may be kicking its weaker partners. Perhaps the wizards at the consulting firm should have purchased AI.Reverie. But Facebook aced out the century old other people’s business outfit. (Note: I used to labor in the BAH vineyards, and I feel sorry for the individuals who were not enthusiastic about acquiring AI.Reverie. Where did that bonus go?)

Several observations are warranted:

  1. Synthetic data is the ideal dating partner for Snorkel-type machine learning systems
  2. Some researchers believe that real data is better than synthetic data, but that is a fight like spats between those who love Windows and those who love Mac OSX
  3. The uptake of “good” enough data for smart statistical systems which aim for 60 percent or better “accuracy” appears to be a mini trend.

Worth watching?

Stephen E Arnold, October 13, 2021

Stanford Google AI Bond?

October 12, 2021

I read “Peter Norvig: Today’s Most Pressing Questions in AI Are Human-Centered.” It appears, based on the interview, that Mr. Norvig will work at Stanford’s Institute for Human Centered AI.

Here’s the quote I found interesting:

Now that we have a great set of algorithms and tools, the more pressing questions are human-centered: Exactly what do you want to optimize? Whose interests are you serving? Are you being fair to everyone? Is anyone being left out? Is the data you collected inclusive, or is it biased?

These are interesting questions, and ones that I assume Dr. Timnit Gebru will offer answers.

Will Stanford’s approach to artificial intelligence advance its agenda and address such issues as bias in the Snorkel-type of approach to machine learning? Will Stanford and Google expand their efforts to provide the solutions which Mr. Norvig describes in this way?

You don’t get credit for choosing an especially clever or mathematically sophisticated model, you get credit for solving problems for your users.

Like ads, maybe? Like personnel problems? Like augmenting certain topics for teens? Maybe?

Stephen E Arnold, October 12, 2021

AI: The Answer to Cyberthreats Existing Systems Cannot Perceive?

October 12, 2021

This article from The Next Web gives us reason to hope: “Computer Vision Can Help Spot Cyber Threats with Startling Accuracy.” Researchers at the University of Portsmouth and the University of Peloponnese have combined machine learning with binary visualization to identify malware and phishing websites. Both processes involve patterns of color.

Traditional methods of detecting malware involve searching files for known malicious signatures or looking for suspicious behavior during runtime, both of which have their flaws. More recently, several machine learning techniques have been tried but have run into their own problems. Writer Ben Dickson describes these researchers’ approach:

“Binary visualization can redefine malware detection by turning it into a computer vision problem. In this methodology, files are run through algorithms that transform binary and ASCII values to color codes. … When benign and malicious files were visualized using this method, new patterns emerge that separate malicious and safe files. These differences would have gone unnoticed using classic malware detection methods. According to the paper, ‘Malicious files have a tendency for often including ASCII characters of various categories, presenting a colorful image, while benign files have a cleaner picture and distribution of values.’”

See the article for an illustration of this striking difference. The team trained their neural network to recognize these disparities. It became especially good at spotting malware in .doc and .pdf files, both of which are preferred vectors for ransomware attacks.

A phishing attack succeeds when a user is tricked into visiting a malicious website that poses as a legitimate service. Companies have used website blacklists and whitelists to combat the problem. However, blacklists can only be updated once someone has fallen victim to a particular site and whitelists restrict productivity and are time-consuming to maintain. Then there is heuristics, an approach that is more accurate than blacklists but still misses many malicious sites. Here is how the binary visualization – machine learning approach may save the day:

“The technique uses binary visualization libraries to transform website markup and source code into color values. As is the case with benign and malign application files, when visualizing websites, unique patterns emerge that separate safe and malicious websites. The researchers write, ‘The legitimate site has a more detailed RGB value because it would be constructed from additional characters sourced from licenses, hyperlinks, and detailed data entry forms. Whereas the phishing counterpart would generally contain a single or no CSS reference, multiple images rather than forms and a single login form with no security scripts. This would create a smaller data input string when scraped.’”

Again, the write-up shares an illustration of this difference—it would make for a lovely piece of abstract art. The researchers were able to train their neural network to identify phishing websites with an impressive 94% accuracy. Navigate to the article for more details on their methods. The papers’ co-author Stavros Shiaeles says the team is getting its technique ready for real-world applications as well as adapting it to detect malware traffic on the growing Internet of Things.

Cynthia Murrell, October 12, 2021

Models, Models Everywhere: Not a Doubt in Sight

October 7, 2021

In 2017, computers became better at generating and understanding human language when Google researchers designed the natural language AI, Transformers. Fast Company explains why natural language is important in the article, “Ex-Googlers Raise $40 Million To Democratize Natural-Language AI.”

Three of the Transformers AI researchers, Nick Frosst, Ivan Zhang, and Aidan Gomez, began their own startup, Cohere, and raised $40 million in funding. They started Cohere to commercialize and further develop the natural language processing AI. The Cohere plan to address biases accidentally programmed into AI when they are taught with bad datasets. These biases are unfavorable to ethnic minorities and women, basically anyone who is not a white man.

Transformers AI models need huge amounts of data in order to be programmed, but only organizations with supercomputers have the necessary high quality natural language models. The Cohere team want to democratize NLP models and make them available to organizations that otherwise would not have the funds for the technology. Cohere wants to guarantee its NLP AI will not contain any biases:

“To address the risks, Cohere’s engineers have implemented quality control tests to look for any issues with the model before release, and the company continues to monitor its models after launch as well. In addition, Gomez says Cohere will publish “data statements,” which will including information about training data, its limitations, and any risks—a concept first popularized by Gebru. Cohere has also established an external Responsibility Council that will help oversee the safe application of the company’s AI. The company declined to share who is part of the council.”

Frosst, Zhang, and Gomez embrace the technological biases in AI, but instead of reacting poorly, like Google did with Timnit Gehru, they admit the mistake and are actively creating a solution. They also made their own company, will probably earn handsome salaries, and help shape future AI.

Whitney Grace, October 6, 2021

Key Words: Useful Things

October 7, 2021

In the middle of nowhere in the American southwest, lunch time conversation turned to surveillance. I mentioned a couple of characteristics of modern smartphones, butjec people put down their sandwiches. I changed the subject. Later, when a wispy LTE signal permitted, I read “Google Is Giving Data to Police Based on Search Keywords, Court Docs Show.” This is an example of information which I don’t think should be made public.

The write up states:

Court documents showed that Google provided the IP addresses of people who searched for the arson victim’s address, which investigators tied to a phone number belonging to Williams. Police then used the phone number records to pinpoint the location of Williams’ device near the arson, according to court documents. 

I want to point out that any string could contain actionable information; to wit:

  • The name or abbreviation of a chemical substance
  • An address of an entity
  • A slang term for a controlled substance
  • A specific geographic area or a latitude and longitude designation on a Google map.

With data federation and cross correlation, some specialized software systems can knit together disparate items of information in a useful manner.

The data and the analytic tools are essential for some government activities. Careless release of such sensitive information has unanticipated downstream consequences. Old fashioned secrecy has some upsides in my opinion.

Stephen E Arnold, October 7, 2021

Next Page »

  • Archives

  • Recent Posts

  • Meta