Algorithms Are Neutral. Well, Sort of Objective Maybe?

October 12, 2018

I read “Amazon Trained a Sexism-Fighting, Resume-Screening AI with Sexist Hiring data, So the Bot Became Sexist.” The main point is that if the training data are biased, the smart software will be biased.

No kidding.

The write up points out:

There is a “machine learning is hard” angle to this: while the flawed outcomes from the flawed training data was totally predictable, the system’s self-generated discriminatory criteria were surprising and unpredictable. No one told it to downrank resumes containing “women’s” — it arrived at that conclusion on its own, by noticing that this was a word that rarely appeared on the resumes of previous Amazon hires.

Now the company discovering that its smart software became automatically biased was Amazon.

That’s right.

The same Amazon which has invested significant resources in its SageMaker machine learning platform. This is part of the infrastructure which will, Amazon hopes, will propel the US Department of Defense forward for the next five years.

Hold on.

What happens if the system and method produces wonky outputs when a minor dust up is automatically escalated?

Discriminating in hiring is one thing. Fluffing a global matter is a another.

Do the smart software systems from Google, IBM, and Microsoft have similar tendencies? My recollection is that this type of “getting lost” has surfaced before. Maybe those innovators pushing narrowly scoped rule based systems were on to something?

Stephen E Arnold, October 11, 2018

Smart Software: There Are Only a Few Algorithms

September 27, 2018

I love simplicity. The write up “The Algorithms That Are Currently Fueling the Deep Learning Revolution” certainly makes deep learning much simpler. Hey, learn these methods and you too can fire up your laptop and chop Big Data down to size. Put digital data into the digital juicer and extract wisdom.

Ah, simplicity.

The write up explains that there are four algorithms that make deep learning tick. I like this approach because it does not require one to know that “deep learning” means. That’s a plus.

The algorithms are:

  • Back propagation
  • Deep Q Learning
  • Generative adversarial network
  • Long short term memory

Are these algorithms or are these suitcase words?

The view from Harrod’s Creek is that once one looks closely at these phrase one will discover multiple procedures, systems and methods, and math slightly more complex than tapping the calculator on one’s iPhone to get a sum. There is, of course, the issue of data validation, bandwidth, computational resources, and a couple of other no-big-deal things.

Be a deep learning expert. Easy. Just four algorithms.

Stephen E Arnold,  September 27, 2018

IBM Embraces Blockchain for Banking: Is Amazon in the Game Too?

September 9, 2018

IBM recently announced the creation of LedgerConnect, a Blockchain powered banking service. This is an interesting move for a company that previously seemed to waver on whether it wanted to associate with this technology most famous for its links to cryptocurrency. However, the pairing actually makes sense, as we discovered in a recent IT Pro Portal story, “IBM Reveals Support Blockchain App Store.”

According to an IBM official:

“On LedgerConnect financial institutions will be able to access services in areas such as, but not limited to, know your customer processes, sanctions screening, collateral management, derivatives post-trade processing and reconciliation and market data. By hosting these services on a single, enterprise-grade network, organizations can focus on business objectives rather than application development, enabling them to realize operational efficiencies and cost savings across asset classes.”

This, in addition, to recent news that some of the biggest banks on the planet are already using Blockchain for a variety of needs. This includes the story that the Agricultural Bank of China has started issuing large loans using the technology. In fact, out of the 26 publicly owned banks in China, nearly half are using Blockchain. IBM looks pretty conservative when you think of it like that, which is just where IBM likes to be.

Amazon supporst Ethereum, HyperLedger, and a host of other financial functions. For how long? Years.

Patrick Roland, September 9, 2018

Algorithms Can Be Interesting

September 8, 2018

Navigate to “As Germans Seek News, YouTube Delivers Far-Right Tirades” and consider the consequences of information shaping. I have highlighted a handful of statements from the write up to prime your critical thinking pump. Here goes.

I circled this statement in true blue:

…[a Berlin-based digital researcher] scraped YouTube databases for information on every Chemnitz-related video published this year. He found that the platform’s recommendation system consistently directed people toward extremist videos on the riots — then on to far-right videos on other subjects.

I noted:

[The researcher] found that the platform’s recommendation system consistently directed people toward extremist videos on the riots — then on to far-right videos on other subjects.

The write up said:

A YouTube spokeswoman declined to comment on the accusations, saying the recommendation system intended to “give people video suggestions that leave them satisfied.”

The newspaper story revealed:

Zeynep Tufekci, a prominent social media researcher at the University of North Carolina at Chapel Hill, has written that these findings suggest that YouTube could become “one of the most powerful radicalizing instruments of the 21st century.”

With additional exploration, the story asserts a possible mathematical idiosyncrasy:

… The YouTube recommendations bunched them all together, sending users through a vast, closed system composed heavily of misinformation and hate.

You may want to read the original write up and consider the implications of interesting numerical recipes’ behavior.

Smart Software: Just Keep Adding Layers between the Data and the Coder

September 6, 2018

What could be easier? Clicking or coding.

Give up. Clicking wins.

A purist might suggest that training smart software requires an individual with math and data analysis skills. A modern hippy dippy approach is to suggest that pointing and clicking is the way of the future.

Amazon is embracing that approach and other firms are too.

I read “Baidu Launches EZDL, an AI Model Training Platform That Requires No Coding Experience.” Even in China where technical talent is slightly more abundant than in Harrod’s Creek, Kentucky, is on the bandwagon.

I learned:

Baidu this week launched an online tool in beta — EZDL — that makes it easy for virtually anyone to build, design, and deploy artificial intelligence (AI) models without writing a single line of code.

Why slog through courses? Point and click. The future.

There’s not much detail in the write up, but I get the general idea of what’s up from this passage from the write up:

To train a model, EZDL requires 20-100 images, or more than 50 audio files, assigned to each label, and training takes between 15 minutes and an hour. (Baidu claims that more than two-thirds of models get accuracy scores higher than 90 percent.) Generated algorithms can be deployed in the cloud and accessed via an API, or downloaded in the form of a software development kit that supports iOS, Android, and other operating systems.

Oh, oh. The “API” buzzword is in the paragraph, so life is not completely code free.

Baidu, like Amazon, has a bit of the competitive spirit. The write up explains:

Baidu’s made its AI ambitions clear in the two years since it launched Baidu Brain, its eponymous platform for enterprise AI. The company says more than 600,000 developers are currently using Brain 3.0 — the newest version, released in July 2018 — for 110 AI services across 20 industries.

What could go wrong? Nothing, I assume. Absolutely nothing.

Stephen E Arnold, September 6, 2018

Online with Smart Software, Robots, and Obesity

September 1, 2018

I recall a short article called “A Starfish-Killing, Artificially Intelligent Robot Is Set to Patrol the Great Barrier Reef.”The story appeared in 2016. I clipped this item a few days ago: “Centre for Robotic Vision Uses Bots to Cull Starfish.” The idea is that environmental protection becomes easier with killer robots.

Now combine that technology application with “Artificial Intelligence Spots Obesity from Space.” The main idea is that smart software can piece together items of data to figure out who is fat and where fat people live.

What happens if a clever tinkerer hooks together robots which can take action to ensure termination with smart software able to identify a target.

I mention this technology confection because the employees who object to an employer’s technology may be behind the curve. The way technology works is that innovations work a bit like putting Lego blocks together. Separate capabilities can be combined in interesting ways.

Will US employees’ refusal to work on certain projects act like a stuck brake on a rental car?

Worth thinking about before a killer satellite identifies a target and makes an autonomous decision about starfish or other entities. Getting online has interesting applications.

Why search when one can target via algorithms?

Stephen E Arnold, September 1, 2018

A Glimpse of Random

August 30, 2018

I found “The Unreasonable Effectiveness of Quasirandom Sequences” interesting. Random number generators are important to certain cyber analytics systems. The write up puts the spotlight on the R2 method. Without turning a blog post into a math lesson, I want to suggest that you visit the source document and look at how different approaches to random number generation appear when graphed. My point is that the selection of a method and then the decision to seed a method with a particular value can have an impact on how the other numerical recipes behave when random numbers are fed into a process. The outputs of a system in which the user has great confidence may, in fact, be constructs and one way to make sense of data. What’s this mean? Pinpointing algorithmic “bias” is a difficult job. It is often useful to keep in mind that decisions made by a developer or math whiz for what seems like a no brainer process can have a significant impact on outputs.

Stephen E Arnold, August 30, 2018

Mathematical Recipes Revealed: Oh, Oh, Trouble

August 26, 2018

I don’t read the Times Literary Supplement. When I worked in London, I was able to flip through the printed version. In Harrod’s Creek, nope. I did spot a link to an essay with the snappy title “God Is in the Machine.” I took a look.

The write up belongs to the genre of non fiction essays which I call “Yep, that’s all there is.”

The focus is how algorithms work and why some are simple and others are complicated.

Think of the essay as explaining how math works to people who know right off the starting block who Eratosthenes was.

The main point of the first chunk of the write up is that algorithms are recipes, procedures which are implemented one at a time. The input yields an output.

The guts of the argument surface in this passage, attributed to a real algorithm wizard:

The researcher knew, of course, what data he’d fed into the process. He knew why he’d designed it, the problem it was trying to solve and the outputs that it produced. However, after he’d been trying to explain it for over an hour, he sat back in his chair, exhausted. “Yes, as you can see, the gap between input and output is difficult to understand,” he said. He’d flooded the algorithm with a huge amount of information, “a trend”, he said, because in the tech giant he could, and everyone did. But the amount of data meant it was hard to tell what the salient inputs within it were. “From a human perspective you’re not sure which of the inputs is significant; it’s hard to know what is actually driving the outputs. It’s hard to trace back, as a human, to know why a decision was made.”

The complexity emerges when:

  1. Algorithms are stuck together
  2. Data (which may or may not be consistent, accurate, or timely) are stuffed into the numerical recipe as “inputs”
  3. Outputs which may or may not be what the user understands, wants, or can use.

The complexity is manageable if the creator or numerical poets are, what the essay calls, “rigorous.” Is rigor possible in Silicon Valley with professionals who focus on mobile phones, laptops, and lunch options?

Where’s this going?

Not surprisingly, I will have to read a forthcoming book called The Death of the Gods. Like other clarion calls to the use of numerical recipes to do what humans once thought they could do with sufficient education, experience, and judgment, numerical recipes can do—algorithms are the future.

Questions I want toss out when I meet with my research team next week: What if the algorithms are already in charge? Are search results objective? Can you explain why some data are not available from commercial sources? What control do you have over content when ads and “information” are freely mixed?

Perhaps the numerical recipe mechanisms are locked and loaded and firing millions of times a day? What if few hear, know, or understand that the big guns are blazing without sound or a flash? What if people do not care?

Stephen E Arnold, August 26, 2018

Can IBM Watermark Neural Networks?

August 8, 2018

Leave it to IBM to figure out how to put their stamp on their AI models. Of course, as with other intellectual property, AI code can be stolen, so this is a welcome development for the field. In the article, “IBM Patenting Watermark Technology to Protect Ownership of AI Models at Neowin, we learn the technology is still in development, and the company hasn’t even implemented it in-house yet. However, if all goes well, the technology may find its way into customer products someday. Writer Usama Jawad reports:

“IBM says that it showcased its research regarding watermarking models developed by deep neural networks (DNNs) at the AsiaCCS ’18 conference, where it was proven to be highly robust. As a result, it is now patenting the concept, which details a remote verification mechanism to determine the ownership of DNN models using simple API calls. The company explains that it has developed three watermark generation algorithms…

These use different methods; specifically:

  • Embedding meaningful content together with the original training data as watermarks into the protected DNNs,
  • Embedding irrelevant data samples as watermarks into the protected DNNs
  • Embedding noise as watermarks into the protected DNNs.

We learned:

“IBM says that in its internal testing using several datasets such as MNIST, a watermarked DNN model triggers an ‘unexpected but controlled response’.”

Jawad notes one drawback as of yet—though the software works well online, it still fails to detect ownership when a model is deployed internally. From another article, “IBM Came Up With a Watermark for Neural Networks” at TheNextWeb, we spotted an  interesting tidbit—Writer Tristan Greene points out a distinct lack of code bloat from the watermark. This is an important factor in neural networks, which can be real resource hogs.

For more information, you may want to see IBM’s blog post on the subject or check out the associated research paper. Beyond Search wonders what smart software developers will use these techniques. Amazon, Facebook, Google, Oracle, Palantir Technologies? Universities with IBM research support may be more likely candidates, but that is, of course, speculation from rural Kentucky.

Cynthia Murrell, August 8, 2018

Insurance Risk? Let an Algorithm Decide

August 7, 2018

Perhaps Big Data will save us from the vexing problem of credit reports, in one industry at least. The SmartDataCollective posits, “Is Big Data Causing Insurance Actuaries to Move Away from Using Credit Scores?” For twenty-some-odd years, insurance companies have relied on credit scores to assess risks and set premiums. Whether bad credit really means someone is more likely to, say, get into a car accident is debatable, but no matter. It seems some actuaries now think predictive analytics will provide better gauges, but we suggest that could lead to a larger and more complex can of worms. What data do they consider, and what conclusions do they draw? I doubt we can expect much transparency here.

Writer Annie Qureshi explores why the use of credit scores by insurance agencies is problematic, then describes:

“This is why insurers are using big data to make more nuanced decisions about the credit risks that their customers present. They may find that certain variables that are incorporated into credit scoring algorithms overstate a customer’s dependability. A customer could have a high credit score, because they have made the vast majority of their payments on time over the past seven years and have used little of their debt. However, they may have recently started using or if their credit card debt and missed three of the last seven payments on their existing insurance policy. This could be an indication that they have recently suffered a job loss or other financial setback, which is not reflected in their current credit score. There are other reasons that insurers are skeptical of using credit scores in the age of big data. One analysis shows that big data has helped insurers recognize that credit-based insurance policies are increasing the risk of unjust racial profiling.”

Indeed, but at the moment the data analytics field is suffering its own bias crisis (though a solution may be at hand). It will be interesting to see where this goes. Meanwhile, many of us would do well to be more careful what details we share online, since we cannot be sure how any tidbit may be used against us down the line.

Cynthia Murrell, August 8, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta