An Algorithm with Unintended Consequences
September 12, 2017
Some of us who follow developments in AI wondered about this: apparently, the algorithm YouTube tasked with eliminating “extremist content” on its platform goes too far. Business Insider reports, “YouTube’s Crackdown on Extremist Content and ISIS Is Also Hurting Researchers and Journalists.” It is a good thing there now exist commercial services that can meet the needs of analysts, researchers, and government officials; many of these services are listed in Stephen E Arnold’s Dark Web Notebook.
In this case, the problem is an algorithm that cannot always distinguish between terrorist propaganda and terrorist coverage. Since the site implemented its new steps to combat terrorist content, several legitimate researchers and journalists have protested that their content was caught in the algorithm’s proverbial net and summarily removed; some of it had been available on the site for years. Reporter Rob Price writes
Open-source researcher Eliot Higgins says he has had his old videos about Syria deleted and his account was suspended as the Google-owned video platform attempts to tackle material that supports terrorism. Middle East Eye reports that Syrian opposition news site Orient News was also deleted, as was a video uploaded by one of the publication’s own journalists. ‘YouTube has now suspended my account because of videos of Syria I uploaded 2-3 years ago. Nice anti-ISIS AI you’ve got there, YouTube,’ Higgins tweeted on Saturday. ‘Ironically, by deleting years-old opposition channels YouTube is doing more damage to Syrian history than ISIS could ever hope to achieve.’ In another incident, a video from American journalist Alexa O’Brien’s video that was used in Chelsea Manning’s trial was deleted, according to Middle East Eye.
Higgins, whose account has since been reinstated, has an excellent point—ultimately, tools that destroy important documentation along with propaganda are counter-productive. Yes, algorithms are faster (and cheaper) than human workers. But do we really want to sacrifice first-hand footage of crucial events for the sake of speedy sanitization? There must be a better way.
Cynthia Murrell, September 12, 2017
Is China the New Los Angeles Trend Machine?
August 28, 2017
I was last in China in 2007 and then in Hong Kong in 2010. My information is, therefore, out of date. That’s no big whoop for me, since I am ready to tally 74 years in our thrilling world.
I read “In China You Now Have to Provide Your Real Identity If You Want to Comment Online.” The main point of the write up is that the free and open Internet is going the way of the dodo. The goal of “real name registration” is to make it easy for certain official to track down individuals without the expensive, time consuming, and sometimes messy “traditional” identity investigations.
I noted this passage:
So what exactly constitutes forbidden topics on the Chinese internet? An unnamed CAC official told a journalist the following when asked about the new rules (first translated by The Diplomat):
- opposing the principles of the constitution of China
- endangering national security, revealing state secrets, subverting state power, and undermining national reunification
- damaging national honor and interests
- inciting national hatred, ethnic discrimination, and undermining national unity
- undermining the state’s policies on religion or promoting cults and feudal superstitions
- spreading rumors or disrupting social order
- spreading obscenity, pornography, violence, or terror, or abetting a crime
- insulting or slandering others and infringing upon the lawful rights and interests of others
- violating any other laws and regulations
My reaction to the write up is that censorship, China-style, may be the latest trend to emerge from the Middle Kingdom. Once Los Angeles on the left coast generated the “in” fads which would then roll toward Harrod’s Creek.
My thought is that censorship may be the new black or whatever the hot color is for fall fashion. I am not particularly surprised because similar governmental actions seem to have emerged from the deliberative bodies in Russia, Turkey, and other countries. One African nation state just turned off the Internet, an Iran-style touch.
One idea struck me. Is now the time for individuals to generate an alternative or optional Internet identity. Creating a “legend” or an alternate Internet identity is important. Just ask the person who ran the illegal Dark Web site AlphaBay. The mistake that individual made was to use an identity which was not “clean.”
The procedure for setting up a legend or clean Internet identity is not easy. There are a number of steps. Human mistakes can render a clean identity traceable; that is, dirty. If you are able to verify that you are working for a recognized law enforcement or intelligence entity, you can obtain a legend from the Beyond Search Overflight team. This is our WITSEC Light bundle. More comprehensive legends are also available to qualified LE and intel professionals.
To explore this package which contains an alias, matching email address, and other necessary elements like a Walmart pay as you go phone, just write darkwebnotebook at yandex dot com. Remember. We verify that you have a legitimate LE or intel role prior to providing the legend, a workable biography, and summary of what one has to do to build out the legend.
Those who do not qualify will have to look elsewhere for a way to deal with censorship constraints in countries other than the US. If the China censorship trend moves outward from that country, more than one online identity may be needed for some operations.
Stephen E Arnold, August 28, 2017
Russia Argues with Encrypted Telegram
August 23, 2017
One reason that the Dark Web flourishes are that if offers people an anonymous, encrypted way to communicate. Governments dislike encrypted services, especially when they are trying to keep an eye on their citizens. The Register explains how Russia is unhappy with encrypted messenger service Telegram: “Encrypted Chat App Telegram Warned By Russian Regulator: ‘Comply Or Goodbye.”
One hot argument between governments and their citizens is how much leeway the former has to monitor the latter’s communication. Russia is one country with a poor history of respecting its people’s privacy. It currently is very angry with encrypted chat app Telegram. Communications regulator Roskomnadzor Alexander Zharov stated that Telegram is violating Russian legislation because it is not providing any information about its parent company.
Telegram’s parent company only has to complete a questionnaire with information that will be published in the country’s register of service providers. It is not an attack on encrypted communication. If the questionnaire remains unanswered, then Telegram will be banned.
Telegram founder Pavel Durov told newswire Reuters a ban would mean Russian government officials will be entrusting their communications to messenger apps written in other countries.
In playing the nationalism card, Durov cited WhatsApp, Viber, Apple and Google as companies who might carry messages from Russian officials and their friends.
He is skeptical that the regulator is mostly cranky about corporate structure.
The communication bureau and Telegram should stop fighting over the petty red tape. Playing the nationalist card is a good move on Telegram’s part, but why is it so hard to answer a standard questionnaire? If Russia’s security and government officials lose their home brewed encryption app, would they turn to something not from Mother Russia? This is yet another example of why people use the Dark Web over regular Web services.
Whitney Grace, August 23, 2017
Take a Hint Amazon, Bing Is Not That Great
August 22, 2017
It recently hit the new stands that Google Home was six times more likely than Amazon Alexa to answer questions. The Inquirer shares more about this development in the article, “Google Hoe Is Six Times Smarter Than Amazon’s Echo.”
360i conducted a test using their proprietary software that asked Amazon Alexa and Google Home 3,000 questions. We don’t know what the 3,000 questions were, but some of them did involve retail information. Google pulled on its Knowledge Graph to answer questions, while Amazon used Bing for its search. Amazon currently controls 70% of the voice assistant market and has many skills from other manufacturers. Google, however, is limited in comparison:
By comparison, Google Home has relatively few smart home control chops, relying primarily on IFTTT, which is limited in what it can achieve and often takes a long time between request and execution.
Alexa, on the other hand, can carry out native skill commands in a second or two.
The downside of the two, however, is that Google is Google and Amazon is just not as good. If Echo was able to access the Knowledge Graph, Google Music, and control Chromecasts, then it would be unassailable.
Amazon Alexa and Google Home are a set of rivals and the facts are is that one is a better shopper and the other better at search. While 360i has revealed their results, we need to see the test questions to fully understand how they arrived at the “six times smarter” statement?
Whitney Grace, August 22, 2017
Google Home Still Knows More
August 21, 2017
Amazon has infiltrated our lives as our main shopping destination. Amazon is also trying to become our best friend, information source, and digital assistant via Alexa. Alexa provides a wealth of services, such as scheduling appointments, filling shopping orders, playing music, answering questions, and more. While Amazon Alexa has a steady stream of users, Ad Week says, “Google Home Is 6 Times More Likely To Answer Your Questions Than Amazon Alexa.”
The company 360i developed software that would determine which digital assistant was more accurate: Google Home or Amazon Alexa. Apparently Google Home is six times more likely to answer a question than Amazon Alexa. 360i arrived at this conclusion by using their software to ask both devices 3,000 questions. Alexa won when it came to questions related to retail information, but Google Home won over all with its search algorithms.
It’s relatively surprising, considering that RBC Capital Markets projects Alexa will drive $10 billion of revenue to Amazon by 2020—not to mention the artificial intelligence-based system currently owns 70 percent of the voice market.
Amazon might be the world’s largest market place, so Alexa would, of course, be the world’s best shopping assistant. The Internet is much larger than shopping and Google scours the entire Web. What does Amazon use to power Alexa’s searches?
Whitney Grace, August 21, 2017
Analytics for the Non-Tech Savvy
August 18, 2017
I regularly encounter people who say they are too dumb to understand technology. When people tell themselves this, they are hindering their learning ability and are unable to adapt to a society that growing more dependent on mobile devices, the Internet, and instantaneous information. This is especially harmful for business entrepreneurs. The Next Web explains, “How Business Intelligence Can Help Non-Techies Use Data Analytics.”
The article starts with the statement that business intelligence is changing in a manner equivalent to how Windows 95 made computers more accessible to ordinary people. The technology gatekeeper is being removed. Proprietary software and licenses are expensive, but cloud computing and other endeavors are driving the costs down.
Voice interaction is another way BI is coming to the masses:
Semantic intelligence-powered voice recognition is simply the next logical step in how we interact with technology. Already, interfaces like Apple’s Siri, Amazon Alexa and Google Assistant are letting us query and interact with vast amounts of information simply by talking. Although these consumer-level tools aren’t designed for BI, there are plenty of new voice interfaces on the way that are radically simplifying how we query, analyze, process, and understand complex data.
One important component here is the idea of the “chatbot,” a software agent that acts as an automated guide and interface between your voice and your data. Chatbots are being engineered to help users identify data and guide them into getting the analysis and insight they need.
I see this as the smart people are making their technology available to the rest of us and it could augment or even improve businesses. We are on the threshold of this technology becoming commonplace, but does it have practicality attached to it? Many products and services are common place, but it they only have flashing lights and whistles what good are they?
Whitney Grace, August 18, 2017
Google and Microsoft AI Missteps
August 14, 2017
I read an interesting article called “Former Microsoft Exec Reveals Why Amazon’s Alexa Voice Assistant Beat Cortana.” The passage I noted as thought provoking was this one:
Qi Lu, formerly a Microsoft wizard and now a guru at Baidu allegedly said in this passage from the Verge’s article:
Lu believes Microsoft and Google “made the same mistake” of focusing on the phone and PC for voice assistants, instead of a dedicated device. “The phone, in my view, is going to be, for the foreseeable future, a finger-first, mobile-first device,” explains Lu. “You need an AI-first device to solidify an emerging base of ecosystems.”
Apparently Lu repeated what I think is a key point:
“The phone, in my view, is going to be, for the foreseeable future, a finger-first, mobile-first device,” explains Lu. “You need an AI-first device to solidify an emerging base of ecosystems.”
Several questions occurred to me:
- Do Google and Microsoft share a similar context for evaluating high value technologies? Perhaps these two companies are more alike in how they see the world than Amazon?
- Are Google and Microsoft reactive; that is, the companies act in a reflexive manner with regard to figuring out how to apply a magnetic technology?
- Is Amazon’s competitive advantage an ability to think about an interesting technology in terms of the technology’s ability to augment an existing revenue stream and open new revenue streams?
I don’t have the answer to these questions. If Lu is correct, Amazon has done an end run around Google and Microsoft in terms of talking to gizmos. Can Amazon sustain its technological momentum? With Microsoft floundering with Windows 10 and hardware reliability, it is possible that its applied research is mired in the Microsoft management morass. Google, on the other hand, has its hands full with Amazon taking more product search traffic at a time when Google has to figure out how to solve emotional, political, and ideological issues. Need I say “damore”?
Stephen E Arnold, August 14, 2017
A Wonky Analysis of Search Today: The SEO Wizard View
July 24, 2017
I read what one of my goslings described as a “wonky” discussion of search. You will have to judge for yourself, gentle reader. In an era of fake news, I am not sure what to make of a semi factual, incomplete write up with the title “How Search Reveals the World.” Search does not reveal “the world”; search provides some — note the word “some” — useful information about the behaviors of individuals who run queries or make use of systems like the oh, so friendly Amazon Alexa.
I learned that there are three types of search, and I have to tell you that these points were not particularly original. Here they are:
- Navigational search queries. Don’t think about Endeca’s “guided navigation.” Think about Google Maps, which is going to morph into a publishing platform, a fact not included in the write up which triggered ruffled gosling feathers
- Information search queries. Ah, now we’re talking. A human types 2.4 words in a search box and feels lucky or just looks at the first few hits on the first search page. Could these hits be ads unrelated or loosely related to the user’s query? Sure, absolutely.
- Transactional search queries. I am not sure what this phrase “transactional search queries” means, but that’s not too surprising. The confusion rests with me when I think of looking for a product like a USB C plug on Amazon versus navigating to my bank’s fine, fine Web site and using a fine, fine interface to move money from Point A to Point B. Close enough for horseshoes.
Skimming the surface is good for seaplanes but not a plus for an analysis of search and retrieval.
But the most egregious argument in the write up is that search becomes little more than a rather clumsy manipulative tool for “marketers, advertisers, and business owners.” Why clumsy? The write up is happily silent about Facebook’s alleged gaming of its system for various purposes. Filtering hate speech, for example, seems admirable until someone has to define “hate speech.” Filtering live streaming of a suicide or crime in progress is a bit more problematic. But search is a sissy compared with the alleged Facebook methods. With marketers looking to make a buck, Facebook seems to slip the pager mâché noose of the write up’s argument.
But there is a far larger omission. One of the most important types of search is “pervasive, predictive search.” The idea is a nifty one. Using various “signals” a system presents information automatically to a user who is online and looking at an output. No specific action on the part of the user is required. The user sees what he or she presumably wants. Search without search! The marketer’s Holy Grail.
There are some important components of this type of search.
Perhaps an SEO expert will explain them instead of recycling old information and failing to define 33 percent of the bedrock statements. But that may be a bridge to far for those who would try to manipulate the systems and methods of some of the providers of free, ad supported search systems. The longest journey begins with a single step. Didn’t an SEO expert say that too?
Stephen E Arnold, July 24, 2017
AI Feeling a Little Sentimental
July 24, 2017
Big data was one of the popular buzzwords a couple years ago, but one conundrum was how organizations were going to use all that mined data? One answer has presented itself: sentiment analysis. Science shares the article, “AI In Action: How Algorithms Can Analyze The Mood Of The Masses” about how artificial intelligence is being used to gauge people’s emotions.
Social media presents a constant stream of emotional information about products, services, and places that could be useful to organizations. The problem in the past is that no one knew how to fish all of that useful information out of the social media Web sites and make it a usable. By using artificial intelligence algorithms and natural language processing, data scientists are finding associations between words, the language used, posting frequency, and more to determine everything from a person’s mood to their personality, income level, and political associations.
‘There’s a revolution going on in the analysis of language and its links to psychology,’ says James Pennebaker, a social psychologist at the University of Texas in Austin. He focuses not on content but style, and has found, for example, that the use of function words in a college admissions essay can predict grades. Articles and prepositions indicate analytical thinking and predict higher grades; pronouns and adverbs indicate narrative thinking and predict lower grades…’Now, we can analyze everything that you’ve ever posted, ever written, and increasingly how you and Alexa talk,’ Pennebaker says. The result: ‘richer and richer pictures of who people are.’
AI algorithms are able to turn a person’s online social media accounts and construct more than a digital fingerprint of a person. The algorithms act like digital mind readers and recreate a person based on the data they publish.
Whitney Grace, July 24, 2017
IBM Watson: Predicting the Future
July 12, 2017
I enjoy IBM’s visions of the future. One exception: The company’s revenue estimates for the Watson product line is an exception. I read “IBM Declares AI the Key to Making Unstructured Data Useful.” For me, the “facts” in the write up are a bit like a Payday candy bar. Some nuts squished into a squishy core of questionable nutritional value.
I noted this factoid:
80 percent of company data is unstructured, including free-form documents, images, and voice recordings.
I have been interested in the application of the 80-20 rule to certain types of estimates. The problem is that the ‘principle of factor sparsity” gets disconnected from the underlying data. Generalizations are just so darned fun and easy. The problem is that the mathematical rigor necessary to validate the generalization is just too darned much work. The “hey, I’ve got a meeting” or the more common “I need to check my mobile” get in the way of figuring out if the 80-20 statement makes sense.
My admittedly inept encounters with data suggest that the volume of unstructured data is high, higher that the 80 percent in the rule. The problem is that today’s systems struggle to:
- Make sense of massive streams of unstructured data from outfits like YouTube, clear text and encrypted text messages, and the information blasted about on social media
- Identify the important items of content directly germane to a particular matter
- Figure out how to convert content processing into useful elements like named entities and relate those entities to code words and synonyms
- Perform cost effective indexing of content streams in near real time.
At this time, systems designed to extract actionable information from relatively small chunks of content are improving. But these systems typically break down when the volume exceeds the budget and computing resources available to those trying to “make sense” of the data in a finite amount of time. This type of problem is difficult due to constraints on the systems. These constraints are financial as in “who has the money available right now to process these streams?” These constraints are problematic when someone asks “what do we do with the data in this dialect from northern Afghanistan?” And there are other questions.
My problem with the IBM approach is that the realities of volume, interrelating structured and semi structured data, and multi lingual content is that these bumps in the information super highway Watson seems to speed along are absorbed by marketing fluffiness.
I loved this passage:
Chatterjee highlighted Macy’s as an example of an IBM customer that’s using the company’s tools to better personalize customers’ shopping experiences using AI. The Macy’s On Call feature lets customers get information about what’s in stock and other key details about the contents of a retail store, without a human sales associate present. It uses Watson’s natural language understanding capabilities to process user queries and provide answers. Right now, that feature is available as part of a pilot in 10 Macy’s stores.
Yep, I bet that Macy’s is going to hit a home run against the fast ball pitching of Jeff Bezos’ Amazon Prime team. Let’s ask Watson. On the other hand, let’s ask Alexa.
Stephen E Arnold, July 12, 2017