Smart Software and Clever Humans

September 23, 2018

Online translation works pretty well. If you want 70 to 85 percent accuracy, you are home free. Most online translation systems handle routine communications like short blog posts written in declarative sentences and articles written in technical jargon just fine. Stick to mainstream languages, and the services work okay.

But if you want an online system to translate my pet phrases like HSSCM or azure chip consultant, you have to attend more closely. HSSCM refers to the way in which some Silicon Valley outfits run their companies. You know. Like a high school science club which decides that proms are for goofs and football players are not smart. The azure chip thing refers to consulting firms which lack the big time reputation of outfits like Bain, BCG, Booz, etc. (Now don’t get me wrong. The current incarnations of these blue chip outfits is far from stellar. Think questionable practices. Maybe criminal behavior.) The azure chip crowd means second string, maybe third string, knowledge work. Just my opinion, but online translation systems don’t get my drift. My references to Harrod’s Creek are geocoding nightmares when I reference squirrel hunting and bourbon in cereal. Savvy?

I was, therefore, not surprised when I read “AI Company Accused of Using Humans to Fake Its AI.” The main point seems to be:

[An[ interpreter accuses leading voice recognition company of ripping off his work and disguising it as the efforts of artificial intelligence.

There are rumors that some outfits use Amazon’s far from mechanical Turk or just use regular employees who can translate that which baffles the smart software.

The allegation from a former human disguised as smart software offered this information to Sixth Tone, a blog publishing the article:

In an open letter posted on Quora-like Q&A platform Zhihu, interpreter Bell Wang claimed he was one of a team of simultaneous interpreters who helped translate the 2018 International Forum on Innovation and Emerging Industries Development on Thursday. The forum claimed to use iFlytek’s automated interpretation service.

Trust me, you zippy millennials, smart software can be fast. It can be efficient. It can be less expensive than manual methods. But it can be wrong. Not just off base. Playing a different game with expensive Ronaldo types.

Why not run this blog post through Google Translate and check out the French or Spanish the system produces? Better yet, aim the system as a poor quality surveillance video or a VoIP call laden with insider talk between a cartel member and the Drug Llama?

Stephen E Arnold, September 23, 2018

Natural Language Processing: Brittle and Spurious

August 24, 2018

I read “NLP’s Generalization Problem, and How Researchers Are Tackling It.” From my vantage point in rural Kentucky, the write up seems to say, “NLP does not work particularly well.”

For certain types of content in which terminology is constrained, NLP systems work okay. But, like clustering, the initial assignment of any object determines much about the system. Examples range from jargon, code words, phrases which are aliases, etc. NLP systems struggle in a single language system.

The write up provides interesting examples of NLP failure.

The fixes, alas, are not likely to deliver the bacon any time soon. Yep, “bacon” means a technical breakthrough. NLP systems struggle with this type of utterance. I refer to local restaurants as the nasty caballero, which is my way of saying “the local Mexican restaurant on the river.”

I like the suggestion that NLP systems should use common sense. Isn’t that the method that AskJeeves tried when it allegedly revolutionized NLP question answering? The problem, of course, was the humans had to craft rules and that took money, time, and even more money.

The suggestion to “Evaluate unseen distributions and unseen tasks.” That’s interesting as well. The challenge is the one that systems like IBM Watson face. Humans have to make decisions about dicey issues like clustering, then identify relevant training data, and index the text with metadata.

Same problem: Time and money.

For certain applications, NLP can be helpful. For other types of content comprehension, one ends up with the problem of getting Gertie (the NLP system) up and running. Then after a period of time (often a day or two), hooking Gertie to the next Star Trek innovation from Sillycon Valley.

How do you think NLP systems handle my writing style? Let’s ask some NLP systems? DR LINK? IBM Watson? Volunteers?

Stephen E Arnold, August 24, 2018

The Social Vendor ATM: Governments Want to Withdraw Cash

August 21, 2018

I read “Social Networks to Be Fined for Hosting Terrorist Content.” My first reaction is, “Who is going to define terrorist content?” Without an answer swirling into my mind, I looked to the article for insight.

I learned:

,,, the EC’s going to follow through on threats to fine companies like Twitter, Facebook and YouTube for not deleting flagged content post-haste. The commission is still drawing up the details…

I assume that one of the details will be a definition of terrorist content.

How long will a large, mostly high school science club type company have to remove the identified content?

The answer:

One hour for platforms to delete terrorist content.

My experience, thought hardly representative, is that it is difficult to get much accomplished in one hour in my home office. A 60 minute turnaround time may be as challenging for a large outfit operating under the fluid principles of high school science club management.

Programmers sort of work in a combination of intense focus and general confusion. My hunch it may be difficult to saddle up the folks at a giant social vendor to comply with a take down request in 3,600 seconds.

My thought is that the one hour response time may be one way to get the social media ATM to eject cash.

By the way, some of Google’s deletion success can be viewed at this page on YouTube. Note that there are some interesting videos which are not deleted. One useful way to identify some interesting videos is to search for the word “nashid” or “nasheed.”

The results list seems to reveal at least one facet of terrorism’s definition.

Stephen E Arnold, August 21, 2018

Chatbots: Yak, Yak, Yak

May 24, 2018

We want to keep an open mind about smart software and the go-to application designed to terminate the folks with thrilling phone and email customer support jobs.

Just the name, “chatbot” is likely to elicit eyerolls from readers. While we have frequently been told these online oddities will be stepping up into the big leagues of usability, they don’t seem to have really found their niche. That’s what made it all the more surprising when their creators began demanding a little respect in a recent Qrius piece, “Chatbots Deserve More Than Being a Joke, Here’s Why.”

“In the most successful (and useful) applications we were able to schedule meetings and order pizza. …

“[But] We remember the failures. And when Microsoft’s Tay turned into a racist within 24 hours of release, we all laughed. If one of the biggest technology companies in existence couldn’t prevent a chatbot from becoming an anti-semite, what hope was there for the technology writ large?”

The reason we remember the failures and not the successes is because the benefits of one are outweighed by the regret of the other. However, more and more businesses are aiming to change this. Forbes recently reported on how AI was helping make chatbots more useful (go figure!). It’s a compelling point and maybe one that is finally on the verge of becoming relevant. Relevant is not the same as annoying and sometimes very, very dumb.

Patrick Roland, May 25, 2018

Short Honk: Online Translation Services

May 10, 2018

I read “Five of the Best Free Online Translators to Translate Foreign Languages.” Not a great headline, but I pulled out the list of services. Here they are:

I would suggest that you take a look at SDL’s FreeTranslation.com service at https://www.freetranslation.com/. Sometimes useful.

For accurate translations, one needs a native language speaker. Software is okay, but it does not do well with jargon, insider lingo, and words with loaded meanings.

Stephen E Arnold, May 10, 2018

Houston, We May Want to Do Fake News

May 2, 2018

The fake news phenomenon might be in the public eye more, thanks to endless warnings and news stories, however that has not dulled its impact. In fact, this shadowy form of propaganda seems to flourish under the spotlight, according to a recent ScienceNews story, “On Twitter, The Lure of Fake News is Stronger than Truth.”

According to the research:

“Discussions of false stories tended to start from fewer original tweets, but some of those retweet chains then reached tens of thousands of users, while true news stories never spread to more than about 1,600 people. True news stories also took about six times as long as false ones to reach 1,500 people. Overall, fake news was about 70 percent more likely to be retweeted than real news.”

That’s an interesting set of data. However, anyone quick to blame spambots for this amazing proliferation of fake news needs to give it a second look. According to research, bots are not as much to blame for this trend than humans. This is actually good news. Ideally, changes can be made on the personal level and we can eventually stamp out this misleading trend of fake news.

But if fake news “works”, why not use it? Not even humans can figure out what’s accurate, allegedly accurate, and sort of correct but not really. Smart software plus humans makes curation complex, slow, and costly.

That sounds about right or does it?

Patrick Roland, May 2, 2018

Fake News: Magnetic Content with Legs

April 30, 2018

The fake news phenomenon might be in the public eye more, thanks to endless warnings and news stories, however that has not dulled its impact. In fact, this shadowy form of propaganda seems to flourish under the spotlight, according to a recent ScienceNews story, “On Twitter, The Lure of Fake News is Stronger than Truth.”

According to the research:

“Discussions of false stories tended to start from fewer original tweets, but some of those retweet chains then reached tens of thousands of users, while true news stories never spread to more than about 1,600 people. True news stories also took about six times as long as false ones to reach 1,500 people. Overall, fake news was about 70 percent more likely to be retweeted than real news.”

That’s a shocking set of statistics. However, anyone quick to blame spambots for this amazing proliferation of fake news needs to give it a second look. According to research, bots are not as much to blame for this trend than humans. This is actually good news. Ideally, changes can be made on the personal level and we can eventually stamp out this misleading trend of fake news.

Patrick Roland, April 30, 2018

Text Classification: Established Methods Deliver Good Enough Results

April 26, 2018

Short honk: If you are a cheerleader for automatic classification of text centric content objects, you are convinced that today’s systems are home run hitters. If you have some doubts, you will want to scan the data in “Machine Learning for Text Categorization: Experiments Using Clustering and Classification.” The paper was free when I checked at 920 am US Eastern time. For the test sets, Latent Dirichlet Allocation performed better than other widely used methods. Worth a look. From my vantage point in Harrod’s Creek, automated processes, regardless of method, perform in a manner one expert explained to me at Cebit several years ago: “Systems are good enough.” Improvements are now incremental but like getting the last few percentage ticks of pollutants from a catalytic converter, an expensive and challenging engineering task.

Stephen E Arnold, April 26, 2018

Picking and Poking Palantir Technologies: A New Blood Sport?

April 25, 2018

My reaction to “Palantir Has Figured Out How to Make Money by Using Algorithms to Ascribe Guilt to People, Now They’re Looking for New Customers” is a a sign and a groan.

I don’t work for Palantir Technologies, although I have been a consultant to one of its major competitors. I do lecture about next generation information systems at law enforcement and intelligence centric conferences in the US and elsewhere. I also wrote a book called “CyberOSINT: Next Generation Information Access.” That study has spawned a number of “experts” who are recycling some of my views and research. A couple of government agencies have shortened by word “cyberosint” into the “cyint.” In a manner of speaking, I have an information base which can be used to put the actions of companies which offer services similar to those available from Palantir in perspective.

The article in Boing Boing falls into the category of “yikes” analysis. Suddenly, it seems, the idea that cook book mathematical procedures can be used to make sense of a wide range of data. Let me assure you that this is not a new development, and Palantir is definitely not the first of the companies developing applications for law enforcement and intelligence professionals to land customers in financial and law firms.

baseball card part 5

A Palantir bubble gum card shows details about a person of interest and links to underlying data from which the key facts have been selected. Note that this is from an older version of Palantir Gotham. Source: Google Images, 2015

Decades ago, a friend of mine (Ev Brenner, now deceased) was one of the pioneers using technology and cook book math to make sense of oil and gas exploration data. How long ago? Think 50 years.

The focus of “Palantir Has Figured Out…” is that:

Palantir seems to be the kind of company that is always willing to sell magic beans to anyone who puts out an RFP for them. They have promised that with enough surveillance and enough secret, unaccountable parsing of surveillance data, they can find “bad guys” and stop them before they even commit a bad action.

Okay, that sounds good in the context of the article, but Palantir is just one vendor responding to the need for next generation information access tools from many commercial sectors.

Read more

Real Time Translation: Chatbots Emulate Sci Fi

April 16, 2018

The language barrier is still one of the world’s major problems. Translation software, such as Google Translate is accurate, but it still makes mistakes that native speakers are needed to correct. Instantaneous translation is still a pipe dream, but the technology is improving with each new development. Mashable shares a current translation innovation and it belongs to Google: “Google Pixel Buds Vs. Professional Interpreters: Which Is More Accurate?”

Apple angered many devout users when it deleted the headphone jack on phones, instead replacing it with Bluetooth headphones called AirPods. They have the same minimalist sleek design as other Apple products, but Google’s Pixel Buds are far superior to them because of real time translation or so we are led to believe. Author Raymond Wong tested the Pixel Buds translation features at the United Nations to see how they faired against professional translators. He and his team tested French, Arabic, and Russian. The Pixel Buds did well with simple conversations, but certain words and phrases caused errors.

One hilarious example was when Google translated the Arabic for, “I want to eat salad” to “I want to eat power” in English. When it comes to real time translation, the experts are still the best because they can understand the context and other intricacies, such as tone, that comes with human language. The professional translators liked the technology, but it still needs work:

“Ayad and Ivanova both agreed that Pixel Buds and Google Translate are convenient technologies, but there’s still the friction of holding out a Pixel phone for the other person to talk into. And despite the Pixel Buds’ somewhat speedy translations, they both said it doesn’t compare to a professional conference interpreters, who can translate at least five times faster Google’s cloud.”

Keep working on those foreign language majors kids. Marketing noses in front of products that deliver in my view.

Whitney Grace, April 17, 2018

Next Page »

  • Archives

  • Recent Posts

  • Meta