Chatbots: The Negatives Seem to Abound

September 26, 2017

I read “Chatbots and Voice Assistants: Often Overused, Ineffective, and Annoying.” I enjoy a Hegelian antithesis as much as the next veteran of Dr. Francis Chivers’ course in 19th Century European philosophy. Unlike some of Hegel’s fans, I am not confident that taking the opposite tack in a windstorm is the ideal tactic. There are anchors, inboard motors, and distress signals.

The article points out that quite a few people are excited about chatbots. Yep, sales and marketing professionals earn their keep by crating buzz in order to keep their often-exciting corporate Beneteau 22’s afloat. With VCs getting pressured by those folks who provided the cash to create chatbots, the motive force for an exciting ride hurtles onward.

The big Sillycon Valley guns have been army the chatbot army for years. Anyone remember Ask Jeeves when it pivoted to a human powered question answering machine into a customer support recruit. My recollection is that the recruit washed out, but your mileage may vary.

With Amazon, Facebook, Google, IBM, and dozens and dozens of companies with hard-to-remember names on the prowl, chatbots are “the future.” The Infoworld article is a thinly disguised “be careful” presented as “real news.”

That’s why I wrote a big exclamation point and the words “A statement from the Captain Obvious crowd” next to this passage:

Most of us have been frustrated with misunderstandings as the computer tries to take something as imprecise as your voice and make sense of what you actually mean. Even with the best speech processing, no chatbots are at 100-percent recognition, much less 100-percent comprehension.

I am baffled by this fragment, but I am confident it makes sense to those who were unaware that dealing with human utterances is a pretty tough job for the Googlers and Microsofties who insist their systems are the cat’s pajamas. Note this indication of Infoworld quality in thought an presentation:

It seems very inefficient to resort to imprecise systems when we have [sic]

Yep, an incomplete thought which my mind filled in as saying, “humans who can maybe answer a question sometimes.”

The technology for making sense of human utterance is complex. Baked into the systems is the statistical imprecision that undermines the value of some chatbot implementations.

My thought is that Infoworld might help its readers if it were to answer questions like these:

  • What are the components of a chatbot system? Which introduce errors on a consistent basis?
  • How can error rates of chatbot systems be reduced in an affordable, cost effective manner?
  • What companies are providing third party software to the big girls and boys in the chatbot dodge ball game?
  • Which mainstream chatbot systems have exemplary implementations? What are the metrics behind “exemplary”?
  • What companies are making chatbot technology strides for languages other than English?

I know these questions are somewhat more difficult to answer than a write up which does little more than make Captain Obvious roll his eyes. Perhaps Infoworld and its experts might throw a bone to their true believers?

Stephen E Arnold, September 26, 2017

New Beyond Search Overflight Report: The Bitext Conversational Chatbot Service

September 25, 2017

Stephen E Arnold and the team at Arnold Information Technology analyzed Bitext’s Conversational Chatbot Service. The BCBS taps Bitext’s proprietary Deep Linguistic Analysis Platform to provide greater accuracy for chatbots regardless of platform.

Arnold said:

The BCBS augments chatbot platforms from Amazon, Facebook, Google, Microsoft, and IBM, among others. The system uses specific DLAP operations to understand conversational queries. Syntactic functions, semantic roles, and knowledge graph tags increase the accuracy of chatbot intent and slotting operations.

One unique engineering feature of the BCBS is that specific Bitext content processing functions can be activated to meet specific chatbot applications and use cases. DLAP supports more than 50 languages. A BCBS licensee can activate additional language support as needed. A chatbot may be designed to handle English language queries, but Spanish, Italian, and other languages can be activated with via an instruction.

Dr. Antonio Valderrabanos said:

People want devices that understand what they say and intend. BCBS (Bitext Chatbot Service) allows smart software to take the intended action. BCBS allows a chatbot to understand context and leverage deep learning, machine intelligence, and other technologies to turbo-charge chatbot platforms.

Based on ArnoldIT’s test of the BCBS, accuracy of tagging resulted in accuracy jumps as high as 70 percent. Another surprising finding was that the time required to perform content tagging decreased.

Paul Korzeniowski, a member of the ArnoldIT study team, observed:

The Bitext system handles a number of difficult content processing issues easily. Specifically, the BCBS can identify negation regardless of the structure of the user’s query. The system can understand double intent; that is, a statement which contains two or more intents. BCBS is one of the most effective content processing systems to deal correctly  with variability in human statements, instructions, and queries.

Bitext’s BCBS and DLAP solutions deliver higher accuracy, and enable more reliable sentiment analyses, and even output critical actor-action-outcome content processing. Such data are invaluable for disambiguating in Web and enterprise search applications, content processing for discovery solutions used in fraud detection and law enforcement and consumer-facing mobile applications.

Because Bitext was one of the first platform solution providers, the firm was able to identify market trends and create its unique BCBS service for major chatbot platforms. The company focuses solely on solving problems common to companies relying on machine learning and, as a result, has done a better job delivering such functionality than other firms have.

A copy of the 22 page Beyond Search Overflight analysis is available directly from Bitext at this link on the Bitext site.

Once again, Bitext has broken through the barriers that block multi-language text analysis. The company’s Deep Linguistics Analysis Platform supports more than 50 languages at a lexical level and +20 at a syntactic level and makes the company’s technology available for a wide range of applications in Big Data, Artificial Intelligence, social media analysis, text analytics,  and the new wave of products designed for voice interfaces supporting multiple languages, such as chatbots. Bitext’s breakthrough technology solves many complex language problems and integrates machine learning engines with linguistic features. Bitext’s Deep Linguistics Analysis Platform allows seamless integration with commercial, off-the-shelf content processing and text analytics systems. The innovative Bitext’s system reduces costs for processing multilingual text for government agencies and commercial enterprises worldwide. The company has offices in Madrid, Spain, and San Francisco, California. For more information, visit www.bitext.com.

Kenny Toth, September 25, 2017

Quote to Note: The Role of US AI Innovators

March 24, 2017

I read “Opening a New Chapter of My Work in AI.” After working through the non-AI output, I concluded that money beckons the fearless leader, Andrew Ng. However, I did note one interesting quotation in the apologia:

The U.S. is very good at inventing new technology ideas. China is very good at inventing and quickly shipping AI products.

What this suggests to me is that the wizard of AI sees the US as good at “ideas”, and China an implementer. A quick implementer at that.

My take is that China sucks up intangibles like information and ideas. Then China cranks out products. Easy to monetize things, avoiding the question, “What’s the value of that idea, pal?”

Ouch. On the other hand, software is the new electricity. So who is Thomas Edison? I wish I “knew”.

Stephen E Arnold, March 24, 2017

Search Like Star Trek: The Next Frontier

February 28, 2017

I enjoy the “next frontier”-type article about search and retrieval. Consider “The Next Frontier of Internet and Search,” a write up in the estimable “real” journalism site Huffington Post. As I read the article, I heard “Scotty, give me more power.” I thought I heard 20 somethings shouting, “Aye, aye, captain.”

The write up told me, “Search is an ev3ryday part of our lives.” Yeah, maybe in some demographics and geo-political areas. In others, search is associated with finding food and water. But I get the idea. The author, Gianpiero Lotito of FacilityLive is talking about people with computing devices, an interest in information like finding a pizza, and the wherewithal to pay the fees for zip zip connectivity.

And the future? I learned:

he future of search appears to be in the algorithms behind the technology.

I understand algorithms applied to search and content processing. Since humans are expensive beasties, numerical recipes are definitely the go to way to perform many tasks. For indexing, humans fact checking, curating, and indexing textual information. The math does not work the way some expect when algorithms are applied to images and other rich media. Hey, sorry about that false drop in the face recognition program used by Interpol.

I loved this explanation of keyword search:

The difference among the search types is that: the keyword search only picks out the words that it thinks are relevant; the natural language search is closer to how the human brain processes information; the human language search that we practice is the exact matching between questions and answers as it happens in interactions between human beings.

This is as fascinating as the fake information about Boolean being a probabilistic method. What happened to string matching and good old truncation? The truism about people asking questions is intriguing as well. I wonder how many mobile users ask questions like, “Do manifolds apply to information spaces?” or “What is the chemistry allowing multi-layer ion deposition to take place?”

Yeah, right.

The write up drags in the Internet of Things. Talk to one’s Alexa or one’s thermostat via Google Home. That’s sort of natural language; for example, Alexa, play Elvis.

Here’s the paragraph I highlighted in NLP crazy red:

Ultimately, what the future holds is unknown, as the amount of time that we spend online increases, and technology becomes an innate part of our lives. It is expected that the desktop versions of search engines that we have become accustomed to will start to copy their mobile counterparts by embracing new methods and techniques like the human language search approach, thus providing accurate results. Fortunately these shifts are already being witnessed within the business sphere, and we can expect to see them being offered to the rest of society within a number of years, if not sooner.

Okay. No one knows the future. But we do know the past. There is little indication that mobile search will “copy” desktop search. Desktop search is a bit like digging in an archeological pit on Cyprus: Fun, particularly for the students and maybe a professor or two. For the locals, there often is a different perception of the diggers.

There are shifts in “the business sphere.” Those shifts are toward monopolistic, choice limited solutions. Users of these search systems are unaware of content filtering and lack the training to work around the advertising centric systems.

I will just sit here in Harrod’s Creek and let the future arrive courtesy of a company like FacilityLive, an outfit engaged in changing Internet searching so I can find exactly what I need. Yeah, right.

Stephen E Arnold, February 28, 2017

CREST Includes Additional Documents

January 22, 2017

Short honk: The CIA has responded to a Freedom of Information Act request and posted additional documents. These are searchable via the CREST system. The content is accessible at this link.

Stephen E Arnold, January 22, 2017

Smart Software: An Annoying Flaw Will Not Go Away

December 22, 2016

Machines May Never Master the Distinctly Human Elements of Language” captures one of the annoying flaws in smart software. Machines are not human—at least not yet. The write up explains that “intelligence is mysterious.” Okay, big surprise for some of the Sillycon Valley crowd.

The larger question is, “Why are some folks skeptical about smart software and its adherents’ claims?” Part of the reason is that publications have to show some skepticism after cheerleading.  Another reason is that marketing presents a vision of reality which often runs counter to one’s experience. Try using that voice stuff in a noisy subway car. How’s that working out?

The write up caught my attention with this statement from the Google, one of the leaders in smart software’s ability to translate human utterances:

“Machine translation is by no means solved. GNMT can still make significant errors that a human translator would never make, like dropping words and mistranslating proper names or rare terms, and translating sentences in isolation rather than considering the context of the paragraph or page.”

The write up quotes a Stanford wizard as saying:

She [wizard Li] isn’t convinced that the gap between human and machine intelligence can be bridged with the neural networks in development now, not when it comes to language. Li points out that even young children don’t need visual cues to imagine a dog on a skateboard or to discuss one, unlike machines.

My hunch is that quite a few people know that smart software works in some use cases and not in others. The challenge is to get those with vested interests and the marketing millennials to stick with “as is” without confusing the “to be” with what can be done with available tools. I am all in on research computing, but the assertions of some of the cheerleaders spell S-I-L-L-Y. Louder now.

Stephen E Arnold, December 22, 2016

Smart Software Figures Out What Makes Stories Tick

November 28, 2016

I recall sitting in high school when I was 14 years old and listening to our English teacher explain the basic plots used by fiction writers. The teacher was Miss Dalton and he seemed quite happy to point out that fiction depended upon: Man versus man, man versus the environment, man versus himself, man versus belief, and maybe one or two others. I don’t recall the details of a chalkboard session in 1959.

Not to fear.

I read “Fiction Books Narratives Down to Six Emotional Story Lines.” Smart software and some PhDs have cracked the code. Ivory Tower types processed digital versions of 1,327 books of fiction. I learned:

They [the Ivory Tower types] then applied three different natural language processing filters used for sentiment analysis to extract the emotional content of 10,000-word stories. The first filter—dubbed singular value decomposition—reveals the underlying basis of the emotional storyline, the second—referred to as hierarchical clustering—helps differentiate between different groups of emotional storylines, and the third—which is a type of neural network—uses a self-learning approach to sort the actual storylines from the background noise. Used together, these three approaches provide robust findings, as documented on the hedonometer.org website.

Okay, and what’s the smart software say today that Miss Dalton did not tell me more than 50 years ago?

[The Ivory Tower types] determined that there were six main emotional storylines. These include ‘rags to riches’ (sentiment rises), ‘riches to rags’ (fall), ‘man in a hole’ (fall-rise), ‘Icarus’ (rise-fall), ‘Cinderella’ (rise-fall-rise), ‘Oedipus’ (fall-rise-fall). This approach could, in turn, be used to create compelling stories by gaining a better understanding of what has previously made for great storylines. It could also teach common sense to artificial intelligence systems.

Ah, progress.

Stephen E Arnold, November 28, 2016

French Smart Software Companies: Some Surprises

November 15, 2016

I read “French AI Ecosystem.” Most of the companies have zero or a low profile in the United States. The history of French high technology outfits remains a project for an enterprising graduate student with one foot in La Belle France and one in the USA. This write up is a bit of a sales pitch for venture capital in my opinion. The reason that VC inputs are needed is that raising money in France is — how shall I put this? — not easy. There is no Silicon Valley. There is Paris and a handful of other acceptable places to be intelligent. In the Paris high tech setting, there are a handful of big outfits and lots and lots of institutions which keep the French one percent in truffles and the best the right side of the Rhone have to offer. The situation is dire unless the start up is connected by birth, by education at one of the acceptable institutions, or hooked up with a government entity. I want to mention that there is a bit of French ethnocentrism at work in the French high tech scene. I won’t go into detail, but you can check it out yourself if you attend a French high tech conference in one of the okay cities. Ars-en-Ré and Gémenos  do not qualify. Worth a visit, however.

Now to the listings. You will have to work through the almost unreadable graphic or contact the outfit creating the listing, which is why the graphic is unreadable I surmise. From the version of the graphic I saw, I did discern a couple of interesting points. Here we go:

Three outfits were identified as having natural language capabilities. These are Proxem, syJLabs (no, I don’t know how to pronounce this”syjl” string. I can do “abs”, though.), and Yseop k(maybe, Aesop from the fable?). Proxem offers its Advanced Natural Language Object Orient Processing Environment (Antelope). The company was founded in 2007.) syJLabs does not appear in my file of French outfits, and we drew a blank when looking for the company’s Web site. Sigh. Yseop has been identified as a “top IT innovator” by an objective, unimpeachable, high value, super credible, wonderful, and stellar outfit (Ventana Research). Yseop, also founded in 2007, offers a system which “turns data into narrative in English, French, German, and Spanish, all at the speed of thousands of pages per second.”

As I worked through a graphic containing lots of companies, I spotted two interesting inclusions. The first is Sinequa, a vendor of search founded in 2002, now positioned as an important outfit in Big Data and machine learning. Fascinating. The reinvention of Sinequa is a logical reaction to the implosion of the market for search and retrieval for the enterprise. The other company I noted was Antidot, which mounted a push to the US market several years ago. Antidot, like Sinequa, focused on information access. It too is “into” Big Data and machine learning.

I noted some omissions; for example, Hear&Know, among others. Too bad the listing is almost unreadable and does not include a category for law enforcement, surveillance, and intelligence innovators.

Stephen E Arnold, November 15, 2016

Entity Extraction: No Slam Dunk

November 7, 2016

There are differences among these three use cases for entity extraction:

  1. Operatives reviewing content for information about watched entities prior to an operation
  2. Identifying people, places, and things for a marketing analysis by a PowerPoint ranger
  3. Indexing Web content to add concepts to keyword indexing.

Regardless of your experience with software which identifies “proper nouns,” events, meaningful digits like license plate numbers, organizations, people, and locations (accepted and colloquial)—you will find the information in “Performance Comparison of 10 Linguistic APIs for Entity Recognition” thought provoking.

The write up identifies the systems which perform the best and the worst.

Here are the five systems and the number of errors each generated in a test corpus. The “scores” are based on a test which contained 150 targets. The “best” system got more correct than incorrect. I find the results interesting but not definitive.

The five best performing systems on the test corpus were:

The five worst performing systems on the test corpus were:

There are some caveats to consider:

  1. Entity identification works quite well when the training set includes the entities and their synonyms as part of the training set
  2. Multi-language entity extraction requires additional training set preparation. “Learn as you go” is often problematic when dealing with social messages, certain intercepted content, and colloquialisms
  3. Identification of content used as a code—for example, Harrod’s teddy bear for contraband—is difficult even for smart software operating with subject matter experts’ input. (Bad guys are often not stupid and understand the concept of using one word to refer to another thing based on context or previous interactions).

Net net: Automated systems are essential. The error rates may be fine for some use cases and potentially dangerous for others.

Stephen E Arnold, November 7, 2016

More and More about NLP

August 31, 2016

Natural language processing is not a new term in the IT market, but NLP technology has only become commonplace in the last year or so. When I refer to commonplace, I refer how most computers and mobile devices have some form of NLP tool, including digital assistants and voice to text. Business 2 Community explains the basics about NLP technology in the article, “Natural Language Processing: Turning Words in Data.”

The article acts a primer for understanding how NLP works and is redundant until you get into the text about how it is applied in the real world; that is, tied to machine learning. I found this paragraph helpful:

“This has changed with the advent of machine learning. Machine learning refers to the use of a combination of real-world and human-supplied characteristics (called “features”) to train computers to identify patterns and make predictions. In the case of NLP, using a real-world data set lets the computer and machine learning expert create algorithms that better capture how language is actually used in the real world, rather than on how the rules of syntax and grammar say it should be used. This allows computers to devise more sophisticated—and more accurate—models than would be possible solely using a static set of instructions from human developers.”

It then goes into further details about how NLP is applied to big data technology and explaining the practical applications. It makes some reference to open source NLP technologies, but only in passing.

The article sums up the NLP and big data information in regular English vernacular. The technology gets even more complicate when you delve into further research on the subject.

Whitney Grace, August 31, 2016

Next Page »

  • Archives

  • Recent Posts

  • Meta