Like Life, Chatbots Are Semi Perfect

September 22, 2020

Chatbots are notoriously dumb pieces of AI that parrot information coded into their programs. They are also annoying, because they never have the correct information. Chatbots, however, are useful tools and developers are improving them to actually be useful. Medium runs down the differences between chatbots: “Updated: A Comparison Of Eight Chatbot Environments.”

Most chatbot environments have the same approach for a conversational interface, but there are four distinct development groups: avant-garde, NLU/NLP tools, use-the-cloud-you’re-in, and leading commercial cloud offerings. There are cross-industry trends across these groups:

“ The merging of intents and entities

• Contextual entities. Hence entities sans a finite list and which is detected by their context within a user utterance.

• Deprecation of the State Machine. Or at least, towards a more conversational like interface.

• Complex entities; introducing entities with properties, groups, roles etc.”

Beyond the industry trends, chatbots are transitioning from the stupid instant messaging programs to interactive, natural language driven, digital employee that “thinks and acts” like a real human. Companies want to use chatbots to grow by being able to comprehend past and current conversations, from multiple sources, and from CRM sources.

Chatbots cannot be compared because their frameworks are so different, but there are five consideration points. The NLU features, ecosystem maturity, licensing/usage costs, graphic call flow front-end developing and editing, and scalability and enterprise readiness are the important consideration points.

Chatbots are becoming smarter and already handle many customer service jobs. If they can actually resolve the problems customers contact companies for, then science fiction truly has become reality.

Whitney Grace, September 22, 2020

Web Scraping: Better Than a Library for Thumbtypers

September 22, 2020

Modern research. The thumbtyper way.

Nature explains the embrace of a technology that, when misused, causes concern in the post, “How We Learnt to Stop Worrying and Love Web Scraping.” The efficiency and repeatability of automation are a boon to researchers Nicholas J. DeVito, Georgia C. Richards, and Peter Inglesby, who write:

“You will end up with a sharable and reproducible method for data collection that can be verified, used and expanded on by others — in other words, a computationally reproducible data-collection workflow. In a current project, we are analyzing coroners’ reports to help to prevent future deaths. It has required downloading more than 3,000 PDFs to search for opioid-related deaths, a huge data-collection task. In discussion with the larger team, we decided that this task was a good candidate for automation. With a few days of work, we were able to write a computer program that could quickly, efficiently and reproducibly collect all the PDFs and create a spreadsheet that documented each case. … [Previously,] we could manually screen and save about 25 case reports every hour. Now, our program can save more than 1,000 cases per hour while we work on other things, a 40-fold time saving. It also opens opportunities for collaboration, because we can share the resulting database. And we can keep that database up to date by re-running our program as new PDFs are posted.”

The authors explain how scraping works to extract data from web pages’ HTML and describe how to get started. One could adopt a pre-made browser extension like or write a customized scraper—a challenging task but one that gives users more control. See the post for details on that process.

With either option, we are warned, there are several considerations to keep in mind. For some projects, those who possess the data have created an easier way to reach it, so scraping would be a waste of time an effort. Conversely, other websites hold their data so tightly it is not available directly in the HTML or has protections built in, like captchas. Those considering scraping should also take care to avoid making requests of a web server so rapidly that it crashes (an accidental DoS attack) or running afoul of scraping rules or licensing and copyright restrictions. The researchers conclude by encouraging others to adopt the technique and share any custom code with the community.

Cynthia Murrell, September 22, 2020

Predictive Analytics: Follow These Puffy Thought Bubbles

September 21, 2020

Predictive analytics is about mathematics; for instance, Bayesian confections and Markov doodling. The write up “Predictive Analytics: 4 Primary Aspects of Predictive Analytics” uses the bound phrase “predictive analytics” twice in one headline and cheerfully ignores the mathy reality of the approach.

Does this marshmallow approach make a difference? Yes, I believe it does. Consider this statement from the write up:

These predictive models can be used by enterprise marketers to more effectively develop predictions of future user behaviors based on the sourced historical data. These statistical models are growing as a result of the wide swaths of available current data as well as the advent of capable artificial intelligence and machine learning.

Okay, marketers. Predictive analytics are right in your wheelhouse. The assumption that “statistical models are growing” is interesting. The statistical models with which I am familiar require work to create, test, refine, and implement. Yep, work, mathy work.

The source of data is important. However, data have to be accurate or verifiable or have some attribute that tries to ensure that garbage in does not become the mode of operation. Unfortunately data remain a bit of a challenge. Do marketers know how to identify squishy data? Do marketers care? Yeah, sure they do in a meeting during which smartphone fiddling is taking place.

The idea of data utility is interesting. If one is analyzing nuclear fuel pool rod placement, it does help to have data relevant to that operation. But are marketers concerned about “data utility”? Once again, thumbtypers say, “Yes.” Then what? Acquire data from a third party and move on with life? It happens.

The thrill of “deep learning” is like the promise of spring. Everyone likes spring? Who remembers the problems? Progress is evident in the application of different smart software methods. However, there is a difference between saying “deep learning” or “machine learning” and making a particular application benefit from available tools, libraries, and methods. The whiz kids who used smart software to beat a human fighter pilot got the job done. The work required to achieve the digital victory was significant, took time, and was difficult. Very difficult. Marketers, were you on the team?

Finally, what’s the point of predictive analytics? Good question. For the article, the purpose of predictive analytics is to refine a guess-timate. And the math? Just use a smart solution, click and icon, and see the future.

Yikes, puffy thought bubbles.

Stephen E Arnold, September 21, 2020

Tireless Readers Those Bots

September 18, 2020

AI bots do marvelous things such as facial recognition, document analysis, and creating false videos of world leaders singing pop songs. AI bots, however, are only as smart as they are programmed. The MIT Technology Review shares how smart AI bots are in the article, “This Know-It-All AI Learns By Reading The Entire Web Nonstop.”

Most AI bots are good at consuming and regurgitating information, but lack the knowledge to interpret the content. If AI is going to be more integral in society, algorithms need to be smarter and also trustworthy. Diffbot is supposed to be different from its brethren, because it is designed to be factual. Diffbot reads everything on the public Internet in multiple languages and extracts as many facts as possible. It sounds like Diffbot know how to double and triple check facts.

Diffbot takes the facts and transforms them into a three part factoid that relates the information together: subject, verb, object. Each factoid interconnects and forms an interconnected knowledge graph of facts. Knowledge graphs have been used for years and are the basis for the semantic web. Google implemented knowledge graphs a few years ago, but only uses them for popular search terms. Diffbot wants to make knowledge graphs for everything on the Internet. How does Diffbot read everything?

“To collect its facts, Diffbot’s AI reads the web as a human would—but much faster. Using a super-charged version of the Chrome browser, the AI views the raw pixels of a web page and uses image-recognition algorithms to categorize the page as one of 20 different types, including video, image, article, event, and discussion thread. It then identifies key elements on the page, such as headline, author, product description, or price, and uses NLP to extract facts from any text.

Every three-part factoid gets added to the knowledge graph. Diffbot extracts facts from pages written in any language, which means that it can answer queries about Katy Perry, say, using facts taken from articles in Chinese or Arabic even if they do not contain the term ‘Katy Perry.’”

Diffbot rebuilds its knowledge graph every four to five days by adding 100 million to 150 million entities each month. Machine learning allows Diffbot to merge old information with new. Diffbot must also add new hardware as the knowledge graph grows.

Diffbot is currently used by DuckDuckGo to make Google-like boxes, Snapchat uses it to feeds its news pages, Adidas and Nike use it to track counterfeit shoes, and Zola uses it to assist people making wedding lists. For the moment, Diffbot only interacts with people in code, but the plan is to make it a universal factoid question answering system.

That sounds familiar, doesn’t it?

Whitney Grace, September 18, 2020

The AI Landscape from Topbots

September 17, 2020

DarkCyber finds logo collections fascinating.


The most recent “infographic” from Topbots groups companies engaged in artificial intelligence into categories; specifically:

B2B sales and marketing
Business intelligence
Consumer marketing
Customer management
Data science and machine learning
Digital commerce
Engineering and information technology
Finance and operations
HR and recruiting
Health care
Industrials and manufacturing
Legal and compliance
Logistics and supply chain
Security and risk
Service providers

Is the list of categories exhaustive? No, for example, there is no category for policeware and intelware.

Is the list of companies comprehensive? No, for example, Anduril, Geospark Analytics, and similar firms are not included.

It doe snot appear that Amazon, Google, or Microsoft are included. Each of these firms is active in artificial intelligence across a spectrum of applications and use cases.

Nevertheless, the diagram is attractive. That’s important for the millennials and MBA go getters.

Stephen E Arnold, September 17, 2020

Machine Learning Like A Psychic: Sounds Scientific for 2020

September 8, 2020

DarkCyber thinks most psychics are frauds. They are observers and manipulators of human behavior. They take people’s weaknesses and turn it into profit for themselves. In other words, they do not know the winning lottery numbers, they cannot predict stock options, and they cannot find missing pets.

Machine learning algorithms built on artificial intelligence, however, might have the “powers” psychics claim to have. EurekaAlert! Has a brand new: “Study: Machine Learning Can Predict Market Behavior.” Machine learning algorithms are smart, because they were programmed to find and interpret patterns. They can also assess how effective mathematical tools are predicting financial markets.

Cornell University researchers used a large dataset to determine if a machine learning algorithm could predict future financial events. It is a large task to undertake, because financial markets have tons of information and high volatility. Maureen O’Hara, the Robert W. Purcell Professor of Management at the SC Johnson College of Business said:

“ ‘Trying to estimate these sorts of things using standard techniques gets very tricky, because the databases are so big. The beauty of machine learning is that it’s a different way to analyze the data,’ O’Hara said. ‘The key thing we show in this paper is that in some cases, these microstructure features that attach to one contract are so powerful, they can predict the movements of other contracts. So we can pick up the patterns of how markets affect other markets, which is very difficult to do using standard tools.’”

Companies exist solely on the basis of understanding how financial markets work and they have developed their own machine learning algorithms for that very purpose. Cornell’s study used a random forest machine learning algorithm to examine these models using a dataset with 87 future contracts. The study used every single trade, tens of millions, for their analysis. They discovered that some of the variables worked, while others did not.

There are millions of datasets available since every trade has been recorded since 1980. Machine learning interprets this data and makes predictions, but it acts more like a black box. In other words, the algorithms predict patterns but it does not reveal the determinations.

Psychics have tried to predict the future for centuries and have failed. Machine learning algorithms are better at it, but they still are not 100% accurate. Predicting the future still remains consigned to fantasy and science fiction.

Whitney Grace, September 8, 2020

Smart Software: Automating Duplicitous Behavior

August 31, 2020

Dark patterns in software can be found. What about dark patterns in artificial intelligence libraries and apps? The problem is likely to be difficult if not impossible, particularly if those trying to figure out the AI’s process are not well informed.

All That Glitters Is Not Gold: Misuse of AI by Big Tech Can Harm Developing Countries” provides some information into a facet of smart software not often considered by users, API users, or regulators. The write up states:

The biggest concern with AI is a lack of governance, which gives large companies (popularly called as the “Big Tech”) unlimited access to private data.

That’s a safe statement. The write up continues:

In his study, Dr, Truby [Qatar University] discusses three examples to show how unregulated AI can be detrimental to SDGs. To begin with, he focuses on SDG 16, a goal that was developed to tackle corruption, organized crime, and terrorism. He explains that because AI is commonly used in national security databases, it can be misused by criminals to launder money or organize crime. This is especially relevant in developing countries, where input data may be easily accessible because of poor protective measures. Dr Truby suggests that, to prevent this, there should be a risk assessment at each stage of AI development. Moreover, the AI software should be designed such that it is inaccessible when there is a threat of it being hacked. Such restrictions can minimize the risk of hackers obtaining access to the software.

According to the write up, Dr. Truby asserts:

He concludes, “The risks of AI to the society and the possible detriments to sustainable development can be severe if not managed correctly. On the flip side, regulating AI can be immensely beneficial to development, leading to people being more productive and more satisfied with their employment and opportunities.”

Scrutiny is likely in some countries. In others, the attitude is, “How are my investments doing today?”

Stephen E Arnold, August 31, 2020

Facial Recognition: Recognizing Elsie the Cow

August 28, 2020

Facial recognition remains a contentious subject. In one of my 2020 National Cyber Crime Conference presentations, I showed a video snip. In Australia, facial recognition systems have been adapted to spot sharks. When the system “recognizes” a shark, an alert is relayed to individuals who patrol a beach. The idea is that sharks threats can be minimized. That’s animal recognition.

Orwell’s Nightmare? Facial Recognition for Animals Promises a Farmyard Revolution” is a different type of story. I presented an example for intelligent application of pattern recognition. The write up evokes images of George Orwell and presents a different picture of these “recognition” technologies.

The write up states:

China has led the world in developing facial recognition capabilities. There are almost 630 million facial recognition cameras in use in the country, for security purposes as well as for everyday conveniences like entering train stations and paying for goods in stores. But authorities also use the technology for sinister means, such as monitoring political dissidents and ethnic minorities.

The write up points out:

One Chinese AI company, Megvii, which has been blacklisted by the Department of Commerce for alleged involvement in the Chinese government’s repression of Uighurs in Xinjiang, is applying its technology to a program to recognize dogs by their nose prints. Other tech companies around the world have had a go at identifying chimpanzees, dolphins, horses and lions, with varying degrees of success.

The article reluctantly turns its attention to the animal recognition “hook” for the reporter’s political commentary:

Farmers load information such as health conditions, insemination dates and pregnancy test results into the system, which syncs up with cameras installed above troughs and milking stations. If everything works, farmers can amass valuable data without lifting a finger.

So what? It seems that the reporter (possibly working for the Washington Post, a Jeff Bezos property, was unaware that the Australian shark recognition example was built on Amazon technology. Yep, Mr. Bezos has a stake in Amazon as well.

Interesting stuff. Perhaps the ace reporter could have explored the use of pattern recognition applied to animals? That’s work, of course.

Stephen E Arnold, August 28, 2020

The Possibilities of GPT-3 from OpenAI Are Being Explored

August 27, 2020

Unsurprisingly, hackers have taken notice of the possibilities presented by OpenAI’s text-generating software. WibestBroker News reports, “Fake Blog Posts Land at the Top of Hacker News.” The post was generated by college student Liam Porr, who found it easy to generate content with OpenAI’s latest iteration, GPT-3, that could fool readers into thinking it had been crafted by a person. Writer John Marley describes the software:

“GPT-3, like all deep learning systems, looks for patterns in data. To simplify, the program has been trained on a huge corpus of text mined for statistical regularities. These regularities are unknown to humans. Between the different nodes in GPT-3’s neural network, they are stored as billions of weighted connections. There’s no human input involved in this process. Without any guidance, the program looks and finds patterns.”

Rather than being unleashed upon the public at large, the software has been released to select researchers in a private beta. Marley continues:

“Porr is a computer science student at the University of California, Berkeley. He was able to find a PhD student who already had access to the API. The student agreed to work with him on the experiment. Porr wrote a script that gave GPT-3 a headline and intro for the blog post. It generated some versions of the post, and Porr chose one for the blog. He copy-pasted from GPT-3’s version with very little editing. The post went viral in a matter of a few hours and had more than 26,000 visitors. Porr wrote that only one person reached out to ask if the post was AI-generated. Albeit, several commenters did guess GPT-3 was the author. But, the community down voted those comments, Porr says.”

Little did the down-voters know. Poor reports he applied for his own access to the tool, but it has yet to be granted. Perhaps OpenAI is not too pleased with his post, he suggests. We wonder whether this blogger received any backlash from the software’s creators.

Cynthia Murrell, August 27, 2020

IDC Has a New Horse to Flog: Artificial Intelligence

August 26, 2020

Okay, it is official. IDC has a new horse to flog. “Artificial Intelligence” will be carrying a load. Navigate to “Worldwide Spending on AI Expected to Double in 4 Years, Says IDC.” Consulting firms and specialized research outfits need to have a “big thing” about which to opine. IDC has discovered one: AI. The write up states:

Global spending on artificial intelligence (AI) is forecast to double over the next four years, growing from US$50.1 billion in 2020 to more than US$110 billion in 2024, according to the IDC. Spending on AI systems will accelerate over the next several years as organizations deploy AI as part of their digital transformation efforts, said IDC. The CAGR for the 2019-2024 period will be 20.1%.

In the Age of Rona, we have some solid estimates. A 2X jump in 48 months.

Why, pray tell, is AI now moving into the big leagues of hyper growth. Check out this explanation:

Two of the leading drivers for AI adoption are delivering a better customer experience and helping employees to get better at their jobs.

Quite interesting. My DarkCyber research team believes that AI growth will be encouraged by these factors:

  • Government investments in smart weapons and aggressive pushes for projects like “loyal wingman”
  • A sense that staff must be terminated and replaced with systems which do not require health care, retirement plans, vacations, and special support for issues like addiction
  • Packaged “smart” solutions like Amazon’s off the shelf products and services for machine learning.

These are probably trivial in the opinion of the IDC estimators, but DarkCyber is not convinced that baloney like customer experience and helping employees “get better at their jobs” are providing much oomph.

Stephen E Arnold, August 26, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta