Natural Language Generation: Sort of Made Clear

February 28, 2019

I don’t want to spend too much time on NGA (natural language generation). This is a free Web log. Providing the acronym should be enough of a hint.

If you are interested in the subject and can deal with wonky acronyms, you may want to read “Beyond Local Pattern Matching: Recent Advances in Machine Reading.”

Search sucks, so bright young minds want to tell you what you need to know. What if the system is only 75 to 80 percent accurate? The path is a long one, but the direction information retrieval is heading seems clear.

Stephen E Arnold, February 28, 2019

Has Search Evolved or Is It Spinning in Circles?

February 28, 2019

We have long been frustrated that search technology has changed into “we will tell you what you need to know.” Search is asking. Providing answers based on behavior is manipulation or a digital version of “mother knows best.”

Smart software or “AI” technology may fundamentally change how we find information online. Forbes asks, “Might AI Spell the Death of Search?” Writer Michael Ashley observes:

“‘This is the first time since 1994 when the search paradigm has changed,’ says David Seuss, CEO of Northern Light, a Boston-based strategic research portal provider I consult with that offers a cloud-based SaaS to global enterprises. ‘In 1994, you went to a search box, filled in a query, hit the search button, and received a list of documents. You manually reviewed these, picking the most relevant item to download. Fast forward to 2019 and it’s still the same thing. Find me one other part of the tech landscape that has not changed since the ’90s, whether it be broadband, wireless, mobile cloud computing, artificial intelligence—everything has changed. Everything except search.’”

The former consultant is doing consultant type thinking. There is a problem, and the consultant can ride to the rescue. A digital Lone Ranger can kill the useless system outputs. Well, that’s the story line.

Seuss claims it is Millennials that are pushing for change. Older users were just so happy to search from their desks instead of in the library stacks, he posits, that most of them remain satisfied with 90s-style online search.

The younger generation, though, find manually reviewing search results inefficient, and they recognize that a lot of good information tends to get buried later in the search results—especially as paid listings claim the top spots. Ashley writes:

“With the help of A.I., tasks once relegated to flesh and blood researchers can be now accomplished by computers. Drawing on the latter’s pattern-forming and predictive abilities, it can observe users’ actions, discerning their interests based on what they download, share, comment on or bookmark. Informed by this knowledge, an A.I. can proactively—and without manual prompting—recommend relevant content to users. Disrupting the traditional search model to its page ranking core, content can seek out the user instead of the other way around.”

Not surprisingly, the piece cites Northern Light’s platform as an example of the new, AI-powered possibilities: it quickly examines documents relevant to a query and presents a summary of pertinent information. The author ponders a time, close at hand, when the information we need finds us when we need it.

That sounds good, but I wonder—how can one be sure the algorithms are choosing wisely? What’s the old adage about consultants? Keep your hand on your wallet?

Cynthia Murrell, February 28, 2019

Antitrust: Tension between Consumer Interests and Government Needs

February 28, 2019

Are we facing the next great Ma Bell case? Likely the biggest antitrust case the general population recalls, was a landmark for the difference between having a big company and having a monopoly. In this technological age, however, we’re dealing with far greater impact than phone lines and long distance charges. We learned ore about this interesting battle over at Salon in an article called, “Do Google and Facebook Need to Be Broken Up?”

According to the story:

“Under the antitrust laws, it’s not a violation to be large, to be successful and to have sort of want a competition in the market place providing you have engaged in anticompetitive conduct, either to obtain or to maintain that dominant position, so that the real question is what’s the conduct that it so much to be a problem. That I think is the only way you’d have to focus.”

Why does this matter? Because antitrust and national security are becoming deeper and deeper intertwined. The danger of fake news, hacking, deepfake videos and more have become harder to monitor because of the sheer size of Google and Facebook.

But some government agencies depend on social media and the de facto control points. Is the line blurring or is the line disappearing?

Patrick Roland, February 28, 2019

ChemNet: Pre Training and Rules Can Work but Time and Cost Can Be a Roadblock

February 27, 2019

I read “New AI Approach Bridges the Slim Data Gap That Can Stymie Deep Learning Approaches.” The phrase “slim data” caught my attention. Pairing the phrase with “deep learning” seemed to point the way to the future.

The method described in the document reminded me that creating rules for “smart software” works on narrow domains with constraints on terminology. No emojis allowed. The method of “pre training” has been around since the early days of smart software. Autonomy in the mid 1990s relied upon training its “black box.”

Creating a training set which represents the content to be processed or indexed can be a time consuming, expensive business. Plus because content “drifts”, re-training is required. For some types of content, the training process must be repeated and verified.

So the cost of the rule creation, tuning and tweaking is one thing. The expense of training, training set tuning, and retraining is another. Add them up, and the objective of keeping costs down and accuracy up becomes a bit of a challenge.

The article focuses on the benefits of the new system as it crunches and munches its way through chemical data. The idea is to let software identify molecules for their toxicity.

Why hasn’t this type of smart software been used to index outputs at scale?

My hunch is that the time, cost, and accuracy of the indexing itself is a challenge. Eighty percent accuracy may be okay for some applications like identifying patients with a risk of diabetes. For identifying substances that will not kill one outright is another.

In short, the slim data gap and deep learning remain largely unsolved even for a constrained content domain.

Stephen E Arnold, February 27, 2019

Google: Fighting the Good Fight in the Valley of Truth

February 27, 2019

Readers of DarkCyber (formerly Beyond Search) know that I am skeptical about identifying and eradicating false news. Humans have a difficult time figuring out what’s “right” and “accurate” even when the “facts” are presumably “correct.” Convert this type of “judgment” into a series of statements and the decision remains a bit, how shall I phrase it, a low precision, low accuracy process.

One view is that Google and Facebook are drowning in fake news. The tech giants are routinely chastised in the media and on Capitol Hill for failing to maintain customer trust over fake news and privacy. So, the search king has taken it upon itself to fight these fights. But, is it enough? We gathered information from a recent Engadget story, “Google Explains How It’s Fighting Fake News.”

According to the story:

“Google is not immune to the scourge of fake news that has dominated headlines over the last few years. The company has taken various steps in fighting the problem — from partnering with fact-checking networks to launching the $300 million Google News Initiative. Now it’s expanded its transparency efforts further by detailing at length the steps it takes to fight disinformation across its services.”

DarkCyber sometimes entertains the thought that well informed humans or really smart software could step in and address this challenge.

We don’t doubt that Google means well, but we are also suspicious as to whether the online ad giant can do more than generate online advertising revenue despite the firm’s effort to convince people about its prowess in non advertising domains.

Can humans handle the job? News has surfaced that Facebook is causing psychic stress among its global team of fake news fighters, hate speech identifiers, and interesting content reviewers.

The FTC, according to the Verge, took action against Cure Encapsulations. According to the online news service, the company:

has agreed to never again make a “weight-loss, appetite-suppression, fat-blocking, or disease-treatment claims for any dietary supplement, food, or drug” unless the company has “competent and reliable scientific evidence in the form of human clinical testing” to support its claims. The settlement also prohibits the company from misrepresenting endorsements, including whether a review or testimonial is from a real customer who purchased the product.

Liver failure and worse can result from “fake” information.

And where are the human editors? What about smart software?

The human editors are under stress and complaining. The software chugs along—ineffectually.

State of play: “Fake news” is a challenge.\

Patrick Roland, February 27, 2019

Factualities for February 27, 2019

February 27, 2019

Data are wonderful. Especially when each and every fact is presented without context and without the constraints of Statistics 101. Here are some gems from the last few days:

24 percent. Percentage of adults who listen to podcasts each day. Country? Sample size? Demographics of “adults”? Not relevant. Source: Chartable

51 percent. Percentage of technology industry workers who believe that fake news is in line with President Trump’s view of the media. Source: Buzzfeed

$522 billion. Revenue from global sales of smartphones in 2018. Source: GFK

$3,500. Cost of Microsoft HoloLens 2 mixed reality headset. (The price is $900 more than the Huawei Mate X foldable 5G mobile phone.) Source: The Verge

1020. Number of hate groups tallied in 2018. Source: USA Today

$1,000. Initial fee for a Google dot dev domain name. Source: The Register

930. The number of exabytes consumed by mobile users in a single year by 2020. Source: Enterprise Networking Planet

500 or 50? Number of Microsoft employees who did not sign up to develop weapons. Source: Buzzfeed

47th. The rank of the US among 77 countries in 4G download speeds. Source: 9to5Mac. Note: The original document has been “disappeared.”

65. Number of days Signal has spent trying to contact Google about an app updating issue. Source: Twitter

1 terabyte. Size of new SanDisk micro secure digital card. Source: Engadget

Stephen E Arnold, February 27, 2019

Watson Weakly: Recruitment the Smart Way

February 26, 2019

IBM is working overtime to become the cloud alternative to Amazon. IBM Watson is back to recipes, health care, and background noise. IBM, however, knows how to capture the attention of the DarkCyber and Beyond Search team in rural Kentucky.

We noted an article in the Register, an online publication, with the interesting title “IBM So Very, Very Sorry after Jobs Page Casually Asks Hopefuls: Are You White, Black… or Yellow?”

The Register asserts:

IBM has apologized after its recruitment web pages asked applicants whether their ethnicity was, among other options, the racial slurs Yellow and Mulatto.

The article describes the wording as a “baffling error.” My hunch is that either IBM Watson or one of his acolytes consumed outputs with the diligence once expects of millennials and smart software, possibly working in tandem.

An IBM professional is quoted as telling the Register:

“Those questions were removed immediately when we became aware of the issue and we apologize. IBM hiring is based on skills and qualifications. We do not use race or ethnicity in the hiring process and any responses we received to those questions will be deleted. IBM has long rejected all forms of racial discrimination and we are taking appropriate steps to make sure this does not happen again.”

Watson? What about that cancer diagnosis? What about inappropriate questions? What about those old people who used to work in personnel?

Stephen E Arnold, February 26, 2019

DARPA Looks Into Nano AI

February 26, 2019

The fields of nanotechnology and artificial intelligence have never been hotter. Just look at the recent call for ideas from the Defense Advanced Research Projects Agency (DARPA), that they are calling the “Micro-Brain Project.” No, this has nothing to do with the IQ level of politicians, it is actually a groundbreaking idea in defense, as we learned from a recent Newsweek story, “US Military is Building Smarter Robots and Thinks Insects Might Be The Key to New Artificial Intelligence.”

According to the story:

“The proposal…[looks for a plan] capable of mapping out the insect’s brain and its decision-making functions as part of the Artificial Intelligence Exploration program which…’constitutes a series of high-risk, high-payoff projects where researchers will work to establish the feasibility of new AI concepts within 18 months of award.’”

Obviously, the idea of mapping an insect’s brain and nervous system would pay dividends in the spy industry. Imagining microscopic drones has tongues wagging everywhere from the Pentagon to Langley. And they’re not alone. MI6 recently hailed this as the next frontier that they, too, are focusing on. We are on the cusp of a new arms race, where the weapons are getting smaller, not bigger. Any agency that could harness the AI capability and size of insects will, obviously, gain a major advantage over other nations.

Now imagine the mix of DARPA’s nano intelligence with the FLIR nano drone explained in this week’s DarkCyber. Interesting stuff.

Patrick Roland, February 26, 2019

DarkCyber for February 26, 2019, Now Available

February 26, 2019

DarkCyber for February 26,2019, is now available at www.arnoldit.com/wordpress and on Vimeo at https://www.vimeo.com/77362226.

The program is a production of Stephen E Arnold. It is the only weekly video news shows focusing on the Dark Web and lesser known Internet services.

This week’s story line up includes: a nano drone for US Army operators; lonely heart cyber cons; a major denial of service takedown; and a snapshot of Cyberheist, a deep dive into financial cyber crime.

The first story explores FLIR’s Black Hornet nano drones. These devices are the size of one half sheet of paper and weigh as much as a single slice of bread. US Army operators will use the devices to see around corners and look over the next ridge. Each drone can transmit high definition video and still images and remain aloft for 30 minutes. The operator can fly the nearly invisible drones from a handheld mobile phone sized controller. The nano drones will be used by military forces in France as well as by US military personnel.

The second story explains how romance cons have become a growth business for cyber criminals. The method exploits online dating or “hook up” sites. Individuals seek females over the age of 50, build trust via online communications, and then use that relationship to obtain cash or financial information. Losses average, according to the UK authorities, about $10,000 per successful con. Victims are often reluctant to go to the authorities because they are embarrassed about their behavior.

The third story provides information about the recent takedown of individuals responsible for more than 200,000 denial of service attacks. One of the individuals arrested began his business based on making it easy to knock a Web site offline when he was 17. The method used flooded a Web site or service with a large number of requests. If the targeted service was not correctly configured, the DDOS attack would cause the Web site or service to become unresponsive.

The final story provides a summary of a free book called “Cyberheist.” The 260 document provides a wealth of information about the mechanisms used for stealing bank account information, credit card data, and other personal financial information. The volume reviews numerous types of online methods for deceiving an individual into providing information or for allowing the attacker to install malware on the target’s computing device. DarkCyber provides information about how to download this useful volume without charge.

Kenny Toth, February 26, 2019

Good News about Big Data and AI: Not Likely

February 25, 2019

I read a write up which was a bit of a downer. The story appeared in Analytics India and was titled “10 Challenges That Data Science Industry Still Faces.” Oh, oh. Maybe not good news?

My first thought was, “Only 10?”

The write up explains that the number one challenge is humans. The idea that smart software would solve these types of problems: Sluggish workers at fast food restaurants, fascinating decisions made by entry level workers in some government bureaus, and the often remarkable statements offered by talking heads on US cable TV “real news” programs, among others.

Nope. The number one challenge is finding humans who can do data science work.

What’s number two after this somewhat thorny problem? The answer is finding the “right data” and then getting a chunk of data one can actually process.

So one and two are what I would call bedrock issues: Expertise and information.

What about the other eight challenges. Here are three of them. I urge you to read the original article for the other five issues.

  • Informing people why data science and its related operations are good for you. Is this similar to convincing a three year old that lima beans are just super.
  • Storytelling. I think this means, “These data mean…” One hopes the humans (who are in short supply) draw the correct inferences. One hopes.
  • Models. This is a shorthand way of saying, “What’s assembled will work.” Hopefully the answer is, “Sure, our models are great.”

Analytics India has taken a risk with their write up. None of the data science acolytes want to hear “bad news.”

Let’s federate and analyze that with great data we can select to generate a useful output. Maybe 80 percent “accuracy” on a good day?

Stephen E Arnold, February 25, 2019

Next Page »

  • Archives

  • Recent Posts

  • Meta