GitHub: Amusing Security Management

April 8, 2021

I got a kick out of “GitHub Investigating Crypto-Mining Campaign Abusing Its Server Infrastructure.” I am not sure if the write up is spot on, but it is entertaining to think about Microsoft’s security systems struggling to identify an unwanted service running in GitHub. The write up asserts:

Code-hosting service GitHub is actively investigating a series of attacks against its cloud infrastructure that allowed cybercriminals to implant and abuse the company’s servers for illicit crypto-mining operations…

In the wake of the SolarWinds’ and Exchange Server “missteps,” Microsoft has been making noises about the tough time it has dealing with bad actors. I think one MSFT big dog said there were 1,000 hackers attacking the company.

The main idea is that attackers allegedly mine cryptocurrency on GitHub’s own servers.

This is post SolarWinds and Exchange Server “missteps”, right?

What’s the problem with cyber security systems that monitoring real time threats and uncertified processes?

Oh, I forgot. These aggressively marketed cyber systems still don’t work it seems.

Stephen E Arnold, April 8, 2021

DarkCyber for January 12, 2021, Now Available

January 12, 2021

DarkCyber is a twice-a-month video news program about online, the Dark Web, and cyber crime. You can view the video on Beyond Search or at this YouTube link.

The program for January 12, 2021, includes a featured interview with Mark Massop, DataWalk’s vice president. DataWalk develops investigative software which leapfrogs such solutions as IBM’s i2 Analyst Notebook and Palantir Gotham. In the interview, Mr. Massop explains how DataWalk delivers analytic reports with two or three mouse clicks, federates or brings together information from multiple sources, and slashes training time from months to several days.

Other stories include DarkCyber’s report about the trickles of information about the SolarWinds’ “misstep.” US Federal agencies, large companies, and a wide range of other entities were compromised. DarkCyber points out that Microsoft’s revelation that bad actors were able to view the company’s source code underscores the ineffectiveness of existing cyber security solutions.

DarkCyber highlights remarkable advances in smart software’s ability to create highly accurate images from poor imagery. The focus of DarkCyber’s report is not on what AI can do to create faked images. DarkCyber provides information about how and where to determine if a fake image is indeed “real.”

The final story makes clear that flying drones can be an expensive hobby. One audacious drone pilot flew in restricted air zones in Philadelphia and posted the exploits on a social media platform. And the cost of this illegal activity. Not too much. Just $182,000. The good news is that the individual appears to have avoided one of the comfortable prisons available to authorities.

One quick point: DarkCyber accepts zero advertising and no sponsored content. Some have tried, but begging for dollars and getting involved in the questionable business of sponsored content is not for the DarkCyber team.

Finally, this program begins our third series of shows. We have removed DarkCyber from Vimeo because that company insisted that DarkCyber was a commercial enterprise. Stephen E Arnold retired in 2017, and he is now 77 years old and not too keen to rejoin the GenX and Millennials in endless Zoom meetings and what he calls “blatant MBA craziness.” (At least that’s what he told me.)

Kenny Toth, January 12, 2021

Ah, Chatbots. Unfortunately, Inevitable Because Who Wants to Support Customers?

December 2, 2020

Lest one think AI is here to make our lives easier, one should think again. Though the technology may bring new capabilities and insights, users must put in work and surmount frustration to get results. Bizcommunity.com discusses “The Unsuspected Stumbling Blocks of AI for Customer Experience.” Writer Mathew Conn specifically examines the use of chatbots here. He writes:

“While chatbots successfully enable one-to-one conversations with customers through automated interfaces and are a great way to deliver immediate responses, they are not right for any and all customer interactions. The first, and possibly most important failure of chatbots, is a direct result of the organization in question not identifying what customer interactions are right for enhancement with chatbots. … Because chatbots use open source libraries, most won’t be customized to the organization’s specific industry or customers. Pre-trained bots will be limited to their pre-programmed decision path and are limited by the designer or programmer’s understanding of customer behaviors and requests. While chatbots don’t reason, smarter bots can cope better with some language nuances; however, without human judgment, chatbot accuracy will always be limited. Pre-trained chatbots follow a structured conversation plan and can lose the flow fairly easily. With more access to customer history and data, smarter chatbots can ‘learn’ customer preferences. However, to keep context, chatbots need every possible response to every possible customer request.”

The more complex the interaction, the more likely customers will want to converse with a human. It can be useful to begin interactions with a chatbot then shift to a human worker, but a problem can occur when such a shift means changing platforms from a chat window to phone or email. If the company does not maintain consistency across all its channels, the customer must restart their explanation from the beginning. This does not make for a happy customer or, by extension, a good reputation for the business.

Chatbots are not the only AI function that is less of a panacea than vendors would like us to believe. Before investing in any AI solution, businesses should do their research and make certain they understand what they are getting, whether it will truly address their unique needs, and how to make the most of it.

Just cut costs and move on.

Cynthia Murrell, December 2, 2020

NLP Survey: Grains of Salt Helpful

November 30, 2020

Curious about the “state” of natural language processing? Surveys dependent on participants who self-recruit or receive a questionnaire as a result of signing up for a newsletter have to be consumed with a grain of salt and bottle of monosodium glutamate. You can get a copy of a survey sponsored by John Snow Labs via this url. This is a Medium content object, so be prepared to provide information of value to certain large organizations.

The principal findings from the survey of 571 respondents include:

  • People are spending money for entity recognition and document classification
  • Sparc and spaCy are popular
  • One third of those responding use an indexing “helper” tool.

Data about budgets are scant. Percentages are not what fuel a sales person’s interest.

For Beyond Search, the single most important finding is that four cloud services do the heavy lifting for those into NLP: AWS, Azure, Google, and IBM. Which cloud service is most popular among the NLP crowd? Give up? The survey says, “Google.”

Not surprisingly cost and complexity are holding back NLP adoption and expansion. And what is John Snow Labs? An NLP outfit. Index term: Marketing.

Stephen E Arnold, November 30, 2020

Google: Poetry Creation Made Eneasy

November 25, 2020

I spotted “Google’s Verse by Verse AI Can Help You Write in the Style of Famous Poets.” The subtitle illustrates why this Google innovation is probably going to find some Silicon Valley Shakespeares:

Quoth the Bugdroid, “Nevermore.”

The write up guides the reader to this url. Then the page displays:

image

Okay, let’s write a poem with the Google smart software. I am skeptical because Google set out to solve death. So far, no luck with that project. For poetic style, I quite like the approach of William Abernathy, who wrote a remarkable tribute to Queen Elizabeth called Elisaeis, Apotheosis poeticaas in Latin when he was trying to avoid arrest for religious heresy. (For more info on William Abernathy, navigate to your local university library and chase down Vol. 76, No. 5, Texts and Studies, 1979. “The Elisæis” of William Alabaster (Winter, 1979). Oh, the poem is a tribute to Elizabeth the First. Did I mention the poem was an epic, thousands upon thousands of lines. In Latin too. Hot stuff.)

Well, bummer. Mr. Alabaster is not listed as a stylistic choice on the Google write a poem Web site. I thought AI was smart. Well, let us sally forth with the clever and sometimes interesting Edwin Arlington Robinson who wrote:

Mininver loved the Medici,
Albeit he had never seen one;
He would have sinned incessantly
Could he have been one.

Yep, sin. But I had to pick other poets with which the smart Google AI is familiar. Trepedatiously I selected the fave of elderly literature teachers: Henry Wadsworth Longfellow. Plus in a nod to the Rona and rising infection rates, I plunked my mouse cursor on the liquor-loving and raven loving Edgar Allen Poe. Yep, I noted the “nevermore” in the article’s subtitle. Then I clicked “Next.”

I specified a quatrain in iambic pentameter with the rhyming scheme AB AB.

Google’s smart software wanted a chunk of poesy as a “seed” for the smart software. I provided:

Whoa, teenaged mind, cause no sorrow or pain

I want to point out that this is the first line of a poem my junior class English teacher Edwardine Sperling required us to write. (She loved cardinals, the bird, not the baseball team.) My poetic flight of fancy at age 15 on this line motivated Ms. Sperling to try and get me expelled me from high school. No sense of humor had she. (The compromise proposed by the assistant principal was that Ms Sperling could ban me from the National Honor Society as a result of my inappropriate writing, and I had to sit outside the class in the hallway for the remainder of the semester.)

And what was my “Spirit of Nature” poem about? Nothing much. Just sitting in the woods on a sunny day in early autumn. Then the Spirit of Nature emerged from a pile of leaves. I explained that my Spirit of Nature was the October 1959 Playmate of the Month from Playboy magazine. I elaborated via metaphors (terrible metaphors I must confess) how the Spirit of Nature or Miss October helped move away from “sorrow or pain.” I will leave the details to your imagination. My poem was a hoot. But I got the boot.

Back to the Google smart poetry writer, a system which I hypothesized would have zero imagination and would have been an A student in dear Ms. Sperling’s literature class.

I clicked the Next button again. Magic. Google’s fine system spit out after some prompting after I provided the first line in red. Google goodness is in blue:

Whoa, teenaged mind, cause no sorrow or pain
Enlife a phantom of an idle love;
Yet in a fancy I could now attain
Look on the beauty of that world above!

Great stuff those words in blue crafted sharp and true by Lord Google.

Ms Sperling would have relished the “enlife” word. The prefix “en” leads to many coinages; for example, enbaloney, enstupid, and enmarketing. Maybe enAI? Sure. But no Playboy bunnies. No filthy innuendo. No double entendre. The meaning thing eludes me, but, hey, Google couldn’t solve death either. The GOOG is not doing too well in poesie either I opine. Any questions about Google’s query ad matching semantic system? Good.

Stephen E Arnold, November 24, 2020

BERT: It Lives

November 2, 2020

I wrote about good old BERT before.

I was impressed with the indexing and context cues in BERT. The acronym does not refer to the interesting cartoon character. This BERT is Bidirectional Encoder Representations from Transformers. If you want more information about this approach to making sense of text, just navigate to the somewhat turtle like Stanford University site and retrieve the 35 page PDF.

BERT popped up again in a somewhat unusual search engine optimization context (obviously recognized by Google’s system at least seven percent of the time): “Could Google Passage Indexing Be Leveraging BERT?”

I worked through the write up twice. It was, one might say, somewhat challenging to understand. I think I figured it out:

Google is trying to index the context in which an “answer” to a user’s query resides. Via Google parsing magic, the answer may be presented to the lucky user.

I pulled out several gems from the article which is designed to be converted into manipulations to fool Google’s indexing system. SEO is focused on eroding relevance to make a page appear in a Google search result list whether the content answers the user’s query or not.

The gems:

  • BERT does not always mean the ‘BERT’. Ah, ha. A paradox. That’s helpful.
  • Former Verity and Yahoo search wizard Prabhakar Raghavan allegedly said: “Today we’re excited to share that BERT is now used in almost every query in English, helping you get higher quality results for your questions.” And what percentage of Google queries is “almost every”? And what percentage of Google queries are in English? Neither the Googler nor the author of the article answer these questions.
  • It’s called passage indexing, but not as we know it. The “passage indexing” announcement caused some confusion in the SEO community with several interpreting the change initially as an “indexing” one. Confusion. No kidding?
  • And how about this statement about “almost every”? “Whilst only 7% of queries will be impacted in initial roll-out, further expansion of this new passage indexing system could have much bigger connotations than one might first suspect. Without exaggeration, once you begin to explore the literature from the past year in natural language research, you become aware this change, whilst relatively insignificant at first (because it will only impact 7% of queries after all), could have potential to actually change how search ranking works overall going forward.”

That’s about it because the contradictions and fascinating logic of the article have stressed my 76 year old brain’s remaining neurons. The write up concludes with this statement:

Whilst there are currently limitations for BERT in long documents, passages seem an ideal place to start toward a new ‘intent-detection’ led search. This is particularly so, when search engines begin to ‘Augment Knowledge’ from queries and connections to knowledge bases and repositories outside of standard search, and there is much work in this space ongoing currently.  But that is for another article.

Plus, there’s a list of references. Oh, did I mention that this essay/article in its baffling wonderfulness is only 15,000 words long. Another article? Super.

Stephen E Arnold, November 2, 2020

     

Like Life, Chatbots Are Semi Perfect

September 22, 2020

Chatbots are notoriously dumb pieces of AI that parrot information coded into their programs. They are also annoying, because they never have the correct information. Chatbots, however, are useful tools and developers are improving them to actually be useful. Medium runs down the differences between chatbots: “Updated: A Comparison Of Eight Chatbot Environments.”

Most chatbot environments have the same approach for a conversational interface, but there are four distinct development groups: avant-garde, NLU/NLP tools, use-the-cloud-you’re-in, and leading commercial cloud offerings. There are cross-industry trends across these groups:

“ The merging of intents and entities

• Contextual entities. Hence entities sans a finite list and which is detected by their context within a user utterance.

• Deprecation of the State Machine. Or at least, towards a more conversational like interface.

• Complex entities; introducing entities with properties, groups, roles etc.”

Beyond the industry trends, chatbots are transitioning from the stupid instant messaging programs to interactive, natural language driven, digital employee that “thinks and acts” like a real human. Companies want to use chatbots to grow by being able to comprehend past and current conversations, from multiple sources, and from CRM sources.

Chatbots cannot be compared because their frameworks are so different, but there are five consideration points. The NLU features, ecosystem maturity, licensing/usage costs, graphic call flow front-end developing and editing, and scalability and enterprise readiness are the important consideration points.

Chatbots are becoming smarter and already handle many customer service jobs. If they can actually resolve the problems customers contact companies for, then science fiction truly has become reality.

Whitney Grace, September 22, 2020

Forget Structured Query Language Commands? Yeah, Not Yet

August 29, 2020

One of the DarkCyber team spotted a demonstration service called NatualSQL.com. The idea is that the system will accept natural language queries of information stored in structured databases. According to the DarkCyber person, the queries launched into the natural language box were:

Sheva War with Whom

Sheva Frequency

The sparse interface sports a Content button which displays the information in the system.

How did this work?

image

Not well. NLP systems pose challenges still it seems.

Interesting idea but some rough edges need a bit of touch up.

Stephen E Arnold, August 29, 2020

NLP: A Time for Reflection or a Way to Shape Decades of Hyperbole and Handwaving?

August 2, 2020

The most unusual GoCurrent.com online information service published “The Field of Natural Language Processing Is Chasing the Wrong Goal.” The article comments about the Association for Computational Linguistics Conference held in July 2020.

The point of the write up is to express concern about the whither and why of NLP; for example:

My colleagues and I at Elemental Cognition, an AI research firm based in Connecticut and New York, see the angst as justified. In fact, we believe that the field needs a transformation, not just in system design, but in a less glamorous area: evaluation.

Evaluation?

Yep, the discipline appears to be chasing benchmarks. DarkCyber believes this is a version of the intra-squad rivalries as players vie to start the next game.

The write up raises this question:

How did the NLP community end up with such a gap between on-paper evaluations and real-world ability? In an ACL position paper, my colleagues and I argue that in the quest to reach difficult benchmarks, evaluations have lost sight of the real targets: those sophisticated downstream applications. To borrow a line from the paper, the NLP researchers have been training to become professional sprinters by “glancing around the gym and adopting any exercises that look hard.”

The answer, in part, is for NLP developers to follow this path:

But our argument is more basic: however systems are implemented, if they need to have faithful world models, then evaluations should systematically test whether they have faithful world models.

DarkCyber’s view is that NLP like other building blocks of content analysis and access systems have some characteristics which cause intra-squad similarities; that is, the players are more similar than even they understand:

  1. Reliance on methods widely taught in universities. Who wants to go in a new direction, fail, and, therefore, be perceived as a dead ender?
  2. Competing with one’s team mates, peers, and fellow travelers is comfortable. Who wants to try and explain why NLP from A is better than NLP from B when the results are more of the same?
  3. NLP like other content functions is positioned as the big solution to tough content challenges. The reality is that language is slippery and often less fancy methods deliver good enough results. Who wants to admit that a particular approach is “good enough.” It is better to get out the pink wrapping paper and swath the procedures in colorful garb.

NLP can be and is useful in many situations. The problem is that making sense of human utterances remains a difficult challenge. DarkCyber is suspicious of appeals emitted by the Epstein-funded MIT entity.

Jargon is jargon. NLP is one of those disciplines which works overtime to deliver on promises that have been made for many years. Does NLP pay off? This is like MIT asking, “Epstein who?”

Stephen E Arnold, August 2, 2020

Natural Language Processing: Useful Papers Selected by an Informed Human

July 28, 2020

Nope, no artificial intelligence involved in this curated list of papers from a recent natural language conference. Ten papers are available with a mouse click. Quick takeaway: Adversarial methods seem to be a hot ticket. Navigate to “The Ten Must Read NLP/NLU Papers from the ICLR 2020 Conference.” Useful editorial effort and a clear, adult presentation of the bibliographic information. Kudos to jakubczakon.

Stephen E Arnold, July 27, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta