Monopolies Know Best: The Amazon Method Involves a Better Status Page

December 13, 2021

Here’s the fix for the Amazon AWS outage: An updated status page. “Amazon Web Services Explains Outage and Will Make It Easier to Track Future Ones” reports:

A major Amazon Web Services outage on Tuesday started after network devices got overloaded, the company said on Friday [December 10, 2021] .  Amazon ran into issues updating the public and taking support inquiries, and now will revamp those systems.

Several questions arise:

  1. How are those two pizza technical methods working out?
  2. What about automatic regional load balancing and redundancy?
  3. What is up with replicating the mainframe single point of failure in a cloudy world?

Neither the write up nor Amazon have answers. I have a thought, however. Monopolies see efficiency arising from:

  1. Streamlining by shifting human intermediated work to smart software which sort of works until it does not.
  2. Talking about technical prowess via marketing centric content and letting the engineering sort of muddle along until it eventually, if ever, catches up to the Mad Ave prose, PowerPoints, and rah rah speeches at bespoke conferences
  3. Cutting costs where one can; for example, robust network devices and infrastructure.

The AT&T approach is a goner, but it seems to be back, just in the form of Baby Bell thinking applied to an online bookstore which dabbles in national security systems and methods, selling third party products with mysterious origins, and promoting audio books to those who have cancelled the service due to endless email promotions.

Yep, outstanding, just from Wall Street’s point of view. From my vantage point, another sign of deep seated issues. What outfit is up next? Google, Microsoft, or some back office provider of which most humans have never heard?

The new and improved approach to an AT&T type business is just juicy with wonderfulness. Two pizzas. Yummy.

Stephen E Arnold, December 13, 2021

Semantics and the Web: A Snort of Pisco?

November 16, 2021

I read a transcript for the video called “Semantics and the Web: An Awkward History.” I have done a little work in the semantic space, including a stint as an advisor to a couple of outfits. I signed confidentiality agreements with the firms and even though both have entered the well-known Content Processing Cemetery, I won’t name these outfits. However, I thought of the ghosts of these companies as I worked my way through the transcript. I don’t think I will have nightmares, but my hunch is that investors in these failed outfits may have bad dreams. A couple may experience post traumatic stress. Hey, I am just suggesting people read the document, not go bonkers over its implications in our thumbtyping world.

I want to highlight a handful of gems I identified in the write up. If I get involved in another world-saving semantic project, I will want to have these in my treasure chest.

First, I noted this statement:

“Generic coding”, later known as markup, first emerged in the late 1960s, when William Tunnicliffe, Stanley Rice, and Norman Scharpf got the ideas going at the Graphics Communication Association, the GCA.  Goldfarb’s implementations at IBM, with his colleagues Edward Mosher and Raymond Lorie, the G, M, and L, made him the point person for these conversations.

What’s not mentioned is that some in the US government became quite enthusiastic. Imagine the benefit of putting tags in text and providing electronic copies of documents. Much better than loose-leaf notebooks. I wish I have a penny for every time I heard this statement. How does the government produce documents today? The only technology not in wide use is hot metal type. It’s been — what? — a half century?

Second, I circled this passage:

SGML included a sample vocabulary, built on a model from the earliest days of GML. The American Association of Publishers and others used it regularly.

Indeed wonderful. The phrase “slicing and dicing” captured the essence of SGML. Why have human editors? Use SGML. Extract chunks. Presto! A new book. That worked really well but for one drawback: The proliferation of wild and crazy “books” were tough to sell. Experts in SGML were and remain a rare breed of cat. There were SGML ecosystems but adding smarts to content was and remains a work in progress. Yes, I am thinking of Snorkel too.

Third, I like this observation too:

Dumpsters are available in a variety of sizes and styles.  To be honest, though, these have always been available.  Demolition of old projects, waste, and disasters are common and frequent parts of computing.

The Web as well as social media are dumpsters. Let’s toss in TikTok type videos too. I think meta meta tags can burn in our cherry red garbage container. Why not?

What do these observations have to do with “semantics”?

  1. Move from SGML to XML. Much better. Allow XML to run some functions. Yes, great idea.
  2. Create a way to allow content objects to be anywhere. Just pull them together. Was this the precursor to micro services?
  3. One major consequence of tagging or the lack of it or just really lousy tagging, marking up, and relying of software allegedly doing the heavy lifting is an active demand for a way to “make sense” of content. The problem is that an increasing amount of content is non textual. Ooops.

What’s the fix? The semantic Web revivified? The use of pre-structured, by golly, correct mark up editors? A law that says students must learn how to mark up and tag? (Problem: Schools don’t teach math and logic anymore. Oh, well, there’s an online course for those who don’t understand consistency and rules.)

The write up makes clear there are numerous opportunities for innovation. And the non-textual information. Academics have some interesting ideas. Why not go SAILing or revisit the world of semantic search?

Stephen E Arnold, November 16, 2021

Facebook Targets Paginas Amarillas: Never Enough, Zuck?

October 14, 2021

Facebook is working to make one of its properties more profitable. The Next Web reports, “WhatsApp Reinvents the ‘Yellow Pages’ and Proves there Are No New Ideas.” The company will test out a new business directory feature in San Paulo, Brazil, where local users will be able to search for “businesses nearby” through the app. Writer Ivan Mehta reports:

“For years, Facebook and Instagram have been trying to connect you to businesses and make your shop through their platforms. While the WhatsApp Business app has been around, you couldn’t really search for businesses using the app, unless you’ve interacted with them previously. WhatsApp already offers payment services in Brazil. So it makes sense for it to provide discovery services for local businesses, so you can shop for goods in person, and pay through the platform. The chat app doesn’t have any ads, unlike Facebook and Instagram, so business interactions and transactions are one of the biggest ways for Facebook to earn some moolah out of it. In June, the company integrated its Shops feature in WhatsApp. So, we can expect more business-facing features in near future.”

India and Indonesia are likely next on the list for the project, according to Facebook’s Matt Idema. We are assured the company will track neither users’ locations nor the businesses they search for. Have we heard similar promises before?

Cynthia Murrell, October 14, 2021

Ex-Googlers Work On Biased NLP Solutions

October 6, 2021

Google is on top of the world when it comes to money and technology. Google is the world’s most used search engine, its Chrome Web browser is used by two-thirds of users, and about 29% of 2021 digital advertising were Google ads. Fast Company asks and investigates important questions about Google’s product quality in: “It’s Not Just You. Google Search Really Is Getting Worse.”

Over 80% of Alphabet Inc.’s revenue, Google’s parent company, comes from advertising revenue and about 85% of the world’s search engine traffic feeds through Google. Google controls a lot of users’ screen time. The search engine’s quality results have been studied and researchers have learned that very few users scroll past the “fold” (all of the available content on a screen). Advertising space at the top of search results is incredibly valuable. It also means that users are forced to scroll further and further to reach non-paid results.

Alphabet Inc. has another revenue generating platform, YouTube. A huge portion of videos include multiple ads. Users can avoid ads by paying for a premium subscription, but very few do.

Google does want to improve its search quality. Currently a lot of information from queries are distributed across multiple Web sites. Google wants to condense everything:

“Google is working on bringing this information together. The search engine now uses sophisticated “natural language processing” software called BERT, developed in 2018, that tries to identify the intention behind a search, rather than simply searching strings of text. AskJeeves tried something similar in 1997, but the technology is now more advanced.

BERT will soon be succeeded by MUM (Multitask Unified Model), which tries to go a step further and understand the context of a search and provide more refined answers. Google claims MUM may be 1,000 times more powerful than BERT, and be able to provide the kind of advice a human expert might for questions without a direct answer.”

Google controls a huge portion of the Internet and how users utilize it. Alphabet Inc. is here to stay for a long time, but there are alternatives such as Bing, DuckDuckGo, Ecosia, and Tor browsers. Google, however, will one day fade. Sears Roebuck, Blockbuster, Kmart, cassettes, etc. were al household names, until they became obsolete.

Whitney Grace, October 6, 2021

Data Federation: Sure, Works Perfectly

June 1, 2021

How easy is it to snag a dozen sets of data, normalize them, parse them, and extract useful index terms, assign classifications, and other useful hooks? “Automated Data Wrangling” provides an answer sharply different from what marketers assert.

A former space explorer, now marooned on a beautiful dying world explains that the marketing assurances of dozens upon dozens of companies are baloney. Here’s a passage I noted:

Most public data is a mess. The knowledge required to clean it up exists. Cloud based computational infrastructure is pretty easily available and cost effective. But currently there seems to be a gap in the open source tooling. We can keep hacking away at it with custom rule-based processes informed by our modest domain expertise, and we’ll make progress, but as the leading researchers in the field point out, this doesn’t scale very well. If these kinds of powerful automated data wrangling tools are only really available for commercial purposes, I’m afraid that the current gap in data accessibility will not only persist, but grow over time. More commercial data producers and consumers will learn how to make use of them, and dedicate financial resources to doing so, knowing that they’ll be reap financial rewards. While folks working in the public interest trying to create universal public goods with public data and open source software will be left behind struggling with messy data forever.

Marketing is just easier than telling the truth about what’s needed in order to generate information which can be processed by a downstream procedure.

Stephen E Arnold, June xx, 2021

More about Bert: Will TikTok Videos Be Next?

May 28, 2021

Google asserts its new AI model will deliver significant improvements. SEO Hacker discusses “Google MUM: New Search Technology.” We are told MUM, or Multi Unified Model, is like BERT but much more powerful. We learn:

“They are built on the same Transformer architecture, but MUM is 1000x more powerful than its predecessor. … Another difference between MUM and BERT is that MUM is trained across 75 languages – not just one language (usually English). This enables the search engine, through the use of MUM, to connect information from all around the world without going through language barriers. Additionally, Google mentioned that MUM is multimodal, so it understands and processes information from modalities such as text and images. They also brought up the possibility for MUM to expand to other modalities such as videos and audio files.”

For an example of how the new model will work, see either the SEO Hacker write-up or Google’s blog post on the subject. The illustration involves Mt. Fuji. Naturally, the Search Engine Optimization site ponders how the change might affect SEO. Writer Sean Si predicts MUM’s understanding of 75 languages means non-English content will find much wider audiences. The revised algorithm will also serve up more types of content, like podcasts and videos, alongside text-based resources. Both of those sound like positives, at least for searchers. Other ramifications on the field remain to be seen, but Si anticipates SEO pros will have to develop entirely new approaches. Of course, producing quality content relevant to one’s site should remain the top recommendation.

Cynthia Murrell, May 28, 2021

UCF Cracks Sarcasm: With a Crocodile Smile?

May 18, 2021

I read some big news from Big News. The story “Researchers Develop A.I. That Can Detect Sarcasm” explains that smart software has the ability to parse text so that a determination can be made about the degree of non-smarty writing can be detected. The article states:

The team taught the computer model to find patterns that often indicate sarcasm and combined that with teaching the program to correctly pick out cue words in sequences that were more likely to indicate sarcasm. They taught the model to do this by feeding it large data sets and then checked its accuracy.

Presumably the hand-crafting of the training set is able to keep pace with the language of those seeking customer support. I have commented about the brilliance and responsiveness of the customer support available from major companies; for example, Microsoft and Verizon. Improving upon the clarity of information available from these organizations is difficult for me to envision. The excellent handling of SolarWinds by Microsoft and the management acumen demonstrated by Verizon with regard to Yahoo chisels a benchmark in marketing effectiveness.

The write up adds:

The multi-head self-attention module aids in identifying crucial sarcastic cue-words from the input, and the recurrent units learn long-range dependencies between these cue-words to better classify the input text.

Mix in sentiment analysis, and the simplicity of the method is evident.

I noted this statement:

Sarcasm detection in online communications from social networking platforms is much more challenging.

It seems that one of the final frontiers of human utterance has been cross. Sarcasm has been cracked. As I write this I manifest a crocodile smile. The reason? The time and cost of maintaining the training set so it reflects what TikTok and Dread users “do” with language may be a sticking point. Then the rules must be updated in near real time, assuming that the data flows are related to crime, war fighting, or financial fraud.

A big crocodile? Yes, and a big smile. But research grants and graduate students are eager to contribute because… degree.

Stephen E Arnold, May 18, 2021

GitHub: Amusing Security Management

April 8, 2021

I got a kick out of “GitHub Investigating Crypto-Mining Campaign Abusing Its Server Infrastructure.” I am not sure if the write up is spot on, but it is entertaining to think about Microsoft’s security systems struggling to identify an unwanted service running in GitHub. The write up asserts:

Code-hosting service GitHub is actively investigating a series of attacks against its cloud infrastructure that allowed cybercriminals to implant and abuse the company’s servers for illicit crypto-mining operations…

In the wake of the SolarWinds’ and Exchange Server “missteps,” Microsoft has been making noises about the tough time it has dealing with bad actors. I think one MSFT big dog said there were 1,000 hackers attacking the company.

The main idea is that attackers allegedly mine cryptocurrency on GitHub’s own servers.

This is post SolarWinds and Exchange Server “missteps”, right?

What’s the problem with cyber security systems that monitoring real time threats and uncertified processes?

Oh, I forgot. These aggressively marketed cyber systems still don’t work it seems.

Stephen E Arnold, April 8, 2021

DarkCyber for January 12, 2021, Now Available

January 12, 2021

DarkCyber is a twice-a-month video news program about online, the Dark Web, and cyber crime. You can view the video on Beyond Search or at this YouTube link.

The program for January 12, 2021, includes a featured interview with Mark Massop, DataWalk’s vice president. DataWalk develops investigative software which leapfrogs such solutions as IBM’s i2 Analyst Notebook and Palantir Gotham. In the interview, Mr. Massop explains how DataWalk delivers analytic reports with two or three mouse clicks, federates or brings together information from multiple sources, and slashes training time from months to several days.

Other stories include DarkCyber’s report about the trickles of information about the SolarWinds’ “misstep.” US Federal agencies, large companies, and a wide range of other entities were compromised. DarkCyber points out that Microsoft’s revelation that bad actors were able to view the company’s source code underscores the ineffectiveness of existing cyber security solutions.

DarkCyber highlights remarkable advances in smart software’s ability to create highly accurate images from poor imagery. The focus of DarkCyber’s report is not on what AI can do to create faked images. DarkCyber provides information about how and where to determine if a fake image is indeed “real.”

The final story makes clear that flying drones can be an expensive hobby. One audacious drone pilot flew in restricted air zones in Philadelphia and posted the exploits on a social media platform. And the cost of this illegal activity. Not too much. Just $182,000. The good news is that the individual appears to have avoided one of the comfortable prisons available to authorities.

One quick point: DarkCyber accepts zero advertising and no sponsored content. Some have tried, but begging for dollars and getting involved in the questionable business of sponsored content is not for the DarkCyber team.

Finally, this program begins our third series of shows. We have removed DarkCyber from Vimeo because that company insisted that DarkCyber was a commercial enterprise. Stephen E Arnold retired in 2017, and he is now 77 years old and not too keen to rejoin the GenX and Millennials in endless Zoom meetings and what he calls “blatant MBA craziness.” (At least that’s what he told me.)

Kenny Toth, January 12, 2021

Ah, Chatbots. Unfortunately, Inevitable Because Who Wants to Support Customers?

December 2, 2020

Lest one think AI is here to make our lives easier, one should think again. Though the technology may bring new capabilities and insights, users must put in work and surmount frustration to get results. Bizcommunity.com discusses “The Unsuspected Stumbling Blocks of AI for Customer Experience.” Writer Mathew Conn specifically examines the use of chatbots here. He writes:

“While chatbots successfully enable one-to-one conversations with customers through automated interfaces and are a great way to deliver immediate responses, they are not right for any and all customer interactions. The first, and possibly most important failure of chatbots, is a direct result of the organization in question not identifying what customer interactions are right for enhancement with chatbots. … Because chatbots use open source libraries, most won’t be customized to the organization’s specific industry or customers. Pre-trained bots will be limited to their pre-programmed decision path and are limited by the designer or programmer’s understanding of customer behaviors and requests. While chatbots don’t reason, smarter bots can cope better with some language nuances; however, without human judgment, chatbot accuracy will always be limited. Pre-trained chatbots follow a structured conversation plan and can lose the flow fairly easily. With more access to customer history and data, smarter chatbots can ‘learn’ customer preferences. However, to keep context, chatbots need every possible response to every possible customer request.”

The more complex the interaction, the more likely customers will want to converse with a human. It can be useful to begin interactions with a chatbot then shift to a human worker, but a problem can occur when such a shift means changing platforms from a chat window to phone or email. If the company does not maintain consistency across all its channels, the customer must restart their explanation from the beginning. This does not make for a happy customer or, by extension, a good reputation for the business.

Chatbots are not the only AI function that is less of a panacea than vendors would like us to believe. Before investing in any AI solution, businesses should do their research and make certain they understand what they are getting, whether it will truly address their unique needs, and how to make the most of it.

Just cut costs and move on.

Cynthia Murrell, December 2, 2020

Next Page »

  • Archives

  • Recent Posts

  • Meta