System Glitches: A Glimpse of Our Future?

April 4, 2022

I read “Nearly All Businesses Hit by IT Downtime Last Year – Here’s What’s to Blame.” The write up reports:

More than three-quarters (75%) of businesses experienced downtime in 2021, up 25% compared to the previous year, new research has claimed. Cybersecurity firm Acronis polled more than 6,200 IT users and IT managers from small businesses and enterprises in 22 countries, finding that downtime stemmed from multiple sources, with system crashes (52%) being the most prevalent cause. Human error (42%) was also a major issue, followed by cyber attacks (36%) and insider attacks (20%).

Interesting. A cyber security company reports these data. The cyber security industry sector should know. Many of the smart systems have demonstrated that those systems are somewhat slow when it comes to safeguarding licensees.

What’s the cause of the issue?

There are “crashes.” But what’s a crash. Human error. Humans make mistakes and most of the software systems with which I am familiar are dumb: Blackmagic ATEM software which “forgets” that users drag and drop. Users don’t intuitively know to put an image one place and then put that image another so that the original image is summarily replaced. Windows Defender lights up when we test software from an outfit named Chris. Excel happily exports to PowerPoint but loses the format of the table when it is pasted. There are USB keys and Secure Digital cards which just stop working. Go figure. There are enterprise search systems which cannot display a document saved by a colleague before lunch. Where is it? Yeah, good question. In the indexing queue maybe? Oh, well, perhaps tomorrow the colleague will get the requested feedback?

My takeaway from the write up is that the wild and crazy, helter skelter approach to software and some hardware has created weaknesses, flaws, and dependencies no one knows about. When something goes south, the Easter egg hunt begins. A dead Android device elicits button pushing and the hope that the gizmo shows some signs of life. Mostly not in my experience.

Let’s assume the research is correct. The increase noted in the write up means that software and systems will continue to degrade. What’s the fix? Like many things — from making a government bureaucracy more effective to having an airline depart on time — seem headed on a downward path.

My take is that we are getting a glimpse of the future. Reality is very different from the perfectly functioning demo and the slick assertions in a PowerPoint deck.

Stephen E Arnold, April 4, 2022

Another Example of the Corrosive Function of Digital Information

February 18, 2022

In Praise of Search Tools” contains an interesting statement. Here it is:

the shaping-up of the book that Duncan describes as he charts the advent of modern search tools might also be seen as a pulling-apart of the book. The alphabetical table that is the index “breaks down a book into its constituents.” Its structure is entirely independent from the structure of the work, sacrificing the latter for the reader’s better convenience. The alphabetical order used by the indexer breaks texts up into so many word-sized bits, but the dismemberment at issue in the culture of indexing was sometimes literal, as when concordance-makers took scissors to the pages whose words they were regrouping. In a 1919 article on the making of a concordance to the poetry of William Wordsworth, a Cornell professor describes how the eight volumes of the Oxford edition were transmuted by his team into 210,944 paper slips: records of each appearance of each of the poet’s keywords.

Interesting and in line with my ASIS Eagleton Lecture given in the mid 1980s.

Stephen E Arnold, February 18, 2022

Bad Culture: Signals Are Tough to Ignore

January 18, 2022

I have worked in a number of interesting jobs and on fascinating projects for more than 50 years. I have some time perspective. My first “real” consulting job was an analysis of the content in a number of magazines for a large New York publishing outfit. The person who supervised my work was the individual who coined the phrase “Photomat. Where your photo matters.” My recollection is that he was interested in the results, and he was up front about what he wanted to learn from the investigation. As I recall, he said, “I want to know how to increase renewals.”

That made zero sense to me because text analysis in 1970 had little to do with the real world. Indexing Latin and cranking out concordances, yes. But subscription renewal? I was clueless, but I did the work and reported results which made this big wheel spin with joy. I liked the person and I liked the company. As it turned out, I did consulting work for one of the other senior managers for over 45 years.

This early experience is different from what is described in “Young People Are Leaving Tech Because of Bad Culture.” This article explains:

Talent and skills provider Mthree found 59% of people between the ages of 18 and 24 said the company culture in their tech-based role made them so uncomfortable they had quit or at least thought about quitting. When it came to those from under-represented groups in UK tech, these figures were higher, with 64% of female respondents, 67% of those from a mixed-race background and 68% of young people who are bisexual saying they had either left or considered leaving a role because of a company’s culture.

Quite a contrast. I had a positive first consulting experience.

I think the issue with work today is broader than the technology sector in which I have worked for many years. Earlier in my career, I worked for a blue chip consulting firm and I have done projects for other big time blue chip outfits. I have avoided, for the most part, the mid tier operations. These struck me (perhaps incorrectly) as lacking the rigor associated with the blue chip outfits.

McKinsey’s Leader Wants to Change the Firm” includes this statement:

McKinsey’s reputation also faced challenges. In February, the company said it had reached a $573 million settlement over its previous work advising OxyContin maker Purdue Pharma LP and other drug manufacturers to aggressively market opioid painkillers. McKinsey admitted no wrongdoing. The firm also drew scrutiny in recent years for its work with some foreign governments, including Saudi Arabia. [Emphasis added]

The idea that the downstream consequences of casual algorithmic tuning at Facebook-type companies and logic-driven business advice linked to drug addiction suggests a deeper issue. Is it the culture of the go go, Wolf of Wall Street thought processes or are these specific, observable symptoms of an ethical cancer? Are individuals operating flash mob snatch and run robberies unhappy with job opportunities or are these behaviors a loss of a strong social fabric?

Either way, people are making decisions which appear to be harmful to others. The impact of these types of behaviors are likely to accelerate the loss of technology and behavioral influence the US possessed when I began my work career half a century ago.

Change has been a long time coming, and I don’t think “speeding up decision making, rethinking performance evaluations, and avoiding future scandals” is going to retain tech talent or address certain behaviors that are genuinely harmful to others.

Stephen E Arnold, January 18, 2022

The Fast Descent to Mediocrity Revealed

January 12, 2022

I read “Google’s Director of Engineering Hiring Test.” I love these inside looks at what Google thinks is important to the company’s success. I email several questions from the decades of GLAT to a Fancy Dan financial whiz. He was unable to make sense of any of the wonky questions. Since the whiz kid and Google are wallowing in financial oceans filled with molecules of money, I am not sure there is much value in certain types of smart filters.

Tucked into the questions and answers, however, is considerable insight into what the company thinks is funny like the GLAT or why a firm is accelerating its ski slope ride to meh. Here’s the statement that caught my attention:

Hiring people that know things that you don’t know helps more than hiring people who merely know what everybody knows.

My hunch is that the issues at Google — for instance, the new phone that doesn’t do phone stuff — is an example of making assumptions about what’s right. Apply this to super duper automated content indexing for machine learning training sets and what happens? Perhaps you get a variation of the phones that don’t do phone stuff? Smart software may end up learning what it already knows. Great for cost reduction but not so great for finding one’s way through a snow storm near Tahoe.

Stephen E Arnold, January 12, 2022

Search Quality: 2022 Style

January 11, 2022

I read the interesting “Is Google Search Deteriorating? Measuring Google’s Search Quality in 2022?” The approach is different from what was the approach used at the commercial database outfits for which I worked decades ago. We knew what our editorial policy was; that is, we could tell a person exactly what was indexed, how it was indexed, how classification codes were assigned, and what the field codes were for each item in our database. (A field code for those who have never encountered the term means an index term which disambiguates a computer terminal from an airport terminal.) When we tested a search engine — for example, a touch of the DataStar systems — we could determine the precision and recall of the result set. This was math, not an opinion. Yep, we had automatic indexing routines, but we relied primarily on human editors and subject matter experts with a consultant or two tossed in for good measure. (A tip of the Silent 700 paper feed to you, Betty Eddison.)

The cited article takes a different approach. It is mostly subjective. The results of the analysis is that Google is better than Bing. Here’s a key passage:

So Google does outperform Bing (the difference is statistically significant)…

Okay, statistics.

Several observations:

First, I am not sure either Bing’s search team or Google’s search team knows what is in the indexes at any point in time. I assume someone could look, but I know from first hand experience that the young wizards are not interested in the scope of an index. The interest is reducing the load or computational cost of indexing new content objects and updating certain content objects, discarding content domains which don’t pay for their computational costs, and similar MBA inspired engineering efficiencies. Nobody gets a bonus for knowing what’s indexed, when, why, and whether that index set is comprehensive. How deep does Google go unloved Web sites like the Railway Retirement Board?

Second, without time benchmarks and hard data about precision and recall, the subjective approach to evaluating search results misses the point of Bing and Google. These are systems which must generate revenue. Bing has been late to the party, but the Redmond security champs are giving ad sales the old college drop out try.  (A tip of the hat to MSFT’s eternal freshman, Bill Gates, too.) The results which are relevant are the ones that by some algorithmic cartwheels burn through the ad inventory. Money, not understanding user queries, supporting Boolean logic, including date and time information about the content object and when it was last indexed, are irrelevant. In one meeting, I can honestly say no one knew what I was talking about when I mentioned “time” index points.

Third, there are useful search engines which should be used as yardsticks against which to measure the Google and the smaller pretender, Bing. Why not include Swisscows.ch or Yandex.ru or Baidu.com or any of the other seven or eight Web centric and no charge systems. I suppose one could toss in the Google killer Neeva and a handful of metasearch systems. Yep, that’s work. Set up standard queries. Capture results. Analyze those results. Calculate result overlap. Get subject matter experts to evaluate the results. Do the queries at different points in time for a period of three months or more, etc., etc. This is probably not going to happen.

Fourth, what has been filtered. Those stop word lists are fascinating and they make it very difficult to find certain information. With traditional libraries struggling for survival, where is that verifiable research process going to lead? Yep, ad centric, free search systems. It might be better to just guess at some answers.

Net net: Web search is not very good. It never has been. For fee databases are usually an afterthought if thought of at all. It is remarkable how many people pass themselves off as open source intelligence experts, expert online researchers, or digital natives able to find “anything” using their mobile phone.

Folks, most people are living in a cloud of unknowing. Search results shape understanding. A failure of search just means that users have zero chance to figure out if a result from a free Web query is much more than Madison Avenue, propaganda, crooked card dealing, or some other content injection goal.

That’s what one gets when the lowest cost methods to generate the highest ad revenue are conflated with information retrieval. But, hey, you can order a pizza easily.

Stephen E Arnold, January 11, 2022

Datasets: An Analysis Which Tap Dances around Some Consequences

December 22, 2021

I read “3 Big Problems with Datasets in AI and Machine Learning.” The arguments presented support the SAIL, Snorkel, and Google type approach to building datasets. I have addressed some of my thoughts about configuring once and letting fancy math do the heavy lifting going forward. This is probably not the intended purpose of the Venture Beat write up. My hunch is that pointing out other people’s problems frames the SAIL, Snorkel, and Google type approaches. No one asks, “What happens if the SAIL, Snorkel, and Google type approaches don’t work or have some interesting downstream consequences?” Why bother?

Here are the problems as presented by the cited article:

  1. The Training Dilemma. The write up says: “History is filled with examples of the consequences of deploying models trained using flawed datasets.” That’s correct. The challenge is that creating and validating a training set for a discipline, topic, or “space” is that new content arrives using new lingo and even metaphors instead of words like “rock.” Building a dataset and doing what informed people from the early days of Autonomy’s neuro-linguistic method know is that no one wants to spend money, time, and computing resources in endless Sisyphean work. That rock keeps rolling back down the hill. This is a deal breaker, so considerable efforts has been expended figuring out how to cut corners, use good enough data, set loose shoes thresholds, and rely on normalization to smooth out the acne scars. Thus, we are in an era of using what’s available. Make it work or become a content creator on TikTok.
  2. Issues with Labeling. I don’t like it when the word “indexing” is replaced with works like labels, metatags, hashtags, and semantic sign posts. Give me a break. Automatic indexing is more consistent than human indexers who get tired and fall back on a quiver of terms because who wants to work too hard at a boring job for many. But the automatic systems are in the same “good enough” basket as smart training data set creation. The problem is words and humans. Software is clueless when it comes to snide remarks, cynicism, certain types of fake news and bogus research reports in peer reviewed journals, etc. Indexing using esoteric words means the Average Joe and Janet can’t find the content. Indexing with everyday words means that search results work great for pizza near me but no so well for beatles diet when I want food insects eat, not what kept George thin. The write up says: “Still other methods aim to replace real-world data with partially or entirely synthetic data — although the jury’s out on whether models trained on synthetic data can match the accuracy of their real-world-data counterparts.” Yep, let’s make up stuff.
  3. A Benchmarking Problem. The write up asserts: “SOTA benchmarking [also] does not encourage scientists to develop a nuanced understanding of the concrete challenges presented by their task in the real world, and instead can encourage tunnel vision on increasing scores. The requirement to achieve SOTA constrains the creation of novel algorithms or algorithms which can solve real-world problems.” Got that. My view is that validating data is a bridge too far for anyone except a graduate student working for a professor with grant money. But why benchmark when one can go snorkeling? The reality is that datasets are in most cases flawed but no one knows how flawed. Just use them and let the results light the path forward. Cheap and sounds good when couched in jargon.

What’s the fix? The fix is what I call the SAIL, Snorkel, and Google type solution. (Yep, Facebook digs in this sandbox too.)

My take is easily expressed just not popular. Too bad.

  1. Do the work to create and validate a training set. Rely on subject matter experts to check outputs and when the outputs drift, hit the brakes, and recalibrate and retrain.
  2. Admit that outputs are likely to be incomplete, misleading, or just plain wrong. Knock of the good enough approach to information.
  3. Return to methods which require thresholds to be be validated by user feedback and output validity. Letting cheap and fast methods decide which secondary school teacher gets fired strikes me as not too helpful.
  4. Make sure analyses of solutions don’t functions as advertisements for the world’s largest online ad outfit.

Stephen E Arnold, December 22, 2021

What Is Better Than One Logic? Two Logics?

December 22, 2021

Search, database, intelligence, data management and analytics firm MarkLogic continues to evolve and grow. Business Wire reveals, “MarkLogic Acquires Leading Metadata Management Provider Smartlogic.” Good choice—we have found Smartlogic to be innovative, reliable, and responsive. We expect MarkLogic will be able to preserve these characteristics, considering Smartlogic’s top brass will be sticking around. The press release tells us:

“As part of the transaction, Smartlogic’s founder and Chief Executive Officer, Jeremy Bentley, as well as other members of the senior management team, will join the MarkLogic executive team. Financial terms of the transaction were not disclosed. Founded in 2006, Smartlogic has deciphered, filtered, and connected data for many of the world’s largest organizations to help solve their complex data problems. Global organizations in the energy, healthcare, life sciences, financial services, government and intelligence, media and publishing, and high-tech manufacturing industries rely on Smartlogic’s metadata and AI platform every day to enrich enterprise information with context and meaning, as well as extract critical facts, entities, and relationships to power their businesses. For the past four years, Smartlogic has been recognized as a leader by Gartner’s Magic Quadrant for Metadata Management Solutions and by Info-Tech as the preeminent leader of the Data Quadrant for Metadata Management (May 2021).”

Based in San Carlos, California, MarkLogic was founded in 2001 and gained steam in 2012 when it picked up former Oracle database division leader Gary Bloom. Smartlogic is headquartered in San Jose, less than 30 miles away. Perhaps MarkLogic’s XML with taxonomy management will triumph in more markets and bring the Oracle outfit to its knees? Perhaps index term management is the killer app?

Cynthia Murrell, December 22, 2021

SEO for 2022: Why Not Buy Google Ads and Skip the Baloney

December 17, 2021

There is one game in the US for search. Yeah, I know DuckDuckGo is wonderful. There’s even Bing. And you can still navigate to AOL.com and enter a search. Same for Dogpile.com. I am not going to repeat what I have been saying for decades. Primary search does the crawl, the indexing, the query processing, and the results serving. There are a few outfits in this business, but none is known; for example, Swisscows.ch, Yandex.ru, Baidu.com, and a few others.

This article “Why Your Website Must Have an SEO Strategy for 2022” strikes me as pretty darned crazy. If someone repeats a process over and over again and fails, what’s that say about the approach or the person? In my view, crazy seems close to the mark.

The write up says:

The aim of SEO is simple: high SEO ranking brings more traffic and more revenue.

More accurately, SEO produces work for search engine optimization experts. Many of the certified outfits are Google partners. When a temporary boost expires, these professionals will sell Google ads.

There you go.

Why not just buy Google ads and forget the futility of trying to outwit the Google. In case you haven’t notice, the Google along with Facebook are in a prime position to determine who and what gets eyeballs.

Buy ads. Simpler, faster, and cheaper. People with degrees in art history and business communications are no match for Googzilla’s decades of “refinement”.

Stephen E Arnold, December 17, 2021

Microsoft Search: Still Trying after All These Years

November 2, 2021

That was “FAST,” wasn’t it? You lived through LiveSearch, right? Jellyfish? Powerset? Outlook Search in its assorted flavors like Life Savers? I could go on, but I am quite certain no one cares.

Nevertheless,

Bing’s new feature may possibly prompt some workers to switch to the search-engine underdog. TechRadar Pro reports the development in its brief write-up, “One of Microsoft’s Most-Hated Products Might Actually Be Getting a Useful Upgrade.” Writer Mike Moore reveals:

“The tech giant is boosting one of its less-celebrated products to give enterprise users an easier way to search online. The update means that enterprise users will now get their historical searches as suggestions in the autosuggest pane on Bing and Microsoft Search in Bing, according to the official Microsoft 365 roadmap entry. … The new update should mean that enterprise users looking to quickly find files that they’ve searched for or opened before will no longer need to manually trawl through endless files and folders in search of the elusive location. The update is still currently in development, but Microsoft will doubtless be keen to get it out soon and help boost Bing engagement. The feature is set to be available to Microsoft Search users across the globe via the company’s general availability route, meaning web, desktop and mobile users will all be able to utilize it upon release.”

Moore notes Microsoft’s tenacity in continuing to support Bing despite Google’s astounding market share lead. He wonders whether the company may have lost some enthusiasm recently, though, when it was revealed that the most searched-for term on Bing is “Google.” A tad embarrassing, perhaps. Does Microsoft suppose its file-finding feature will turn the tide? Unlikely, but some of our readers may find the tool useful, nonetheless.

What’s next for Microsoft search? Perhaps broader and deeper indexing of US government Web sites for a starter?

Cynthia Murrell, November 2, 2021

Digital Shadows Announces Social Monitor

October 19, 2021

Deep fakes? They are here and Digital Shadows has a service for those who live in fear of digital manipulation.

Bad actors often pose as corporations’ executives and other key personnel on social media. Sometimes the goal is to damage the target’s reputation, but more often it is to enact a phishing scheme. Either way, companies must put a stop to these efforts as soon as possible. We learn there is a new tool for that from, “Digital Shadows Launches SocialMonitor—a Key Defense Against Executive Impersonation on Social Media” posted at PR Newswire. The press release tells us:

“All social media platforms will take down fake accounts once alerted but keeping on top of the constant creation of fake profiles is a challenge. SocialMonitor overcomes these challenges by adding targeted human collection to SearchLight’s existing broad automated coverage. Digital Shadows customers simply need to register key staff members within the SearchLight portal. Thereafter, users will receive ‘Impersonating Employee Profile’ alerts which will be pre-vetted by its analyst team. This ensures that organizations only receive relevant notifications of concern. Russell Bentley at Digital Shadows comments: ‘Fake profiles on social media are rife and frequently used to spread disinformation or redirect users to scams or malware. Social media providers have taken steps such as providing a verified profile checkmark and removing fake accounts. However, there is often too long a window of opportunity before action can be taken. SocialMonitor provides organizations with a proactive defense so that offending profiles can be taken down quickly, protecting their customers and corporate reputation.’”

Note this is yet another consumer-facing app from Digital Shadows, the firm that appears to be leading the Dark Web indexing field. Curious readers can click here to learn more about SocialMonitor. Digital Shadows offers a suite of products to protect its clients from assorted cyber threats. Based in San Francisco, the company was founded in 2011.

Cynthia Murrell October 19, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta