Search: Contentious and Increasingly Horrible

May 25, 2020

I dropped enterprise search, commercial search, and vertical search to the bottom of my “Favorite Topics” list years ago.

Why?

The individuals popping up and off at conferences were disconnected from the realities of looking for information under stressful circumstances.

image

Hey, big rocks, how did you move from that quarry kilometers away and get yourselves smoothed down? Just like modern online search systems, you won’t get an answer. Finding information relevant to a query is as difficult as getting megalithic stones to become Chatty Kathies.

The thumb typing crowd, some are now in their mid forties, ASSUME that search has to think for the stupid user.

The techniques range from smart software which skews results in what are to an experienced researcher stupid ways. For those search experts concerned with making their information or their name appear number one on a results list, good search was anything that produced a top spot in a result list even if that result was stupid, irrelevant, or shameless ego jockeying. Then there are the chipper, super confident experts who emerged from an educational system which awarded those who showed up and sort of behaved a blue ribbon. Yep, everything that group does is just wonderful. Yeah, right.

You can see the consequences of two forces colliding when you read Science Magazine’s “They Redesigned PubMed, a Beloved Website. It Hasn’t Gone Over Well.”

You can work through the examples in the source article. The pain points range from appearance to search functionality.

Why did this happen?

The change is a result of people who do not have the experience of performing search under stressful conditions. No, I don’t mean locating the Cuba Libre restaurant in Washington, DC, on a Google Map. I mean looking up technical information to complete a lab test, perform a diagnosis, locate a procedure, or some similar action. There is a pandemic going on, isn’t there?

The complaints indicate that the “new” PubMed is not perceived as a home run.

Go read the original.

I want to offer several observations:

  1. Those who do research with intent need predictability; that is, when a Boolean query is entered, the results should reflect that logic. Modern systems think Boolean is stupid. There you go, a value judgment from those with “Also Participated” ribbons in high school.
  2. Interfaces should allow the user to select an approach. There are some users who like a blinking dot or a question mark. Enter the commands and get a text output. Others like the Endeca style training wheels, although I doubt if any of the modern “helper” interfaces know what Endeca offered. Other may want some other type of interface like a PhD approach; that is, push here, dummy. The point is: Why not allow the user to select the interface?
  3. Change is introduced for dark purposes. Catalina has many points of friction so that Apple can extend its span of control. Annoying? Sure is. Why doesn’t Apple tell the truth about these friction points? What? Tell the truth, are you crazy. Apple, like Facebook and Google, are doing what they can to protect their hegemony, and the user is the victim. Tough. The same logic applies to PubMed. Dollars to donuts there is a “reason” for the change, and it may be due to whimsy, money, or the need to demonstrate the team is actually doing something instead of just having meetings with contractors.

Net net: Search, as I wrote for Barbara Quint in the now departed magazine Searcher, search is dead. Each day the hope for a better, more appropriate way to locate online information becomes lost in the mists of time. Getting relevant information from PubMed or any modern systems is like trying to get the stone of Ollantaytambo to explain how the rocks moved eons ago.

Finding information today is more difficult than at any other time in my professional career. That’s a big problem.

Stephen E Arnold, May 24, 2020

The Bulldozer and Bray: Amazon and Its People Policies in Action

May 4, 2020

I read “Bye, Amazon.” The author is Tim Bray. Some may remember him as one of the spark plugs of Open Text. He did some nifty visualization work. He did the Google thing until 2014. From 2014 until a couple of days ago he worked at Amazon, the Bezos bulldozer, the online bookstore, and all-around economic engine of Covid America.

The write up states:

I quit in dismay at Amazon firing whistleblowers who were making noise about warehouse employees frightened of Covid-19.

When Amazon terminated with prejudice the Amazonians protesting.

Mr. Bray’s reaction was

Snap!

Mr. Bray was upset, went through Amazon channels, and resigned.

He states about the warehouse worker action:

It’s not just workers who are upset. Here are Attorneys-general from 14 states speaking out. Here’s the New York State Attorney-general with more detailed complaints. Here’s Amazon losing in French courts, twice.

On the other hand, he points out:

Amazon Web Services (the “Cloud Computing” arm of the company), where I worked, is a different story. It treats its workers humanely, strives for work/life balance, struggles to move the diversity needle (and mostly fails, but so does everyone else), and is by and large an ethical organization. I genuinely admire its leadership.

In his penultimate paragraph he offers:

At the end of the day, it’s all about power balances. The warehouse workers are weak and getting weaker, what with mass unemployment and (in the US) job-linked health insurance. So they’re gonna get treated like crap, because capitalism. Any plausible solution has to start with increasing their collective strength.

Several observations:

  • Mr. Bray has a moral compass. DarkCyber finds that of value.
  • Amazon’s “power” has been largely unchecked since the mid 1990s, and only now are actions building like storm clouds on the horizon.
  • Mr. Bray was able to continue working for the Google but he could not continue working at Amazon. That’s interesting in itself.

Net net: Will Amazon take steps to deal with what seems to be the Tim Bray situation? Do Prime customers get orders delivered on time? Not if warehouse employees put sand in the Bezos bulldozer’s differential.

Stephen E Arnold, May 4, 2020

Google: The Laser That Threatened James Bond Creeps Closer to the Private Parts of the GOOG

April 23, 2020

Update: I omitted the link to the actual Googler blog post. Too excited thinking about “integrity.” My bad.

Goldfinger was an interesting film. In 1965, lasers were advanced. Some thought they were death rays. The Hollywood people, sunning around the pool with Technicolor drinks, thought the laser was the ideal way to burn James Bond’s private parts. Goldfinger was the bad actor. Now Google’s integrity weapon may be threatening Alphabet’s private parts. Odd job indeed.

image

The laser posed a risk to the fictional James Bond’s private parts. The Google integrity verification is a similar risk with one difference: Googlers are steering the destructive beam of actual data toward Alphabet’s secret places.

Flash forward to 2020, “Google to Require All Advertisers to Pass Identity Verification Process.” The word “all” is probably not warranted, but it sounds good. Talking heads enjoy glittering generalities and categorical affirmatives.

Nevertheless, the news story, if accurate, reveals some interesting quasi-factoids. Here’s one example:

Google began requiring political advertisers wanting to run election ads on its platform to verify their identity back in 2018. Now, that program is being extended to all advertisers, the company wrote in a blog post this morning from John Canfield, its director of product management for ads integrity.  The change will allow consumers to see who’s running an ad and which country they’re located in when they click “Why this ad?” on a placement.

Advertisers have to “prove” something other than having a mechanism to put funds into a Google advertising account. Second, Google has a job description which includes these words: “Management” and “integrity.” Plus, the information will not help Google. Nope, the winners in knowing who allegedly buys ads is “consumers.”

Google’s integrity person allegedly said:

“This change will make it easier for people to understand who the advertiser is behind the ads they see from Google and help them make more informed decisions when using our advertising Controls,” John Canfield, Google’s director of product management for ads integrity, said in the post. “It will also help support the health of the digital advertising ecosystem by detecting bad actors and limiting their attempts to misrepresent themselves.”

How does one become verified by Google’s integrity people?

Organizations are required to submit personal legal information (like a W9 or IRS document showing the organization’s name, address and employer identification number). An individual from the organization also needs to provide legal identification on the organization’s behalf. Individuals have to show government-issued photo ID like a passport or ID card. Google said it previously had collected basic information about the advertiser but didn’t require documentation to verify.

How effective are Google’s efforts to filter, screen, and verify? We know that human traffickers and others in this line of business have infiltrated videos on YouTube. We know that one can run a query for “Photoshop crakz”:

image

Apparently Google’s system cannot block listings for stolen commercial software. In fact, the listing for this illegal offering was updated three days ago. DarkCyber knows that some legitimate sites’ content has not been updated for longer periods of time. Notice how Google’s smart autocorrect changed “crakz” into “cracked.” Helpful smart software. Why does Google display the result? Why doesn’t Adobe email Google’s search wizards to have these links with illegal intent filtered? One reason may be that Adobe has emailed Google customer support and is, like many others with questions for the Google, waiting for a response from an informed Googler?

Read more

Google, Ad Transparency, and Query Relaxation: Should Advertisers Care? Probably

April 20, 2020

You need information about Banjo, a low profile outfit in Utah. Navigate to Google and enter the query Banjo law enforcement. No quotes for this query. Banjo has a Web site, and the phrase law enforcement is reasonably common and specific. (It is what is known as a bound phrase like White House or stock market; that is, the two words go together in US English.)

Here’s what the system displayed to me on April 20, 2020, at 0918 am US Eastern time:

image

The search results are okay. The ads do not match the query or the user’s intent: Law enforcement is not even close to a $1,000 musical instrument in a retail store.

Notice that the first result is to a Salt Lake Tribune article in March 2020 about Banjo’s allegedly “massive surveillance system.” The second result is from the same newspaper which reports a few days later that the Salt Lake City police won’t share data with Banjo. So far so good. Google is delivering timely, relevant results.

But look at the ads. The query Banjo law enforcement displays to a person wanting information about a policeware company the following for fee, pay to be seen ads in front of a buyer with an interest in Banjo:

image

These advertisers are betting money that Google can get them relevant clicks when a person search for a banjo. Maybe? But when someone searches for the policeware company Banjo, the advertiser is going to be “surprised.” Do advertisers like surprises?

Here are the advertisers whose for fee ads for people interested in law enforcement software (policeware) had displayed in front of a Google user with a vanishingly low probability of purchasing a stringed instrument whilst researching a specialist software vendor selling almost exclusively to police and screened quangos (quasi non governmental organizations):

  • Banjo Ben Clark
  • Deering Banjo Company
  • Banjo.com (note that our Banjo is Banjo.co)
  • Banjo Studio
  • Instrument Alley
  • Sweetwater
  • Guitar Center

These companies paid for ads as a result of query relaxation. Google’s system does not differentiate the Banjo policeware outfit from the music products.

image

Are there parallels between games in which a person can win money by guessing which cup hides the ball? These games of chance are often confidence operations. In this context confidence means trickery, not trust.

Why? There are url distinctions; that is, Banjo.co versus Banjo.com; there are disambiguation clues in Banjo.co’s Web page; there is the metadata itself with the keyword surveillance a likely index term.

Read more

WFH WTF: A Reality Check for Newbies

March 30, 2020

On Sunday, my son who provides specialized services to the US government and I were talking about WFH or work from home. WFH is now the principal way many people earn money. My son asked me, “When did you start working from home?” He should have remembered, since he was a much younger version of his present technology consulting self.

The year was 1991 (nearly three decades, 29 years to be exact and I am now 76), and I had just avoided corporate RIFFing after an investment bank purchased the firm at which I served as a reasonably high ranking officer. I pitched a multi year consulting deal with the new owners (money people), and I decided that commuting among my home in Kentucky, the Big Apple, and Plastic Fantastic (Silicon Valley) was not for me.

I figured I had a few years of guaranteed income so I would avoid running out an leasing an office. No one who hires me cares whether they ever see me. I do special work; I don’t go to meetings; I don’t hang out at the squash club or golf course; and I don’t want people around me every day. In Plastic Fantastic, I requested an inside office. The company moved the fax machine, photocopier, and supply cabinet to my outside office with lots of windows. I took the dark, stuffy, and inhospitable inside office. Perfect it was.

image

The seven deadly sins of working from home: [1] Waiting for the phone to ring or email to arrive, [2] eating, [3] laziness, [4] anger, [5]  envy, [6] philandering online or IRL, [7] greed. For the modern world I would add social media, online diversions, and fiddling with gizmos.

Why is this important for the WFH crowd?

The Internet is stuffed with articles like these:

The WFH articles I scanned — reading them was alternately amusing and painful — shared a common thread. None of them told the truth about WFH.

My son suggested, “Why not write up what’s really needed to make WFH pay off?” Okay, Erik, here’s the scoop. (By the way, he has implemented most of these behaviors as his technology consulting business has surged and his entrepreneurial ventures flourished. That’s what’s called “living proof” or it used to be before Plastic Fantastic speech took over discourse.)

Discipline. Discipline. Then Discipline Again

The idea is that one has to establish goals, work routines, and priorities. The effort is entirely mental. For nearly 30 years, I follow a disciplined routine. I am at my desk (hidden in a dark, damp basement) working on tasks. Yep, seven days a week, 10 hours a day unless I am sick, on a much loathed business trip, or in a meeting somewhere, not in my home office). Sound like fun? For me, it is, and discipline is not something to talk about in marketing oriented click bait articles. Discipline is what one manifests.

Read more

Semantic Sci-Fi: Search Is Great

March 23, 2020

I read “Keyword Search is DEAD; Semantic Search Is Smart.” I assume the folks at Medium consider each article, weigh its value, and then release only the highest value content.

comic

Semantic search is better than any other type of search in the galaxy.

Let’s assume that the write up is correct and keyword search is dead. Further, we shall ignore the syntax of SQL queries, the dependence of policeware and intelware systems on users’ looking for named entities, and overlook the interaction of people using an automobile’s navigation service by saying, “Home.” These are examples of keyword search, and I decided to give a few examples, skipping how keyword search functions in desktop search, chemical structure systems, medical research, and good old, bandwidth trimming YouTube.

Okay, what’s the write up say beyond “keyword search is dead.”

Here are some points I extracted as I worked my way through the write up. I required more than three minutes (the Medium estimate) because my blood pressure was spiking, and I was hyper ventilating.

Factoid 1 from the write up :

If you do semantic search, you can get all information as per your intent.

What’s with this “all.” Content domains, no matter what the clueless believe, are incomplete. There is no “all” when it comes online information which is indexed.

Factoid 2 from the write up:

semantic search seeks to understand natural language the way a human would.

Yep, natural language queries are possible within certain types of content domains. However, the systems I have worked with and have an opportunity to use in controlled situations exhibit a number of persistent problems. These range from computational constraints. One system could support four simultaneous users on a corpus of fewer than 100,000 text documents. Others simply output “good enough” results. Not surprisingly when a physician needs an antitoxin to save a child’s life, keywords work better than “good enough” in my experience. NLP has been getting better, but the idea that systems can integrate widely different data which may be incomplete, incorrect, or stale and return a useful output is a big hurdle. So far no one has gotten over it on a consistent, affordable basis. Short cuts to reduce index look ups can be packaged as semantics and NLP but mostly these are clever ways to improve “efficiency.” Understanding sometimes. Precision and recall? Not yet.

Read more

Amazon Revealed by the BBC: Analysis and News about the Bezos Bulldozer

February 18, 2020

The BBC is a subsidized news outfit. As a person who lives in America, I don’t understand the approach taken to either obtaining money or to programming. I do miss the Lilliburlero tune. Also, wouldn’t it be helpful to be able to locate BBC audio programs? Well, maybe not.

DarkCyber noted “Why Amazon Knows So Much about You.” The write up is notable for several reasons. First, it uses one of those Web layouts that are popular: Sliding windows, white text on black backgrounds, and graphics like this one of Mr. Bezos, zeros and ones, and a headline designed to make the reader uncomfortable:

image

Second, the article is labeled as news, but it is more of a chatty essay about Amazon, its Great Leader, and the data the company gathers via the front scoop of the Bezos bulldozer. But news? Maybe one of those chatty podcasts which purport to reveal the secrets of some companies’ success.

Third, the write up seems long. There are plenty of snappy graphics, dialog which reads a bit like the script for the video program Silicon Valley, and embedded video; for example, Margreth Vestager:

image

Note that this image is in close proximity to this image of Mr. Bezos and his friend. Happenstance? Sure.

image

The write up goes deep into Amazon history with details about a snowy, cold, and dark night. The stage setting is worthy of Edward Bulwer Lytton, the fellow who allegedly coined the phrase “the pen is mightier than the sword.” Is the BBC’s pen mightier than an Amazon sword, available in the US for $23.70 with free shipping for Prime members:

image

With that in mind, what is “Why Amazon Knows So Much about You?”

The most straightforward way to respond to this question is to look at what the write up covers. Here’s the general layout of the almost 5,000 word “semi news” story:

Introduction with the author’s personal take on Amazon

The early days (the meeting in the mountains) of “planning to suck data”

Amazon’s approach to business: Slippery, clever, and maybe some Google-style deflection

The Ring moment when the Shark Tank people proved they were not qualified to work for Mr. Bezos

Amazon is just like those other American monopolies and the sky is falling because staff are complaining about many things

Amazon’s big ideas for making even more money.

 

Read more

Google May Be Facing a Moon Shot Challenge

February 17, 2020

DarkCyber wants to reflect on a challenge, a difficult one.

image

DarkCyber read “Google Removes 500+ Malicious Chrome Extensions from the Web Store.” No, not a “the” store. The store is Google’s online store toward which every Android phone longs to visit. Some mobile devices have no choice. Other Android phones have some restraints, but “home is home.”

According to the write up:

The removed extensions operated by injecting malicious ads (malvertising) inside users’ browsing sessions. The malicious code injected by the extensions activated under certain conditions and redirected users to specific sites. In some cases, the destination would be an affiliate link on legitimate sites like Macys, Dell, or BestBuy; but in other instances, the destination link would be something malicious, such as a malware download site or a phishing page.

You should read the ZDNet story mentioned above and follow its links. However, the notion that DarkCyber has been noodling involves Google’s large online advertising business. Here are some questions we drafted after our morning call:

  • If the Google Android store is disseminating software which generates clicks, how will those affected advertisers be compensated?
  • What other ad centric spoofs or manipulations exist within the ad system for YouTube?
  • What malware or manipulative techniques operate within the core AdWords’ system?
  • What role to click bots or click farms play in manipulating Google’s online advertising data?
  • What about human Googler manipulation of advertising systems; for example, as quarters draw to a close?

DarkCyber only has these and a number of other questions. The answer to these questions may call into question the reliability, accuracy, and honesty of the Google online advertising operation.

If the answers fail to reassure advertisers and others, the strength of Google might become its most serious challenge in the company’s rise from objective search system to global online ad giant.

Challenge? Maybe multiple challenges: Credibility, legal, technical, and managerial.

Stephen E Arnold, February

Data Are a Problem? And the Solution Is?

January 8, 2020

I attended a conference about managing data last year. I sat in six sessions and listened as enthusiastic people explained that in order to tap the value of data, one has to have a process. Okay? A process is good.

Then in each of the sessions, the speakers explained the problem and outlined that knowing about the data and then putting it in a system is the way to derive value.

Neither Pros Nor Cons: Just Consulting Talk

This morning I read an article called “The Pros and Cons of Data Integration Architectures.” The write up concludes with this statement:

Much of the data owned and stored by businesses and government departments alike is constrained by the silos it’s stuck in, many of which have been built over the years as organizations grow. When you consider the consolidation of both legacy and new IT systems, the number of these data silos only increases. What’s more, the impact of this is significant. It has been widely reported that up to 80 per cent of a data scientist’s time is spent on collecting, labeling, cleaning and organizing data in order to get it into a usable form for analysis.

Now this is most true. However, the 80 percent figure is not backed up. An IDG expert whipped up some percentages about data and time, and these, I suspect, have become part of the received wisdom of those struggling with silos for decades. Most of a data scientist’s time is frittered away in meetings, struggling with budgets and other resources, and figuring out what data are “good” and what to do with the data identified by person or machine as “bad.”

The source of this statement is MarkLogic, a privately held company founded in 2001 and a magnet for $173 million from funding sources. That works out to an 18 years young start up if DarkCyber adopts a Silicon Valley T shirt.

image

A modern silo is made of metal and impervious to some pests and most types of weather.

One question the write up begs is, “After 18 years, why hasn’t the methodology of MarkLogic swept the checker board?” But the same question can be asked of other providers’ solutions, open source solutions, and the home grown solutions creaking in some government agencies in Europe and elsewhere.

Several reasons:

  1. The technical solution offered by MarkLogic-type companies can “work”; however, proprietary considerations linked with the issues inherent in “silos” have caused data management solutions to become consultantized; that is, process becomes the task, not delivering on the promise of data, elther dark or sunlit.
  2. Customers realize that the cost of dealing with the secrecy, legal, and technical problems of disparate, digital plastic trash bags of bits cannot be justified. Like odd duck knickknacks one of my failed publishers shoved into his lumber room, ignoring data is often a good solution.
  3. Individuals tasked with organizing data begin with gusto and quickly morph into bureaucrats who treasure meetings with consultants and companies pitching magic software and expensive wizards able to make the code mostly work.

DarkCyber recognizes that with boundaries like budgets, timetables, measurable objectives, federation can deliver some zip.

Silos: A Moment of Reflection

The article uses the word “silo” five times. That’s the same frequency of its use in the presentations to which I listened in mid December 2019.

image

So you want to break down this missile silo which is hardened and protected by autonomous weapons? That’s what happens when a data scientist pokes around a pharma company’s lab notebook for a high potential new drug.

Let’s pause a moment to consider what a silo is. A silo is a tower or a pit used to store core, wheat, or some other grain. Dust is silos can be exciting. Tip: Don’t light a match in a silo on a dry, hot day in a state where farms still operate. A silo can also be a structure used to house a ballistic missile, but one has to be a child of the Cold War to appreciate this connotation.

As applied to data, it seems that a silo is a storage device containing data. Unlike a silo used to house maize or a nuclear capable missile, the data silo contains information of value. How much value? No one knows. Are the data in a digital silo explosive? Who knows? Maybe some people should not know? What wants to flick a Bic and poke around?

Read more

Informatica: A Play for Greater Relevance in an Amazon Chess Game?

January 3, 2020

Informatica was set up in 1993. The company was private, then public, and now private. Its new CEO is a former McKinsey professional, a background which some may find reassuring and others terrifying. (McKinsey had a racketeering lawsuit dismissed. How does a consulting firm ensnare itself in an allegation of racketeering? I will leave it to you to answer that question.)

The big news, however, is that Informatica is making an attempt to retain its relevance and increase its impact among Fortune 1000 firms, investment banks, financial services firms, insurance companies, and other blue chip customers.

The method, its seems to DarkCyber, involves Amazon. Keep in mind that Informatica’s previous attempts to add some zing to its quarter century of database-related work involved Microsoft and Salesforce, both next big things.

According to “Informatica Aims to Better Track Data Lineage with AI-Powered Data Catalog,”

its AI-powered data catalog, called Catalog of Catalogs is notable because it is trying to track data lineage across ecosystems. Catalog of Catalogs includes metadata scanners for business intelligence, data warehouses, big data and third party repositories.

The “new” Informatica is represented in this graphic, which has a remarkable resemblance to Amazon Web Services blockchain diagrams:

informatica-catlog-of-catalogs.png

Is this an Amazon diagram in recognizable AWS orange or an Informatica diagram?

There’s a hook to Amazon’s data marketplace technology, support for Amazon’s smart workflow, and the federation of metadata.

But what’s missing in this real news story?

Read more

Next Page »

  • Archives

  • Recent Posts

  • Meta