NLP with an SEO Spin

July 8, 2020

If you want to know how search engine optimization has kicked librarians and professional indexers in the knee and stomped on their writing hand, you will enjoy “Classifying 200,000 Articles in 7 Hours Using NLP” makes clear that human indexers are going to become the lamp lighters of the 21st century. Imagine. No libraries, no subject matter experts curating and indexing content, no human judgment. Nifty. Perfect for a post Quibi world.

The write up explains the indexing methods of one type of smart software. The passages below highlights the main features of the method:

Weak supervision: the human annotator explains their chosen label to the AI model by highlighting the key phrases in the example that helped them make the decision. These highlights are then used to automatically generate nuanced rules, which are combined and used to augment the training dataset and boost the model’s quality.

Uncertainty sampling: it finds those examples for which the model is most uncertain, and suggests them for human review.
Diversity sampling: it helps make sure that the dataset covers as diverse a set of data as possible. This ensures the model learns to handle all of the real-world cases.

Guided learning: it allows you to search through your dataset for key examples. This is particularly useful when the original dataset is very imbalanced (it contains very few examples of the category you care about).

These phrases may not be clear. May I elucidate:

  • Weak supervision. Subject matter experts riding herd. No way. Inefficient and not optimizable.
  • Uncertainty sampling means a “fudge factor” or “fuzzifying.” A metaphor might be “close enough for horse shoes.”
  • Guided learning. Yep, manual assembly of training data, recalibration, and more training until the horse shoe thing scores a point.

The write up undermines its good qualities with a reference to Google. Has anyone noticed that Google’s first page of results for most of my queries are advertisements.

NLP and horse shoes. Perfect match. Why are the index and classification codes those which an educated person would find understandable and at hand? Forget answering this question. Just remember good enough and close enough for horse shoes. Clang and kha-ching as another ad sucks in a bidder.

Stephen E Arnold, July 8, 2020

Techno-Grousing: A New Analytic Method?

July 3, 2020

Two items snagged my attention as my team and I were finishing the pre-recorded lecture about Amazon policeware for the upcoming National Cyber Crime Conference.

The first is a mostly context free item from a Silicon Valley type “real” news outfit. The article’s title is:

Hany Farid Says a Reckoning Is Coming for Toxic Social Media

The item comes from one of the technology emission centers in the San Francisco / Silicon Valley region: A professor at the University of California, Berkeley.

What’s interesting is that Hany Farid is activating a klaxon that hoots:

In five years, I expect us to have long since reached the boiling point that leads to reining in an almost entirely unregulated technology sector to contend with how technology has been weaponized against individuals, society, and democracy.

Insight? Prediction? Anticipatory avoidance?

After decades of supporting, advocating, and cheerleading technology — now, this moment, is the time to be aware that change is coming. Who is responsible? The media is a candidate, people who disseminate misinformation, and bad actors.

Sounds good. What about educators? Well, not mentioned.

The other item comes from the Jakarta Post. You can find the story at this link. I have learned that mentioning the entity the story discusses results in my blog post being skipped by certain indexing systems. Hey, that’s a surprise, right?

The point of the write up is that a certain social media site is now struggling with increased feistiness among otherwise PR influenced users.

What’s interesting is that suddenly, like the insight du jour from the Berkeley professor, nastiness is determined to be undesirable.

The fix for the social media outfit is simple: Get out of line and you will be blocked from the service. There’s nothing so comforting as hitting the big red cancel button.

Turning battleships quickly can have interesting consequences. The question is, “What if the battleship’s turn has unforeseen consequences?”

Stephen E Arnold, July 3, 2020

Complaints and Protest: But the GOOG Has Been Googling for 20 Years

June 23, 2020

My goodness, we live in the Era of Complaining. The print version of the “flagship podcast” published “Google Employees Demand the Company End Police Contracts.” Let’s put this Google tie up with the US government in context.

Google was poking around the US government as early as 1999 when the chatter about indexing US government content surfaced. The company bid on the FirstGov.gov project and lost. (The US government selected the really interesting solution proposed and provided by AT&T.) Google acquired Keyhole which the CIA investment unit In-Q-Tel supported with cash. In 2005, In-Q-tel sold its shares in Google in 2005. In 2008, Google and In-Q-Tel jointly invested in Recorded Future. Along the way, Google has performed “work” for a number of US government agencies. Despite the low profile of some of these activities, Google has been in the DC game for more than 20 years. I know because I receive a snotty email about why Google should have been selected instead of the AT&T Fast Search solution.

The point is that Google employees are dazzled by their perceptual baloney. The company today is similar to the wonky outfit it was after Backrub took a break, venture money arrived, and in a moment of adulting thrashed about for a way to make money. The solution was, as you and some Googlers may not care to know, was to “be influenced” by Yahoo’s Overture/GoTo online advertising concept. Google settled the Yahoo legal complaint about this “influence’ prior to the firm’s IPO and may have coughed up about $1 billion to grease the skids for the IPO. Yahoo took the deal, and the Google morphed into the online ad outfit it is today.

But employees at Google, based on my limited exposure to these fine individuals, are generally unaware of the company’s interest in US government work, the fascinating way systems and methods arrive at the company, and the old fashioned idea that when you accept money for work you shut up or quit.

Not today.

The online word version of the “flagship podcast” states:

Employees are specifically calling out Google’s ongoing Cloud contract with the Clarkstown Police Department in New York, which was sued for allegedly conducting illegal surveillance on Black Lives Matter protestors in 2015. They’re also highlighting the company’s indirect support of a sheriff’s department in Arizona tracking people who cross the US-Mexico border.

Okay, Google is not the center of the universe when it comes to management sophistication. The company employs what I call “the high school science club management method.” The inability to keep information private and the hiring procedures which seem to favor those who want to decide what a publicly traded commercial enterprise do to earn money illustrates the challenges Google faces.

Mr. Brin’s showing up in senior elected officials’ offices wearing a T shirt and gym shoes with sparklies on them is trivial compared to the larger strategic recent issues at Google.

Not only are employees at Google complaining despite the money, the ping pong tables, and the benefits of working at home — the employees want Google to extricate and no longer pursue revenue producing activities.

Several observations:

  1. Google does and will continue to do government work despite caving to employee demands over Project Maven. Hey, good news for Anduril, right?
  2. Employees don’t know much if anything about the history of Google, the type of decisions its founders made, and efforts the company has made to obtain government work. Candidate vetting and  employee training is working well at the GOOG, don’t you think?
  3. Google management cannot contain confidential information. But the larger question is, “Why is the hiring process failing to recruit individuals who do work and make time to complain about Google’s government work. The contracts don’t just drop from the sky. Effort, sometimes years of effort, are necessary to land these projects. So quit tomorrow? Sure, good for the attorneys, not for the government customers.

Complain, complain, complain. There’s nothing like employees grousing. Why not do something other than send email? Here’s a suggestion: Quit.

What’s Google going to do about this quite embarrassing state of affairs?

Many years ago (I can’t provide details because I signed a document wittingly) a Google senior wizard told me:

Some day it will end. Until then, rock and roll.

And to what does this Gnostic phrase refer?

Google has been putting the pedal to the metal for 20 years. Now the company is operating, like a few others, without meaningful constraints, adult leadership, and much of a purpose other than making money, reducing costs, and dealing with backlashes. The push back against Google is manifesting itself in the government investigations, the talk about monopoly behavior, and the dwindling likelihood that a trip to Brussels or Strasbourg will be a holiday. It is possible that some Google attorneys will enjoy discussing the fines and legal restraints fun, but that’s a sign of changing times.

Net net: The employee grousing reflects a lack of meaningful regulation, a failure of Google leadership, and remediating hiring processes which allow the printed version of the “flagship podcast” to explain that lots of Googlers want to tear the house down. Take direct action. Resign. I am old fashioned. Employees accept job offers. Before hooking up with a publicly traded company as an employee (look up the definition, gentle Googlers with protest on your mind) — learn about the company. That’s your obligation. After accepting a job, like it or leave. Easy. I, however, think these complainers will follow the thought processes I characterize as “Casey Newtonesque.”

Wonderful. Flagship podcast. Real news, yeah!

Stephen E Arnold, June 23, 2020

Free Dissertation? Act Fast or You May Have to Pay Up and a Lot

June 20, 2020

DarkCyber spotted “Discovering Dennis Ritchie’s Lost Dissertation.” The main point of the write up is that a wizard failed to hand over a copy of his dissertation to the institution library. As a result, no PhD and no scanning, indexing, and selling of the good student’s work by University Microfilms. I have no clue what this outfit is called today, but in the 1960s, the outfit zoomed through Kodak film and helped animate environmental controls on photoprocessing chemicals. Silver and all that, of course.

The main point of the write up for me is the link to the aforementioned dissertation. Free and online as June 20, 2020, at Ritchie_dissertation.pdf. Miss this chance and you may have to pony up some hard cash for a professional publishing/database company’s honest work of making money by converting students’ fear and perspiration into an online charge.

Oh, what did the student cook up? The C language.

Stephen E Arnold, June 20, 2020

Short Cut Debater Delight: URL to a Snippet

June 19, 2020

Let us journey back in time. I was a high school and college debate person. I think one of my “advisors” called us “debaters,” but I think he was saying, “De-daters.” Yeah, popular.

The year is 1964, and my debate partner was a silver tongued Greek American named Nick G. I was a fat, bespeckled trailer court person who hid in the library. My job was to read stuff and write summaries on 5×8 note cards. Remember those?

If I spotted a useful fact about the National Defense Education Act or similar burning topic for a 19 year old, I would cross reference the factoid, index it with a color tinted pencil, and organize the note cards in my really big wooden box. Cool, right?

Flash forward to a debate at some empty campus in January and a “debate tournament.” Sad affairs? You bet. Nick and I were listening to a couple of swifties from Dartmouth explain that Nick and I were stupid, losers from an intellectual nowheresville, and candidates for life in a tuna packing plant owned by one of the Dartmouth wizard’s family.

I spotted a note card, a snippet, and a cross reference. Coincidence,  maybe. Cut to the punch line: The Dartmouth rebuttal person changed the factoids and quoted an edited version of the information I had recorded in my blissful hours of alone-ness in the library.

My turn to speak arrived, and I began by pointing out that snippets out of context were not the stuff an Ivy Leaguer was fabricated. The “fabrication” of misstatements, misquotes, and misrepresentations were proof that the arguments constructed by the shortcut artists from Hanover, New Hampshire (wherever that was) were fluff.

Bingo. I summed up our case and sat down.

We won the debate and the tournament. I think my father-in-law used the trophy as a tie rack.

I thought of Eleazar’s losers. Nick and I ate a pizza at some joint before the bus ride back to the frozen Midwest where our one-horse college pumped information into hungry Illinoisans.

Google is allegedly going to facilitate short cut thinking if the information in “Google’s New Chrome Extension Lets You Link Directly to Specific Text on a Page” is accurate, but today, who knows?

The idea is that a person creates or fabricates a factoid, creates a link, and leads the Dartmouth-type research to just what is needed to support a castle of clips.

The old fashioned approach mostly required finding information, reading something, copying or photocopying the pages, converting the information to a note card, and going through the indexing thing.

The process had the effect of imprinting the information on the mind. If one had a good memory as Nick or I did, we could pull information, find the source, and convert that item into a useful addition to our argument.

What happens if one takes a shortcut? You get the Dartmouth approach to information; that is, fix it up and skip the work.

The write up states:

The Google extension builds upon a new feature that was recently added to Chromium called Text Fragments, which works by appending extra linking information to a URL after a #. It’s the same technology that Google now sometimes uses to link to specific parts of a webpage in search results. However, these URLs can be long and difficult to manually create if you’re linking to longer sections of text, or complex web pages where the same words are repeated multiple times. This extension simplifies the creation process.

Right, who needs context? Also, what happens when Google “hides” urls so one has to use Google search to locate a source?

Any wonder why some of the arguments presented by “real” lawyers and journalists are so stupid?

The intellectual rigor has not just relaxed; it has checked into Hotel California and chilling out. Bump, bump, bump. Hanover arrives in La La Land.

Stephen E Arnold, June 19, 2020

Organic or Paid Search? Answer: Pay Up

June 16, 2020

There is a weird symbiosis. Unlike the sucker fish clamped on a shark, the predator’s fellow travelers operate in the dark digital ocean. “Organic Vs Paid Search: Explained” correctly points out that traffic costs money. This is not 1994, gentle reader. This is 2020 and the costs of running an ad supported search engine are difficult to control.

The write-up ignores a simple fact: Online advertising companies want anyone who wants clicks and traffic to pay. Like the IRS oriented phrase: Death, taxes, and the online traffic levy.

This means that “organic search” — the 1994 style of Web indexing — is dead like dinosaurs. The future is pay to play.

As output devices become smaller and voice creeps forward as a way to explain where to get a pizza, the free loading sucker fish are going to get scraped off the digital shark. The shark will then eat the sucker fish.

What’s this mean for search engine optimization? More baloney, more hand waving, and another lost cause.

Pay to play, the phrase of the future. There’s no cyber Mother Theresa to intervene.

Stephen E Arnold, June 16, 2020

Google Search: Clutching at Elephant Parts?

June 3, 2020

The DarkCyber research team finds Google search endlessly fascinating. The group is less interested in the relevance of the results and increasingly interested in the manipulations of data. The line between objective results and weaponized results is a thin one. Figuring out what is occurring, the intent of changes in data presentation, and the actions of stakeholders like SEO (search engine optimization) professionals is similar to the behaviors we documented in our Dark Web research. (We summarized some of our data in “Dark Web Notebook. Information about that monograph is available at this link.) Our radar beeped when one of the team identified a certified SEO expert who identified himself as a “hustler.” This is street jargon for a person with behaviors which may be perceived as illegal or quasi illegal.

Consider this Reddit post from Antihero. The focus of Antihero’s attention was a search for mattress. The result returned about 761 million results. However, the first page of search results — that is the one that 95 percent of those using Google view — is entirely ads. To support the argument, Antihero includes a screen shot of the page which indeed is entirely “pay to play” content. Yep, ads, infomercials in text form, carnival barkers who get that prime real estate by paying off the entertainment company managing the event. To sum up, Google is not good.

Now consider this post from a company which depends on Google for indexing and pointing to its content. “Panda and the Death of SEO PR” explains that Google is doing an outstanding job of filtering certain content from its search results. The idea is that bogus news releases which can be output after registering for free news release services is filtered. Plus the changes in search since 2013 have made it more difficult for outputters to put certain content on Web pages which are then indexed by Google and made available to the world. To sum up, Google is good.

Let’s step back. Google is in the business of selling ads. The ad business is different from those halcyon days when Google was furiously litigating with Yahoo about certain similarities between Google’s fledgling ad service and Yahoo’s ad system and method. Google ended up inking a deal; Yahoo went back into its purple jack in the box; and the pay to play approach to “objective” search become the de facto standard in the US and then elsewhere.

When a Web site is not indexed, the webmaster or 23 year old political science major reinvented while living in mom and dad’s basement needs traffic. What are the choices?

  1. Create content and hope that tweets, Facebook posts, and links in LinkedIn generate hundreds of thousands of page views. Google’s algorithms and ad sales professionals monitor such traffic anomalies. A spike could mean a customer with money to spend. With more than 35 billion Web pages in the online indexes, generating a spike is possible, but it is difficult to achieve. That path is called “organic search.” The idea is that clicks flow from the video, the content, or the image posted. Organic search operates on the magnet principle. Good content pulls traffic. Yes, that happens.
  2. Buy ads. This approach does work. Amazon, Facebook, Google, and others operate search systems and match ads to user interests. For product traffic, Amazon is emerging as the big dog running in front of the Bezos bulldozer to chase small animals off the trail. Facebook — despite its somewhat unstable political and social position — can deliver person centric ads. Google is the champion of free Web search on the desktop and on mobile devices. If you want traffic, you buy and ad. The ad produces traffic. There is chatter that buying ads has other upsides as well, but those are a subject for a future post.

Now back to the Reddit post. Those who buy ads for content related to mattresses and pay the most appear on the results page in Antihero’s online article.

And what about the eRelease “Google is wonderful” post? It is valid, particularly for Google partners and organizations which have an opportunity to participate in the Google ecosystem.

Net net: When organic traffic doesn’t work, one can work with a Google partner who can provide content distribution and a glide path for ad sales. When one grabs part of an elephant, even when one has one’s eyes open and one is wearing rubber boots and a rubber apron, it is difficult to see what you near.

Stephen E Arnold, June 3, 2020

Search Engine Optimization: Designed to Sell Google Advertising

May 26, 2020

Many years ago, I gave a talk at one of Search Engine Land’s conferences. I am not sure how I ended up on the program. At that time I was focused on enterprise search and some work for the US government. I showed up, gave a talk about enterprise search, and sat in on several round tables. The idea was, as I recall, that speakers sat at a table and people could sit down and talk about search. I was like a murder hornet at a five year old’s birthday party. Not only did I have any context for questions like “How do I get my department’s content to rank highly in our local search engine?” And “What ideas do you have for making content relevant?” That was the last time I accepted an invitation to give a talk at a search engine optimization conference. If you want to manipulate corporate content, just do it directly. What’s with the indexing thing?

The topics were designed to give a marketer who knew essentially zero about search of any kind information to game a relevance ranking system. The intent of the conference organizer (who eventually became a search evangelist or apologist for Google) and the attendees had zero, zilch, nada, to do with getting on point answers to a query.

I typically confine my annoyance at search engine optimization to comments I offer in my blog Beyond Search/Dark Cyber. If a scam artist sends me asking me to include a link to another blog, I respond and point out I will reproduce those emails about cyber crime. That usually causes the bot or whoever is sending me emails to go away.

I want to take this opportunity to state what was obvious to me when the SEO (the acronym for the relevance-killing discipline of search engine optimization) industry began taking bait dangled by Google.

Here’s how this multi-year, large-scale digital pipeline works. The diagram below shows a marketer or Web site owner eager to get the site into a search engine. Being indexed, of course, is not enough. The Web site must appear on the first page of a Web search system’s results pages. The person seeking traffic has two choices and only two choices: Get traffic with the content (text, audio, or video) providing the magnetism or pay to play. Buy ads. Get traffic. Period.

Put content on the page with index terms (now called tags) and make sure the Web page conforms to Google’s rules. Despite Google’s protestations, the company accounts for an astounding 95 percent of the search queries in the US and Western Europe. Google has competition in China which holds down Google’s share of market in the Middle Kingdom. For all practical purposes, embracing Google’s web master guidelines, conforming to AMP, and making modifications decreed by Google is helpful in getting indexed. The first path appears to be easy. When it fails, the search engine optimization experts are ready to assist.

The second path to traffic is to buy Google Advertising. Google has a desire to become the premier place for large-scale media campaigns. Google will sell ads to small outfits, but the money comes from having Fortune 1000 companies and their ilk buy Google advertising. The problem is that Google Advertising costs money. The interface is designed to be like a game, a gambling game at that. The results from Google ads can be difficult to connect to a specific sale. Nevertheless, ads are option two.

How does the pipeline work? What is the feedback mechanism that enriches some SEO experts? Why are the two options symbiotic? I want to provide brief answers to each of these questions.

How does the pipeline work? (Perhaps the word “grooming” might be appropriate here?)

This is an easy question. Not buying ads means that most Web sites will get almost zero traffic. Web search is a pay-to-play operation. Google has its own list of bluebirds, canaries, and sparrow. (A bluebird is a Web site that Google must index no matter what. An example is whitehouse.gov, stanford.edu, and cnn.com. A sparrow is an uninteresting Web site which may get indexed on an irregular or relaxed cycle. The canary? That’s a Web site which may not be indexed comprehensively or if indexed, updated on a delayed basis.) With more than 35 billion Web sites wanting to be indexed by Google and the lesser online systems, the no-ads option seems attractive. Therefore, Google encourages SEO experts to pitch their services.

Now here’s the kicker. Web sites which do not buy ads struggle to get clicks. SEO experts make suggestions and may make changes in their customers’ Web pages. But nothing delivers traffic unless an anomaly or a particular item of information catches attention which delivers large numbers of clicks. Google dutifully indexes that which attracts clicks, thus creating more demand. More demand means that indexing those “magnetic” pages makes ad sales “obvious”. Traffic allows Google to chop through its ad inventory. Relaxed queries for words related to “magnetic” sites is an obvious technical play to sell more ads. Thus, SEO experts lucky enough to have a customer pulled into the maelstrom of a “magnetic” page is happy. If Google wants a change, that Web site operator will make the change. If an SEO expert is involved, the Google change is packaged with assurance that “traffic will arrive in an organic way.” Organic in the lingo of the SEO expert means “you don’t have to pay to get traffic.”

So what? Groomed or indoctrinated SEO experts set the stage to help Google get their requirements and methods adopted without telling a Web site operator “You must do this.” Second, the SEO experts make money pushing the fluff about organic traffic. Third, Web site operators who benefit from the effect of “magnetic” sites on their Web site become noisy advocates of SEO.

There is a but.

At any time, Google’s algorithms can decrement a Web site living by organic traffic. Google can also manually intervene and slow the flow of traffic to a Web site. The mechanism ranges from blacklists to adding a url or entity to a list of sites with “negative” quality scores. I have explained the notion of “quality” as defined by Google in my The Google Legacy and Google Version 2.0 monographs, originally published by Infonortics but out of print due to the skill print publishers have in committing hair Kari.

What happens when a Web site loses traffic? Some sue like Foundem; others go out of business. Many simply accept the loss of traffic as fate and either buy Google Advertising or run back into the La-La land of SEO assurances that traffic will again flow organically after we wave our magic wand.

Other companies bite the bullet and buy Google advertising. Examples range from companies who pull advertising because their ads appear adjacent objectionable content. These companies go back because Google is a de facto gatekeeper for high-volume online traffic. Other companies decide that they need to pay SEO experts AND buy Google Advertising.

This is a sweet operation because:

Google has evangelists who tell those with Web pages what specific changes are needed to make a Web page conform to a Google-defined standard. Conformance to Google standards reduces computational load. There are tens of thousands of Google’s “SEO helpers” creating what Google wants and needs.

When the SEO experts fail to deliver clicks, you know what happens? Google Advertising to the only life saver on the digital beach.

SEO is a game played for free or organic traffic. Google controls the information highway. Stay in your lane and do what we want. Make a tiny error. Well, Google Advertising, a friendly Google inside sales professional or certified SEO expert can get you out of the mud.

SEO experts are sure to object to my characterization of their efforts as Google pre-sales. But some SEO experts make money and one SEO expert became an honest-to-goodness Googler.

From my point of view, SEO is a complement to Google Advertising. Want traffic? Buy Google’s ads. The Google knows, and it gets the pay-to-play money, its gets the support and love of the SEO “experts”, and Google gets a third party pounding Web sites into the Google cookie cutter.

What happens if an outfit doesn’t play Foosball by Google’s rules? Just ask Foundem or the TradeComet executives.

If you are not on Google, you may not exist. That’s what makes the pipeline work and plugs in the Google money machine: Pay to play. It is a business model guaranteed to cement increasingly irrelevant results to users’ minds. And what happens when Google shapers results? You decide based on the information you “find” in Google, usually above the fold and more than 90 percent of the time without clicking to Page 2.

If you want more search engine optimization information, point your browser to this page of titles and hot links on Xenky.com. (Some of these articles identify SEO experts who are avowed hustlers. Is SEO a playground for digital Larry Flynts?)

Stephen E Arnold, May 26, 2020

Microsoft and Its Latest Search Innovation: Moving Past Fast? Nope

May 22, 2020

I read “Microsoft Search: Search Your Document Like You Search the Web.” Perhaps Microsoft did not get the reports about the demise of the Google Search Appliance. That “invention” made clear that searching a corporate content collection like you search the Web was not exactly the greatest thing since sliced bread. There were a number of reasons for the failure of the GSA. It was a black box. You know that mere mortals could not tune the relevance component. You know that it produced results that left employees wondering, “Where is the document I wrote yesterday?” You know that the corpus of Web content is different from the fruit cake of corporate content. Web search returns something because the system is rigged to find a way to display ads to the hapless searcher.

Contrast this with documents in the cloud, in different systems like that old AS/400 Ironsides application used by the warehouse supervisors, and content tucked away on employees’ USB drives, mobile phones, the oldest kid’s iPad, and on services a go to sales professional uses to store PowerPoints for “special” customers. Then there are the documents in the corporate legal office. The consultants’ reports scanned and stored on the Market Department’s computer kept for interns.

Nevertheless, the article explains:

We’re utilizing well-established web search technologies, such as query and document understanding, and adding deep learning based natural language models. This allows us to handle a much broader set of search queries beyond “exact match.”

Okay, query expansion, synonym look up, and Fast Search’s concept feature. But there’s more:

With the recent breakthroughs in deep learning techniques, you can now go beyond the common search term-based queries. The result is answers to your questions based on the document content. This opens a whole new way of finding knowledge. When you’re looking at a water quality report, you can answer questions like “where does the city water originate from? How to reduce the amount of lead in water?”

May I suggest that Microsoft and dozens of other enterprise search vendors have promised magical retrieval?

May I point out that the following content types are usually outside the ken of the latest and great enterprise search confection; for example:

  • Quality control data on parts stored in an Autodesk engineering document
  • Real time data flowing into an organization from sensors
  • Video content, audio content, and rich media like photographs
  • Classified or content restricted by certain constraints. (Access controls are often best implemented by specialized systems unknown to the greedy enterprise search indexing system.)
  • Documents obtained through an eDiscovery process for legal matters.

Has Microsoft solved these problems? Sure, if everything (note the logically impossible categorical affirmative) is in an Azure repository, it is conceivable that a user query could return a particular content object.

But that’s Microsoft fantasy land, and it is about as likely as Mr. Nadella arriving at work on the back of a unicorn.

Microsoft feels compelled to reinvent search every year or two. The longest journey begins with a single step. It is just that Microsoft took those steps decades ago and still has not reached the now rubbelized Fred Harvey’s.

Stephen E Arnold, May 22, 2020

Semantic SEO: Solution or Runway for Google Ads, Formerly AdWords?

May 14, 2020

I participated in a conversation with Robert David Steele, a former CIA professional, and a former Google software engineer named Zack Vorhies. One of the topics touched upon was Google’s relaxing of its relevance thresholds. A video of extracts from the conversation contains some interesting information; for example, the location of a repository of Google company documents Mr. Vorhies publicly released.

My contribution to the discussion focused on how valuable “relaxed” relevance is. The approach allows Google to display more ads per query. The “relaxed” query means that an ad inventory can be worked through more quickly than it would be IF old fashioned Boolean search were the norm for users. Advertisers’ eyes cross when an explanation of Boolean and “relaxing” a semantic method have to be explained.

DarkCyber’s research team prefers Boolean. None of the researchers need training wheels, Mother Google (which seems to emulate Elsa Krebs of James Bond fame) and WFH Googlers bonding with their mobile phones like a fuzzier, semantic Tommy Bahama methods.

The team spotted “The Newbie’s Information to Semantic Search: Examples and Instruments.” Our interpretation of “newbies” is that the collective noun refers to desperate marketers who have to find a way to boost traffic to a Web site BEFORE going to his or her millennial leader and saying, “Um, err, you know, I think we have to start buying Google Ads.”

Yes, there is a link between the SEO rah rah and the Google online advertising system. The idea is simple. When SEO fails, the owner of the Web page has to buy Google Ads (formerly Google AdWords). In a future post, someone on the team will write about this interesting business process. Just not in this post, thank you.

The article triggering this essay includes what looks like simplified semi-technical diagrams. Plus, there are screenshots featuring Yo Yo Ma. And SEOish jargon; for example:

Coding
Elements
Knowledge as in “knowledge of any Web page.” DarkCyber finds categorical affirmatives a crime against logicians living and semantically dead.
Mapping as in “semantic mapping”
Markup
Semantic

Plus, the write up some to be an advertorial weaponized content object for a product called Optimizer. DarkCyber concluded that the system is a word look up tool, sort of a dumbed down thesaurus for hustlers, unemployed business administration junior college drop outs, and earnest art history majors working in the honorable discipline of SEO.

What’s the semantic analysis convey to a reader unfamiliar with the concepts of “semantic,” “mark up,” and “knowledge.”? The answer, in the view of the DarkCyber team, is less and less useful search results. Mr. Vorhies makes this point in the video cited above. In fact, he wants to go back to the “old Google.” Why? Today’s Google outputs frustratingly off point results.

The article’s main points, based on the DarkCyber interpretation of the article, are:

First, statements like this: “…don’t actually recognize how troublesome it’s to elucidate what’s being communicated with out the assistance of all “beyond-words” indicators.” Yeah, what? DarkCyber thinks the tortured words imply that smart software and data can light up the dark spaces of a user’s query. Stated another way: Search results should answer the user’s question with on point results. Yes, that sounds good. A tiny percentage of people using Google want to conduct an internal reference interview to identify what’s needed, select the online indexes to search, formulate the terms required for a query, and then run the query on multiple systems. Very few users of online search systems wants to scan results, analyzed the most useful content, dedupe and verify data, and then capture facts with appropriate bibliographic information. Many times, this type of process is little for than input for a more refined query. Who has time for a systematic, thorough informationizing process. Why? Saying the word “pizza” to a mobile phone is the way to go. If it works for pizza, the simple query will work for Inconel 235 chemical properties, right? This easy approach is called semantic. In reality it is a canned search with results shaped by advertisers who want clicks.

Second, a person desperately seeking traffic to a Web site must index content on a Web page. Today, “index” is a not-so-useful term. Today one “tags” a page with user assigned terms. Controlled vocabularies play almost no role in modern Web search systems. Just make up a term, then to a TikTok video and become a millionaire. Easy, right? To make tags more useful, one must use synonyms. If a page is about pizza, then a semantic tag is one that might offer the tag “vegetarian.” At least one of the DarkCyber team is old enough to remember being taught how to use a thesaurus and a dictionary. Today, one needs smart software to help the art major navigate the many words available in the English language.

Third, to make the best use of related words, the desperate marketer must embrace “semantic mapping.” The idea is to “visualize relationships between ideas and entities.” (The term “entity” is not defined, which the DarkCyber team is perfectly okay for newbies who need help with indexing.) The idea of a semantic map is a Google generated search page — actually a report of allegedly related data — created by Google’s smart software. In grade school decades ago, students were taken to the library, taught about the “catalog”. Then students would gather information from “sources.” The discovered information was then winnowed and assembled into an essay or a report. If something looked or seemed funny, there was a reference librarian or a teacher to inform the student about the method for verifying facts. Now? Just trust Google. To make the idea vivid, the article provides another Google output. Instead of Yo Yo Ma, the topic is “pizza.” There you go.

The write up reminds the reader to use the third party application Text Optimizer for best results. And the bad news is that “semantic codes” must be attached to these semantically related index terms. One example is the command for deleted text. Indeed, helpful. Another tag is to indicate a direct quotation. No link to a source is suggested. Another useful method for the practicing hustler.

Let’s step back.

The article is all too typical of search engine optimization expertise. The intent is wrapped in the wool of jargon. The main point is to sell a third party software which provides training wheels to the thrashing SEO hungry individual. Plus, the content is not designed to help the user who needs specific information.

The focus of SEO is to add fluff to content. When the SEO words don’t do the job, what does the SEO marketer do?

Buy Google Ads. This is “pay to play”, and it is the one thing that Google relies upon for revenue.

Stephen E Arnold, May 14, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta