Google Web Search Quality

April 20, 2022

The cat is out of the bag. The Reddit threat “Does Anyone Else Think Google Search Quality Has Gone Downhill Fast?” provides an interesting series of comments about “quality.”

The notion of “search quality” in the good old days involved gathering a corpus of text. The text was indexed using a system; for example, Smart or maybe Personal Bibliographic software. Test queries would be created in order to determine how the system displayed search results. The research minded person would then examine the corpus and determine if the result set returned the best matches. There are tricks those skilled in the art could use to make the test queries perform. One would calculate precision and recall. Bingo metrics. Now here’s the good part. Another search system would be used to index the content; for example, something interesting like the “old” Sagemaker, the mainframe fave IBM STAIRS III, or Excalibur. The performance of the second system would be compared to the first system. One would do this over time and generate precision and recall scores which could be compared. We used to use a corpus of Google patents, and I remember that Perfect Search (remember that one, gentle reader) outperformed a number of higher profile and allegedly more advanced systems.

I am not sure Reddit posts are into precision and recall, but the responses to the question about degradation of Google search quality is fascinating. Those posting are not too happy with what Google delivers and how the present day Googley search and retrieval system works. Thank you, Prabhakar Raghavan, former search wizard executive at Verity (wow, that was outstanding) and the individual who argued with a Bear Stearns’ managing director and me about how much better Yahoo’s semantic technology was that Google’s. Raghavan was at Yahooooo then and we know how wonderful Yahoo search was!)

Hewer’s a rundown of some of the issues identified in the Reddit thread:

  • From PizzaInteraction: “always laugh when I enter like 4 search terms and all the results focus on just one of the terms.”
  • Healthy-Contest-1605: “Every algorithm is being gamed to have their trash come out in top.”
  • Cl0udSurfer: “the usual tricks like adding quotes around required words, or putting a dash in front of words that should be excluded don’t work anymore.”

Net net: This is the Verity-Yahoo trajectory. Precision and recall? Ho ho ho. What about disclosing when a source was indexed and updated? What about Boolean operators? What about making as much money as possible so one can go to a high school reunion and explain the wonderfulness one’s cleverness? What happened to Louis Monier, Sanjay Ghemawat, and the Backrub crowd?

Stephen E Arnold, April 20, 2022

Google Responds to Amazon Product Search Growth

April 20, 2022

Here is a new feature from Google, dubbed Lens, we suspect was designed to win back product-search share from Amazon. TechCrunch reveals, “Google’s New ‘Multisearch’ Feature Lets You Search Using Text and Images at the Same Time.” The mobile-app feature, now running as a beta in the US, is available on Android and iOS. As one would expect, it allows one to ask questions or refine search results for a photo or other image. Writer Aisha Malik reports:

“Google told TechCrunch that the new feature currently has the best results for shopping searches, with more use cases to come in the future. With this initial beta launch, you can also do things beyond shopping, but it won’t be perfect for every search. In practice, this is how the new feature could work. Say you found a dress that you like but aren’t a fan of the color it’s available in. You could pull up a photo of the dress and then add the text ‘green’ in your search query to find it in your desired color. In another example, you’re looking for new furniture, but want to make sure it complements your current furniture. You can take a photo of your dining set and add the text ‘coffee table’ in your search query to find a matching table. Or, say you got a new plant and aren’t sure how to properly take care of it. You could take a picture of the plant and add the text ‘care instructions’ in your search to learn more about it.”

Malik notes this feature is great for times when neither an image nor words by themselves produce great Google results—a problem the platform has wrestled with. Lens employs the company’s latest ready-for-prime-time AI tech, but the developers hope to go further and incorporate their budding Multitask Unified Model (MUM). See the write up for more information, including a few screenshots of Lens at work.

Cynthia Murrell, April 20, 2022

DuckDuckGo and Filtering

April 18, 2022

I read “DuckDuckGo Removes Pirate Websites from Search Results: No More YouTube-dl?” The main thrust of the story is:

The private search engine, DuckDuckGo, has decided to remove pirate websites from its official search results.

DuckDuckGo is a metasearch engine. These are systems which may do some focused original spidering, but may send a user’s query to partner indexes. Then the results are presented to the user (which may be a human or a software robot). Some metasearch systems like Vivisimo invested some intellectual cycles in de-duplicating the results. (A helpful rule of thumb is to assume a 50 to 70 percent overlap in results from one Web search system to another.) IBM bought Vivisimo, and I have to admit that I have no idea what happened to the de-duplicating technology because … IBM.

There are more advanced metasearch systems. One example is Silobreaker, a system influenced by some Swedish wizards. The difference between a DuckDuckGo and an industrial strength system, in my opinion, is significant. Web search is an opaque service. Many behind-the-scenes actions take place, and some of the most important are not public disclosed in a way that makes sense to a person looking for pizza.

My question, Is DuckDuckGo actively filtering?” And “Why did this take so long?” And, “Is DuckDuckGo virtue signaling after its privacy misstep, or is the company snagged in a content marketing bramble?

I don’t know. My thoughts are:

  1. The editorial policies of metasearch systems should be disclosed; that is, we do this and we do that.
  2. Metasearch systems should disclose that many results are recycled and the provenance, age, and accuracy of the results are unknown to the metasearch provider?
  3. Metasearch systems should make clear exactly what the benefits of using the metasearch system are and why the provider of some search results are not as beneficial to the user; for example, which result is an ad (explicit or implicit), sponsored, etc.

Will metasearch systems embrace some of these thoughts? Nah. Those who use “free” Web search systems are in a cloud of unknowing.

Stephen E Arnold, April 18, 2022

DuckDuckGo Metasearch Service Causes Quacks

April 12, 2022

DuckDuckGo followed its technology brethren by rescinding most of its services in Russia due to the unfortunate invasion of Ukraine. The unbiased search engine CEO Gabriel Weinberg stated on March 9 that it would down rank Russian Web sites that spread disinformation. Much to DuckDuckGo’s surprise (as well as many others), the search engine was attacked by right-leaning, pro-free-speech supporters. The privacy search engine unintentionally attracted these supporters but did not discourage them.

Recode via Vox has the entire story: “The Free Speech Search Engine That Never Was.”

Weinberg tweeted his support for user privacy, but conservative supporters who used DuckDuckGo to search for content without Big Tech censorship were angry. They did not like that DuckDuckGo was demoting Russian propaganda Web sites. Oddly enough, these people also were pro-Putin’s invasion on Ukraine.

Right-wing supporters flocked to DuckDuckGo, because it was supposedly free of censorship that plagues other search engines like Google. These conservatives believe that information relating to their political and social beliefs was censored in all search engines except DuckDuckGo. These conservative supporters are more of the alt-right, conspiracy theorist type, i.e., anti-vaccination, DC capital insurrection. DuckDuckGo was okay with this:

“So DuckDuckGo surely knew what many of its new fans were coming to it for. They leaned into it a bit, too. Weinberg told Fox News and Quartz that Google’s search results were biased because Google collects data on users, which it then uses to target results to them. That, he said, created filter bubbles that further polarized society. Because DuckDuckGo didn’t collect data, its results were unbiased and searchers were free from Google’s echo chamber. This was a bit of a dodge; conservatives accused Google of intentionally keeping conservative sites and content off of its results, not just returning results influenced by a searchers’ interests. But it was an answer that seemed to satisfy users of all political persuasions. The alt-right wingers do not like that, but DuckDuckGo explained they are doing what search engines should be doing: “ensure that users were getting the best results for their searches.”

DuckDuckGo is one of many platforms that the alt-right adopted: Rumble, MeWe, Telegram, Substack. These platforms did not sky away from the users, because it meant more investments. We like the idea of a metasearch service protecting user privacy and, as a byproduct, false propaganda. Now how about better results?

Whitney Grace, April 12, 2022

Agolo: Making Government Sales and Landing VC Money

April 4, 2022

Apple, Google, and Microsoft might be search experts, but they continue to get things wrong. These are big companies, so they cannot solely concentrate on the search function like a dedicated company. Tech Crunch explains how Agolo specializes in search and improves upon what the tech giants do: “Agolo Summary-Powered Search Brings In Government Work And Fresh Funding.”

Agolo developed a powerful summary search engine that dissects articles and presents users with shorter versions that preserve key points. Agolo identified two types of search tools: dumb and smart. Dumb engines are not good at locating context or extraction data, while the smart functions are decent at fining relevance and order items. Both are limited in their capabilities. Agolo designed a smarter search function and described it as this:

“Agolo co-founder and CEO Sage Wohns gave the example of searching for ibuprofen. Any ordinary search engine only understands ibuprofen as a term people generally search for in order to learn more about the medicine, and that’s the way it’s reflected in the index. Even if you deploy that search tech on a domain-specific corpus, like research papers, it doesn’t magically gain better understanding. But a medical researcher searching through pandemic-related papers for ibuprofen already knows what it is — what they need is an ordered presentation of how ibuprofen appears in the literature, what other drugs and effects it is most tightly correlated with, what institutions and authors are associated with studying it.”

Agolo digests terabytes of data and summaries them in usable knowledge graphs. The summary search tool is capable of handling long documents and the US federal government uses it. Agolo does not sell an out of the box search solution, instead it partners with enterprise system designers like Google and Microsoft. This is interesting because in a recent round of funding, Google and Microsoft invested in Agolo:

“The company’s A round was just closed, led by Lytical Ventures, plus returning investors Microsoft M12, Google’s Assistant Investment Group, Tensility Venture Partners, Ridgeline Partners and Thomson Reuters. The company has raised over $18 million in total to date.”

Agolo, like Kyndi, are examples of a mini-enterprise search renascence? The memories of Autonomy, Delphi, Entopia, and Fast Search & Transfer have faded from customers’, investors’, and innovators’ memories.

Whitney Grace, April 4, 2022

Google: Grade A Search Baloney

March 31, 2022

I have been involved in online information for more than 50 years. Yep, folks, That’s more than half a century. Those early days involved using big clunky computers to locate a word in a Latin corpus. Then there were the glory days of commercial online products like Business Dateline, the Health Reference Center, and others. The Internet was a source of online craziness that trumped the wackiness of Ev Brenner and his vision for petrochemical data. Against this richly colored tapestry of marketing fabrications, overpromising and under delivering, and the bizarre fantasies of the “old” Information Industry Association I read “Google Search Is Actually Getting Better at Giving You What You Need.”

The write up channels a marketing person at the Google and mixes the search wizard’s recycling of Google truisms with some pretty crazy assertions about finding information in 2022.

Let’s take a look at three points and then step back and put these online advertising charged assertions in a broader context; namely, of the outcomes of a a system which is a de facto information monopoly.

Here are the points I noted in the write up:

Big, baby, big.

The first idea is that Google processes a great deal of information. Plus, Google tests to tackle the challenge of “search quality.” By the way, what does “quality” mean? What happens when you combine big with quality, you get really good outputs from the Google system. Just try it. Do a search for pizza via Google on a mobile device. See what you get? Pizza information. Perfect. So big and quality means good. Do you buy that?

The second idea is that Google like little beavers or little Googzillas works to improve quality. The idea is that yesterday’s Google was not bad; it needs improvement. Many improvements mean that quality goes up. Okay, let’s try it. Say you want information about a loss of coolant accident. You know. Chernobyl, Fukashima, et al. Type in loca and you get Shakira’s video. Type in “nuclear loca” and you get links to a loss of coolant accident. Type in site:nrc.gov loca and you get results specific to a loss of coolant incident. Note what’s needed to get Google to produce something about loss of coolant accident. The user must specify a context; otherwise, Google delivers lowest common denominator results. One can use Google Dorks to work about the Shakira problem, but let’s face it, very few people are into Google Dorks. (I include them in my OSINT lecture at the National Cyber Crime Conference in April 2022, but I know from experience that not even trained investigators are into Google Dorks.)

The third idea is that Google is embracing artificial intelligence. That makes sense because there are not enough people to process today’s flows of information in the old fashioned subject matter expert way. One must reduce costs in order to deliver “quality.” Does that seem an unusual pairing of improvements and search results? Think about it, please.

Now let’s step back. Here are some observations I jotted on a 4×6 notecard:

  1. Google uses people looking for online information to generate revenue from ads. That which produces more ad revenue is valued. The “quality” is a repurposing of a useful concept to the need to generate revenue. Shakira is the correct result for the “loca” query. That’s quality.
  2. The notion of testing is interesting. What’s the objective? The answer is generating revenue. Thus, the notion of testing is little more than steering or tuning search results to generate more revenue. The adjustments operate on several levels: Shaping understanding via filtering and producing revenue from search results. Simple, just not exactly what a user of an ad supported system thinks about when running a query for pizza.
  3. Smart software is the number one way for Google to [a] reduce costs, [b] deflect legal challenges to its search result shaping with the statement “The algorithm does, not a human”; and [c] create the illusion that Google search results are really smart. Use Google and you will be smarter too.

Believe these assertions? You’re the ideal Google user. Have doubts? You are not Googley. Don’t apply for a job at the Google and for heaven’s sake, don’t expect the Google outputs to be objective, just accept that some information is unfindable by design.

Google Dorks exist for a reason? Google has made finding relevant information more difficult than at any time in my professional career. And every year, the Google system becomes more detached from what most people believe fuels Google’s responses to what Google users need.

Yep, need. Sell ads. Reduce costs. Generate feedback into the system from user’s who have biases. Why are government agencies pushing back on outfits like Google? The quest for qualilty? Nope. The pushback reflects a growing awareness of disinformation, manipulation, and behavior that stifles options in my opinion.

Stephen E Arnold, March 31, 2022

Microsoft Search: Getting Better and Better

March 30, 2022

In early versions of Windows operating systems, the search function stank worse than rotting garbage in summer. Since the initial Windows deploy, Microsoft has improved the search function and as technology advances there are still upgrades to be made says Make Use Of in: “Microsoft Is Making Windows 11’s Search Function Better Than Ever.” In a refreshing take on its past mistakes, Microsoft admits that its former search tools were not the best. When it comes to Windows 11, Microsoft revamped the search into a quality tool and does not plan to rest on its laurels.

One of the best upgrades with the newest Windows 11 patch is the that search will be streamlined between work/business accounts. The search function will locate items on all accounts. Microsoft is also adding lifestyle widgets to make the OS more entertaining, such as a “word of the day” and altering users to Microsoft Reward offerings. Search will also take the place of Facebook and inform users of important dates, such as birthdays, anniversaries, and holidays. Whenever Microsoft releases a new Windows version, they do their best to get users to adopt the new OS:

“When Microsoft releases a new operating system, it always faces the same challenge. Users and businesses are comfortable with their operating system of choice, and now the Redmond tech giant has to convince them to upgrade to the newer one. The best way to do that is to make an operating system that improves upon the old one’s formula. As such, Microsoft’s touch-ups to Windows 11’s Search tool may be an effort to convince people to leave Windows 10 behind and adopt the newer, shinier system.”

Microsoft has a poor track record when it comes to system upgrades. They have a pattern of every other OS being a bad. Windows users might want to stick with Windows 10 a little longer and wait until Windows 12. It would be nice if Microsoft also added database search options like specific date, file name, Boolean, etc.

Whitney Grace, March 30, 2022

Google: A Redefinition of Relevance

March 21, 2022

It begins with the author’s search for a new toaster. That is the example The New Yorker‘s Kyle Chayka cites as he discusses “What Google Search Isn’t Showing You.” Of course, we know that Google sells ads. It does not deliver objective search. If you want objective search, you have to do actual research, not query free services which have to make money selling user data, ads, and analyses. That is why Chayka’s initial toaster hunt produced a dissatisfying, “cluttered onslaught of homogenous e-commerce options,” as he put it.

When Google Search launched in 1998, it was free of advertising and dedicated to supplying users with the best results. At the time, co-founders Sergey Brin and Lawrence Page wrote that advertising would interfere with that goal. Even so, they introduced ads two years later; their original hypothesis was, as it turns out, correct. Then there is the entire SEO racket that has developed around gaming Google’s algorithm. And let us not forget Google’s growing willingness to push its own interests to the top of results. Chayka writes:

“Decades of search-engine optimization have resulted in content that is formulated not to inform readers but to rank prominently on Google pages. That might be one reason that my toaster results felt so redundant: each site is attempting to solve the same algorithmic equation. Gabriel Weinberg, the C.E.O. of the privacy-focused search-engine company DuckDuckGo, cited three other sources of dissatisfaction with Google Search. The first is the company’s practice of tracking user behavior, which drives the kind of creepy, chasing-you-around-the-Internet advertising that Google profits from. The second is Google prioritizing its own services in search results, by, for instance, answering a travel query with Quick Answers pulled from Google Places instead of from a richer, more social source such as Tripadvisor. Lastly, Weinberg argued, users are simply tired of Google’s dominance over their experience of the Internet. Google is reportedly paying Apple upward of fifteen billion dollars a year to remain the default search engine on iPhones. On Google’s own Android phone, changing one’s preferred search engine requires a cumbersome settings adjustment, and pop-up messages along the way urge the user to switch back to Google.”

To say the company is taking advantage of its near-monopoly is an understatement. (Google Search makes up about 85% of the search market.) Besides DuckDuckGo‘s Weinberg, the article shares comments from two other alternative-search champions, Marginalia founder Viktor Lofgren and Are.na co-founder Daniel Pianetti. See the write-up for those perspectives. When applied to Chayka’s toaster queries, both these niche platforms returned unexpected results. The author found them interesting if not particularly helpful for the online shopper. We do not know where Chayka finally decided to purchase a new toaster, but his tangent reminds us how far Google has veered from its original philosophy.

Cynthia Murrell, March 21, 2022

Microsoft: Fun Search

March 17, 2022

We have censorship. We have discriminatory spidering. We have sites which are no longer indexed. And now if ZDNet’s “real” news team is on the money, we have search fun or fun search. You pick.

Microsoft Is About to Add More Fun to Your Windows Search” reports:

… the Windows 10’s taskbar search box and search home pane will now feature content “including fun illustrations, which help you discover more, be connected, and stay productive. Search highlights will present notable and interesting moments of what’s special about each day – like holidays, anniversaries, and other educational moments in time both globally and in your region.

Great. How about that Windows search. Do you have a Drobo or similar storage device. I bet that Windows search will make that “fun.” What about a desire to locate an actual file on the C: or boot drive? I bet Microsoft will make that fun too. And I could go on? For example, don’t you love Microsoft search syntax? And let’s not forget “unfindable” files. Yeah, that’s a winner too!

How about search that just works, includes Boolean, and provides one click access to sample syntax? That would be fun too.

Stephen E Arnold, March 17, 2022

Dashworks Promises To Be The Best Enterprise Search System

March 16, 2022

Search not only remains a fundamental component of working environments, but also daily life. Quickly locating information is essential, but if a search engine low quality results it clogs up routines. TechCrunch dives into the background of a robust enterprise search system: “Dashworks Is A Search Engine For Your Company’s Sprawling Internal Knowledge.”

Dashworks promises to be a comprehensive search system that scours everything from Slack threads to Dropbox files. It wants to be an organization’s one stop search solution for internal knowledge through one centralized hub. While its homepage is helpful with FAQs and bookmarks, its cross-tool search is the real selling feature:

“More impressive, though, is its cross-tool search. With backgrounds in natural language processing at companies like Facebook and Cresta, co-founders Prasad Kawthekar and Praty Sharma are building a tool that allows you to ask Dashworks questions and have them answered from the knowledge it’s gathered across all of those aforementioned Slack threads, or Jira tickets, or Dropbox files. It’ll give you a search results page of relevant files across the services you’ve hooked in — but if it thinks it knows the answer to your question, it’ll just bubble that answer right to the top of the page, Google Snippets style.”

Dashworks is compatible with over thirty popular services and more are being added all the time. Dashworks does require access to all the services, devices, and applications within an organization, which might be alarming but necessary for cross-tool search.

Dashworks is an excellent idea, but if an employee uses their own device will it engage with platforms that should remain personal? But a promise? Hmmm.

Whitney Grace, March 16, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta