Google Search: An Intriguing Observation
August 9, 2021
I read “It’s Not SEO: Something Is Fundamentally Broken in Google Search.” I spotted this comment:
Many will remember how remarkably accurate searches were at initial release c. 2017; songs could be found by reciting lyrics, humming melodies, or vaguely describing the thematic or narrative thrust of the song. The picture is very different today. It’s almost impossible to get the system to return even slightly obscure tracks, even if one opens YouTube and reads the title verbatim.
The idea is that the issue resides within Google’s implementation of search and retrieval. I want to highlight this comment offered in the YCombinator Hacker News thread:
While the old guard in Google’s leadership had a genuine interest in developing a technically superior product, the current leaders are primarily concerned with making money. A well-functioning ranking algorithm is only one small part of the whole. As long as the search engine works well enough for the (money-making) main-stream searches, no one in Google’s leadership perceives a problem.
I have a different view of Google search. Let me offer a handful of observations from my shanty in rural Kentucky.
To begin, the original method for determining precision and recall is like a page of text photocopied with that copy then photocopied. After a couple of hundred photocopies, image of the page has degraded. Photocopy for a couple of decades and the document copy is less than helpful. Degradation in search subsystems is inevitable, and it takes place in search as layers or wrappers have been added around systems and methods.
Second, Google must generate revenue; otherwise, the machine will lose velocity, maybe suffer cash depravation. The recent spectacular financial payoffs are not directed at what I call “precision and recall search.” What’s happening, in my opinion, is that accelerated relaxation of queries makes it easier to “match” an ad. More — not necessarily more relevant — matching provides quicker depletion of the ad inventory, more revenue, more opportunities for Google sales partners to pitch ads, and more users believing Google results are the cat’s pajamas. To “go back” to antiquated ideas like precision and recall, relevance, and old-school Boolean breaks the money flow, adds costs, and a forces distasteful steps for those who want big paydays, bonuses, and the cash to solve death and other childish notions.
Third, this comment from Satellite2 is on the money:
Power users as a proportion of Internet’s total user count probably followed an inverted zipf distribution over time. At the begining 100%, then 99, 90%, 9% and now less than one percent. Assuming power users formulate search in ways that are irreconcilable from those of the average user, and assuming Google adapted their models, metrics to the average user and retrained them at each step,then, we are simply no longer a target market of Google.
I interpret this as implying that Google is no longer interested in delivering on point results. I now run the same query across a number of Web search systems and hunt for results which warrant direct inspection. I use, for example, iseek.com, swisscows.ch, yandex.ru, and a handful of other systems.
Net net: The degradation of Google began around 2005 and 2006. In the last 15 years, Google has become a golden goose for some stakeholders. The company’s search systems — where is that universal search baloney, please? — are going to be increasingly difficult to refine so that a user’s query is answered in a user-useful way.
Messrs. Brin and Page bailed, leaving a consultant-like management team. Was their a link between increased legal scrutiny, friskiness in the Google legal department, antics involving hard drugs and death on a Googler’s yacht, and “effciency oriented” applied technologies which have accelerated the cancer of relevance-free content. Facebook takes bullets for its high school management approach. Google, in my view, may be the pinnacle of the ethos of high school science club activities.
What’s the fix? Maybe a challenger from left field will displace the Google? Maybe a for-fee outfit like Infinity will make it to the big time? Maybe Chinese style censorship will put content squabbles in the basement? Maybe Google will simply become more frustrating to users?
The YouTube search case in the essay in Hacker News is spot on. But Google search — both basic and advanced search — is a service which poses risks to users. Where’s a date sort? A key word search? File type search? A federated search across blogs and news? What happened to file type search? Yada yada yada.
Like the long-dead dinosaurs, Googzilla is now watching the climate change. Snow is beginning to fall because the knowledge environment is changing. Hello, Darwin!
Stephen E Arnold, August 9, 2021
Strong Sinequa Helps Out Hapless Microsoft with Enterprise Search
August 9, 2021
Microsoft has enlisted aid or French entrepreneurs have jumped on the opportunity to enhance the already stellar software system available from the SolarWinds and Exchange Server misstep outfit.
Business Wire reveals in a hard hitting write up “Sinequa Brings Intelligent Search to Microsoft Teams” an exciting development. Wait, doesn’t Microsoft search work? Apparently Sinequa’s platform works better. We learn:
“Sinequa for Teams enables organizations to unleash the power of Sinequa’s Intelligent Search platform right within Microsoft Teams. … Sinequa continues to recognize the need to make knowledge discoverable so employees can make better decisions, regardless of where and how they work. The Sinequa platform offers a single access point to surface relevant insights both from within and outside the Microsoft ecosystem. Built for Azure and Microsoft 365 customers with Teams, Sinequa has extended its powerful search technology to Teams to help enterprises elevate productivity and enable better decision-making all in one place.”
The tailored Teams platform promises to improve data findability and analysis while bolstering collaboration and workflows. Sinequa is proud of its ability to provide enterprise search to large and complex organizations. Founded in 2002, the company is based in Paris, France.
Excellence knows no bounds.
Cynthia Murrell, August 9, 2021
Autonomy: An Interesting Legal Document
August 4, 2021
Years ago I did some work for Autonomy. I have followed the dispute between Hewlett Packard and Autonomy. Enterprise search has long been an interest of mine, and Autonomy had emerged as one of the most visible and widely known vendors of search and retrieval systems.
Today (August 3, 2021) I read “Hard Drives at Autonomy Offices Were Destroyed the Same Month CEO Lynch Quit, Extradition Trial Was Told.” The write up contains information with which I was not familiar.
In the write up is a link to “In the City of Westminster Magistrates’ Court The Government of the United States of America V Michael Richard Lynch Findings of Fact and Reasons.” That 62 page document contains a useful summary of the HP – Autonomy deal.
Several observations:
- Generating sustainable revenue for an enterprise search system and ancillary technology is difficult. This is an important fact for anyone engaged in search and retrieval.
- The actions summarized in the document provide a road map of what Autonomy did to maintain its story of success in what has been for decades a quite treacherous market niche. Search is particularly difficult, and vendors have found marketing a heck of a lot easier than delivering a system that meets users’ expectations.
- The information in the document suggests that the American judicial system may find this case a “bridge” between how corporate entities respond to the Wall Street demands for revenue and growth.
Like Fast Search & Transfer, executives found themselves making decisions which make search and retrieval a swamp. Flash forward to the present: Google search is shot through with adaptations to online advertising.
Perhaps the problem is that people expect software to deliver immediate, relevant results. Well, it is clear that most of the search and retrieval systems seeking sustainable revenues have learned that search can deliver good enough results. Good enough is not good enough, however.
Stephen E Arnold, August 4, 2021
Search Atlas Demonstrates Google Search Bias by Location
July 28, 2021
An article at Wired reminds us that Google Search is not the objective source of information it appears to many users. We learn that “A New Tool Shows How Google Results Vary Around the World.” Researchers and PhD students Rodrigo Ochigame of MIT and Katherine Ye of Carnegie Mellon University created Search Atlas, an experimental Google Search interface. The tool displays three different sets of results to the same query based on location and language, illustrating both cultural differences and government preferences. “Information borders,” they call it.
The first example involves image searches for “Tiananmen Square.” Users in the UK and Singapore are shown pictures of the government’s crackdown on student protests in 1989. Those in China, or elsewhere using the Chinese language setting, see pretty photos of a popular tourist destination. Google says the difference has nothing to do with censorship—they officially stopped cooperating with the Chinese government on that in 2010, after all. It is just a matter of localized results for those deemed likely to be planning a trip. Sure. Writer Tom Simonite describes more of the tool’s results:
“The Search Atlas collaborators also built maps and visualizations showing how search results can differ around the globe. One shows how searching for images of ‘God’ yields bearded Christian imagery in Europe and the Americas, images of Buddha in some Asian countries, and Arabic script for Allah in the Persian Gulf and northeast Africa. The Google spokesperson said the results reflect how its translation service converts the English term ‘God’ into words with more specific meanings for some languages, such as Allah in Arabic. Other information borders charted by the researchers don’t map straightforwardly onto national or language boundaries. Results for ‘how to combat climate change’ tend to divide island nations and countries on continents. In European countries such as Germany, the most common words in Google’s results related to policy measures such as energy conservation and international accords; for islands such as Mauritius and the Philippines, results were more likely to cite the enormity and immediacy of the threat of a changing climate, or harms such as sea level rise.”
Search Atlas is not yet widely available, but the researchers are examining ways to make it so. They presented it at last month’s Designing Interactive Systems conference and are testing a private beta. Of course, the tool cannot reveal the inner workings of Google’s closely held algorithms. It does, however, illustrate the outsized power the company has over who can access what information. As co-creator Ye observes:
“People ask search engines things they would never ask a person, and the things they happen to see in Google’s results can change their lives. It could be ‘How do I get an abortion?’ restaurants near you, or how you vote, or get a vaccine.”
The researchers point to Safiya Noble’s 2018 book “Algorithms of Oppression” as an inspiration for their work. They hope their project will bring the biased nature of search algorithms to the attention of a broader audience.
Cynthia Murrell, July 28, 2021
A Xoogler Wants to Do Search: Channeling Exalead and Smoking InfinitySearch?
July 16, 2021
Remember Exalead. This was a search engine created by a person who was asked to join the Google. The system was very good: 64 bit architecture, timely indexing of new and previously indexed sites, and novel features like searching via text for a specific point in an Exalead processed video. Now the system is part of Dassault Systèmes because senior management grew frustrated with one of the aggressively marketed “smart systems” available in the mid 2000s.
Now a Xoogler realizes that Google search is just an artifact of the Backrub search and retrieval system. What was “clever” in 1998 now generally a version of MySpace.com. Maybe anigifs are in the fridge waiting to become the next big thing at the GOOG.
Now there’s a new Google called Neeva, a subscription-based, allegedly non-tracking, ad-free alternative to Google. Plus, Neeva, is out of beta—let the marketing begin! Fast Company explores the new search engine and its developers in depth in, “Inside Neeva, the Ad-Free, Privacy-First Search Engine from Ex-Googlers.” (Keep in mind that InfinitySearch.co is a new search engine with an almost identical subscription business model. Haven’t heart of InfinitySearch? Hmmm. What about Okeano? Oh, not that system either? Hmmm.)
Co-founders Sridhar Ramaswamy and Vivek Raghunathan, who both used to work at Google, had front-row seats to the dominant search engine’s evolution. They were unhappy to see advertising become more and more intrusive over the years. They are betting many users are ready to pay a $4.95 a month to access what Google could have been if it were not in hot pursuit of the almighty ad dollar. Anyone who has been googling for years has watched ads migrate from a relatively unobtrusive position on the right of the page to the top of search results. For a while after that shift they were delineated by a shaded box, but now they suspiciously blend into the organic results. Google also started pushing links to its own services to the top, even when a competitor might better serve the searcher’s needs. The Fast Company write up states:
“Then there’s the fact that Google builds profiles of its users based on their online activity, the better to precisely target them with advertising not only at its own sites but all the other ones across the web whose ads are powered by Google. With no ads to serve up, Neeva shouldn’t leave privacy-conscious types feeling like they’re being monitored for ulterior purposes. (By default, Neeva does hold onto your searches for 90 days to improve the quality of features such as autosuggestions, but you can erase this log or tell the service you don’t want it to keep it in the first place.) In another break from search-engine tradition, Neeva says that it will turn at least 20 percent of its top-line revenue over to publishing partners, including the first two it’s announced, Quora and Medium. Though the details of where this could lead remain vague, it’s another attempt to set Neeva apart from Google, which has often been accused of benefiting from media outlets’ content without adequate compensation, a long-simmering dispute that has led to lawsuits and legislation.”
The founders hired on several other ex-Googlers. The team worked to create a platform that is close enough to their former employer’s to feel familiar while nixing all the advertising misery. To do this, Neeva blends its own indexing with results from Apple, Bing, Yelp, Intrinio, Weather.com, Xignite, and even Google Maps. McCracken reports the platform performs well for most tasks, falling short only on local searches. There is also the small inconvenience that, as of this writing, Chrome is the only browser that lets one set Neeva as the default search platform. Is this an acquisition-friendly move. See the Fast Company article for more on Neeva’s features as well as details on Ramaswamy’s and Raghunathan’s experiences that led them down the path to this adventure.
And you can check out Exalead search at this link. Yep, still online. May I suggest the Web, video, and forums search be expanded and enhanced. As I said, it was quite good.
Cynthia Murrell, July 16, 2021
Another New Web Search Engine: Zajjle
July 15, 2021
The DarkCyber research team tries to keep up with the Web search engines. Rarely does one come along with an index of content not generally included in the Bing, Google, Yandex systems or the new-kids-on-the-block like Metager or Neeva. According to “Arabic Search Engine plus Webmail and Data Analytics is Now Available at Zajjle”:
Zajjle is an Arabic search engine with many advantages. Besides offering the primary function as a search engine in the Arabic language, it also has additional features like webmail and website statistics & data analytics. People in Middle East countries can access the website in the Arabic language for searching the current news, website address, videos, and photos.
The write up states:
Ahmad A Najar, the Zajjle founder, is an entrepreneur who established CatchFood and Mazra3a.net. CatchFood is successful web-based restaurant management and food delivery platform. It has served countries like the United Arab Emirates, Jordan, Palestine, Iraq, Syria, and Lebanon, America, and Canada. Mazra3a.net is a popular Arabic agricultural platform connecting people interested in the agriculture industry to change ideas, increase knowledge, and communicate with professionals in the agriculture industry.
Will Zajjle index content deep in the US Department of Energy’s public facing Web sites? Will it snag content in Streamgun? What about content censored by mainstream systems?
You can explore the site at this link: http://www.zajjle.com/
Stephen E Arnold, July 15, 2021
Brave Privacy-Centric Search Arrives
July 5, 2021
Several online services that emphasize privacy have emerged in recent years, including the Brave browser. The San Francisco-based company is not stopping there. We learn from TechCrunch that “Brave’s Nontracking Search Engine is Now in Beta.” Earlier this year, Brave acquired Cliqz, a company which had developed an anti-tracking browser with its own search platform. That system’s Tailcat technology will underpin Brave Search. The company also offers Brave Ads, a way to make money while preserving users’ privacy. Brave is different from other non-tracking Google alternatives like DuckDuckGo because it is using its own, independent index. Reporter Natasha Lomas writes:
“Brave touts its eponymous search offering as having a number of differentiating features versus rivals (including smaller rivals) — such as its own index, which it also says gives it independence from other search providers. Why is having an independent index important? We put that question to Josep M. Pujol, chief of search at Brave, who told us: ‘… More choices will entail more freedom and also get back to real competition, with checks and balances. Choice can only be achieved by being independent, as if we do not have our own index, then we are just a layer of paint on top of Google and Bing, unable to change much or anything` in the results for users’ queries. Not having your own index, as with certain search engines, gives the impression of choice, but in reality such engine “skins’” are the same players as the big-two. Only by building our own index, which is a costly proposition, will we be in a position to offer true choice to the users for the benefit of all, whether they are Brave Search users or not.’”
It should be noted that for certain functions, like image searches, Brave currently relies on other search providers to ensure relevant results. For now. Otherwise it turns to anonymized community contributions to refine its index’s results and will soon provide “community-curated open ranking models” in an effort to combat censorship and AI bias. The company plans to offer both a free option supported by ads and a paid, ad-free version we are told will be “affordable.”
We are running test queries, and the results are promising. There are other services becoming available too. I like Swisscows.ch. But I like cows.
Let us hope Brave’s efforts result in an index that is better than what has gone before. The increasing number of search options is a signal that Google search has failed in its basic mission. The problem is that millions don’t know what they are missing. Undisclosed omissions and obfuscated distortions are worse than guessing or asking friends.
Cynthia Murrell, July 5, 2021
Google and Unreliable Results: Like the Jack Benny One Liner, I Am Thinking, I Am Thinking
June 25, 2021
I read a “real” news story called “Google Is Starting to Warn Users When It Doesn’t Have a Reliable Answer.” (No, I will not ask, What’s reliable mean.)
Here’s the statement which snagged my attention in the write up:
“When anybody does a search on Google, we’re trying to show you the most relevant, reliable information we can,” said Danny Sullivan, a public liaison for Google Search. “But we get a lot of things that are entirely new,” Sullivan said the notice isn’t saying that what you’re seeing in search results is right or wrong — but that it’s a changing situation, and more information may come out later.
I think Mr. Sullivan, a former search engine optimization guru and conference organizer, is the “new” Matt Cutts, a Google professional helping to point the way to the digital future at the US government. Is key word packing the path to more patents than China?
I loved this statement which I know is pretty Tasmanian devil like: “Most relevant, reliable information we can.” I did a query for garage floor epoxy coating in Louisville. I gathered about 20 businesses display on the first two pages of Google search results. Two companies were in this business. Others were out of business. One “company” called me back and said, “My loser son has been gone for two years.”
I have other examples as well of search either being out of date, spoofed, or just weird.
Let’s look at some of the reasons why Google made a statement about “reliable answers.”
First, I think the difficulty of providing real-time indexing is beyond three Google capabilities: Outfits with real time content won’t play ball with Google unless Google pays up and works out a mechanism to move the content to a Google indexing queue. (Yep, queue as in long line at the McDonald’s drive through.)
Second, Google is not set up to do real time. I think the notion of having a short list of “must ping frequently sites” may be a hold over from the distant past. The reason? As the cost of indexing, updating, and making the Google indexes “consistent” – some of the practices no longer fit the current iteration of “relevant” and “reliable.” Google is not Twitter, and it is not Facebook. Therefore, the pipelines for real time content simply don’t exist. Googlers tried but seemed to be better at selling ads than dealing with new content types.
Third, hot info appears in non text form on Instagram, TikTok, and even places like DailyMotion and Vimeo sometimes days before the content plops into YouTube. Ever try to locate a video using the creator assigned index terms. That’s an exercise in futility. Ads, gentle reader, not relevant and reliable information.
From my vantage point on the porch overlooking a mine drainage pond, I have some hypotheses:
- Google is under financial pressure, a competitive pressure from Amazon and Facebook, and a legal pressure. Almost any nation state with an appetite to drag the Google into court is in gear.
- Google is just not able to handle the real time flows of content, either textual or imagery. Too bad, but that’s the excitement of Hegel’s these, antithese, synthese which “real” Googlers learn along with search engine optimization marketing methods.
- Google’s propagandistic and jingoistic assurances that it returns relevant and reliable results is more and more widely seen as key word spam.
- Google’s management methods are not tuned for the current business environment. I may be alone in noticing that high school science club thinking and management from assumed superiority is out of favor. (If Sergey Brin were to ride a Russian rocket into space, wou8ld he attract more signatures that Jeff Bezos. The quasi referendum did not want Mr. Bezos to return to earth. Mr. Brin’s ride did not materialize, so I won’t know who “won” the most votes.)
Net net: Relevant and reliable. That’s a line worthy of Jack Benny when he is asked about Fred Allen. I give up, “What does ‘reliable’ mean, Googlers.” My suggestion is marketing hoo haa with metatags.
Stephen E Arnold, June 25, 2021
X1 Embraces Social Media and Collections Search for eDiscovery
June 22, 2021
It looks like eDiscovery firm X1 is moving beyond enterprise search into collection-centric search. They have sent around a memo announcing their “Defensible Social Media and Web Collections On-Demand.” The notice explains:
“Taking on the process of capturing evidence in a forensically sound manner can be challenging, time consuming and sometimes impossible with ever-increasing workloads. Why not outsource the collection portion of the process by letting our team of experts perform the job for you? With X1 Social Discovery On-Demand, X1’s forensic experts capture the data you need in a legally defensible manner, alleviating any headaches or worry about ESI collection.
- Social Media and Web Capture Collection – save critical time by leveraging the expertise of X1 for efficient and accurate collections
- Defensible Metadata Collection – unlike the ‘print screen’ approach, with X1 key metadata is included with all captures and deliverables with chain of custody preserved throughout
- Experienced Service Support – X1 works with you to understand your collection scope and deliverable requirements up front bringing timely, authenticated results
- Choose from Several Different Export Options – Concordance Load File, CSV, PDF, HTML for clear and accurate output
- Get Started Right Away – engage with an X1 Solutions Consultant and start the collection process same-day”
Information on the X1 Social Discovery and the X1E Remote Collection On-Demand can be found on the company’s products page. At the time of this writing, new customers can save 50% off their first collection for up to 10 accounts. We do not know how long the “limited time” offer will last. We also cannot speak for or against the solutions since we have not tried them ourselves. We find the development interesting, though. Founded in 2003, the evolving small business is based in Los Angeles.
Search is becoming policeware.
Cynthia Murrell, June 22, 2021
Ninfex Is a New Take on Internet Search
June 10, 2021
The creator of “experimental” search site Ninfex is trying a different approach that uses neither crawlers nor AI. The site’s Hello page explains:
“We rely on 2 proxies for search relevance. First: URL score (user votes). Second: Links to discussions on major forums. All submissions on Ninfex are votable by users. When you submit a link, you can also submit up to 5 supporting links (to discussions about that link) from external forums. Among the current submissions you are most likely to see forum links to reddit, hackernews, lobsters, stackexchange & tildes because those are the forums that I frequent most often.”
Yes, the young site still leans heavily toward material based on its maker’s interests, but that could change as its usership grows. The founder, who goes by the name traindreams, writes:
“I am the maker of Ninfex and right now I’m actively pushing to build the index around my personal wiki / research notes / bookmarks. That is why, the home feed mostly contains topics of my interest. The following is a list of those topics.”
See the page to explore that list of diverse and interesting topics, from Art to Psychology to Startups. Perhaps you will be inspired to vote or add a link. Traindreams has already made some changes based on user feedback, like cleaning up the UI, and promises more to come as the index grows. It looks like the idea is quality over quantity; we are curious to see if the enterprise takes off.
Cynthia Murrell, June 10, 2021