OpenText: Goodwill Search

August 26, 2022

I spotted a short item in the weird orange newspaper called “Micro Focus Shares Jump After Takeover Bid from Canadian Rival.” (This short news item resides behind a paywall. Can’t locate it? Yeah, that’s a problem for some folks.)

What Micro Focus and Open Text are rivals? Interesting.

The key sentence is, in my opinion, ““OpenText agreed to buy its UK rival in an all-cash deal that values
the software developer at £5.1bn.”

Does Open Text have other search and retrieval properties? Yep.

Will Open Text become the big dog in enterprise search? Maybe. The persistent issue is the presence of Elasticsearch, which many developers of search based applications find to be better, faster, and chapter than many commercial offerings. (“Is BRS search user friendly and cheaper?”, ask I. The answer from my viewshed is ho ho ho.)

I want to pay attention going forward to this acquisition. I am curious about the answers to these questions:

  • How will the math work out? It was a cash deal and there is the cost of sales and support to evaluate.
  • Will the Micro Focus customers become really happy campers? It is possible there are some issues with the Micro Focus software.
  • How will Open Text support what appear to be competing options; for example, many of Open Text’s software systems strike me as duplicative. Perhaps centralizing technical development and providing an upgraded customer service solution using the company’s own software will reduce costs.

Notice I did not once mention Autonomy, Recommind, Fulcrum, or Tuxedo. (Reuters mentioned that Micro Focus was haunted by Autonomy’s ghost. Not me. No, no, no.)

Stephen E Arnold, August 26, 2022

Google: Redefines Quality. And What about Ads?

August 23, 2022

When I was working on The Google Legacy (Infonortics, 2004), I gathered information about Google’s method for determining quality. Prior to 2006, Google defined “quality” in a way different from the approach taken at professional indexing and commercial database companies. Professional organizations relied on subject matter experts’ views. Some firms — for example, the Courier Journal & Louisville Times, Predicasts, Engineering Index, the American Petroleum Institute, among others — were old fashioned. Commercial database firms with positive cash flows would hire specialists to provide ideas and suggestions for improving content selection and indexing. At the Courier Journal, we relied on Betty Eddison and a number of other professionals. We also hired honest-to-goodness people with advanced degrees to work on the content we produced.

Google pops up with jibber jabber about voting, a concept floated by an IBM Almaden researcher, and the notion of links and their value. As Google evolved, I collected a list of what amount4ed to 140 or so factors which were used by Google to determine the quality of content. At one time, Dr. Liz Liddy used my compilation as illustrative material for her classes in information science.

By 2006, Google shifted quality from its mysterious and somewhat orthogonal factors to what I call “ad quality.” The concept gained steam when Google acquired Applied Semantics and worked hard to relax a user’s query, match the query to a stack of ads to which the query would relate, and display these as “personalized” and targeted messages. Quality, therefore, became an automated process for working through ad revenue.

Since 2006, Google has been focused on ad revenue. My personal view is that Google has one stream of revenue: Ad revenue. Its other ventures have not demonstrated to me that the company can match its first “me too” innovation. If you don’t remember what that was, think about the Yahoo settlement related to the “inspiration” Google obtained from the and Overture “pay to play” system. The idea was that those with Web pages would pay to get their message in front of a service’s users.

Where is Google quality now? Is it anchored in editorial policies, old fashioned ideas like precision and recall? Is the Google using controlled vocabulary lists designed to allow precise queries? Is Google adding classification codes to disambiguate terms like terminal as in “computer terminal” or “airport terminal”?

Google’s Planned Search Changes Could Upend the Internet” reveals:

Google is trying to improve the quality of search results and reduce the number of misleading sites, misinformation, and clickbait users are subjected to.

I want to point out that the lack of precision and recall in Google’s approach is the firm’s notion that new Web sites are more important than older Web sites, traffic is more important than factual accuracy, and ad revenue goals are the strong force in the Google datasphere.

Thus, after a certain outfit headed by a search engine optimization crazed advanced the SEO “revolution”, the Google is, according the article:

As part of the change, the company will roll out its “helpful content update” to identify content that is primarily written to rank well in search engines and lower its rank. Sullivan says the update seems to especially benefit searches related to tech, online education, shopping, arts, and entertainment. The company is also working to improve access to high-quality reviews, ones that provide helpful, in-depth information.

Does this suggest that Google will focus on high-value content, explicit editorial policies, and professional indexing by subject matter experts?


It means quicker depletion of the ad inventory and an effort to cope with the fact that those in middle school and high school use TikTok for information.

Google is officially a dinobaby just one not very good at anything other than selling ads and steering its coal fired steam boat away from the rapids in today’s data flows. For serious information research Google is too consumer oriented. Search based applications are what some researchers prefer. The content in these systems comes from specialized crawls and collections.

The quality list? Old fashioned and antiquated. How much of Google fits in that category? SAIL on, steam boat. Chug chug chug. PR PR PR. Toot toot.

But what about traffic to sites affected by Google’s content rigor?

Just buy ads, of course.

Stephen E Arnold, August 23, 2022

Can Ducks Crawfish? DuckDuckGo Gives Reverse a Go

August 19, 2022

I read “DuckDuckGo removes Carve Out for Microsoft Tracking Scripts after Securing Policy Change.” I learned:

A few months on from a tracking controversy hitting privacy-centric search veteran, DuckDuckGo, the company has announced it’s been able to amend terms with Microsoft, its search syndication partner, that had previously meant its mobile browsers and browser extensions were prevented from blocking advertising requests made by Microsoft scripts on third party sites.

The write up contains Silicon Valley-type talk about how its bold action and deep thinking sparked the backwards duck walk.

I am not sure if ducks can walk backward. In fact, after a security company assured some folks that privacy was number one and then was outed as a warm snuggler of tracking, will I trust the Duck metasearch thing?

The answer is the same for any online service with log files: Nope.

Oh, for the record, some ducks can waddle backwards for a couple of steps and then they try to walk, hop, or swim forward. The backwards thing is an anomaly. Perhaps you have seen a duck do a bit of nifty backwards walking? I have but it was laughable. Some of my test queries on the Duck have been almost as amusing.

Stephen E Arnold, August 19, 2022

YouTube: Some Proof about Unfindable Content

August 17, 2022

I read “5 Sites to Discover the Best YouTube Channels and Creators Recommended for You.” The write up presents five services which make YouTube content “findable.” What I learned from the article is that YouTube videos are, for the most part, unfindable. A YouTuber can stumble upon a particular video and rely on Google’s unusual recommendation system. In my experience, that system is hobbled by its assorted filters and ad-magnetic methods. If I want to locate a video by eSysman (a fellow who reports about big money yachts loved by some money launderers and oligarchs), Google refers me to NautiStyles, YachtsForSale (quite a sales person is visible on that channel), or the flavor of the day like Bering Yachts. eSysman is the inspiration for one former CIA professional, and her edging into the value of open source intelligence. Does Google’s algorithm “sense” this? Nah, not a clue. What if I want some downhome cookin’ with Cowboy Kent, the chuck wagon totin’, trail hand feedin’ Oklahoma chef. Sorry, promoting Italian chefs are not what I was looking for. Cowboy cookin’ is not Italian restaurateurs showing that their skills are sharper than fry cooks in French restaurants. But what about YouTube search? Yes, isn’t it fantastic? Enough said.

What about the services identified in the article? Each offers different ways to find a video or channel on a specific or semi-specific topic. You can navigate to the source document and work your way through the list of curated “finder” sites.

The write up points out:

YouTube has over 50 million channels, but as you might have guessed, most of them aren’t worth subscribing to.

That’s the type of “oh, well, don’t worry statement” that drives me bonkers. Just let someone tell you what’s good. Go with it. Hey, no problemo. Who wants to consider the implications of hours of video uploaded every minute or the fact that there are 50 million channels from the Googlers’ service.

Several observations:

  1. No one knows what is on YouTube. I have some doubts that filters designed to eliminate certain types of content work particularly well. The idea that the Google screens each and every uploaded video with tools constantly updated to keep track of possibly improper videos is interesting to contemplate. Since no one knows what videos contain, how can one know what’s filtered, allowed in mistakenly, blocked inadvertently, or processed using methods not revealed to the public. (Lists of user “handles” can be quite useful for some purposes.)
  2. Are the channels no one can find actually worthless? I am not too sure. There are channels which present information about how to game the Google algorithm posted by alleged Google “partners.” I engaged in a dialogue with this “professional” and found the exchange quite disturbing. I located the huckster by accident, and I can guarantee that keeping track of this individual is not an easy task. Is that a task a Googler will undertake? Yeah, sure.
  3. YouTube search is one of the many “flavors” of information location the company offers. In my experience, none of the Google search services works very well or delivers on point information without frustration. Does this comment apply to Google Patent search? Yep. What about Google News search? Yep yep. What about regular Google search for company using a common word for its name? Yep yep yep. (Google doesn’t have a clue about a company field code, but it sure pushes ads unrelated to anything I search. I love mindless ads for the non-US content surveillance products that help me express myself clearly. Hey, no I won’t buy.)

Net net: YouTube’s utility is designed for Google ads. The murky methods used to filter content and the poor search and recommender systems illustrate why professional libraries and specific indexing guidelines were developed. Google, of course, thinks that type of dinobaby thinking is not hip.

Yes, it is. Unless Google tames the YouTube, the edifice could fall down. TikTok (which has zero effective search) may just knock a wall or trellis in the YouTube garden over. Google wants to be an avant guard non text giant. Even giants have vulnerable points. The article makes clear that third parties cannot do much to make information findable in YouTube. But in a TikTok world, who cares? Advertisers? Google stakeholders? Those who believe Google’s smart software is alive? I go for the software is alive crowd.

Stephen E Arnold, August 17, 2022

Google Innovates Again: Quick or Is That Semi-KWIC?

August 5, 2022

Innovation at the Googleplex never stops. Never. I read the online story “Google Updates Search Result Snippets for Queries with Quotes.” The write up reports that after more than two decades of defining Googling as searching:

Now Google will now show the quoted text in the snippet where that exact phrase appears on the page.

The idea upon which the quantumly supreme Googlers hit is that some context, not much, but some is helpful. No one has ever had this scintillating insight before. Amazing. Think about it. A person’s search for a quote returns some semi-context. I learned:

Google said they made this change based on searcher feedback, Google wrote “We’ve heard feedback that people doing quoted searches value seeing where the quoted material occurs on a page, rather than an overall description of the page. Our improvement is designed to help address this.”

A few observations? Sure, why not?

  1. Google ignores bound phrases and user defined phrases in quotes. Don’t you love the strike out for the key words in a query that do NOT appear in the results list? I do. Will this helpful feature be decremented or ignored?
  2. Key words in context has been a function for a long time. I am not motivated to dig through my 50 year archive of “search” ideas to locate the very first KWIC option. I think some of the long-forgotten online search systems offered this feature? Maybe Dialog circa 1980? Somewhere around there. I recall Carlos Cuadra talking about the function at an Information Industry Association 45 years ago. Yo, SDC experts, any thoughts?
  3. Is this the magic learnings of a former Verity wizard transporting inspirations to the GOOG?
  4. Will the Google allow the user to specify the size of the KWIC window? Sure, when Google discovers the function. What’s next? Boolean logic?

Wow. KWIC.

Stephen E Arnold, August 5, 2022

Google YouTube: Trying to Put Sand in Amazon and TikTok Product Search? Yep. Yep. Yep.

July 29, 2022

Most people don’t think too much about the impact of Amazon’s ecommerce search. It mostly works and the savvy shopper knows how to spot a third party reseller scam brand. (You do, don’t you?) Here’s a bit of anecdotal context. Amazon product search has chewed into Google search. In the post-Froogle years, Amazon sold online books. Then Amazon started adding products. With the products came reviews. Some reviews were Fiverr-type service generated but a few — the exact percentage like the number of bogus Twitter accounts — is not known.

People around the world use Amazon ecommerce search to find products, get basic information, and some useful, some misinformation about a particular product.

The impact on the Google has been significant. The number kicked around among my slightly dull research team is a decrease of 30 percent in product search in 2021. How does one know that Amazon has done more to cause pain at the Google than many know? Easy. Google took a former Verity wizard (you remember Verity, right?) and used high school reunion type pressure to get that person to indicate that Google product search was going to get a couple of steroid injections, a tummy tuck, and a butt lift. These are digital enhancements, of course. Google is not a humanoid, despite Google management’s insistence on its sentience.

YouTube and Shopify Just Started Livestream Selling and You Should Too” explains:

YouTube just announced a partnership with Shopify.

Yep, the company that media luminary and business wizard Scott Gallagher touted for several months on a popular podcast featuring insights and school yard humor. (Was Google won over by Guru Gallagher’s blend of insight and George Carlin thinking?)

The article points out:

Social selling is the shopping experience of the future.

The write up adds a bit of color to what seems like a “next big thing.” Spoiler: It’s not.

My reaction to the write up? The most important point should be that Google is racing (possibly out of control) to find a way to stop the loss of product search clicks. Hence, TikTok me too videos with product endorsements. Hence, a deal with a modern version of Yahoo stores. Hence, a tie up to use Shopify as a war horse.

My view: Too late. Amazon, TikTok, and a handful of other product centric ecommerce services are sitting behind their revenue ramparts. Google doesn’t have the weaponry it did before the erosion became noticeable in 2006. Froogle? Froogle? Long gone. But the spirit of Verity is here to claw back the product search traffic. Exciting.

Stephen E Arnold, July 29, 2022

Jargon Changes More Rapidly Than Search And Retrieval

July 22, 2022

Oh boy! There is a new term in the search and retrieval lexicon: neural search. While the term sounds like a search engine for telepaths or something a cyborg and/or android would use, Martech Series explained that it is something completely different: “Sinequa Adds Industry-Leading Neural Search Capabilities To Its Search Cloud Platform.”

Sinequa is an enterprise search leader and it recently announced the addition of advanced neural search capabilities to its Search Cloud Platform. The upgrade promises to provide unprecedented relevance, accuracy, etc. Sinequa is the first company to offer neural search in four deep learning language models commercially. The models are pre-trained with a combination of Sinequa’s trademark NLP and semantic search.

Search engines used neural search models for years, but they were not cost-effective for enterprise systems:

“Neural search models have been used in internet searches by Google and Bing since 2019, but computing requirements rendered them too costly and slow for most enterprises, especially at production scale. Sinequa optimized the models and collaborated with the Microsoft Azure and NVIDIA AI/ML teams to deliver a high performance, cost-efficient infrastructure to support intensive Neural Search workloads without a huge carbon footprint. Neural Search is optimized for Microsoft Azure and the latest NVIDIA A10 or A100 Tensor Core GPUs to efficiently process large amounts of unstructured data as well as user queries.”

Wonderful for Sinequa! Search and retrieval, especially in foreign languages are some of the biggest time wasters in productivity. Hopefully, Sinequa actually delivers an industry changing product, otherwise, they simply added more jargon to the tech glossary.

Whitney Grace, July 22, 2022

Commercializing Cyber Crime with Search and Retrieval

July 14, 2022

I read “Ransomware Gangs Offer Ability to Search Stolen Data.” The write up reports:

Bleeping Computer reported today that the ALPHV/BlackCat ransomware gang was the first to offer the feature, announcing that they have created a searchable database with leaks from nonpaying victims. The hackers said that their stolen data had been fully indexed and that the search feature included support for finding information by filename or by content available in documents and images. The BlackCat ransomware gang claims it is offering the search service to make it easier for cybercriminals to find passwords or other confidential information.

Other alleged bad actors are offering a search function as well. These are Lockbit and Karakurt.

Several observations:

  1. Commercialization of cyber crime has been a characteristic of some of the more forward-leaning bad actors
  2. The availability of open source search makes it easy to add functionality
  3. More productization is inevitable; for example, subscriptions to Crime as a Service.

Net net: The focus of crime analysts and investigators may have to embrace enablers like Internet Service Providers, cloud services, and open source code repositories.

Stephen E Arnold, July 14, 2022

Another Plea for Web Search That Sort of Works: Andrew Carnegie, Where Are You?

July 11, 2022

I am not going to do any history. Oh, well. Not really. Does anyone on TikTok know about Andrew Carnegie? Okay, let’s try another angle. How about a semi-rapacious dude with roots in Scotland who wanted to do good. Please, ignore the Carnegie era Monongahela River. The cheerful Mr. Carnegie came up with the idea of a free public library. Looking up information was a useful thing for poor folks and monopolistic steel barons alike. One person sort of fixed the “problem” of information access.

Flash forward to Backrub. Two bright young sprouts realized that a person had a tough time finding relevant information on Lycos and the other search engines available at “dawn” or the Internet. The fix? Take a little bit of Kleinberg, add a pinch of technology, use available computing resources whether others at Stanford University knew or cared, and mix in continuous feedback to a bundle of mostly automatic rules. More links in, good. Not many links in, meh. Then advertising. Yeah, that worked great for some. For others, ho ho ho.

The result is the weaponized findability environment of good old 2022.

What’s the fix? “Why the World Needs a Non-Profit Search Engine” explains that donors contribute money, and an objective Web search system will return relevant results. The write up states:

Sometimes I forget why I’ve taken on this crazy, huge task. Why am I building a search engine? Will it really be better than Google one day? Will people support it? Will people even use it? And then I read something like The Bullshit Web and I remember, that, yes, there is a point. Even if I make the web better for one person, it’s worth it. Because the way things are is just wrong. Search engines are in a unique position to fix the situation. Not only do we create a view on the world’s knowledge, we influence it too. If we promote bullshit-free sites, then people will create more bullshit-free sites. More importantly, search engines are a filter on the world’s knowledge. Do you really want your filter to be “whatever makes $SEARCH_ENGINE more money”, particularly when that means, “show ads instead of search results, and prioritize search results that also make us more money”? We can and should do better.

I want to point out that what may be required is an Andrew Carnegie type who already has money and a guilty conscience. It is a modern perception that if one can get lots and lots of people to contribute money, one can fund anything.

Nice idea. My response? “Where’s the Andrew Carnegie?”


Traffic means monetization. Do-gooding is walking on the information highway. One has to speed, and speed is infinitely expensive. Ergo: Monetization lies over the horizon.

Stephen E Arnold, July 11, 2022

Akn Unfindable Search Utillity: Wild Spelling and Naming Idea

July 7, 2022

I like to check out new Web search systems. Most are little more than recycled versions of, one of the most Abe Lincoln metasearch systems. A metasearch system uses hits from other search systems, possibly adds a bit of Vivisimo-type special sauce, and outputs results and rather crazy marketing materials.

The write up “This Badass Tool Makes Advanced YouTube Searches a Breeze” states:

This tool also allows you to perform advanced search on Google, DuckDuckGo, Twitter, and Reddit.

But the article is over the moon about the utility of the system when searching for content in Newton Minnow’s nightmare, YouTube. I learned:

I [the author of the article] think this cool tool is better suited to YouTube.

Let’s try to find the system using its name, ä1. Try plugging the ä1 into Google, and what do you get? I received hits for services wildly unrelated to search and retrieval:


What about Bing, the Microsofties’ wonderful, but small, search system:


Yep, childhood disease.

What about Yandex? No joy.


Let’s search for the ä1 site on the ä1 site. What do we get? Google results and no ä1 search overlay or service.

Net net: Innovators, use names which can be searched. (Not every one knows how to put the a with acne into a search box. Besides, most search systems discard such silliness as dots, checks, and circumflexes. Intellectual niceties are not part of the plan.) Pain in the a$$, not bad a$$ in my opinion.

If you want to try out the all-in-one “system” yourself, here’s the url: https://ä

Tip: How about a findable name?

Stephen E Arnold, July 7, 2022

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta