Microsoft: Fun Search
March 17, 2022
We have censorship. We have discriminatory spidering. We have sites which are no longer indexed. And now if ZDNet’s “real” news team is on the money, we have search fun or fun search. You pick.
“Microsoft Is About to Add More Fun to Your Windows Search” reports:
… the Windows 10’s taskbar search box and search home pane will now feature content “including fun illustrations, which help you discover more, be connected, and stay productive. Search highlights will present notable and interesting moments of what’s special about each day – like holidays, anniversaries, and other educational moments in time both globally and in your region.
Great. How about that Windows search. Do you have a Drobo or similar storage device. I bet that Windows search will make that “fun.” What about a desire to locate an actual file on the C: or boot drive? I bet Microsoft will make that fun too. And I could go on? For example, don’t you love Microsoft search syntax? And let’s not forget “unfindable” files. Yeah, that’s a winner too!
How about search that just works, includes Boolean, and provides one click access to sample syntax? That would be fun too.
Stephen E Arnold, March 17, 2022
Dashworks Promises To Be The Best Enterprise Search System
March 16, 2022
Search not only remains a fundamental component of working environments, but also daily life. Quickly locating information is essential, but if a search engine low quality results it clogs up routines. TechCrunch dives into the background of a robust enterprise search system: “Dashworks Is A Search Engine For Your Company’s Sprawling Internal Knowledge.”
Dashworks promises to be a comprehensive search system that scours everything from Slack threads to Dropbox files. It wants to be an organization’s one stop search solution for internal knowledge through one centralized hub. While its homepage is helpful with FAQs and bookmarks, its cross-tool search is the real selling feature:
“More impressive, though, is its cross-tool search. With backgrounds in natural language processing at companies like Facebook and Cresta, co-founders Prasad Kawthekar and Praty Sharma are building a tool that allows you to ask Dashworks questions and have them answered from the knowledge it’s gathered across all of those aforementioned Slack threads, or Jira tickets, or Dropbox files. It’ll give you a search results page of relevant files across the services you’ve hooked in — but if it thinks it knows the answer to your question, it’ll just bubble that answer right to the top of the page, Google Snippets style.”
Dashworks is compatible with over thirty popular services and more are being added all the time. Dashworks does require access to all the services, devices, and applications within an organization, which might be alarming but necessary for cross-tool search.
Dashworks is an excellent idea, but if an employee uses their own device will it engage with platforms that should remain personal? But a promise? Hmmm.
Whitney Grace, March 16, 2022
DuckDuckWent: Can a Search System Float in the Same Content Stream Again?
March 11, 2022
I read “DuckDuckGo Ends Neutrality, Will Down-Rank Sites Associated with Russian Disinformation.” Recognizing disinformation can be tricky. Using the word Russian may make the job easier.
I am not going to get into a philosophical discussion.
For me the important point of DuckDuckGo’s decision to have an editorial policy (often called censorship) is captured in this passage from the source document:
A change in direction.
I would like to see DuckDuckGo be upfront about:
- The source of its search index
- The number of content objects compared to the indexes of Swisscows, Google, and Brave Search
- How deduplication works
Responding to Russia is a waddle but more steps are needed. Waddle along, DuckDuck, please.
Stephen E Arnold, March 11, 2022
Yandex: Is It Time to Say Hello, Goodbye?
March 9, 2022
For about 80 to 90 percent of the people in North America and Western Europe, “search” means Googzilla’s service. Is it useful? Legions will say, “Google’s search service is the bestest ever.” Others are more comfortable running queries on Exalead Search, Swisscows, and one of the new kids on the block like Kagi or Wecript, among others.
My personal plan of attack, as I shared with the founder of Kagi, is to run specific queries across a group of selected search engines. (Sorry, I don’t provide those in this unloved, and mostly ignored free blog. However, if you attend my 2022 National Cyber Crime Conference lecture on finding information, you will get a list of about 500 useful search/content services.)
Why am I talking about “free” or ad-supported Web search. Three reasons:
- Today’s search “experts” don’t pay much attention to the lack of overlap in results. Hey, reading pages of results and cross checking them is too annoying. Let’s do the TikTok thing is the way to go.
- Web search engines do not disclose what I call the “editorial policy.” How often does Googzilla update results eight links deep on the Department of Energy’s public facing Web site? Or, where does DuckDuckGo get its results? Or, why doesn’t IxQuick/StartPage disclose which search systems generate its results? Or why are Gigablast results for images not really images? If one discloses an editorial policy, then the shallowness, freshness, and bias of the spidering mechanisms is disclosed. Who wants that? Certainly not the Web search outfits.
- Serious or professional Web search systems charge money and deliver high value results simply not obtainable via free Web search systems. Why don’t these outfits market to the users of free Web search systems? These outfits don’t want to end up in an RV at the Israel River Campground in the White Mountains. A low profile is a prudent profile.
I noted this article “Russian Tech Giant Yandex Says Might Default” on Friday, March 3, 2022. I have no idea if the information in the write up is accurate, but it is suggestive. I learned that the Russian Web search engine, which is “free”, may be a goner. I noted this passage:
…the company, often called the “Russian Google” for its size and breadth of services, said that if it is suspended for more than five trading days, owners of certain bonds will legally be able to redeem their debt with interest. “The Yandex group as a whole does not currently have sufficient resources to redeem the notes in full,” the company said.
The language “suspended” and “sufficient resources” are to my way of thinking a flashing yellow light. Could that light go red?
Yandex might be hauled off to the Web search system grave yard. How will this affect Googzilla? Not at all. However, start up Web search outfits may be in a position to hit up funding sources for more cash in order to provide Yandex users with a viable option.
That sounds like a slide deck phrase, doesn’t it.
Stephen E Arnold, March 9, 2022
As Privacy Concerns Grow, So Do Search Alternatives
February 23, 2022
Google is sure to remain king of the online search hill for the foreseeable future, but Spark takes a look at a couple burgeoning alternatives in the post, “Search Engines Try to Rival Google by Offering Fewer Ads, More Privacy.” Writer Jonathan Ore begins with Neeva, founded by ex-Googler Sridhar Ramaswamy.
“[Ramaswamy] bills Neeva as an ad-free, private search engine. Results won’t include advertisements, and the company says any information it does collect from users isn’t shared with third parties. That ad-free experience does come with a cost, however: a subscription fee of $5 US per month, after a three-month trial period. Ramaswamy argues that no search engine is truly free, as users end up paying with all the advertisements and affiliate links clogging up search results, making it harder to find the things they actually want.”
That is one way to look at it. We would add that Neeva does have a free version, but naturally hopes users will be enticed to upgrade. Ore notes that, though Neeva emphasizes privacy, it does collect certain user data—like one’s email address, IP address, location data, browser, and OS. The platform uses this information to improve function and performance, states its privacy policy, but promises not to share any of it with third parties.
Next, the write-up takes a look at You.com, a platform that seems tailored to younger audiences. We learn:
“Rather than a mostly-linear list of results sorted in order of relevance or accuracy, You.com displays search results in a grid-like format. It also lets users ‘upvote’ and ‘downvote’ individual results, directly affecting their rankings in future searches. That added flexibility comes at the cost of simplicity, though; The Verge’s Adi Robertson said its layout can appear ‘overwhelming and sort of cluttered’ to anyone used to Google’s linear approach. Co-founder Richard Socher said, however, that he found younger users used to other social media platforms like Instagram or TikTok, which display content in tiles both vertically and horizontally, were able to quickly acclimate themselves to You.com’s unique layout.”
Like Neeva, You.com also emphasizes privacy and refuses to sell user data to advertisers. Can such search platforms really take out Google? Don’t be silly—of course not. But the write-up cites DuckDuckGo as an example of success. That privacy-centric service, launched in 2008, now processes tens of millions of searches daily and employs over 140 workers. Is ad-addicted Google bothered? Probably not. It can well afford to lose such small slices of the search pie and remain decisively in the lead.
Cynthia Murrell, February 23, 2022
Algolia: A New Approach to Relevance?
February 21, 2022
Algolia is a company providing search and retrieval services to a number of companies. A call for résumés provides some interesting assertions about the company, its philosophy, and its goal.
The goal interests me. The posting on the Algolia Web site says:
Our mission is to return relevant search results at all times and give companies the ability to tailor those results to their own specific needs. We are tasked with reimagining a core piece of Algolia’s technology: how search results are ranked. To that end, we are still early in our building, and we are looking for someone who can help us perform experiments and manage the technical aspects of our pilot program, including building clients for users to test our work and tools to evaluate the impact of our changes.
I like the idea of tailoring “search” which is certainly okay if someone knows for that which the individual is looking. I like the idea of ranking because relevance is — to some people — helpful. I like the idea that the company is “early in building.” The right person with the right stuff will make an impact. I like the idea of measuring results, which works reasonably well when the people in the same know that which they need to find.
There are several challenges in delivering or finding better ways to rank search results.
First, today the idea of knowing the corpus and using old-fashioned techniques like precision and recall are not as sexy as capsule network or caps net methods.
Second, users who want to formulate complex search queries like those required to extract semi useful information from Google or a Dataminr feed of social media are rare birds. I heard at one big search outfit that fewer than three percent of queries are a result of complex search statements; for example, site: or filetype:. Serving experts, analysts, and intelligence professionals is different from serving the ingredients for a Sicilian pizza.
Third, the now threadbare truism of lots of data, changing rapidly, and incorporating different content types and a veritable fun house of metadata requires some innovation. So far the best efforts of some bright folks have led to outright failure (Autonomy, Fast Search & Transfer, et al) or recycling endlessly with minor variations the functionality of everyone’s favorite fighter of Amazon, Elastic.
I noted some interesting supportive information in the write up; for instance, the candidate with right stuff must have grit (the sort of effort required to get an advanced degree from MINES ParisTech or Université Paris Saclay or the toughness required to deal with a wealthy family or a generational link to the Capetians. Other ingredients in the “right stuff” trois étoiles cannelés of Bordeaux:
- Trust
- Care
- Candor
- Humility
I am eager to explore the new approach to relevance. But I harbor an abiding affection for a clear explanation of the content indexed and good old Boolean logic. Snorkels, caps nets, and a 21sst century approach to relevance? Meh.
Stephen E Arnold, February 21, 2022
A Google Dork for Everyone
February 21, 2022
In my lectures about open source intelligence for law enforcement and other government professionals, I mention Google Dorks. I won’t go into detail, but the “dork” is a fancy way of saying a person who is an information professional with a knowledge of specialized commands can get semi-on point results from the online ad outfit. See for example this link. Do Googlers wear T shirts emblazoned with the phrase “Don’t be evil.” I saw such a shirt with the message “Don’t be Google,” but I may have misread.
What’s interesting is that Google Dorking is finding its way into the mainstream of the people who perceive themselves as “experts in online research.” Yep, the expertise is often similar to mastering an automatic teller machine, but that’s possibly a characteristic of our Covid era.
“Google Search Is Dying” has undergone a number of updates. The write up states:
Google still gives decent results for many other categories, especially when it comes to factual information. You might think that Google results are pretty good for you, and you have no idea what I’m talking about. What you don’t realize is that you’ve been self-censoring yourself from searching most of the things you would have wanted to search. You already know subconsciously that Google isn’t going to return a good result.
The punch line is “Google is dying.” Yeah, no kidding. When the wizard from Verity and Yahoo got involved, it was not dying. It was gifted a MOAB (that is the mother of all bombs or a disconnect from a query and stuff like precision and recall).
So what’s the fix?
A Google Dork.
Enter a query and stick “reddit” in the query. The idea is that some entity (bot or humanoid) will have posted more useful, authentic, relevant information on that service. One can be sporty and try wiki at the end of a query as well.
Google Dorking for everyone even the self proclaimed experts in online information search and retrieval! The challenge is that Google advertising is pumping cash, and that plus the bonuses for senior management is what makes Google search the outstanding service it is.
Stephen E Arnold, February 21, 2022
Google Observations: A Hoot and a Maybe Bit Frightening If Statements Are Accurate
February 4, 2022
I read an item on Hacker News which “tells” about an issue/observation. The comment points out that certain queries generate links on a search result page which point to questionable content. Interesting, but news? Not in Harrods Creek, the technology centroid of the world.
What is quite fascinating in the short article? The comments. Yep, the comments. There are quite a few gems scattered in the trollite outcrops.
Here are a few examples with the “names” of the entity generating the output. Remember. I am just sharing. These are not my observations, comments, or ideas. In fact, we think the current version of the Google is a heck of a lot better than Version 2.0 which I wrote a monograph about many years ago.
- “nobody gets promoted in Google for doing their job well. Only for inventing a new job to do.” – reaperducer
- “It was my mistake. I trusted Google.” — Silisili
- “I work for Google Search. We are looking into this.” – SullilvanDanny
- “My wife recently received, in her inbox, a spoofed email from her own email address on Gmail.” – andrewmcwatters
- “There is no end to The Greed.” – JayTaylor
Stephen E Arnold, February 4, 2022
Mike Lynch: Going to America?
January 29, 2022
I noted the Beeb’s article “Mike Lynch: Priti Patel Approves Extradition of Autonomy Founder.” The write up states:
Home Secretary Priti Patel has approved the extradition of a British tech tycoon to the US to face criminal fraud charges. The decision comes after Mike Lynch, the founder of Autonomy, lost a multibillion-dollar fraud action in London on Friday.
Welp.
A Home Office spokesperson said: “Under the Extradition Act 2003, the secretary of state must sign an extradition order if there are no grounds to prohibit the order being made. Extradition requests are only sent to the home secretary once a judge decides it can proceed after considering various aspects of the case. On 28 January, following consideration by the courts, the extradition of Dr Michael Lynch to the US was ordered.”
The Beeb’s write up includes some biographical information:
Cambridge graduate Mr Lynch, 56, built Autonomy up to be one of the top 100 UK public companies. In 2006, he was awarded an OBE for services to enterprise. A fellow of the Royal Society, Mr Lynch, who lives in Suffolk, previously advised the government and sat on the boards of the British Library and the BBC.
The brief summary omits some interesting information; for example, the Bayesian influence and the architecture of a system which would influence decades of content processing systems. More information is available on my Xenky.com site at this link: https://bit.ly/3IQTwgz
Stephen E Arnold, January 29, 2022
Hewlett Packard Autonomy: A Decision of Sorts
January 28, 2022
I read “HPE Has Substantially Succeeded in Its £3.3bn Fraud Trial against Autonomy’s Mike Lynch – Judge.” The write up reports that buyer beware is not a legal argument. It appears that more litigation awaits Mike Lynch in the US. I noted one interesting statement in the very good summary of the UK legal activities:
Autonomy, which told the market it was a “pure play” software company, accounted for its substantial hardware sales by burying them inside its sales and marketing revenue instead of breaking them out separately.
I am delighted I am not an attorney. I am a retired knowledge worker who has some familiarity with the general technology used by Autonomy and I did some work for the company years ago.
My uninformed view is that Hewlett Packard was looking for a home run when Léo Apotheker (formerly SAP and owner of the TREX search technology), ignored realities about the search and content processing revenue ceilings. Hewlett Packard, it seems to me, pushed forward, ignored inputs, and paid the what might be called the Ford Bronco surcharge.
What happens when a used vehicle sales professional explains the sidewalk guarantee to the buyer? Nothing. Buyers often do not do their homework, are too excited about the deal, or just don’t care about the future until it arrives. Oh, oh. Are there lemon laws for content processing platforms? I suppose the question will be answered US style in the coming months.
Stephen E Arnold, January 28, 2022