How about That Subscription Web Search Model?

January 24, 2022

Former Googlers Sridhar Ramaswamy and Vivek Raghunathan are refining their paid, privacy-centric search platform Neeva. We have followed this development from the 2020 beta through the 2021 official launch. Now we learn Neeva has added a free tier from The Next Web’s piece, “How a Couple of Ex-Googlers Are Trying to Fix What’s Wrong with Search Engines.” It appears not enough users are (yet) willing to pay the low, low price of $4.95 per month for search and the team is looking to upsell about 5% of those who sign on for free. It might be a good bet—Ramaswamy reports that a third of folks who sampled the free trial have subscribed. Even he was surprised users cited the peaceful, ad-free screen as their favorite feature. Reporter Ivan Mehta writes:

“[Neeva] will offer ad-free search with customizations, and integration to accounts such as Gmail, Microsoft Office, and Dropbox. People who’re paying for Neeva’s services will get all of this, a leading third-party VPN and a password manager service, and advanced features, like a monthly Q&A. As far as search engine features go, Neeva offers customizations, such as being able to see particular sites in results more or less. You can also ‘skip’ an ecommerce site in results, or get the whole recipe for a dish without having to visit a site. What’s more, the new search engine lets your look through your email right from the search bar. And if you install Neeva’s extension, it also blocks ad trackers that are collecting your browsing data. Last October, Neeva also launched a 1-click Fasttap search geared towards mobile where users just need to type a phrase to get accurate search results. It’s like Google auto-complete on steroids.”

The write-up includes a few screenshots of Neeva features in action. Regarding the how-to behind it all, Mehta tells us:

“On the technological side, while Neeva is aggregating some search results from Bing, the company is building its own crawler and looking at billions of pages every day. But as Raghunathan pointed out in his FastCompany interview earlier this month, crawling the web to create a new index while maintaining privacy standards is hard.”

Perhaps if anyone is up to the task, it is these two Xooglers. As of yet, Neeva is only available in the US, but the company hopes to become global. The plan is to expand into India and Western Europe “soon.” One tactic it is using to compete against the likes of privacy-focused DuckDuckGo and Brave is its partnership with news rating agency NewsGuard, which is helping it assess the accuracy of information. We wonder whether such features plus the free-tier offering will help Neeva reach its stated goal: to become the primary search engine for millions of privacy-centered users in the next two years.

Are there monetization options? The Point team is available to offer some ideas. Just write benkent2020 at yahoo dot com. We’ve been there and know the subscription method was a loser decades ago.

Cynthia Murrell, January 24, 2021

New Search Platform Focuses on Protecting Intellectual Property

January 21, 2022

Here is a startup offering a new search engine, now in beta. Huski uses AI to help companies big and small reveal anyone infringing on their intellectual property, be it text or images. It also promises solutions for title optimization and even legal counsel. The platform was developed by a team of startup engineers and intellectual property litigation pros who say they want to support innovative businesses from the planning stage through protection and monitoring. The Technology page describes how the platform works:

“* Image Recognition: Our deep learning-based image recognition algorithm scans millions of product listings online to quickly and accurately find potentially infringing listings with images containing the protected product.

* Natural Language Processing: Our machine learning algorithm detects infringements based on listing information such as price, product description, and customer reviews, while simultaneously improving its accuracy based on patterns it finds among confirmed infringements.

* Largest Knowledge Graph in the Field: Our knowledge graph connects entities such as products, trademarks, and lawsuits in an expansive network. Our AI systems gather data across the web 24/7 so that you can easily base decisions on the most up-to-date information.

* AI-Powered Smart Insights: What does it mean to your brands and listings when a new trademark pops out? How about when a new infringement case pops out? We’ll help you discover the related insights that you may never know otherwise.

* Big Data: All of the above intelligence is being derived from the data universe of the eCommerce, intellectual property, and trademark litigation. Our data engine is the biggest ‘black hole’ in that universe.”

Founder Guan Wang and his team promise a lot here, but only time will tell if they can back it up. Launched in the challenging year of 2020, Huski.ai is based in Silicon Valley but it looks like it does much of its work online. The niche is not without competition, however. Perhaps a Huski will cause the competition to run away?

Cynthia Murrell, January 21, 2021

Search Quality: 2022 Style

January 11, 2022

I read the interesting “Is Google Search Deteriorating? Measuring Google’s Search Quality in 2022?” The approach is different from what was the approach used at the commercial database outfits for which I worked decades ago. We knew what our editorial policy was; that is, we could tell a person exactly what was indexed, how it was indexed, how classification codes were assigned, and what the field codes were for each item in our database. (A field code for those who have never encountered the term means an index term which disambiguates a computer terminal from an airport terminal.) When we tested a search engine — for example, a touch of the DataStar systems — we could determine the precision and recall of the result set. This was math, not an opinion. Yep, we had automatic indexing routines, but we relied primarily on human editors and subject matter experts with a consultant or two tossed in for good measure. (A tip of the Silent 700 paper feed to you, Betty Eddison.)

The cited article takes a different approach. It is mostly subjective. The results of the analysis is that Google is better than Bing. Here’s a key passage:

So Google does outperform Bing (the difference is statistically significant)…

Okay, statistics.

Several observations:

First, I am not sure either Bing’s search team or Google’s search team knows what is in the indexes at any point in time. I assume someone could look, but I know from first hand experience that the young wizards are not interested in the scope of an index. The interest is reducing the load or computational cost of indexing new content objects and updating certain content objects, discarding content domains which don’t pay for their computational costs, and similar MBA inspired engineering efficiencies. Nobody gets a bonus for knowing what’s indexed, when, why, and whether that index set is comprehensive. How deep does Google go unloved Web sites like the Railway Retirement Board?

Second, without time benchmarks and hard data about precision and recall, the subjective approach to evaluating search results misses the point of Bing and Google. These are systems which must generate revenue. Bing has been late to the party, but the Redmond security champs are giving ad sales the old college drop out try.  (A tip of the hat to MSFT’s eternal freshman, Bill Gates, too.) The results which are relevant are the ones that by some algorithmic cartwheels burn through the ad inventory. Money, not understanding user queries, supporting Boolean logic, including date and time information about the content object and when it was last indexed, are irrelevant. In one meeting, I can honestly say no one knew what I was talking about when I mentioned “time” index points.

Third, there are useful search engines which should be used as yardsticks against which to measure the Google and the smaller pretender, Bing. Why not include Swisscows.ch or Yandex.ru or Baidu.com or any of the other seven or eight Web centric and no charge systems. I suppose one could toss in the Google killer Neeva and a handful of metasearch systems. Yep, that’s work. Set up standard queries. Capture results. Analyze those results. Calculate result overlap. Get subject matter experts to evaluate the results. Do the queries at different points in time for a period of three months or more, etc., etc. This is probably not going to happen.

Fourth, what has been filtered. Those stop word lists are fascinating and they make it very difficult to find certain information. With traditional libraries struggling for survival, where is that verifiable research process going to lead? Yep, ad centric, free search systems. It might be better to just guess at some answers.

Net net: Web search is not very good. It never has been. For fee databases are usually an afterthought if thought of at all. It is remarkable how many people pass themselves off as open source intelligence experts, expert online researchers, or digital natives able to find “anything” using their mobile phone.

Folks, most people are living in a cloud of unknowing. Search results shape understanding. A failure of search just means that users have zero chance to figure out if a result from a free Web query is much more than Madison Avenue, propaganda, crooked card dealing, or some other content injection goal.

That’s what one gets when the lowest cost methods to generate the highest ad revenue are conflated with information retrieval. But, hey, you can order a pizza easily.

Stephen E Arnold, January 11, 2022

Cherche: A Neural Search Pipeline

January 10, 2022

For fans of open source search, Cherche is available. The GitHub write up states:

Cherche is meant to be used with small to medium sized corpora. Cherche’s main strength is its ability to build diverse and end-to-end pipelines.

The “neural search” module includes ElasticSearch. The programming team for Cherche consists of Raphaël Sourty and François-Paul Servant. Beyond Search has not fired up the system and run it against our test corpus. We did have in our files a paper called “Knowledge Base Embedding by Cooperative Knowledge Distillation.” That paper states:

Given a set of KBs, our proposed approach KDMKB, learns KB embeddings by mutually and jointly distilling knowledge within a dynamic teacher-student setting. Experimental results on two standard datasets show that knowledge distillation between KBs through entity and relation inference is actually observed. We also show that cooperative learning significantly outperforms the two proposed baselines, namely traditional and sequential distillation.

The idea is that instead of retrieving strings, broader tags (concepts and classifications) appear to provide an advantage; pushing “beyond” old school search.

Stephen E Arnold, January 10, 2022

The Collision of Search Thinkers and the Wide World of Finding

January 4, 2022

To get some insight into the vibrations set off when search thinkers run into market behaviors, you will want to scan the Twitter thread about the need to create an alternative to Google. The focus is medical information. The idea is to return results for a health query without “clickbait sites riddled with crappy ads.” The criticism of the Google was not ignored. No less a luminary than Danny Sullivan replied with Google’s “we are always looking to keep improving our results.”

Digital Don Quixotes saddled up and asserted in this Tweet stream that Google can be beaten. The fix is to create a niche search engine tailored to provide results where Google is just thrilled to present “spam.” Assorted Tweeters added comments.

What do these two Tweeter threads suggest to me?

First, there are niche search engines(what I call vertical search services) that deliver on point results. These are probably not ones most people think about because users of free or ad-supported systems do not know much about finding high value information. Also, I know from my decades in the commercial database business that most “online experts” don’t want to pay for access to commercial online services. Academics get “free” access to content pools like Lexis Nexis, and the “old” Dialog type files because institutions pay the license fees. To the academic user, high value information is “free.” It is not.

Second, a number of Web centric search engines provide reasonably useful results. Examples range from iSeek.com to the Metager system. The mechanism for locating specific information is to frame a query, manually or automatically pass the query to numerous search engines, de-duplicate the result sets, and examine the links. Industrious searchers may enlist tools like Maltego or other open source software to identify potentially helpful items to examine initially. Who wants to do this? I suggest that fewer than three percent of online users pursue this approach. People want to have the mobile phone light up when a pizza joint is nearby or the Tesla’s electric gauge is creeping into the “hello, I need a flat bed truck, please” zone.

Third, Google has operated without meaningful regulation, oversight, or competition for decades. The vaunted ad-revenue engine was not a Google invention. Google took advantage of a particular point in time when searching the Web was gaining traction and useful competition from Alta Vista, Exalead, and Fast Search’s AllTheWeb services were distracted. Google sucked up some AltaVista folks; Exalead was decidedly French; and Fast Search chased the enterprise. Other actions transpired, but the result was that the Google used free to get traffic and traffic made the Yahoo, Overture, GoTo revenue model work like a champ. Remember this was decades ago, not yesterday.

Here’s what I think is going on:

  1. Pundits don’t know or care much about Okeano, Swisscows or  other “free” online search systems. How about searching for those Instagram snaps with Picuki?
  2. Niche search engines are thriving; for example, some of the Israeli specialized software and services firms provide quite helpful access to Facebook content. Who knows? Not too many pundits on the Tweeter and certainly not Google’s PR experts.
  3. Google is not a search engine. Google is a global content system, a fact I explored in my Google: The Digital Gutenberg, originally a long white paper for a government customer who found my view of the world interesting. BearStearns published a report in 2007 which featured my diagram of the Google “octopus” which identified the digital fabric that the company was weaving. Now Google owns the sheep, the dyes, the weaving machines, and the concept of digital fabrics. The overall quality of the Google outputs is “good enough,” and, believe me, it is tough to knock off a global outfit which satisfies the big hump in the standard distribution with something “better.” Whatever “better” means.

Net net: Search is a very, very fuzzy word. At one end of the spectrum are those who are searching well because they can locate an Uber-type service. At the other end of the spectrum are those who deal in extremely rarified content disciplines and have quite good services available; for example, Daylight chemical informatics.

In the middle? A long-standing, persistent and fundamental disconnect between search and what is actually going on in the datasphere.

Pizza? Google’s got that nailed. Need information to fabricate calandria (nuclear terminology)? Google can’t help too much because who searches for calandria, buys ads related to calandria, or knows anything about calandria?

Stephen E Arnold, January 4, 2021

Why Search Is Hard and Quick and Dirty Good Enough Methods Are Train Wrecks

December 15, 2021

I recommend to anyone interested in search and smart software the article “The Business of Extracting Knowledge from Academic Publications.” I am not going to summarize it, nor am I going to discuss why modern search systems are racing toward a collision with useful information retrieval. There was one omission from the essay, and I want to highlight it. I am not critical of this write up. I want to make clear that there is another dimension to scientific, technical, and medical publishing that is often overlooked. I learned this when we created the Pharmaceutical News Index decades ago.

Here’s the omission:

Wizards in technical fields work overtime to obfuscate some of their systems, methods, insights, and findings. The reason is that wizards want to remain wizards and have an ace up their sleeve if one is required to win a poker game for tenure, an over achieving graduate assistant, or some legal eagle involved in a patent dispute. Other reasons for withholding, distorting, and shaping information are related to insecurity. Yep, wizards are wizards in order to have a way to build a defense against those who don’t know what they don’t know and think that what they know defines knowledge.

When it comes to search and retrieval, key words are okay but not perfect. Index terms (what GenXers call tags) are helpful. But the substance of STM content does not yield insights, inventions, or any of the other “knowledge gems” that those pitching smart software believe will spill forth in a results list or a visualization.

What does the information in the article imply for smart software? My answer is, “Misleading or incorrect answers to certain types of inquiries.”

Don’t believe me? That’s okay. Just wait. STM content is “easier” to index than general business writing which is much easier to tag than the excrescences on TikTok, Twitch, or (heaven help me), Twitter.

Stephen E Arnold, December 15, 2021

The Coveo IPO: Making Some Headway

December 9, 2021

A number of Canadian tech companies have recently gone public on the Toronto Stock Exchange only to be met with muted responses. One was enterprise search firm Coveo, which went public in November in order to position itself globally, attract talent, and fund future acquisitions. CEO Louis Têtu appears unconcerned about the apparent indifference to his and other companies’ fledgling stock, The Globe and Mail reports in its piece, “Coveo CEO Dismisses Soft Trading Start on TSX as Quebec Software Company Closes $215-Million IPO.” Writer Sean Silcoff tells us:

“Coveo received more than $1-billion in orders for its IPO… . The stock hit $18 on its first day of trading last Thursday, but has since retreated, briefly trading below the issue price Tuesday. That makes it the fourth new tech issue this autumn – following D2L Corp., Q4 Inc. and E Automotive Inc. – to trade below its issue price. Coveo stock closed Wednesday at $15.30, up 1.7 per cent. Mr. Têtu dismissed Coveo’s ho-hum start as a public company, noting the share price of New York Stock Exchange-listed rival Elastic NV had dropped by 15 per cent over the previous four sessions. ‘There is a set of market dynamics we don’t control; the tide raises and lowers all boats,’ he said. ‘I think the jury is going to be out until the first earnings call [as a public company] and the subsequent earnings call. I think anybody who understands the stock market and IPOs … wouldn’t draw conclusions’ from the stock’s early performance. Coveo became the 20th Canadian tech IPO on the TSX to raise $50-million or more since July, 2020. By contrast, there were 12 such IPOs in the 11 years ended December, 2019.”

I suppose that is a good point—progress is progress, even if it is not at light speed. The write-up [paywalled] includes a few more details about Coveo’s growth and profits. Since its founding in 2005, the company has acquired two AI-powered e-commerce firms: Tooso in 2019 and Qubit in 2021. It sounds like Coveo may have some more companies already in its sights.

The good news is that the stock on December 8, 2021, was trending up. Search and retrieval is a tough business. Just ask the former CEOs of Autonomy and Fast Search & Transfer or take a look at the dust up between Amazon and Elastic. Worth monitoring. Maybe take a stake?

Cynthia Murrell December 10, 2021

What Company Is the Leader in Search Powered by Artificial Intelligence? One Answer May Surprise You. It Did Me.

November 30, 2021

Give up? The answer is Lucidworks, “the leader in AI-powered search.” You can get the gull story from Unite.ai and the article “Will Hayes, CEO of Lucidworks – Interview Series.” What’s “AI”? I don’t know, and the answer is not provided from @IAmWillHayes’ comments. What’s “search”? I don’t know because no specific definition is provided. (Search is a blanket word, covering everything from the open source Lucene in policeware solutions to whiz-bang, patented real time methods for time series data from Trendalyze. And we must not forget the generous offerings of “search” for eDiscovery, product supplier data, chemical structures, streaming video files, code libraries, and mysterious content like the interesting information in encrypted Signal and Telegram interactions. Search at Lucidworks is different it seems.

I noted this statement:

Lucidworks takes mission-critical business problems and solves them with search.

I assume that Lucidworks is disconnected from Dassault Systèmes search based applications approach. There is a 2011 book titled “Search Based Applications: At the Confluence of Search and Database Technologies.” The author is Dr. Gregory Grefenstette with assistance from Laura Wilber. The Lucidworks’ assertion struck me as one more example of marketing hoo hah disconnected from what came before. At least, the Dassault technology was original, not a recycling of open source software.

Here’s another statement offered as an original insight:

Lucidworks offers products and applications for commerce, customer service, and the workplace that use AI and machine learning to solve search. Fusion, our flagship product, uses AI extensively through every stage of enriching data—during ingest and at query time, for understanding user intent, and personalizing results that match that intent.

I want to point out that the Paris-based firm Polyspot used almost the exact same language (both French and English) to describe the company’s approach to information access. Here’s what Bloomberg says about the now repositioned company:

PolySpot SAS develops and publishes enterprise software. The Company’s products offer search and information access solutions designed to improve business and ensure that companies can access the data they need, regardless of their structure, format or origin. PolySpot markets its products internationally.

Dis Yogi Berra or Yogi Bear say: “It’s déjà vu all over again.” I go with the cartoon bear. The aphorism applies to Lucidworks in my opinion.

Lucidworks also does chatbots, fits into the connected experience cloud (CXC), and compounds “value.” Okay. The company, according to @IAmWillHayes, is “leader in next-generation search solutions and we have an exciting roadmap of cloud products coming in the near future.”

I wonder what outfits like Algolia, Coveo, Sphinx Search, and even the heroic X1 think about this assertion. What will Google’s revolving door search experts make of Lucidworks’ bold assertion? What about the crafty laborers in AWS search vineyards who watch the competitors gun for the Bezos bulldozer? What about the innovators working on the somewhat frightening IBM search solution? Maybe Microsoft will just pull a “Fast Search” and buy Lucidworks to beef up its incredible array of finding systems?

My hunch is that Lucidworks has to deal with its backers who want their money back plus some upside. Mix in the harsh market realities of many options, some free or low cost, and others bundled with purpose built solutions like Voyager Labs’ software and what do you get?

I am not sure about your answer. My answer is, “Recycling marketing lingo, ideas, and assertions which are decades old?” Will AI, machine learning, and CXC pull a rabbit from the search magician’s hat?

Maybe. But the investors who have injected more than $200 million into the company may want more than a magic show. And what is “search” and “AI” anyway? Solr with a new outfit from Amazon?

Stephen E Arnold, November 30, 2021

Ask Jeeves Has a Younger Cousin, Ask Jarvis

November 25, 2021

Ask Jeeves.com was a “smart” online search engine. The name lives on in Ask.com. Who remembers? No one. No matter. The younger cousin is now available. Ask Jarvis is “an AI code assistant developed by Assistiv.ai.” The idea is that a hard working developer handling a full time job via Zoom and working on numerous side gigs needs help. Just ask Jarvis when you need a programming tip or a chunk of a manpage. You can find the Web page at https://askjarvis.io. Is it the rule based wonder of the original smart Ask Jeeves.com? Nope, this is an artificial intelligence / machine learning 2021 search system with natural language “powered by OpenAI codex, a descendant of GPT-3.” Years ago this would have been labeled a vertical search engine. Today? I am not sure.

Stephen E Arnold, November 25, 2021

Battle of the Experts? Snowden Versus Sullivan, Wowza

November 19, 2021

This is a hoot: “Edward Snowden Dunks on Search Gurus in Hilarious Twitter Clapback.” Mr. Snowden is an individual who signed a secrecy agreement and elected to ignore it. Mr. Sullivan is a search engine optimization journalist, who is now laboring in the vineyards of Google.

The write up makes clear that Mr. Snowden finds the Google Web search experience problematic. (I wanted to write lousy, but I wish to keep maintain some level of polite discourse.)

Mr. Sullivan points out that Mr. Snowden was talking about “site search.” For those not privy to Google Dorks, a site search requires the names of a site like doe.gov preceded by the Google operator site: At least, that’s the theory.

The write up concludes with a reference to search engine optimization or SEO. That’s Mr. Sullivan’s core competency. Mr. Snowden’s response is not in the article or it could be snagged in the services monitored by the Federal service for supervision of Communications, Information Technology and Mass Media (Roskomnadzor) in everyone favorite satellite destroying country.

Quite a battle. The Snowden Sullivan slugfest. No, think this is emblematic of what has happened to those who ignore secrecy agreements and individuals who have worked hard to make relevance secondary to Google pay to play business processes.

Stephen E Arnold, November 19, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta