Google and Microsoft Are Fighting. But a Battle May Loom between Coveo and Service Now

March 18, 2021

The 2021 cage match line ups are interesting. The Google – Microsoft dust up is a big deal. Google says Microsoft is using its posture on news as a way to blast rock and roll fog around the egregious security breaches for SolarWinds and Exchange Server.

But that fog could obscure a bout between Coveo (a smart search company) and Service Now (a Swiss Army knife of middleware, including Attivio search. Both companies invoke the artificial intelligence moniker. Both covet enterprise customers. Both want to extend their software into large organizations.

Service Now makes it plans clear in “Service Now Adds New AI and Low-Code Development Features.” The write up states:

[A user conference in Quebec] … also introduces AI Search, underpinned by technology acquired in ServiceNow’s purchase of Attivio. AI Search delivers intelligent search results and actionable information, complementing Quebec’s Engagement Messenger that extends self-service to third-party portals to enable AI search, knowledge management, and case interactions. Also new in Quebec is the aforementioned virtual agent, which delivers AI-powered conversational experiences for IT incident resolution.

From my vantage point, the AI is hand waving. Search has quite a few moving parts, and human involvement is necessary whether smart software is involved or not.

What Service Now has, however, is a meta-play; that is, it offers numerous management services. If properly set up and resourced could reduce the pain of some utility functions. Search is the mother of all utility services.

Coveo is a traditional enterprise search vendor. The company has targeted numerous business functions as likely customers; for example, customer support and marketing.

But niche vendors of utilities have to be like the “little engine that could.”

This may not be the main event like Google versus Microsoft, but it will be an event to watch.

Stephen E Arnold, March 18, 2021

Search Engines: Bias, Filters, and Selective Indexing

March 15, 2021

I read “It’s Not Just a Social Media Problem: How Search Engines Spread Misinformation.” The write up begins with a Venn diagram. My hunch is that quite a few people interested in search engines will struggle with the visual. Then there is the concept that typing in a search team returns results are like loaded dice in a Manhattan craps game in Union Square.

The reasons, according to the write up, that search engines fall off the rails are:

  • Relevance feedback or the Google-borrowed CLEVER method from IBM Almaden’s patent
  • Fake stories which are picked up, indexed, and displayed as value infused,

The write up points out that people cannot differentiate between accurate, useful, or “factual” results and crazy information.

Okay, here’s my partial list of why Web search engines return flawed results:

  1. Stop words. Control the stop words and you control the info people can find
  2. Stored queries. Type what you want but get the results already bundled and ready to display.
  3. Selective spidering. The idea is that any index is a partial representation of the possible content. Instruct spiders to skip Web sites with information about peanut butter, and, bingo, no peanut butter information
  4. Spidering depth. Is the bad stuff deep in a Web site? Just limit the crawl to fewer links?
  5. Spider within a span. Is a marginal Web site linking to sites with info you want killed? Don’t follow links off a domain.
  6. Delete the past. Who looks at historical info? A better question, “What advertiser will pay to appear on old content?” Kill the backfile. Web indexes are not archives no matter what thumbtypers believe.

There are other methods available as well; for example, objectionable info can be placed in near line storage so that results from questionable sources display with latency or slow enough to cause the curious user to click away.

To sum up, some discussions of Web search are not complete or accurate.

Stephen E Arnold, March 15, 2021

Search and Privacy: Those Log Files Are Tempting However

March 11, 2021

Search has been a basic Internet function since its inception, but when it was first invented protecting users’ privacy was not a concern.  Nowadays a simple search reveals users’ interests, locations, and much more information that can be sold or stolen.  TechRadar explains why search needs to be redesigned with privacy as the top priority: “Why We Need To Rebuild Internet Search, Putting User Privacy First.”

Early Internet developers wanted to make money from their new invention in order to build new technology.  Investors and developers were happy, because there was a profit.  Early Internet advertising, however, transformed into a big privacy problem today:

“Problems later emerged because what started out as a quick fix to a short-term problem turned into a central part of the internet’s architecture. Like anything else in tech, engineers quickly went to work optimizing advertising to be as efficient as possible, stumbling into a situation where the world’s biggest and most powerful companies were suddenly incentivized to gather more and more personal data on users to sell advertising. This resulted in algorithms to maximize engagement on content sites that prioritized instinctive and emotional decisions – or “fast thinking” as the Nobel Prize winner in behavioral economics Daniel Kahneman calls it.”

The information superhighway has turned into a giant consumerism tool that spreads fake news, radicalization, pushes unneeded products and services, and feeds on peoples’ insecurities.  Driving sales to stir the economy is one thing, but the spread of misinformation and radicalization leads to dangerous situations, including the recent coup attempt on Washington D.C. and constant backfires against science.

User-experience drives technology design and development, so any new search protocols must have today’s ease of use.  Currently multi-party computation (MPC) replicates blockchain-like technology so it protects users’ privacy.   Selected computers directly access encrypted data without knowing anything about the data, dubbed zero-knowledge computation. 

Zero-knowledge computation is a good solution to protecting user privacy, but there is a big problem preventing more development: money.  Advertisers and businesses love the current search system, because it feeds their bottom line.  Most users do not protect their data, but if they demanded more privacy protections then organizations would invest more money in that area. 

Whitney Grace, March 11, 2021

Elastic and Its Approach to Its Search Business

February 16, 2021

This blog post is about Elastic, the Shay Banon information retrieval company, not Amazon AWS Elastic services. Confused yet? The confusion will only increase over time because the “name” Elastic is going to be difficult to keep intact due to Amazon’s ability to erode brand names.

But that’s just one challenge the Elastic search company founded by the magic behind Compass Search. An excellent analysis of Elastic search’s challenges appears in “Elastic Has Stretched the Patience of Many in Open Source. But Is There Room for a Third Way?”

The write up quotes an open source expert as saying:

Let’s be really clear – it’s a move from open to proprietary as a consequence of a failed business model decision…. Elastic should have though their revenue model through up front. By the time the team made the decision to open source their code, the platform economy existed and their decisions to open source ought to
have been aligned to an appropriate business model.

I circled this statement in the article:

Sympathy for Elastic’s position comes from a perhaps unexpected source. Matt Assay, principal at Elastic’s bête noire AWS, believes it’s time to revisit the idea of “shared source”, a licensing scheme originally dreamed up by Microsoft two decades ago as an answer to the then-novel open source concept. In shared source, code is open – as in visible – but its uses are restricted… The heart of the problem is about who gets to profit from open source software. To help resolve that problem, we just might need new licensing.

Information retrieval is not about precision and recall, providing answers to users, or removing confusion about terms and product names — search is about money. Making big bucks from a utility service continues to lure some and smack down others. Now it is time to be squishy and bouncy I suppose.

Stephen E Arnold, February 16, 2021

Google and Broad Match

February 11, 2021

I read “Google Is Moving on From Broad Match Modifier.” The essay’s angle is search engine optimization; that is, spoofing Google’s now diluted relevance methods. The write up says:

Google says it has been getting better at learning the intent behind a query, and is therefore more confident it can correctly map advertisements to queries. As that ability improves, the differences between Phrase Match and Broad Match Modified diminishes. Moving forward, there will be three match types, each with specific benefits:

  • Exact match: for precision
  • Broad match: for reach
  • Phrase match: in Google’s words, to combine the best of both.

Let’s assume that these are the reasons. Exact match delivers precision. Broad match casts a wide net. No thumbtypers wants a null set. Obviously there is zero information in a null set in the mind of the GenXers and Millennials, right? The phrase match is supposed to combine precision and recall. Oh, my goodness, precision and recall. What happened to cause the Google to reach into the deep history of STAIRS III and RECON for this notion.

Google hasn’t and won’t.

The missing factor in the write up’s analysis is answering the question, “When will each of the three approaches be used, under what conditions, and what happens if the bus drives to the wrong city?” (This bus analogy is my happy way of expressing the idea that Google search results often have little to do with either the words in the user’s query or the “intent” of the user (allegedly determined by Google’s knowledge of each user and the magic of more than 100 “factors” for determining what to present).

The key is the word “reach.” Changes to Google’s methods are, from my point of view, are designed to accomplish one thing: Burn through ad inventory.

By killing off functioning Boolean, deprecating search operators, ignoring meaningful time indexing, and tossing disambiguation into the wind blowing a Google volleyball into Shoreline traffic — the company’s core search methods have been shaped to produce money.

SEO experts don’t like this viewpoint. Google doesn’t care as long as the money keeps flowing. With Google investing less in infrastructure and facing significant pressure from government investigators and outfits like Amazon and Facebook, re-explaining search boils down to showing content which transports ads.

Where’s that leave the SEO experts? Answer: Ad sales reps for the Google. Traffic comes to advertisers. But the big bucks are the big advertisers’ campaigns which expose a message to as many eyeballs as possible. That’s why “broad reach” is the fox in the relevance hen house.

Stephen E Arnold, February 11, 2021

Algolia: Making Search Smarter But Is This Possible?

February 5, 2021

A retail search startup pins its AI hopes on a recent acquisition, we learn from the write-up at SiliconANGLE, “Algolia Acquires MorphL to Embed AI into its Enterprise Search Tech.” The company is using its new purchase to power Algolia AI. The platform predicts searchers’ intent in order to deliver tailored (aka targeted) search results, even on a user’s first interaction with the software. Writer Mike Wheatley tells us:

“Algolia sells a cloud-based search engine that companies can embed in their sites, cloud services and mobile apps via an application programming interface. Online retailers can use the platform to help shoppers browse their product catalogs, for example. Algolia’s technology is also used by websites such as the open publishing platform Medium and the online learning course provider Coursera. Algolia’s enterprise-focused search technology enables companies to create a customized search bar, with tools such as a sidebar so shoppers can quickly filter goods by price, for example. MorphL is a Romanian startup that has created an AI platform for e-commerce personalization that works by predicting how people are likely to interact with a user interface. Its technology will extend Algolia’s search APIs with recommendations and user behavior models that will make it possible for e-commerce websites and apps to deliver more ‘intent-based experiences.’”

The Google Digital News Initiative funded MorphL’s development. The startup began as an open-source project in 2018 and is based in Bucharest, Romania. Headquartered in San Francisco, Algolia was founded in 2012. MorphL is the company’s second acquisition; it plucked SeaUrchin.IO in 2018.

Will Algolia search be smarter, maybe even cognitive? Worth watching to see how many IQ points are added to Algolia’s results.

Cynthia Murrell, February 5, 2021

Old Book Illustrations: No Photoshop or Illustrator, Thank You

February 1, 2021

Here is a useful resource—Old Book Illustrations. The site began as a way for the creators to share pictures from their own collection of Victorian and French Romantic books and grew as they explored other collections online. All images are in the public domain. The site’s About page elaborates:

“Although it would have been possible to considerably broaden the time-frame of our pursuit, we chose to keep our focus on the original period in which we started for reasons pertaining to taste, consistency, and practicality: due to obvious legal restrictions, we had to stay within the limits of the public domain. This explains why there won’t be on this site illustrations first published prior to the 18th century or later than the first quarter of the 20th century. We are not the only image collection on the web, neither will we ever be the largest one. We hope however to be a destination of choice for visitors more particularly interested in Victorian and French Romantic illustrations—we understand French Romanticism in its broadest sense and draw its final line, at least in the realm of book illustration, at the death of Gustave Doré. We also focused our efforts on offering as many different paths and avenues as possible to help you find your way to an illustration, whether you are looking for something specific or browsing randomly. The many links organizing content by artist, language, publisher, date of birth, and more are designed to make searching easier and indecision rewarding.”

The site is well organized and easy to either search or browse is several ways—by artists, publishers, subjects, art techniques, book titles, and formats (portrait, landscape, tondo, or square). There is even a “navigation how-to” if one wants a little help getting started. The site also posts articles like biographies and descriptions of cultural contexts. We recommend checking it out and perhaps bookmarking it for future use.

Cynthia Murrell, February 1, 2021

News Flash: ECommerce Search Is Not Enterprise Search

January 8, 2021

Now here is some crazy stuff—e-commerce search masquerading as enterprise search. Business Wire shares, “Searchspring Named Leader in Enterprise Search Software and E-Merchandising in G2 Grid Reports for Winter 2021.” Now Searchspring may or may not be the best commerce platform, but enterprise search is an entirely different animal. The press release crows:

“The reports’ scores are based on verified reviews by customers and grounded on ease of use, ease of setup, ease of administration, and how well the software meets requirements. G2 is the world’s largest B2B tech marketplace for software and services, helping businesses make smarter buying decisions. Searchspring ranked No. 2 across all providers, earning its Winter 2021 ‘Leader’ position in Enterprise Search Software and E-Merchandising, in addition to being recognized for ‘Best Support’, ‘Easiest Admin’, and ‘Easiest Setup’. Rated by Searchspring customers as 4.9/5 stars, Searchspring was favorably reviewed for offering the ‘Gold Standard for Functionality’, ‘Brilliant Service’, and ‘Incredible Performance. Amazing People. Fantastic Results.’”

So G2’s qualifications for winning make no distinction between e-commerce and enterprise search. We suppose we cannot blame the company for taking the title it was handed and running with it. 2020 has been a big year for online retail, and Searchspring is happy to be recognized for being on top of the surge. Founded in 2007, the firm is located in San Antonio, Texas.

Cynthia Murrell, January 8, 2021

Stork Search for Static Sites

January 8, 2021

Just a short honk to let our dear readers in on this search resource: If you host a website with static content, Stork may be for you. At the platform’s landing page, Creator James Little tells us how it works:

“Stork is two things that work in tandem to put a beautiful, fast, and accurate search interface on your static site. First, it’s a program that indexes your content and writes that index to disk. Second, it’s a JavaScript library that downloads that index, hooks into a search input, and displays optimal search results immediately to your user, as they type. Stork is built with Rust, and the JavaScript library uses Web Assembly behind the scenes. It’s built with content creators in mind, in that it requires little-to-no code to get started and can be extended deeply. It’s perfect for JAMstack sites and personal blogs, but can be used wherever you need a search interface.”

The page offers a setup guide which, interestingly, uses the task of embedding The Federalist Papers as an example. Complete with snippets of code, the description walks users through setup, customization, and index building, so see the page for those details. One can see the project’s GitHub here.

Cynthia Murrell, January 8, 2021

Factoids from Best Paper Awards in Computer Science

January 6, 2021

I noted “Best Paper Awards in Computer Science Since 1996.” The year caught my attention because that was the point in time at which software stagnation gained traction. See “The Great Software Stagnation” for the argument.

The Best Papers tally represents awards issued to the “best papers”. Hats off to the compiler Jeff Huang and his sources and helpers.

I zipped through the listings which contained dozens upon dozens of papers I knew absolutely zero about. I will probably be pushing up daisies before I work through these write ups.

I pulled out several observations which answered questions of interest to me.

First, the data illustrate the long tail thing. Stated another way, the data reveal that if an expert wants to win a prestigious award, it matters which institution issues one’s paycheck:

Second, what are the most prestigious “names” to which one should apply for employment in computer science? Here’s the list of the top 25. The others are interesting but not the Broadway stars of the digital world:

1Microsoft56.4
2University of Washington50.5
3Carnegie Mellon University47.1
4Stanford University43.3
5Massachusetts Institute of Technology40.2
6University of California, Berkeley29.2
7University of Michigan20.6
8University of Illinois at Urbana–Champaign18.5
9Cornell University17.4
10Google16.8
11University of Toronto15.8
12University of Texas at Austin14.5
13IBM13.7
14University of British Columbia12.4
15University of Massachusetts Amherst11.2
16Georgia Institute of Technology10.3
17École Polytechnique Fédérale de Lausanne10.1
18University of Oxford9.6
19University of California, Irvine9.4
20Princeton University9.1
21University of Maryland8.9
22University of California, San Diego8.7
23University of Cambridge8.6
24University of Wisconsin–Madison8
25Yahoo7.9

Note that Microsoft, the once proud Death Star of the North, is number one. For comparison, the Google is number 10. But the delta in average “bests” is an intriguing 39.6 papers. The ever innovative IBM is number 13, and the estimable Yahoo Oath Verizon confection is number 25.

I did not spot a Chinese University. A quick scan of the authors reveals that quite a few Chinese wizards labor in the research vineyards at these research-oriented institutions. Short of manual counting and analysis of names, I decided to to calculate authors by nationality. I think that’s a good task for you, gentle reader.

What about search as a research topic in this pool? I used a couple of online text analysis tools like Writewords, a tool on my system, and the Madeintext service. The counts varied slightly, which is standard operating procedure for counting tools like these. The 10 most frequently used words in the titles of the award winning papers are:

data 63 times
based 56 times
learning 53 times
using 49 times
design 45 times
analysis 38 times
software 36 times
time 36 times
search 35 times
Web 30 times

The surprise is that “search” was, based on my analysis of the counts I used, was the ninth most popular word in the papers’ titles. Who knew? Almost as surprising was “social” ranking a miserable 46th. Search, it seems, remains an area of interest. Now if that interest can be transformed into sustainable revenue and sufficient profit to fund research, bug fixes, and enhancements — life would be better in my opinion.

Stephen E Arnold, January 5, 2020

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta