Glean: Another Enterprise Search Solution

October 12, 2021

Enterprise search features are interesting, but users accept it as an unavoidable tech problems like unfindable content and sluggish indexing.. A former Google engineering director recognized the problem when he started his own startup and Forbes article, “Glean Emerges from Stealth With $55 Million To Bring Search To The Enterprise” tells the story.

Arvind Jain cofounded the cloud data management company Rubrik and always had problems locating information. Rubrik is now worth $3.7 million, but Jain left and formed the new startup Glean with Google veterans Piyush Prahladka, Tony Gentilcore, and T.R. Vishwanath. The team have developed a robust enterprise search engine application from multiple applications. Glean has raised $55 million in funding.

Other companies like Algolia and Elastic addressed the same enterprise search problem, but they focused on search boxes on consumer-facing Web sites instead of working for employees. With more enterprise systems shifting to the cloud and SaaS, Glean’s search product is an invaluable tool. Innovations with deep learning also make Glean’s search product more intuitive and customizable for each user:

“On the user side, Glean’s software analyzes the wording of a search query—for example, it understands that “quarterly goals” or “Q1 areas of focus” are asking the same thing—and shows all the results that correspond to it, whether they are located in Salesforce, Slack or another of the many applications that a company uses. The results are personalized based on the user’s job. Using deep learning, Glean can differentiate personas, such as a salesperson from an engineer, and tailor recommendations based on the colleagues that a user interacts with most frequently.”

Will Glean crack the enterprise search code? Interesting question to which the answer is not yet known.

Whitney Grace, October 12, 2021

Yext: Payoff Marketing

October 8, 2021

Years ago my team took a look at a search system called EasyAsk (originally Linguistic Technology Corporation and eventually as a unit of Progress Software and then a stand alone company headed by Craig Bassin, founder of B2Systems.

Yes, EasyAsk is licensing a range of software, but the company seems to lead with search for eCommerce.

What’s interesting is that the firm used what I called “payoff marketing.” The idea is that use of a particular search-and-retrieval system with appropriate technical enhancements can deliver a big financial return.

Here’s a snip from the EasyAsk Web site. Note the tagline: “Cognitive eCommerce.”

image

The “payoff” angle is evident in “Watch revenues soar by at least 20% within 90 days.”

In some sales presentations from other vendors I have heard words that suggest increased return on investment, reduced cost of search, and increased sales. Not too many vendors have gone out on a limb at put a number in the customer’s mind.

However, Yext has taken a page from the EasyAsk marketing playbook. I read “People’s United Bank Sees 15x Annualized ROI from Site Search Integration between Yext, Virtusa, and Adobe.” Wouldn’t the word “among” be more accurate? Oh, well.

Here’s the snippet I circled:

The launch of Yext Answers assisted in about a 50% and as much as 70% reduction in unnecessary support call volume in the months following its launch compared to the months before. By integrating locations into the Yext search experience with Adobe AEM, People’s United saw an estimated 15x annualized return on investment (ROI) on the platform — a number that rose to 35x annualized ROI when including locations, FAQs, and products.

I think this is another example of payoff marketing.

I find the angle an interesting one. Search-and-retrieval systems have been seeking a model for sustainable revenue for more than 40 years. Subscriptions, license fees, and engineering support have worked. The winning method is to charge people to appear in search results and sell advertising.

What happens if the search system does not deliver a “15x annualized return”? My hunch is that companies confident enough to provide a numeric peg for search technology have the hard data to shoot down doubters.

Stephen E Arnold, October 8, 2021

Elastic: Differentiation and Wagon Circling

September 22, 2021

Elastic expects two recent acquisitions to beef up its security in the cloud. Betakit reports, “Cybersecurity Startup Cmd to Be Acquired by Enterprise Search Firm Elastic.” This deal is on the heels of the company’s announcement that it snapped up authorization policy management platform build.security. Writer Josh Scott tells us:

“Cmd was founded in 2016 by CSO Jake King, former security operations lead at Hootsuite, and Milun Tesovic, general partner at Expa. The startup offers a runtime security platform for cloud workloads and Linux assets, providing infrastructure detection and response capabilities to global brands, financial institutions, and software companies. Cmd’s offering observes real-time session activity and allows Linux administrators and developers to take immediate remediation action. … Following the close of the deal, Elastic plans to work with Cmd to integrate Cmd’s cloud native data collection capabilities directly into the company’s Elastic Agent product, and Cmd’s user experience and workflows into Kibana, Elastic’s data visualization offering.”

Citing an article from TechCrunch, Scott notes that Cmd’s employees will be moving to Elastic, with King and CEO Santosh Krishnan slipping into executive roles. Elastic says current customers of both firms will benefit from the integration and specifically promises its existing clients will soon receive Cmd’s cloud security capabilities. Built around open source software, Elastic began as Elasticsearch Inc. in 2012, simplified its name in 2015, and went public in 2018. The company is based in Mountain View, California, and maintains offices around the world.

Cynthia Murrell, September 22, 2021

Useless Search Results? Thank Advertising

September 17, 2021

We thought this was obvious. The Conversation declares, “Google’s ‘Pay-Per-Click’ Ad Model Makes it Harder to Find What You’re Looking For.” Writers Mohiuddin Ahmed and Paul Haskell-Dowland begin by pointing out “to google” has literally become synonymous with searching online via any online search platform. Indeed, Google has handily dominated the online search business, burying some competitors and leaving the rest in the dust. Not coincidentally, the company also rules the web browser and online advertising markets. As our dear readers know, Google is facing pushback from competition and antitrust regulators in assorted countries. However, this article addresses the impact on search results themselves. The authors report:

“More than 80% of Alphabet’s revenue comes from Google advertising. At the same time, around 85% of the world’s search engine activity goes through Google. Clearly there is significant commercial advantage in selling advertising while at the same time controlling the results of most web searches undertaken around the globe. This can be seen clearly in search results. Studies have shown internet users are less and less prepared to scroll down the page or spend less time on content below the ‘fold’ (the limit of content on your screen). This makes the space at the top of the search results more and more valuable. In the example below, you might have to scroll three screens down before you find actual search results rather than paid promotions. While Google (and indeed many users) might argue that the results are still helpful and save time, it’s clear the design of the page and the prominence given to paid adverts will influence behavior. All of this is reinforced by the use of a pay-per-click advertising model which is founded on enticing users to click on adverts.”

We are reminded Google-owned YouTube is another important source of information for billions of users, and it is perhaps the leading platform for online ads. In fact, these ads now intrude on videos at a truly annoying rate. Unless one pays for a Premium subscription, of course. Ahmed and Haskell-Dowland remind us alternatives to Google Search exist, with the usual emphasis on privacy-centric DuckDuckGo. They conclude by pointing out other influential areas in which Google plays a lead role: AI, healthcare, autonomous vehicles, cloud computing, computing devices, and the Internet of Things. Is Google poised to take over the world? Why not?

Cynthia Murrell, September September 17, 2021, 2021

Mythic Search: Yext Introduces the Phoenix with Summer Updates

September 15, 2021

Enterprise search firm Yext is launching new features and a revamped algorithm, poetically named “Phoenix.” We learn about the updates from the press release, “New Yext Features and Algorithm Update Bring AI Search Optimizations to Businesses” at PR Newswire. We learn:

“In addition to features powered by Phoenix like dynamic reranking, the release introduces revamped test search and experience training, as well as a reimagining of Yext’s data connector and app frameworks — all to equip businesses with modern and powerful search solutions.”

The dynamic reranking feature sounds promising. Phoenix analyzes user behavior to push the most relevant results to the top. We are given an example:

“If customers consistently click on a blog post when searching for vaccine information on a healthcare organization’s website, dynamic reranking will push that content to the top of the search results page so it appears first any time someone searches about vaccines. The Phoenix update also introduces more relevant results for queries about locations that are ‘open now’ and rich text fields, like lists, in featured snippets.”

Another feature is the ability to build Yext platform configurations and package them into installable apps. The update also makes it easy to test search experiences from the customer’s point of view. But Yext may promise a bit much with its updates to data connectors:

“With the new update to Yext’s data connectors framework, businesses can use a low-code ‘extract, transform, load’ (ETL) tool that extracts all of their data and transforms it into the same format for easy integration into their knowledge graph (a unique brain-like database of facts).”

We do not want to be critical, but we are skeptical when a vendor of search and retrieval uses the word “all.” Certain types of data are notoriously difficult to access, like chemical structures, audio, video, images, and product-management quality assurance data, to name a few. Retrieving “all” data is unlikely at prices most organizations can afford. Still, it does sound like Phoenix is a step forward from the company that promises “Search made for today. Not 1999.” Today’s “search” dates back a half century, but who is interested in history?

Cynthia Murrell, September 15, 2021

Coveo: A Search Vendor Repositions, Pivots, and Spins

September 13, 2021

Coveo was a vendor of search and retrieval software. Then Coveo morphed into help desk and self-service software. Now the company appears to be spinning like a whirling dervish into a new positioning. “Coveo Adds More Developer Features to Its AI Powered Digital Experience Platform” explains:

Coveo Solutions Inc., a unicorn startup that helps companies such as Salesforce.com Inc. and Adobe Inc. improve their websites with artificial intelligence, today introduced new features to help developers more easily use its technology.

A couple of minor points. Coveo has ingested about $330 million since it was set up in 2005. I think that works out to 16 years, which in my experience makes Coveo something other than a start up. Your book may be different, of course.

I am not into enterprise search, but I find it interesting that this company is spinning in an AI powered digital experience platform. I don’t have a clue how to define “artificial intelligence.” I simply don’t know what a “digital experience platform” is.

That may not matter. The point is keep moving, changing, and morphing in order to generate sufficient revenue to make long suffering investors happy campers and differentiate the commodity of search technology from open source and proprietary options.

Oh, do dervishes get dizzy? I do.

Stephen E Arnold, September 13, 2021

Interesting Number: Apple Sells Access

September 3, 2021

I read “Google to Pay Apple $15 Billion to Remain Default Safari Search Engine in 2021.” The write up states:

It’s long been known that Google pays Apple a hefty sum every year to ensure that it remains the default search engine on iPhone, iPad, and Mac. Now, a new report from analysts at Bernstein suggests that the payment from Google to Apple may reach $15 billion in 2021, up from $10 billion in 2020. In the investor note, seen by Ped30, Bernstein analysts are estimating that Google’s payment to Apple will increase to $15 billion in 2021, and to between $18 billion and $20 billion in 2022.

Apple and Google care about their users and their “experience.” That’s a mellifluous thing to say, particularly in an anti-trust deposition.

Let’s put the allegedly accurate number in context:

The metasearch engine DuckDuckGo may be in the $70 million range. That is in the neighborhood of 200 times the metasearch system’s estimated revenues for 2020.

Stephen E Arnold, September 3, 2021

Wiki People: One Cannot Find Online Information If It Is Censored

September 2, 2021

Women have born the brunt of erasure from history, but thanks web sites like Wikipedia, their stories are shared more than ever. There is a problem with Wikipedia though, says CBC in the article: “Canadian Nobel Scientist’s Deletion From Wikipedia Points To Wider Bias, Study Finds.” Wikipedia is the most comprehensive, collaborative, and largest encyclopedia in human history. It is maintained by thousands of volunteer editors, who curate the content, verify information, and delete entries.

There are different types of Wikipedia editors. One type is an “inclusionist,” an editor who takes broad views about what to include in Wikipedia. The second type are “deflationists,” who have high content standards. American sociologist Francesca Tripodi researched the pages editors deleted and discovered that women’s pages are deleted more than men’s. Tripodi learned that 25% of women’s pages account for all deletion recommendations and their pages only make up 19% of the profiles.

Experts say it is either gender bias or notability problem. The notability is a gauge Wiki editors use to determine if a topic deserves a page and they weigh the notability against reliable sources. What makes a topic notable, Tripodi explained, leads to gender bias, because there is less information on them. It also does not help that many editors are men and there are attempts to add more women:

“Over the years, women have tried to fix the gender imbalance on Wikipedia, running edit-a-thons to change that ratio. Tripodi said these efforts to add notable women to the website have moved the needle — but have also run into roadblocks. ‘They’re welcoming new people who’ve never edited Wikipedia, and they’re editing at these events,’ she said. ‘But then after all of that’s done, after these pages are finally added, they have to double back and do even more work to make sure that the article doesn’t get deleted after being added.”

Unfortunately women editors complain they need to do more work to make sure their profiles are verifiable and are published. The Wikipedia Foundation acknowledges that the lack of women pages, because it reflects world gender biases. The Wikipedia Foundation, however, is committed to increasing the amount of women pages and editors. The amount of women editors has increased over 30% in the past year.

That is the problem when there is a lack of verifiable data about women or anyone erased from history due to biases. If there is not any information on them, they cannot be searched even by trained research librarians like me. Slick method, right?

Whitney Grace, September 2, 2021

Amazon Search: Just Outstanding

September 2, 2021

Authors at Paste Magazine are dedicated to assembling lists of the best streaming content from Netflix, Hulu, Amazon Prime, and other services. They know almost as much about these content libraries as their developers. The title in Paste Magazine’s article, “Amazon Prime Video’s Library Is Not Genuinely Impossible To Browse” says it all.

It is notoriously difficult to browse Amazon Prime’s content library and the problem was noted in 2018. Amazon Prime’s library contains a lot of content, much of it is considered unwatchable. The only way to locate anything is searching by its proper name, but users who want to browse films like physicals libraries and video stores of yore are abandoned.

Amazon Prime has also hidden its search function, instead it wants users to work around this road block:

It quickly becomes apparent that there is no obvious way to view that full list of sci-fi movies, suggesting that Amazon doesn’t want consumers to be able to easily find that kind of information—its user experience is built around you choosing one of the small handful of suggested films, or knowing in advance what you want to see and then specifically searching it out. However, it is possible to see the full list—in order for it to display, you just have to click on any specific sci-fi film, look at the movie’s genre tags, and click on the words “science fiction” once again.”

The search function is worse than that available in a medieval scriptorium. When users return to certain genre pages and browse the supposed complete list, the same twenty-one movies continuously reload.

Amazon Prime has thousands of titles and is designed by a high tech company, yet it cannot fix its search function? Why does Amazon, an important company that is shaking the film and television industry, not offering its users the best of the best when it comes to search? Amazon did A9, it sucked in Lucid Imagination “experts,” it intruded on Elastic search territory. And now search doesn’t work the way users expect. Has another high-tech outfit become customer hostile or just given up making search useful?

Whitney Grace,September 1, 2021

Semantic: Scholar and Search

September 1, 2021

The new three musketeers could be named Semantic, Scholar, and Search. What’s missing is a digital d’Artagnan. What are three valiant mousquetaires up to? Fixing search for scholarly information.

To learn why smart software goes off the rails, navigate to “Building a Better Search Engine for Semantic Scholar.” The essay documents how a group of guardsmen fixed up search which is sort of intelligent and sort of sensitive to language ambiguities like “cell”: A biological cell or “cell” in wireless call admission control. Yep, English and other languages require context to figure out what someone might be trying to say. Less tricky for bounded domains, but quite interesting for essay writing or tweets.

Please, read the article because it makes clear some of the manual interventions required to make search deliver objective, on point results. The essay is important because it talks about issues most search and retrieval “experts” prefer to keep under their kepis. Imagine what one can do with the knobs and dials in this system to generate non-objective and off point results. That would be exciting in certain scholarly fields I think.

Here are some quotes which suggest that Fancy Dan algorithmic shortcuts like those enabled by Snorkel-type solutions; for example:

Quote A

The best-trained model still makes some bizarre mistakes, and posthoc correction is needed to fix them.

Meaning: Expensive human and maybe machine processes are needed to get the model outputs back into the realm of mostly accurate.

Quote B

Here’s another:

Machine learning wisdom 101 says that “the more data the better,” but this is an oversimplification. The data has to be relevant, and it’s helpful to remove irrelevant data. We ended up needing to remove about one-third of our data that didn’t satisfy a heuristic “does it make sense” filter.

Meaning: Rough sets may be cheaper to produce but may be more expensive in the long run. Why? The outputs are just wonky, at odds with what an expert in a field knows, or just plain wrong. Does this make you curious about black box smart software? If not, it should.

Quote C

And what about this statement:

The model learned that recent papers are better than older papers, even though there was no monotonicity constraint on this feature (the only feature without such a constraint). Academic search users like recent papers, as one might expect!

Meaning: The three musketeers like their information new, fresh, and crunchy. From my point of view, this is a great reason to delete the backfiles. Even thought “old” papers may contain high value information, the new breed wants recent papers. Give ‘em what they want and save money on storage and other computational processes.

Net Net

My hunch is that many people think that search is solved. What’s the big deal? Everything is available on the Web. Free Web search is great. But commercial search systems like LexisNexis and Compendex with for fee content are chugging along.

A free and open source approach is a good concept. The trajectory of innovation points to a need for continued research and innovation. The three musketeers might find themselves replaced with a more efficient and unmanageable force like smart software trained by the Légion étrangère drunk on digital pastis.

Stephen E Arnold, September 1, 2021

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta