Dexa: A New Podcast Search Engine

May 21, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Google, Bing, and DuckDuckGo (a small percentage) dominate US search. Spotify, Apple Podcasts, and other platforms host and aggregate podcast shows. The problem is neither the twain shall meet when people are searching for video or audio content. Riley Tomasek was inspired by the problem and developed the Deva app:

“Dexa is an innovative brand that brings the power of AI to your favorite podcasts. With Dexa’s AI-powered podcast assistants, you can now explore, search, and ask questions related to the knowledge shared by trusted creators. Whether you’re curious about sleep supplements, programming languages, growing an audience, or achieving financial freedom, Dexa has you covered. Dexa unlocks the wisdom of experts like Andrew Huberman, Lex Fridman, Rhonda Patrick, Shane Parrish, and many more.

With Dexa, you can explore the world of podcasts and tap into the knowledge of trusted creators in a whole new way.”

Alex Huberman of Huberman Labs picked up the app and helped it go viral.

From there the Deva team built an intuitive, complex AI-powered search engine that indexes, analyzes, and transcribes podcasts. Since Deva launched nine months ago it has 50,000 users, answered almost one million, and partnered with famous podcasters. A recent update included a chat-based interface, more search and discover options, and ability watch referenced clips in a conversation.

Deva has raised $6 million in seed money and an exclusive partnership with Huberman Lab.

Deva is still a work in progress but it responds like ChatGPT but with a focus of conveying information and searching for content. It’s an intuitive platform that cites its sources directly in the search. It’s probably an interface that will be adopted by other search engines in the future.

Whitney Grace, May 21, 2024

Ho Hum: The Search Sky Is Falling

May 15, 2024

dinosaur30a_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Google’s Broken Link to the Web” is interesting for two reasons: [a] The sky is falling — again and [b] search has been broken for a long time and suddenly I should worry.

The write up states:

When it comes to the company’s core search engine, however, the image of progress looks far muddier. Like its much-smaller rivals, Google’s idea for the future of search is to deliver ever more answers within its walled garden, collapsing projects that would once have required a host of visits to individual web pages into a single answer delivered within Google itself.

Nope. The walled garden has been in the game plan for a long, long time. People who lusted for Google mouse pads were not sufficiently clued in to notice. Google wants to be the digital Hotel California. Smarter software is just one more component available to the system which controls information flows globally. How many people in Denmark rely on Google search whether it is good, bad, or indifferent? The answer is, “99 percent.” What about people who let Google Gmail pass along their messages? How about 67 percent in the US. YouTube is video in many countries even with the rise of TikTok, the Google is hanging in there. Maps? Ditto. Calendars? Ditto. Each of these ubiquitous services are “search.” They have been for years. Any click can be monetized one way or another.

image

Who will pay attention to this message? Regulators? Users of search on an iPhone? How about commuters and Waze? Thanks, MSFT Copilot. Good enough. Working on those security issues today?

Now the sky is falling? Give me a break. The write up adds:

where the company once limited itself to gathering low-hanging fruit along the lines of “what time is the super bowl,” on Tuesday executives showcased generative AI tools that will someday plan an entire anniversary dinner, or cross-country-move, or trip abroad. A quarter-century into its existence, a company that once proudly served as an entry point to a web that it nourished with traffic and advertising revenue has begun to abstract that all away into an input for its large language models.  This new approach is captured elegantly in a slogan that appeared several times during Tuesday’s keynote: let Google do the Googling for you.

Of course, if Google does it, those “search” abstractions can be monetized.

How about this statement?

But to everyone who depended even a little bit on web search to have their business discovered, or their blog post read, or their journalism funded, the arrival of AI search bodes ill for the future. Google will now do the Googling for you, and everyone who benefited from humans doing the Googling will very soon need to come up with a Plan B.

Okay, what’s the plan B? Kagi? Yandex? Something magical from one of the AI start ups?

People have been trying to out search Google for a quarter century. And what has been the result? Google’s technology has been baked into the findability fruit cakes.

If one wants to be found, buy Google advertising. The alternative is what exactly? Crazy SEO baloney? Hire a 15 year old and pray that person can become an influencer? Put ads on Tubi?

The sky is not falling. The clouds rolled in and obfuscated people’s ability to see how weaponized information has seized control of multiple channels of information. I don’t see a change in weather any time soon. If one wants to run around saying the sky is falling, be careful. One might run into a wall or trip over a fire plug.

Stephen E Arnold, May 15, 2024

Will Google Behave Like Telegram?

May 10, 2024

dinosaur30a_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

I posted a short item on LinkedIn about Telegram’s blocking of Ukraine’s information piped into Russia via Telegram. I pointed out that Pavel Durov, the founder of VK and Telegram, told Tucker Carlson that he was into “free speech.” A few weeks after the interview, Telegram blocked the data from Ukraine for Russia’s Telegram users. One reason given, as I recall, was that Apple was unhappy. Telegram rolled over and complied with a request that seems to benefit Russia more than Apple. But that’s just my opinion. The incident, which one of my team verified with a Ukrainian interacting with senior professionals in Ukraine, the block. Not surprisingly, Ukraine’s use of Telegram is under advisement. I think that means, “Find another method of sending encrypted messages and use that.” Compromised communications can translate to “Rest in Peace” in real time.

image

A Hong Kong rock band plays a cover of the popular hit Glory to Hong Kong. The bats in the sky are similar to those consumed in Shanghai during a bat festival. Thanks, MSFT Copilot. What are you working on today? Security or AI?

I read “Hong Kong Puts Google in Hot Seat With Ban on Protest Song.” That news story states:

The Court of Appeal on Wednesday approved the government’s application for an injunction order to prevent anyone from playing Glory to Hong Kong with seditious intent. While the city has a new security law to punish that crime, the judgment shifted responsibility onto the platforms, adding a new danger that just hosting the track could expose companies to legal risks. In granting the injunction, judges said prosecuting individual offenders wasn’t enough to tackle the “acute criminal problems.”

What’s Google got to do with it that toe tapper Glory to Hong Kong?

The write up says:

The injunction “places Google, media platforms and other social media companies in a difficult position: Essentially pitting values such as free speech in direct conflict with legal obligations,” said Ryan Neelam, program director at the Lowy Institute and former Australian diplomat to Hong Kong and Macau. “It will further the broader chilling effect if foreign tech majors do comply.”

The question is, “Roll over as Telegram allegedly has, or fight Hong Kong and by extension everyone’s favorite streaming video influencer, China?” What will Google do? Scrub Glory to Hong Kong, number one with a bullet on someone’s hit parade I assume.

My guess is that Google will go to court, appeal, and then take appropriate action to preserve whatever revenue is at stake. I do know The Sundar & Prabhakar Comedy Show will not use Glory to Hong Kong as its theme for its 2024 review.

Stephen E Arnold, May 10, 2024

Google Search Is Broken

May 10, 2024

ChatGPT and other generative AI engines have screwed up search engines, including the all-powerful Google. The Blaze article, “Why Google Search Is Broken” explains why Internet search is broke, and the causes. The Internet is full of information and the best way to get noticed in search results is using SEO. A black hat technique (it will probably be considered old school in the near future) to manipulate search results is to litter a post with keywords aka “keyword stuffing.”

ChatGPT users realized that it’s a fantastic tool for SEO, because they tell the AI algorithm to draft a post with a specific keyword and it generates a decent one. Google’s search algorithm then reads that post and pushes it to the top of search results. ChatGPT was designed to read and learn language the same way as Google: skin the Internet, scoop up information from Web sites, and then use it to teach the algorithm. This threatens Google’s search profit margins and Alphabet Inc. doesn’t like that:

“By and large, people don’t want to read AI-generated content, no matter how accurate it is. But the trouble for Google is that it can’t reliably detect and filter AI-generated content. I’ve used several AI detection apps, and they are 50% accurate at best. Google’s brain trust can probably do a much better job, but even then, it’s computationally expensive, and even the mighty Google can’t analyze every single page on the web, so the company must find workarounds.

This past fall, Google rolled out its Helpful Content Update, in which Google started to strongly emphasize sites based on user-generated content in search results, such as forums. The site that received the most notable boost in search rankings was Reddit. Meanwhile, many independent bloggers saw their traffic crash, whether or not they used AI.”

Google wants to save money by offloading AI detection/monitoring to forum moderators that usually aren’t paid. Unfortunately SEO experts figured out Google’s new trick and are now spamming user-content driven Websites. Google recently signed a deal with Reddit to acquire its user data to train its AI project, Gemini.

Google hates AI generated SEO and people who game its search algorithms. Google doesn’t have the resources to detect all the SEO experts, but went they are found Google extracts vengeance with deindexing and making better tools. Google released a new update to its spam policies to remove low-quality, unoriginal content made to abuse its search algorithm. The overall goal is to remove AI-generated sites from search results.

If you read between the lines, Google doesn’t want to lose more revenue and is calling out bad actors.

Whitney Grace, May 10, 2024

Torrent Search Platform Tribler Works to Boost Decentralization with AI

May 7, 2024

Can AI be the key to a decentralized Internet? The group behind the BitTorrent-based search engine Tribler believe it can. TorrentFreak reports, “Researchers Showcase Decentralized AI-Powered Torrent Search Engine.” Even as the online world has mostly narrowed into commercially controlled platforms, researchers at the Netherlands’ Delft University of Technology have worked to decentralize and anonymize search. Their goal has always been to empower John Q. Public over governments and corporations. Now, the team has demonstrated the potential of AI to significantly boost those efforts. Writer Ernesto Van der Sal tells us:

“Tribler has just released a new paper and a proof of concept which they see as a turning point for decentralized AI implementations; one that has a direct BitTorrent link. The scientific paper proposes a new framework titled ‘De-DSI’, which stands for Decentralised Differentiable Search Index. Without going into technical details, this essentially combines decentralized large language models (LLMs), which can be stored by peers, with decentralized search. This means that people can use decentralized AI-powered search to find content in a pool of information that’s stored across peers. For example, one can ask ‘find a magnet link for the Pirate Bay documentary,’ which should return a magnet link for TPB-AFK, without mentioning it by name. This entire process relies on information shared by users. There are no central servers involved at all, making it impossible for outsiders to control.”

Van der Sal emphasizes De-DSI is still in its early stages—the demo was created with a limited dataset and starter AI capabilities. The write-up briefly summarizes the approach:

“In essence, De-DSI operates by sharing the workload of training large language models on lists of document identifiers. Every peer in the network specializes in a subset of data, which other peers in the network can retrieve to come up with the best search result.”

The team hopes to incorporate this tech into an experimental version of Tribler by the end of this year. Stay tuned.

Cynthia Murrell, May 7, 2024

Kagi Search Beat Down

April 17, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

People surprise me. It is difficult to craft a search engine. Sure, a recent compsci graduate will tell you, “Piece of cake.” It is not. Even with oodles of open source technology, easily gettable content, and a few valiant individuals who actually want relevant results — search and retrieval are tough to get right. The secret to good search, in my opinion, is to define a domain, preferably a technical field, identify the relevant content, obtain rights, if necessary, and then do the indexing and the other “stuff.”

In my experience, it is a good idea to have either a friend with deep pockets, a US government grant (hello, NSF, said Google decades ago), or a credit card with a hefty credit line. Failing these generally acceptable solutions, one can venture into the land of other people’s money. When that runs out or just does not work, one can become a pay-to-play outfit. We know what that business model delivers. But for a tiny percentage of online users, a subscription service makes perfect sense. The only problem is that selling subscriptions is expensive, and there is the problem of churn. Lose a customer and spend quite a bit of money replacing that individual. Lose big customers spend oodles and oodles of money replacing that big spender.

I read “Do Not Use Kagi.” This, in turn, directed me to “Why I Lost Faith in Kagi.” Okay, what’s up with the Kagi booing? The “Lost Faith” article runs about 4,000 words. The key passage for me is:

Between the absolute blasé attitude towards privacy, the 100% dedication to AI being the future of search, and the completely misguided use of the company’s limited funds, I honestly can’t see Kagi as something I could ever recommend to people.

I looked at Kagi when it first became available, and I wrote a short email to the “Vlad” persona. I am not sure if I followed up. I was curious about how the blend of artificial intelligence and metasearch was going to deal with such issues as:

  1. Deduplication of results
  2. Latency when a complex query in a metasearch system has to wait for a module to do it thing
  3. How the business model was going to work: Expensive subscription, venture funding, collateral sales of the interface to law enforcement, advertising, etc..
  4. Controlling the cost of the pings, pipes, and power for the plumbing
  5. Spam control.

I know from experience that those dabbling in the search game ignore some of my routine questions. The reasons range from “we are smarter than you” to “our approach just handles these issues.”

image

Thanks, MSFT Copilot. Recognize anyone in the image you created?

I still struggle with the business model of non-ad supported search and retrieval systems. Subscriptions work. Well, they worked out of the gate for ChatGPT, but how many smart search systems do I want to join? Answer: Zero.

Metasearch systems are simply sucker fish on the shark bodies of a Web search operator. Bing is in the metasearch game because it is a fraction of the Googzilla operation. It is doing what it can to boost its user base. Just look at the wonky Edge ads and the rumored miniscule gain the additional of smart search has delivered to Bing traffic. Poor Yandex is relocating and finds itself in a different world from the cheerful environment of Russia.

Web content indexing is expensive, difficult, and tricky.

But why pick on Kagi? Beats me. Why not write about dogpile.com, ask.com, the duck thing, or startpage.com (formerly ixquick.com)? Each embodies a certain subsonic vibe, right?

Maybe it is the AI flavor of Kagi? Maybe it is the amateur hour approach taken with some functions? Maybe it is just a disconnect between an informed user and an entrepreneurial outfit running a mile a minute with a sign that says, “Subscribe”?

I don’t know, but it is interesting when Web search is essentially a massive disappointment that some bright GenX’er has not figured out a solution.

Stephen E Arnold, April 17, 2024

HP and Autonomy: The Long Tail of Search and Retrieval

April 8, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

The US justice system is flawed but when big money is at stake, it quickly works as it’s supposed to do. British tech entrepreneur is responsible for making the tech industry lose a lot of greenbacks and the BBC shares the details: “Mike Lynch: Autonomy Founder’s Fraud Trial Begins In The US.” Lynch, formerly called Britain’s equivalent of Bill Gates, was extradited to the US in 2023 after a British court found him guilty of a civil fraud cause. He is accused of over inflating the value of his former company Autonomy. Autonomy was sold to Hewlett-Packard (HP) in 2011 for $11 billion.

Lynch is facing sixteen charges and a possible twenty-five years in prison if convicted. Reid Weingarten, Lynch’s attorney, stated his client is prepared to take the stand. He also said that Lynch focused on Autonomy’s technology side and left the finances to others. After buying Autonomy, HP valued it at $2.2 billion and claimed Lynch duped them.

Lynch founded Autonomy in 1996 and it became a top 100 public companies in the United Kingdom. Autonomy was known for software that extracted information from unstructured content: video, emails, and phone calls.

HP is not mincing claims in this case:

“US prosecutors in San Francisco say Mr Lynch backdated agreements to mislead about the company’s sales; concealed the firm’s loss-making business reselling hardware; and intimidated or paid off people who raised concerns, among other claims. In court filings, his attorneys have argued that the "real reason for the write-down" was a failure by HP to manage the merger. ‘Then, with its stock price crumbling under the weight of its own mismanagement, circled the wagons to protect its new leaders and wantonly accused’ Mr Lynch of fraud, they wrote.”

London’s High Court convicted Lynch and Autonomy’s former CFO Sushovan Hussain of fraud. Hussain was imprisoned for five years and fined millions of dollars. The pair claimed HP’s case against them was buyer’s remorse and management failings.

Lynch should be held accountable for false claims, pay the fines, and be jailed if declared guilty. If the court does convict him, it will be time for more legal gymnastics.

Whitney Grace, April 8, 2024

Finding Live Music Performances

April 5, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Here is a niche search category some of our readers will appreciate. Lifehacker shares “The Best Ways to Find Live Gigs for Music You Love.” Writer David Nield describes how one can tap into a combination of sources to stay up to date on upcoming music events. He begins:

“More than once I’ve missed out on shows in my neighborhood put on by bands I like, just because I’ve been out of the loop. Whether you don’t want to miss gigs by artists you know, or you’re keen to get out and discover some new music, there are lots of ways to stay in touch with the live shows happening in your area—you need never miss a gig again. Pick the one(s) that work best for you from this list.”

First are websites dedicated to spreading the musical word, like Songkick and Bandsintown. One can sign up for notices or simply browse the site by artist or location. These sites can also use one’s listening data from streaming apps to inform their results. Or one can go straight to the source and follow artists on social media or their own websites (but that can get overwhelming if one enjoys many bands). Several music apps like Spotify and Deezer will notify you of upcoming concerts and events for artists you choose. Finally, YouTube lists tour details and ticket links beneath videos of currently touring bands, highlighting events near you. If, that is, you have chosen to share your location with the Google-owned site.

Cynthia Murrell, April 5, 2024

Is the AskJeeves Approach the Next Big Thing Again?

March 14, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Way back when I worked in silicon Valley or Plastic Fantastic as one 1080s wag put it, AskJeeves burst upon the Web search scene. The idea is that a user would ask a question and the helpful digital butler would fetch the answer. This worked for questions like “What’s the temperature in San Mateo?” The system did not work for the types of queries my little group of full-time equivalents peppered assorted online services.

image

A young wizard confronts a knowledge problem. Thanks, MSFT Copilot. How’s that security today? Okay, I understand. Good enough.

The mechanism involved software and people. The software processed the query and matched up the answer with the outputs in a knowledge base. The humans wrote rules. If there was no rule and no knowledge, the butler fell on his nose. It was the digital equivalent of nifty marketing applied to a system about as aware as the man servant in Kazuo Ishiguro’s The Remains of the Day.

I thought about AskJeeves as a tangent notion as I worked through “LLMs Are Not Enough… Why Chatbots Need Knowledge Representation.” The essay is an exploration of options intended to reduce the computational cost, power sucking, and blind spots in large language models. Progress is being made and will be made. A good example is this passage from the essay which sparked my thinking about representing knowledge. This is a direct quote:

In theory, there’s a much better way to answer these kinds of questions.

  1. Use an LLM to extract knowledge about any topics we think a user might be interested in (food, geography, etc.) and store it in a database, knowledge graph, or some other kind of knowledge representation. This is still slow and expensive, but it only needs to be done once rather than every time someone wants to ask a question.
  2. When someone asks a question, convert it into a database SQL query (or in the case of a knowledge graph, a graph query). This doesn’t necessarily need a big expensive LLM, a smaller one should do fine.
  3. Run the user’s query against the database to get the results. There are already efficient algorithms for this, no LLM required.
  4. Optionally, have an LLM present the results to the user in a nice understandable way.

Like AskJeeves, the idea is a good one. Create a system to take a user’s query and match it to information answering the question. The AskJeeves’ approach embodied what I called rules. As long as one has the rules, the answers can be plugged in to a database. A query arrives, looks for the answer, and presents it. Bingo. Happy user with relevant information catapults AskJeeves to the top of a world filled with less elegant solutions.

The question becomes, “Can one represent knowledge in such a way that the information is current, usable, and “accurate” (assuming one can define accurate). Knowledge, however, is a slippery fish. Small domains with well-defined domains chock full of specialist knowledge should be easier to represent. Well, IBM Watson and its adventure in Houston suggests that the method is okay, but it did not work. Larger scale systems like an old-fashioned Web search engine just used “signals” to produce lists which presumably presented answers. “Clever,” correct? (Sorry, that’s an IBM Almaden bit of humor. I apologize for the inside baseball moment.)

What’s interesting is that youthful laborers in the world of information retrieval are finding themselves arm wrestling with some tough but elusive problems. What is knowledge? The answer, “It depends” does not provide much help. Where does knowledge originate, the answer “No one knows for sure.” That does not advance the ball downfield either.

Grabbing epistemology by the shoulders and shaking it until an answer comes forth is a tough job. What’s interesting is that those working with large language models are finding themselves caught in a room of mirrors intact and broken. Here’s what TheTemples.org has to say about this imaginary idea:

The myth represented in this Hall tells of the divinity that enters the world of forms fragmenting itself, like a mirror, into countless pieces. Each piece keeps its peculiarity of reflecting the absolute, although it cannot contain the whole any longer.

I have no doubt that a start up with venture funding will solve this problem even though a set cannot contain itself. Get coding now.

Stephen E Arnold, March 14, 2024

Kagi Hitches Up with Wolfram

March 6, 2024

green-dino_thumb_thumb_thumbThis essay is the work of a dumb dinobaby. No smart software required.

Kagi + Wolfram” reports that the for-fee Web search engine with AI has hooked up with one of the pre-eminent mathy people innovating today. The write up includes PR about the upsides of Kagi search and Wolfram’s computational services. The article states:

…we have partnered with Wolfram|Alpha, a well-respected computational knowledge engine. By integrating Wolfram Alpha’s extensive knowledge base and robust algorithms into Kagi’s search platform, we aim to deliver more precise, reliable, and comprehensive search results to our users. This partnership represents a significant step forward in our goal to provide a search engine that users can trust to find the dependable information they need quickly and easily. In addition, we are very pleased to welcome Stephen Wolfram to Kagi’s board of advisors.

image

The basic wagon gets a rethink with other animals given a chance to make progress. Thanks, MSFT Copilot. Good enough, but in truth I gave up trying to get a similar image with the dog replaced by a mathematician and the pig replaced with a perky entrepreneur.

The integration of mathiness with smart search is a step forward, certainly more impressive than other firms’ recycling of Web content into bubble gum cards presenting answer. Kagi is taking steps — small, methodical ones — toward what I have described as “search enabled applications” and my friend Dr. Greg Grefenstette described in his book with the snappy title “Search-Based Applications: At the Confluence of Search and Database Technologies (Synthesis Lectures on Information Concepts, Retrieval, and Services, 17).”

It may seem like a big step from putting mathiness in a Web search engine to creating a platform for search enabled applications. It may be, but I like to think that some bright young sprouts will figure out that linking a mostly brain dead legacy app with a Kagi-Wolfram service might be useful in a number of disciplines. Even some super confident really brilliantly wonderful Googlers might find the service useful.

Net net: I am gratified that Kagi’s for-fee Web search is evolving. Google’s apparent ineptitude might give Kagi the chance Neeva never had.

Stephen E Arnold, March 6, 2024

Next Page »

  • Archives

  • Recent Posts

  • Meta