Bookmark This: HathiTrust Digital Library
October 30, 2024
Concerned for the Internet Archive? So are we. (For multiple reasons.) But while that venerable site recovers from its recent cyberattacks, remember Hathi exists. Founded in 2008, the not-for-profit HathiTrust Digital Library is a collaborative of academic and research libraries. The site makes millions of digitized items available for study by humans as well as for data mining. The site shares the collection’s story:
“HathiTrust’s digital library came into being during the mid-2000s when companies such as Google began scanning print titles from the shelves of university and college campus libraries. When many of those same libraries created HathiTrust in 2008, they united library copies of those digitized books into a single, shared collection to make as much of the collection available for access as allowable by copyright law. Through HathiTrust, libraries collaborate on long-term management, preservation, and access of their collections. Book lovers and researchers like you can explore this huge collection of digitized materials! Today, HathiTrust Digital Library is the largest set of digitized books managed by academic and research libraries. The collection includes materials typically found on the shelves of North American university and college campuses with the benefit of being available online instead of scattered in buildings around the globe. Our enormous collection includes thousands of years of human knowledge and published materials from around the world, selected by librarians and preserved in the libraries of academic and research libraries. You can find all kinds of digitized books and primary source materials to suit a wide range of research needs.”
The collection contains books and “book-like” items—basically anything except audio/visual files. All Library of Congress subjects are represented, but the largest treasures lie in the Language & Literature, Philosophy, Religion, History, and Social Sciences chambers. All volumes not restricted by copyright are free for anyone to read. Just over half the works are in English, while the rest span over 400 languages, including some that are now extinct. Ninety-five percent were scanned from print by Google, but a few specialized collections were contributed by individuals or institutions. The Collection page offers several sample collections to get you started, or you can build your own. Have fun browsing their collections, and with luck the Internet Archive will be back up and running in no time.
Cynthia Murrell, October 30, 2024
FOGINT: Internet Service Providers in the Hot Box
October 9, 2024
The only smart software involved in producing this short FOGINT post was Microsoft Copilot’s estimable art generation tool. Why? It is offered at no cost.
For several years, I have used the term “ghost providers” to describe online service providers as enablers of online crime. The advent of virtual machines and virtual servers operated by customers who just pay a monthly fee and do everything themselves provides a great foggy ground cover. If an investigators speaks with one of these providers, the response includes variations of “We don’t know” and “No clue, bro.” The reason is that the service provider provides access to a system, includes no support, and leaves it up to the person paying the bill to be the cook, bottlewasher, and janitor. These outfits are in the service business with a range of offerings: Full service to DIY.
“Oh, we cannot see what is on the virtual machines working as virtual servers,” says the bright ISP operator. Thanks, MSFT Copilot. That’s pretty lousy fog if I say so myself.
Italy wants to take action to prevent enablers who provide ghost services with bare metal and zero service other than pings, plumbing, and power. “ISPs ‘Betrayed’ Over Pirate Site-Blocking Threats, The Reckoning Will Be Invisible” reports that Italy’s
advanced legal weaponry is incapable of dealing with distant pirate IPTV services. Instead, it mainly targets communications infrastructure, much of it operated by rightsholders’ supposed allies – ISPs – who were given no say in the matter.
Torrent Freak’s view of the law is somewhat reserved, even skeptical. The cited article continues:
if pirate sites share an IP address with entirely innocent sites, and the innocent sites are outnumbered, ISPs, VPNs and DNS services will be legally required to block them all. Since nobody ever passes bad law and good laws hurt no one, blocking innocent sites can be conducted guilt-free from the moral high ground.
Among those with a strong view of the law is Giovanni Zorzoni, president of the Italian Internet Provider Association. No big surprise, FOGINT surmises. The article quotes him as saying:
“Irresponsible initiative that, in the sole interest of the football lobby, tramples on operators, [AGCOM] and the Internet ecosystem,” he said. “Thanks to the new law, they will be able to block sites that are no longer exclusively, but also ‘mainly’ used to distribute illegal content, substantially widening the scope of [rightsholders’] discretion. It may therefore happen, much more frequently, that even legitimate addresses that are only accidentally used for the transmission of pirated content are blocked,” Zorzoni added.
Google offered some input which Torrent Freak presented; to wit:
Diego Ciulli, Head of Government Affairs and Public Policy at Google in Italy, expressed concern over the likely effect on the justice system in Italy should Google be required to comply. Under the label of “fighting piracy”, Ciulli said that digital platforms will be required to notify the judicial authorities of ALL copyright infringements – present, past and future – when they become aware of them. That could be a problem. “Do you know how many there are in the case of Google? At the moment, 9,756,931,770. In short, the Senate is asking us to flood the judicial authorities with almost 10 billion URLs – and provides for prison if we miss a single notification. If the law is not amended, the risk is to do the opposite of the spirit of the law: clog up the judicial authorities, and take resources away from the fight against piracy,” he warned.
Yep, imagine if ISPs had to block packets containing information directly linked to illegal activities. That is, it seems, to be a lot of work for the ISPs to do.
Several observations:
- Some service providers are known for their willingness to facilitate content which breaks laws
- The “virtualization” of “services” provides a 24×7 disco dance fog machine to hide certain activities from staff, other customers, and government authorities
- The money derived from the customers who exploit the willful obfuscation makes the service provider business tick.
Is the Italian law a remedy? No. Will other countries crank up regulation of ISPs? Yes. But after decades of a digital Wild West, fences will not be erected overnight. As a result, the black sheep will roam among wild ponies and make a range of online crimes possible and lucrative. That’s quite a marketing position for some firms.
Stephen E Arnold, October 9, 2024
Rapid Change: The Technological Meteor Causing Craziness
September 6, 2024
This essay is the work of a dumb dinobaby. No smart software required.
The mantra “Move fast and break things” creates opportunities for entrepreneurs and mental health professionals. “Eminent Scientist Richard Dawkins Reveals Fascinating Theory Behind West’s Mental Health Crisis” quotes Dr. Dawkins:
‘Certainly, the rate at which we are evolving genetically is miniscule compared to the rate at which we are evolving non-genetically, culturally,’ Dawkins told the hosts of the TRIGGERnometry podcast. ‘And much of the mental illness that afflicts people may be because we are in a constantly changing unpredictable environment,’ the biologist added, ‘in a way that our ancestors were not.’
Thanks, Microsoft Copilot. Is that a Windows Phone doing the flame out thing?
The write up reports:
Dawkins expressed more direct concerns with other aspects of human technology’s impact on evolution: climate change and basic self-reliance in the face of a new Dark Age. ‘The internet is a huge change, it’s gigantic change,’ he noted. ‘We’ve become adapted to it with astonishing rapidity.’ ‘if we lost electricity, if we suddenly lost the technology we’re used to,’ Dawkins worried, humanity might not be able to eve ‘begin’ to adapt in time, without great social upheaval and death… ‘Man-made extinction,’ he said, ‘it’s just as bad as the others. I think it’s tragic.’
There you go, death.
I know that brilliant people often speak carefully. Experts take time to develop their knowledge base and put words together that make complex ideas easy to understand.
From my redoubt in rural Kentucky, I have watched the panoply of events parading across my computer monitor. Among the notable moments were:
- Images from US cities showing homeless people slumped over either scrolling on their mobile phones or from the impact of certain compounds on their body
- Young people looting stores and noting similar items offered for sale on Craigslist.com-type sites
- Graphs of US academic performance illustrating the winners and losers of educational achievement tests
- The number of people driving around at times I associated with being in an office at “work” when I was younger
- Advertisements for prescription drugs with peculiar names and high-resolution images of people with smiles and contented lives but for the unnamed disease plaguing the otherwise cheerful folk.
What are the links between these unrelated situations and online access? I think I have a reasonably good idea. Why have experts, parents, and others required decades to figure out that flows of information are similar to sand-blasting systems. Provide electronic information to an organization, and it begins to decompose. The “bonds” which hold the people, processes, and products together are weakened. Then some break. Pump electronic information into younger people. They begin to come apart too. Give college students a tool to write their essays. Like lemmings, many take the AI solution and watch TikToks.
I am pleased that Dr. Dawkins has identified a problem. Now what’s the fix? The digital meteor has collided with human civilization. Can the dinosaurs be revivified?
Stephen E Arnold, September 6, 2024
New Research about Telegram and Its Technology
August 29, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Next week, my team and I will be presenting a couple of lectures to a group of US government cyber experts. Our topic is Telegram, which has been a focal point of my research team for most of 2024. Much of the information we have included in our talks will be new; that is, it presents a view of Telegram which is novel. However, we have available a public version of the material. Most of our work is delivered via video conferencing with PDFs of selected exhibits provided to those participating in a public version of our research.
For the Telegram project, the public lecture includes:
- A block diagram of the Telegram distributed system, including the crypto and social media components
- A timeline of Telegram innovations with important or high-impact innovations identified
- A flow diagram of the Open Network and its principal components
- Likely “next steps” for the distributed operation.
With the first stage of the French judiciary process involving the founder of Telegram completed, our research project has become one of the first operational analyses of what to many people outside of Russia, the Russian Federation, Ukraine, and other countries is unfamiliar. Although usage of Telegram in North America is increasing, the service is off the radar of many people.
In fact, knowledge of Telegram’s basic functions is sketchy. Our research revealed:
- Users lack knowledge of Telegram’s approach to encryption
- The role US companies play in keeping the service online and stable
- The automation features of the system
- The reach of certain Telegram dApps (distributed applications) and YouTube, to cite one example.
The public version of our presentation at the US government professionals will be available in mid-September 2024. If you are interested in this lecture, please, write benkent2020 at yahoo dot com. One of the Beyond Search team will respond to your inquiry with dates and fees, if applicable.
Stephen E Arnold, August 29, 2024
Online Sports Gambling: Some Negatives Have Been Identified by Brilliant Researchers
August 29, 2024
This essay is the work of a dumb dinobaby. No smart software required.
People love gambling, especially when they’re betting on the results of sports. Online has made sports betting very easy and fun. Unfortunately some people who bet on sports are addicted to the activity. Business Insider reveals the underbelly of online gambling and paints a familiar picture of addiction: “It’s Official: Legalized Sports Betting Is Destroying Young Men’s Financial Futures.” The University of California, Los Angeles shared a working paper about the negative effects of legalized sports gambling:
“…takes a look at what’s happened to consumer financial health in the 38 states that have greenlighted sports betting since the Supreme Court in 2018 struck down a federal law prohibiting it. The findings are, well, rough. The researchers found that the average credit score in states that legalized any form of sports gambling decreased by 0.3% after about four years and that the negative impact was stronger where online sports gambling is allowed, with credit scores dipping in those areas by 1%. They also found an 8% increase in debt-collection amounts and a 28% increase in bankruptcies where online sports betting was given the go-ahead. By their estimation, that translates to about 100,000 extra bankruptcies each year in the states that have legalized sports betting. The number of people who fell dangerously behind on their car loans went up, too. Oddly enough, credit-card delinquencies fell, but the researchers believe that’s because banks wind up lowering credit limits to try to compensate for the rise in risky consumer behavior.”
The researchers discovered that legalized gambling leads to more gambling addictions. They also found if a person lives near a casino or is from a poor region, they’ll more prone to gambling. This isn’t anything new! The paper restates information people have known for centuries about gambling and other addictions: hurts finances, leads to destroyed relationships, job loss, increased in illegal activities, etc.
A good idea is to teach people to restraint. The sports betting Web sites can program limits and even assist their users to manage their money without going bankrupt. It’s better for people to be taught restraint so they can roll the dice one more time.
Stephen E Arnold, August 29, 2024
Yep, the Old Internet Is Gone. Learn to Love the New Internet
August 1, 2024
This essay is the work of a dumb humanoid. No smart software required.
The market has given the Google the green light to restrict information. The information highway has a new on ramp. If you want content created by people who were not compensated, you have to use Google search. Toss in the advertising system and that good old free market is going to deliver bumper revenue to stakeholders.
Online search is a problem. Here’s an old timer like me who broke his leg. The young wizard who works at a large online services firm explains that I should not worry. By the time my leg heals, I will be dead. Happy thoughts from one of those Gen somethings. Thanks, MSFT Copilot. How your security systems today?
What about users? The reality is that with Google the default search system in Apple iPhones, the brand that has redefined search and retrieval to mean “pay to play,” what’s the big deal?
Years ago I explained in numerous speeches and articles in publications like Online Magazine that online fosters the creation of centralized monopolistic information services. Some information professionals dismissed my observation as stupid. The general response was that online would generate benefits. I agree. But there were a few downsides. I usually pointed to the duopoly in online for fee legal information. I referenced the American Chemical Society’s online service Chemical Abstracts. I even pointed out that outfits like Predicasts and the New York Times would have a very, very tough time creating profitable information centric standalone businesses. The centralization or magnetic pull of certain online services would make generating profits very expensive.
So where are we now? I read “Reddit, Google, and the Real Cost of the AI Data Rush.” The article is representative of “real” journalists’, pundits’, and some regulators’ understanding of online information. The write up says:
Google, like Reddit, owes its existence and success to the principles and practices of the open web, but exclusive arrangements like these mark the end of that long and incredibly fruitful era. They’re also a sign of things to come. The web was already in rough shape, reduced over the last 15 years by the rise of walled-off platforms, battered by advertising consolidation, and polluted by a glut of content from the AI products that used it for training. The rise of AI scraping threatens to finish the job, collapsing a flawed but enormously successful, decades-long experiment in open networking and human communication to a set of antagonistic contracts between warring tech firms.
I want to point out that Google bought rights to Reddit. If you want to search Reddit, you use Google. Because Reddit is a high traffic site, users have to use Google. Guess what? Most online users do not care. Search means Google. Information access means Google. Finding a restaurant means Google. Period.
Google has become a center of gravity in the online universe. One can argue that Google is the Internet. In my monograph Google Version 2.0: The Calculating Predator that is exactly what some Googlers envisioned for the firm. Once a user accesses Google, Google controls the information world. One can argue that Meta and TikTok are going to prevent that. Some folks suggest that one of the AI start ups will neutralize Google’s centralized gravitational force. Google is a distributed outfit. Think of it as like the background radiation in our universe. It is just there. Live with it.
Google has converted content created by people who were not compensated into zeros and ones that will enhance its magnetic pull on users.
Several observations:
- Users were so enamored of a service which could show useful results from the quite large and very disorganized pools of digital information that it sucked the life out of its competitors.
- Once a funding source got the message through to the Backrub boys that they had to monetize, the company obtained inspiration from the Yahoo pay to play model which Yahoo acquired from Overture.com, formerly GoTo.com. That pay to play thing produces lots of money when there is traffic. Google was getting traffic.
- Regulators ignored Google’s slow but steady march to information dominance. In fact, some regulatory professionals with whom I spoke thought Google was the cat’s pajamas and asked me if I could get them Google T shirts for their kids. Google was not evil; it was fund; it was success.
- Almost the entire world’s intelligence professionals relay on Google for OSINT. If you don’t know what that means, forget the term. Knowing the control Google can exert by filtering information on a topic will probably give you a tummy ache.
The future is going to look exactly like the world of online in the year 1980. Google and maybe a couple of smaller also rans will control access to digital information. To get advertising free and to have a shot at bias free answers to online queries, users will have to pay. The currency will be watching advertising or subscribing to a premium service. The business model of Dialog Information Services, SDC, DataStar, and Dialcom is coming back. The prices will inflate. Control of information will be easy. And shaping or weaponizing content flow from these next generation online services will be too profitable to resist. Learn to love Google. It is more powerful than a single country’s government. If a country gets too frisky for Google’s liking, the company has ways to evade issues that make it uncomfortable.
The cartoon in this blog post summarizes my view of the situation. A fix will take a long time. I will be pushing up petunias before the problems of online search and the Information Superhighway are remediated.
Stephen E Arnold, August 1, 2024
Prompt Tips and Query Refinements
July 29, 2024
Generative AI is paving the way for more automation, smarter decisions, and (possibly) an easier world. AI is still pretty stupid, however, and it needs to be hand fed information to make it work well. Dr. Lance B. Eliot is an AI expert and he contributed, “The Best Engineering Techniques For Getting The Most Out Of Generative AI” for Forbes.
Eliot explains the prompt engineering is the best way to make generative AI. He developed a list of how to write prompts and related skills. The list is designed to be a quick, easy tutorial that is also equipped with links for more information related to the prompt. Eliot’s first tip is to keep the prompt simple, direct, and obvious, otherwise the AI will misunderstand your intent.
He the rattles of a bunch of rhetoric that reads like it was written by generative AI. Maybe it was? In short, it’s good to learn how to write prompts to prepare for the future. He runs through the list alphabetically, then if that’s enough Eliot lists the prompts numerically:
“I didn’t number them because I was worried that the numbering would imply a semblance of importance or priority. I wanted the above listing to seem that all the techniques are on an equal footing. None is more precious than any of the others.
Lamentably, not having numbers makes life harder when wanting to quickly refer to a particular prompt engineering technique. So, I am going to go ahead and show you the list again and this time include assigned numbers. The list will still be in alphabetical order. The numbering is purely for ease of reference and has no bearing on priority or importance.”
The list is rundown of psychological and intercommunication methods used by humans. A lot of big words are used, but the explanations were written by a tech-savvy expert for his fellow tech people. In layman’s terms, the list explains that anything technique will work. Here’s one from me: use generative AI to simplify the article. Here’s a paradox prompt: if you feed generative AI a prompt written by generative AI will it explode?
Whitney Grace, July 29, 2024
Stop Indexing! And Pay Up!
July 17, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
I read “Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI.” The write up appears in two online publications, presumably to make an already contentious subject more clicky. The assertion in the title is the equivalent of someone in Salem, Massachusetts, pointing at a widower and saying, “She’s a witch.” Those willing to take the statement at face value would take action. The “trials” held in colonial Massachusetts. My high school history teacher was a witchcraft trial buff. (I think his name was Elmer Skaggs.) I thought about his descriptions of the events. I recall his graphic depictions and analysis of what I recall as “dunking.” The idea was that if a person was a witch, then that person could be immersed one or more times. I think the idea had been popular in medieval Europe, but it was not a New World innovation. Me-too is a core way to create novelty. The witch could survive being immersed for a period of time. With proof, hanging or burning were the next step. The accused who died was obviously not a witch. That’s Boolean logic in a pure form in my opinion.
The Library in Alexandria burns in front of people who wanted to look up information, learn, and create more information. Tough. Once the cultural institution is gone, just figure out the square root of two yourself. Thanks, MSFT Copilot. Good enough.
The accusations and evidence in the article depict companies building large language models as candidates for a test to prove that they have engaged in an improper act. The crime is processing content available on a public network, indexing it, and using the data to create outputs. Since the late 1960s, digitizing information and making it more easily accessible was perceived as an important and necessary activity. The US government supported indexing and searching of technical information. Other fields of endeavor recognized that as the volume of information expanded, the traditional methods of sitting at a table, reading a book or journal article, making notes, analyzing the information, and then conducting additional research or writing a technical report was simply not fast enough. What worked in a medieval library was not a method suited to put a satellite in orbit or perform other knowledge-value tasks.
Thus, online became a thing. Remember, we are talking punched cards, mainframes, and clunky line printers one day there was the Internet. The interest in broader access to online information grew and by 1985, people recognized that online access was useful for many tasks, not just looking up information about nuclear power technologies, a project I worked on in the 1970s. Flash forward 50 years, and we are upon the moment one can read about the “fact” that Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI.
The write up says:
AI companies are generally secretive about their sources of training data, but an investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission. Our investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple, and Salesforce.
I understand the surprise some experience when they learn that a software script visits a Web site, processes its content, and generates an index (a buzzy term today is large language model, but I prefer the simpler word index.)
I want to point out that for decades those engaged in making information findable and accessible online have processed content so that a user can enter a query and get a list of indexed items which match that user’s query. In the old days, one used Boolean logic which we met a few moments ago. Today a user’s query (the jazzy term is prompt now) is expanded, interpreted, matched to the user’s “preferences”, and a result generated. I like lists of items like the entries I used to make on a notecard when I was a high school debate team member. Others want little essays suitable for a class assignment on the Salem witchcraft trials in Mr. Skaggs’s class. Today another system can pass a query, get outputs, and then take another action. This is described by the in-crowd as workflow orchestration. Others call it, “taking a human’s job.”
My point is that for decades, the index and searching process has been without much innovation. Sure, software scripts can know when to enter a user name and password or capture information from Web pages that are transitory, disappearing in the blink of an eye. But it is still indexing over a network. The object remains to find information of utility to the user or another system.
The write up reports:
Proof News contributor Alex Reisner obtained a copy of Books3, another Pile dataset and last year published a piece in The Atlantic reporting his finding that more than 180,000 books, including those written by Margaret Atwood, Michael Pollan, and Zadie Smith, had been lifted. Many authors have since sued AI companies for the unauthorized use of their work and alleged copyright violations. Similar cases have since snowballed, and the platform hosting Books3 has taken it down. In response to the suits, defendants such as Meta, OpenAI, and Bloomberg have argued their actions constitute fair use. A case against EleutherAI, which originally scraped the books and made them public, was voluntarily dismissed by the plaintiffs. Litigation in remaining cases remains in the early stages, leaving the questions surrounding permission and payment unresolved. The Pile has since been removed from its official download site, but it’s still available on file sharing services.
The passage does a good job of making clear that most people are not aware of what indexing does, how it works, and why the process has become a fundamental component of many, many modern knowledge-centric systems. The idea is to find information of value to a person with a question, present relevant content, and enable the user to think new thoughts or write another essay about dead witches being innocent.
The challenge today is that anyone who has written anything wants money. The way online works is that for any single user’s query, the useful information constitutes a tiny, miniscule fraction of the information in the index. The cost of indexing and responding to the query is high, and those costs are difficult to control.
But everyone has to be paid for the information that individual “created.” I understand the idea, but the reality is that the reason indexing, search, and retrieval was invented, refined, and given numerous life extensions was to perform a core function: Answer a question or enable learning.
The write up makes it clear that “AI companies” are witches. The US legal system is going to determine who is a witch just like the process in colonial Salem. Several observations are warranted:
- Modifying what is a fundamental mechanism for information retrieval may be difficult to replace or re-invent in a quick, cost-efficient, and satisfactory manner. Digital information is loosey goosey; that is, it moves, slips, and slides either by individual’s actions or a mindless system’s.
- Slapping fines and big price tags on what remains an access service will take time to have an impact. As the implications of the impact become more well known to those who are aggrieved, they may find that their own information is altered in a fundamental way. How many research papers are “original”? How many journalists recycle as a basic work task? How many children’s lives are lost when the medical reference system does not have the data needed to treat the kid’s problem?
- Accusing companies of behaving improperly is definitely easy to do. Many companies do ignore rules, regulations, and cultural norms. Engineering Index’s publisher leaned that bootleg copies of printed Compendex indexes were available in China. What was Engineering Index going to do when I learned this almost 50 years ago? The answer was give speeches, complain to those who knew what the heck a Compendex was, and talk to lawyers. What happened to the Chinese content pirates? Not much.
I do understand the anger the essay expresses toward large companies doing indexing. These outfits are to some witches. However, if the indexing of content is derailed, I would suggest there are downstream consequences. Some of those consequences will make zero difference to anyone. A government worker at a national lab won’t be able to find details of an alloy used in a nuclear device. Who cares? Make some phone calls? Ask around. Yeah, that will work until the information is needed immediately.
A student accustomed to looking up information on a mobile phone won’t be able to find something. The document is a 404 or the information returned is an ad for a Temu product. So what? The kid will have to go the library, which one hopes will be funded, have printed material or commercial online databases, and a librarian on duty. (Good luck, traditional researchers.) A marketing team eager to get information about the number of Telegram users in Ukraine won’t be able to find it. The fix is to hire a consultant and hope those bright men and women have a way to get a number, a single number, good, bad, or indifferent.)
My concern is that as the intensity of the objections about a standard procedure for building an index escalate, the entire knowledge environment is put at risk. I have worked in online since 1962. That’s a long time. It is amazing to me that the plumbing of an information economy has been ignored for a long time. What happens when the companies doing the indexing go away? What happens when those producing the government reports, the blog posts, or the “real” news cannot find the information needed to create information? And once some information is created, how is another person going to find it. Ask an eighth grader how to use an online catalog to find a fungible book. Let me know what you learn? Better yet, do you know how to use a Remac card retrieval system?
The present concern about information access troubles me. There are mechanisms to deal with online. But the reason content is digitized is to find it, to enable understanding, and to create new information. Digital information is like gerbils. Start with a couple of journal articles, and one ends up with more journal articles. Kill this access and you get what you wanted. You know exactly who is the Salem witch.
Stephen E Arnold, July 17, 2024
x
x
x
x
x
x
Does Google Have a Monopoly? Does AI Search Make a Difference?
July 9, 2024
I read “2024 Zero-Click Search Study: For Every 1,000 EU Google Searches, Only 374 Clicks Go to the Open Web. In the US, It’s 360.” The write up begins with caveats — many caveats. But I think I am not into the search engine optimization and online advertising mindset. As a dinobaby, I find the pursuit of clicks in a game controlled by one outfit of little interest.
Is it possible that what looks like a nice family vacation place is a digital roach motel? Of course not! Thanks, MSFT Copilot. Good enough.
Let’s answer the two questions the information in the report from the admirably named SparkToro presents. In my take on the article, the charts, the buzzy jargon, the answer to the question, “Does Google Have a Monopoly?” the answer is, “Wow, do they.”
The second question I posed is, “Does AI Search Make a Difference in Google Traffic?’ the answer is, “A snowball’s chance in hell is better.”
The report and analysis takes me to close enough for horse shoes factoids. But that’s okay because the lack of detailed, reliable data is part of the way online operates. No one really knows if the clicks from a mobile device are generated by a nepo baby with money to burn or a bank of 1,000 mobile devices mindlessly clicking on Web destinations. Factoids about online activity are, at best, fuzzy. I think SEO experts should wear T shirts and hats with this slogan, “Heisenberg rocks. I am uncertain.
I urge you to read and study the SparkToro analysis. (I love that name. An electric bull!)
The article points out that Google gets a lot of clicks. Here’s a passage which knits together several facts from the study:
Google gets 1/3 of the clicks. Imagine a burger joint selling 33 percent of the burgers worldwide. Could they get more? Yep. How much more:
Equally concerning, especially for those worried about Google’s monopoly power to self-preference their own properties in the results, is that almost 30% of all clicks go to platforms Google owns. YouTube, Google Images, Google Maps, Google Flights, Google Hotels, the Google App Store, and dozens more means that Google gets even more monetization and sector-dominating power from their search engine. Most interesting to web publishers, entrepreneurs, creators, and (hopefully) regulators is the final number: for every 1,000 searches on Google in the United States, 360 clicks make it to a non-Google-owned, non-Google-ad-paying property. Nearly 2/3rds of all searches stay inside the Google ecosystem after making a query.
The write up also presents information which suggests that the European Union’s regulations don’t make much difference in the click flow. Sorry, EU. You need another approach, perhaps?
In the US, users of Google have a tough time escaping what might be colorfully named the “digital roach motel.”
Search behavior in both regions is quite similar with the exception of paid ads (EU mobile searchers are almost 50% more likely to click a Google paid search ad) and clicks to Google properties (where US searchers are considerably more likely to find themselves back in Google’s ecosystem after a query).
The write up presented by SparkToro (Is it like the energizer bunny?) answers a question many investors and venture firms with stakes in smart software are asking: “Is Google losing search traffic? The answer is, “Nope. Not a chance.”
According to Datos’ panel, Google’s in no risk of losing market share, total searches, or searches per searcher. On all of these metrics they are, in fact, stronger than ever. In both the US and EU, searches per searcher are rising and, in the Spring of 2024, were at historic highs. That data doesn’t fit well with the narrative that Google’s cost themselves credibility or that Internet users are giving up on Google and seeking out alternatives. … Google continues to send less and less of its ever-growing search pie to the open web…. After a decline in 2022 and early 2023, Google’s back to referring a historically high amount of its search clicks to its own properties.
AI search has not been the game changer for which some hoped.
Net net: I find it interesting that data about what appears to be a monopoly is so darned sketchy after more than two decades of operation. For Web search start ups, it may be time to rethink some of those assertions in those PowerPoint decks.
Stephen E Arnold, July 9, 2024
Encryption Battles Continue
June 4, 2024
This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.
Privacy protections are great—unless you are law-enforcement attempting to trace a bad actor. India has tried to make it easier to enforce its laws by forcing messaging apps to track each message back to its source. That is challenging for a platform with encryption baked in, as Rest of World reports in, “WhatsApp Gives India an Ultimatum on Encryption.” Writer Russell Brandom tells us:
“IT rules passed by India in 2021 require services like WhatsApp to maintain ‘traceability’ for all messages, allowing authorities to follow forwarded messages to the ‘first originator’ of the text. In a Delhi High Court proceeding last Thursday, WhatsApp said it would be forced to leave the country if the court required traceability, as doing so would mean breaking end-to-end encryption. It’s a common stance for encrypted chat services generally, and WhatsApp has made this threat before — most notably in a protracted legal fight in Brazil that resulted in intermittent bans. But as the Indian government expands its powers over online speech, the threat of a full-scale ban is closer than it’s been in years.”
And that could be a problem for a lot of people. We also learn:
“WhatsApp is used by more than half a billion people in India — not just as a chat app, but as a doctor’s office, a campaigning tool, and the backbone of countless small businesses and service jobs. There’s no clear competitor to fill its shoes, so if the app is shut down in India, much of the digital infrastructure of the nation would simply disappear. Being forced out of the country would be bad for WhatsApp, but it would be disastrous for everyday Indians.”
Yes, that sounds bad. For the Electronic Frontier Foundation, it gets worse: The civil liberties organization insists the regulation would violate privacy and free expression for all users, not just suspected criminals.
To be fair, WhatsApp has done a few things to limit harmful content. It has placed limits on message forwarding and has boosted its spam and disinformation reporting systems. Still, there is only so much it can do when enforcement relies on user reports. To do more would require violating the platform’s hallmark: its end-to-end encryption. Even if WhatsApp wins this round, Brandom notes, the issue is likely to come up again when and if the Bharatiya Janata Party does well in the current elections.
Cynthia Murrell, June 4, 2024