It Works for SEO and Narcotics… and Academics
February 14, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Academic research papers that have been cited often are probably credible, right? These days, not so much. Science reports, “Citation Cartels Help Some Mathematicians – and their Universities – Climb the Rankings.” Referring to an analysis by University of Vigo’s Domingo Docampo, writer Michele Catanzaro tells us:
“Cliques of mathematicians at institutions in China, Saudi Arabia, and elsewhere have been artificially boosting their colleagues’ citation counts by churning out low-quality papers that repeatedly reference their work, according to an unpublished analysis seen by Science. As a result, their universities—some of which do not appear to have math departments—now produce a greater number of highly cited math papers each year than schools with a strong track record in the field, such as Stanford and Princeton universities. These so-called ‘citation cartels’ appear to be trying to improve their universities’ rankings, according to experts in publication practices. ‘The stakes are high—movements in the rankings can cost or make universities tens of millions of dollars,’ says Cameron Neylon, a professor of research communication at Curtin University. ‘It is inevitable that people will bend and break the rules to improve their standing.’ In response to such practices, the publishing analytics company Clarivate has excluded the entire field of math from the most recent edition of its influential list of authors of highly cited papers, released in November 2023.”
Thanks MSFT Copilot Bing thing. You are mostly working today. Actually well enough for good enough art.
Researchers say this manipulation occurs across disciplines, but the relatively low number of published math papers makes it more obvious in that field. When Docampo noticed the trend, the mathematician analyzed 15 years’ worth of Clarivate’s data to determine which universities were publishing highly cited math papers and who was citing them. Back in 2008 – 2010, legitimately heavy-hitters like UCLA and Princeton were at the top of the cited list. But in the last few years those were surpassed by institutions not exactly known for their mathematics prowess. Many were based in China, Saudi Arabia, and Egypt. And, yes, those citations were coming from inside the writers’ own schools. Sneaky. But not sneaky enough.
There may again come a time when citations can be used as a metric for reliability. Docampo is working on a system to weigh citations according to the quality of the citing journals and institutions. Until then, everyone should take citation counts with a grain of salt.
Cynthia Murrell, February 14, 2024
A Xoogler Explains AI, News, Inevitability, and Real Business Life
February 13, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I read an essay providing a tiny bit of evidence that one can take the Googler out of the Google, but that Xoogler still retains some Googley DNA. The item appeared in the Bezos bulldozer’s estimable publication with the title “The Real Wolf Menacing the News Business? AI.” Absolutely. Obviously. Who does not understand that?
A high-technology sophist explains the facts of life to a group of listeners who are skeptical about artificial intelligence. The illustration was generated after three tries by Google’s own smart software. I love the miniature horse and the less-than-flattering representation of a sales professional. That individual looks like one who would be more comfortable eating the listeners than convincing them about AI’s value.
The essay contains a number of interesting points. I want to highlight three and then, as I quite enjoy doing, I will offer some observations.
The author is a Xoogler who served from 2017 to 2023 as the senior director of news ecosystem products. I quite like the idea of a “news ecosystem.” But ecosystems as some who follow the impact of man on environments can be destroyed or pushed to the edge of catastrophe. In the aftermath of devastation coming from indifferent decision makers, greed fueled entrepreneurs, or rhinoceros poachers, landscapes are often transformed.
First, the essay writer argues:
The news publishing industry has always reviled new technology, whether it was radio or television, the internet or, now, generative artificial intelligence.
I love the word “revile.” It suggests that ignorant individuals are unable to grasp the value of certain technologies. I also like the very clever use of the word “always.” Categorical affirmatives make the world of zeros and one so delightfully absolute. We’re off to a good start I think.
Second, we have a remarkable argument which invokes another zero and one type of thinking. Consider this passage:
The publishers’ complaints were premised on the idea that web platforms such as Google and Facebook were stealing from them by posting — or even allowing publishers to post — headlines and blurbs linking to their stories. This was always a silly complaint because of a universal truism of the internet: Everybody wants traffic!
I love those universal truisms. I think some at Google honestly believe that their insights, perceptions, and beliefs are the One True Path Forward. Confidence is good, but the implication that a universal truism exists strikes me as information about a psychological and intellectual aberration. Consider this truism offered by my uneducated great grandmother:
Always get a second opinion.
My great grandmother used the logically troublesome word “always.” But the idea seems reasonable, but the action may not be possible. Does Google get second opinions when it decides to kill one of its services, modify algorithms in its ad brokering system, or reorganize its contentious smart software units? “Always” opens the door to many issues.
Publishers (I assume “all” publishers)k want traffic. May I demonstrate the frailty of the Xoogler’s argument. I publish a blog called Beyond Search. I have done this since 2008. I do not care if I get traffic or not. My goal was and remains to present commentary about the antics of high-technology companies and related subjects. Why do I do this? First, I want to make sure that my views about such topics as Google search exist. Second, I have set up my estate so the content will remain online long after I am gone. I am a publisher, and I don’t want traffic, or at least the type of traffic that Google provides. One exception causes an argument like the Xoogler’s to be shown as false, even if it is self-serving.
Third, the essay points its self-righteous finger at “regulators.” The essay suggests that elected officials pursued “illegitimate complaints” from publishers. I noted this passage:
Prior to these laws, no one ever asked permission to link to a website or paid to do so. Quite the contrary, if anyone got paid, it was the party doing the linking. Why? Because everybody wants traffic! After all, this is why advertising businesses — publishers and platforms alike — can exist in the first place. They offer distribution to advertisers, and the advertisers pay them because distribution is valuable and seldom free.
Repetition is okay, but I am able to recall one of the key arguments in this Xoogler’s write up: “Everybody wants traffic.” Since it is false, I am not sure the essay’s argumentative trajectory is on the track of logic.
Now we come to the guts of the essay: Artificial intelligence. What’s interesting is that AI magnetically pulls regulators back to the casino. Smart software companies face techno-feudalists in a high-stakes game. I noted this passage about anchoring statements via verification and just training algorithms:
The courts might or might not find this distinction between training and grounding compelling. If they don’t, Congress must step in. By legislating copyright protection for content used by AI for grounding purposes, Congress has an opportunity to create a copyright framework that achieves many competing social goals. It would permit continued innovation in artificial intelligence via the training and testing of LLMs; it would require licensing of content that AI applications use to verify their statements or look up new facts; and those licensing payments would financially sustain and incentivize the news media’s most important work — the discovery and verification of new information — rather than forcing the tech industry to make blanket payments for rewrites of what is already long known.
Who owns the casino? At this time, I would suggest that lobbyists and certain non-governmental entities exert considerable influence over some elected and appointed officials. Furthermore, some AI firms are moving as quickly as reasonably possible to convert interest in AI into revenue streams with moats. The idea is that if regulations curtail AI companies, consumers would not be well served. No 20-something wants to read a newspaper. That individual wants convenience and, of course, advertising.
Now several observations:
- The Xoogler author believes in AI going fast. The technology serves users / customers what they want. The downsides are bleats and shrieks from an outmoded sector; that is, those engaged in news
- The logic of the technologist is not the logic of a person who prefers nuances. The broad statements are false to me, for example. But to the Xoogler, these are self-evident truths. Get with our program or get left to sleep on cardboard in the street.
- The schism smart software creates is palpable. On one hand, there are those who “get it.” On the other hand, there are those who fight a meaningless battle with the inevitable. There’s only one problem: Technology is not delivering better, faster, or cheaper social fabrics. Technology seems to have some downsides. Just ask a journalist trying to survive on YouTube earnings.
Net net: The attitude of the Xoogler suggests that one cannot shake the sense of being right, entitlement, and logic associated with a Googler even after leaving the firm. The essay makes me uncomfortable for two reasons: [1] I think the author means exactly what is expressed in the essay. News is going to be different. Get with the program or lose big time. And [2] the attitude is one which I find destructive because technology is assumed to “do good.” I am not too sure about that because the benefits of AI are not known and neither are AI’s downsides. Plus, there’s the “everybody wants traffic.” Monopolistic vendors of online ads want me to believe that obvious statement is ground truth. Sorry. I don’t.
Stephen E Arnold, February 13, 2024
AI: Big Ideas and Bigger Challenges for the Next Quarter Century. Maybe, Maybe Not
February 13, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I read an interesting ArXiv.org paper with a good title: “Ten Hard Problems in Artificial Intelligence We Must Get Right.” The topic is one which will interest some policy makers, a number of AI researchers, and the “experts” in machine learning, artificial intelligence, and smart software.
The structure of the paper is, in my opinion, a three-legged stool analysis designed to support the weight of AI optimists. The first part of the paper is a compressed historical review of the AI journey. Diagrams, tables, and charts capture the direction in which AI “deep learning” has traveled. I am no expert in what has become the next big thing, but the surprising point in the historical review is that 2010 is the date pegged as the start to the 2016 time point called “the large scale era.” That label is interesting for two reasons. First, I recall that some intelware vendors were in the AI game before 2010. And, second, the use of the phrase “large scale” defines a reality in which small outfits are unlikely to succeed without massive amounts of money.
The second leg of the stool is the identification of the “hard problems” and a discussion of each. Research data and illustrations bring each problem to the reader’s attention. I don’t want to get snagged in the plagiarism swamp which has captured many academics, wives of billionaires, and a few journalists. My approach will be to boil down the 10 problems to a short phrase and a reminder to you, gentle reader, that you should read the paper yourself. Here is my version of the 10 “hard problems” which the authors seem to suggest will be or must be solved in 25 years:
- Humans will have extended AI by 2050
- Humans will have solved problems associated with AI safety, capability, and output accuracy
- AI systems will be safe, controlled, and aligned by 2050
- AI will make contributions in many fields; for example, mathematics by 2050
- AI’s economic impact will be managed effectively by 2050
- Use of AI will be globalized by 2050
- AI will be used in a responsible way by 2050
- Risks associated with AI will be managed by effectively by 2050
- Humans will have adapted its institutions to AI by 2050
- Humans will have addressed what it means to be “human” by 2050
Many years ago I worked for a blue-chip consulting firm. I participated in a number of big-idea projects. These ranged from technology, R&D investment, new product development, and the global economy. In our for-fee reports were did include a look at what we called the “horizon.” The firm had its own typographical signature for this portion of a report. I recall learning in the firm’s “charm school” (a special training program to make sure new hires knew the style, approach, and ground rules for remaining employed at that blue-chip firm). We kept the horizon tight; that is, talking about the future was typically in the six to 12 month range. Nosing out 25 years was a walk into a mine field. My boss, as I recall told me, “We don’t do science fiction.”
The smart robot is informing the philosopher that he is free to find his future elsewhere. The date of the image is 2025, right before the new year holiday. Thanks, MidJourney. Good enough.
The third leg of the stool is the academic impedimenta. To be specific, the paper is 90 pages in length of which 30 present the argument. The remain 60 pages present:
- Traditional footnotes, about 35 pages containing 607 citations
- An “Electronic Supplement” presenting eight pages of annexes with text, charts, and graphs
- Footnotes to the “Electronic Supplement” requiring another 10 pages for the additional 174 footnotes.
I want to offer several observations, and I do not want to have these be less than constructive or in any way what one of my professors who was treated harshly in Letters to the Editor for an article he published about Chaucer. He described that fateful letter as “mean spirited.”
- The paper makes clear that mankind has some work to do in the next 25 years. The “problems” the paper presents are difficult ones because they touch upon the fabric of social existence. Consider the application of AI to war. I think this aspect of AI may be one to warrant a bullet on AI’s hit parade.
- Humans have to resolve issues of automated systems consuming verifiable information, synthetic data, and purpose-built disinformation so that smart software does not do things at speed and behind the scenes. Do those working do resolve the 10 challenges have an ethical compass and if so, what does “ethics” mean in the context of at-scale AI?
- Social institutions are under stress. A number of organizations and nation-states operate as dictators. One central American country has a rock star dictator, but what about the rock star dictators working techno feudal companies in the US? What governance structures will be crafted by 2050 to shape today’s technology juggernaut?
To sum up, I think the authors have tackled a difficult problem. I commend their effort. My thought is that any message of optimism about AI is likely to be hard pressed to point to one of the 10 challenges and and say, “We have this covered.” I liked the write up. I think college students tasked with writing about the social implications of AI will find the paper useful. It provides much of the research a fresh young mind requires to write a paper, possibly a thesis. For me, the paper is a reminder of the disconnect between applied technology and the appallingly inefficient, convenience-embracing humans who are ensnared in the smart software.
I am a dinobaby, and let me you, “I am glad I am old.” With AI struggling with go-fast and regulators waffling about go-slow, humankind has quite a bit of social system tinkering to do by 2050 if the authors of the paper have analyzed AI correctly. Yep, I am delighted I am old, really old.
Stephen E Arnold, February 13, 2024
Google Gems: February 5 to 9, 2024
February 13, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Google tallied another bumper week of innovations, news, and management marvels. Let’s take a look.
WE HAVE OUR ACT TOGETHER
The principal story concerns Google’s “answer” to the numerous competitors for smart software. The Gemini subscription service has arrived. Fourteen months after Microsoft caught Googzilla napping near the Foosball table, the quantum supremacy outfit has responded. Google PR received accolades in the Wired article explaining Google’s monumental achievement: A subscription service like OpenAI’s and Microsoft’s.
And in a twist of logic, Google has allegedly alerted users of Gemini (the answer to MSFT and ChatGPT) not to provide confidential or personal data to a Gemini service. With logging, Google’s learning user behaviors, and users general indifference to privacy issues associated with any Web service — why is a special warning needed? “Google Warning: Do Not Divulge Confidential Info or Personal Data When Using Gemini” reports:
Users can also turn off Gemini Apps Activity to stop the collection of conversations but even when it is disabled, Gemini conversations continue to be saved for up to 72 hours to "maintain the safety and security of Gemini apps and improve Gemini apps."
Toss in Google human review and what do you get? A Googley service with a warning.
Google inspects its gems. Thanks MSFT Copilot. Good enough.
Second, Google has alleged been taking some liberties with data captured from Danish schools. (Imagine that!) The students use Chromebooks, and these devices seem to be adept at capturing data no matter what the Danish IT administrators do. For reference, see the item about confidential and personal data above, please. “Denmark Orders Schools to Stop Sending Student Data to Google” reports:
Also, given that restricting sensitive data processing on Google’s end will be hard, if not impossible, for municipalities to assure, there may be no practical way to adhere to the new policies without blocking the use of Google Chromebooks and/or Google Workspace.
Yes, the act is indeed together. Words do not change data collection it seems.
Third, Google published a spyware report. You can download the document from this link. In addition to naming the names of vendors with specialized tools, Google does little to explain why Android based devices are protected from these firms’ software. My thought is that since Google knows what these companies are doing, Google has been making its users and customers more secure. Perhaps Google’s management thinks that talking about spyware is the same as protecting users and customers. The identified vendors are probably delighted to receive free publicity. To Google’s credit it did test a process for protecting users from financial fraud. The report is highlighted with the news about more Chrome security problems.
Google management is the best.
PRODUCT GEMS
I don’t want to overlook Google’s ability to make meaning innovations.
Out of the blocks, I want to mention Google’s announcement that it will create an app for Apple’s $3,500 smart goggles. Google Glass apparently provided some inspiration to the savvy iTunes people.
A second innovation is Google’s ability to deliver higher quality to YouTube streaming video. The service requires paying more money to the Google, but that’s part of the company’s plan to grow despite increasing competition and cost control challenges. Will Google’s method work if the streamer has lousy bandwidth? Sure, sure, Google has confidence in its capabilities despite issues solely within the control of its users and customers.
A third innovation is that Google may offer seven years of updates to Pixel phone users. OnePlus management thinks this is baloney. Seven years is a long time in a Googley world. A quick review of the fate of the Google cache and other products killed by Google reminds one of Google’s concept of commitment. (One rumor is that killing the Google cache extricated Google from paywall bypass services.) The question is, “Will Pinpoint be a Googley way to get information from paywalled content. What is Pinpoint? The explanation is at a really popular site called Journalist Studio. Everyone knows that.
A fourth item repeats an ever more frequent refrain: Google search is meh. Some, however, are just calling the service broken.
Fifth, Google Maps are getting more features. Google Maps for Android mobiles can now display the weather. One may not be able to locate a destination, but one knows the weather.
Sixth, in a breakthrough of significant proportions, Google has announced a new Pixel variant which folds and sports a redesigned camera island. This is not a bump. It is an island obviously.
SERVICE PEARLS
Google continues it march to be the cable service for streaming.
First, Google suggested it had more than eight million “subscribers.” Expressed another way, YouTube is fourth among pay television services.
Also, Google has expressed a desire to get more viewer time than it has in the past.
For those who fancy Google-intermediated ads on Pinterest, that day has arrived.
COURT ACTIVITY
Google continues to be of interest to regulatory officials.
First, Google faces an anti trust trial in the US. The matter is related to the Google’s approach to digital advertising. Advertising, after 25 years of trying to diversify its revenue, still accounts for more than 60 percent of the firm’s revenue.
Second, Google paid to settle a class action lawsuit. The matter was a security failure for a now-dead service called Google Plus. How much did the Google pay? Just $350 million or a month of coffee for thirsty Googlers (estimated, of course).
What will Google do this week? Alas, I cannot predict the future like some savvy bloggers.
Stephen E Arnold, February 13, 2024
Sam AI-Man Puts a Price on AI Domination
February 13, 2024
AI start ups may want to amp up their fund raising. Optimism and confidence are often perceived as positive attributes. As a dinobaby, I think in terms of finding a deal at the discount supermarket. Sam AI-Man (actually Sam Altman) thinks big. Forget the $5 million investment in a semi-plausible AI play. “Think a bit bigger” is the catchphrase for OpenAI.
Thinking billions? You silly goose. Think trillions. Thanks, MidJourney. Close enough, close enough.
How does seven followed by 12 zeros strike you? A reasonable figure. Well, Mr. AI-Man estimates that’s the cost of building world AI dominating chips, content, and assorted impedimenta in a quest to win the AI dust ups in assorted global markets. “OpenAI Chief Sam Altman Is Seeking Up to $7 TRILLION (sic) from Investors Including the UAE for Secretive Project to Reshape the Global Semiconductor Industry” reports:
Altman is reportedly looking to solve some of the biggest challenges faced by the rapidly-expanding AI sector — including a shortage of the expensive computer chips needed to power large-language models like OpenAI’s ChatGPT.
And where does one locate entities with this much money? The news report says:
Altman has met with several potential investors, including SoftBank Chairman Masayoshi Son and Sheikh Tahnoun bin Zayed al Nahyan, the UAE’s head of security.
To put the figure in context, the article says:
It would be a staggering and unprecedented sum in the history of venture capital, greater than the combined current market capitalizations of Apple and Microsoft, and more than the annual GDP of Japan or Germany.
Several observations:
- The ante for big time AI has gone up
- The argument for people and content has shifted to chip facilities to fabricate semiconductors
- The fund-me tour is a newsmaker.
Net net: How about those small search-and-retrieval oriented AI companies? Heck, what about outfits like Amazon, Facebook, and Google?
Stephen E Arnold, February 13, 2024
Hewlett Packard and Autonomy: Search and $4 Billion
February 12, 2024
This essay is the work of a dumb dinobaby. No smart software required.
More than a decade ago, Hewlett Packard acquired Autonomy plc. Autonomy was one of the first companies to deploy what I call “smart software.” The system used Bayesian methods, still quite new to many in the information retrieval game in the 1990s. Autonomy kept its method in a black box assigned to a company from which Autonomy licensed the functions for information processing. Some experts in smart software overlook BAE Systems’ activity in the smart software game. That effort began in the late 1990s if my memory is working this morning. Few “experts” today care, but the dates are relevant.
Between the date Autonomy opened for business in 1996 and HP’s decision to purchase the company for about $8 billion in 2011, there was ample evidence that companies engaged in enterprise search and allied businesses like legal work processes or augmented magazine advertising were selling for much less. Most of the companies engaged in enterprise search simply went out of business after burning through their funds; for example, Delphes and Entopia. Others sold at what I thought we inflated or generous prices; for example, Vivisimo to IBM for about $28 million and Exalead to Dassault for 135 million euros.
Then along comes HP and its announcement that it purchased Autonomy for a staggering $8 billion. I attended a search-related event when one of the presenters showed this PowerPoint slide:
The idea was that Autonomy’s systems generated multiple lines of revenue, including a cloud service. The key fact on the presentation was that the search-and-retrieval unit was not the revenue rocket ship. Autonomy has shored up its search revenue by acquisition; for example, Soundsoft, Virage, and Zantaz. The company also experimented with bundling software, services, and hardware. But the Qatalyst slide depicted a rosy future because of Autonomy management’s vision and business strategy.
Did I believe the analysis prepared by Frank Quatrone’s team? I accepted some of the comments about the future, and I was skeptical about others. In the period from 2006 to 2012, it was becoming increasingly difficult to overcome some notable failures in enterprise search. The poster child from the problems was Fast Search & Transfer. In a nutshell, Fast Search retreated from Web search, shutting down its Google competitor AllTheWeb.com. The company’s engaging founder John Lervik told me that the future was enterprise search. But some Fast Search customers were slow in paying their bills because of the complexity of tailoring the Fast Search system to a client’s particular requirements. I recall being asked to comment about how to get the Fast Search system to work because my team used it for the FirstGov.gov site (now USA.gov) when the Inktomi solution was no longer viable due to procurement rule changes. Fast Search worked, but it required the same type of manual effort that the Vivisimo system required. Search-and-retrieval for an organization is not a one size fits all thing, a fact Google learned with its spectacular failure with its truly misguided Google Search Appliance product. Fast Search ended with an investigation related to financial missteps, and Microsoft stepped in in 2008 and bought the company for about $1.2 billion. I thought that was a wild and crazy number, but I was one of the lucky people who managed to get Fast Search to work and knew that most licensees would not have the resources or talent I had at my disposal. Working for the White House has some benefits, particularly when Fast Search for the US government was part of its tie up with AT&T. Thank goodness for my counterpart Ms. Coker. But $1.2 billion for Fast Search? That in my opinion was absolutely bonkers from my point of view. There were better and cheaper options, but Microsoft did not ask my opinion until after the deal was closed.
Everyone in the HP Autonomy matter keeps saying the same thing like an old-fashioned 78 RPM record stuck in a groove. Thanks, MSFT Copilot. You produced the image really “fast.” Plus, it is good enough like most search systems.
What is the Reuters’ news story adding to this background? Nothing. The reason is that the news story focuses on one factoid: “HP Claims $4 Billion Losses in London Lawsuit over Autonomy Deal.” Keep in mind that HP paid $11 billion for Autonomy plc. Keep in mind that was 10 times what Microsoft paid for Fast Search. Now HP wants $4 billion. Stripping away everything but enterprise search, I could accept that HP could reasonably pay $1.2 billion for Autonomy. But $11 billion made Microsoft’s purchase of Fast Search less nutso. Because, despite technical differences, Autonomy and Fast Search were two peas in a pod. The similarities were significant. The differences were technical. Neither company was poised to grow as rapidly as their stakeholders envisioned.
When open source search options became available, these quickly became popular. Today if one wants serviceable search-and-retrieval for an enterprise application one can use a Lucene / Solr variant or pick one of a number of other viable open source systems.
But HP bought Autonomy and overpaid. Furthermore, Autonomy had potential, but the vision of Mike Lynch and the resources of HP were needed to convert the promise of Autonomy into a diversified information processing company. Autonomy could have provided high value solutions to the health and medical market; it could have become a key player in the policeware market; it could have leveraged its legal software into a knowledge pipeline for eDiscovery vendors to license and build upon; and it could have expanded its opportunities to license Autonomy stubs into broader OpenText enterprise integration solutions.
But what did HP do? It muffed the bunny. Mr. Lynch exited and set up a promising cyber security company and spent the rest of his time in courts. The Reuters’ article states:
Following one of the longest civil trials in English legal history, HP in 2022 substantially won its case, though a High Court judge said any damages would be significantly less than the $5 billion HP had claimed. HP’s lawyers argued on Monday that its losses resulting from the fraud entitle it to about $4 billion.
If I were younger and had not written three volumes of the Enterprise Search Report and a half dozen books about enterprise search, I would write about the wild and crazy years for enterprise search, its hits, its misses, and its spectacular failures (Yes, Google, I remember the Google Search Appliance quite well.) But I am a dinobaby.
The net net is HP made a poor decision and now years later it wants Mike Lynch to pay for HP’s lousy analysis of the company, its management missteps within its own Board of Directors, and its decision to pay $11 billion for a company in a sector in which at the time simply being profitable was a Herculean achievement. So this dinobaby says, “Caveat emptor.”
Stephen E Arnold, February 12, 2024
The Next Big Thing in Search: A Directory of Web Sites
February 12, 2024
This essay is the work of a dumb dinobaby. No smart software required.
In the early 1990s, an entrepreneur with whom I had worked in the 1980s convinced me to work on a directory of Web sites. Yahoo was popular at the time, but my colleague had a better idea. The good news is that our idea worked and the online service we built became part of the CMGI empire. Our service was absorbed by one of the leading finding services at the time. Remember Lycos? My partner and I do. Now the Web directory is back decades after those original Yahooligans and our team provided a useful way to locate a Web site.
“Search Chatbots? Pah, This Startup’s Trying on Yahoo’s Old Outfit of Web Directories” presents information about the utility of a directory of Web sites and captures some interesting observations about the findability service El Toco.
The innovator driving the directory concept is Thomas Chopping, a “UK based economist.” He made several observations in a recent article published by the British outfit The Register; for example:
“During the decades since it launched, we’ve been watching Google steadily trying to make search more predictive, by adding things like autocomplete and eventually instant answers,” Chopping told The Register. “This has the byproduct of increasing the amount of time users spend on their site, at the expense of visiting the underlying sources of the data.”
The founder of El Toco also notes:
It’s impossible to browse with conversational-style search tools, which are entirely focused on answering questions. “Right now, this is playing into the hands of Meta and TikTok, because it takes so much effort to find good quality websites via search engines that people stopped bothering.
El Taco wants to facilitate browsing, and the model is a directory listing. The user can browse and click. The system displays a Web site for the user to scan, read, or bookmark.
Another El Taco principle is:
“We don’t need the user’s personal data to work out which results to show, because the user can express this on their own. We don’t need AI to turn the search into a conversation, because this can be done with a few clicks of the user interface
The economist-turned-entrepreneur points out:
“Charging users for Web search is a model which clearly doesn’t work, thanks to Neeva for demonstrating that, so we allow adverts but if the users care they can go into a menu and simply switch them off.”
Will El Taco gain traction? My team and I have been involved in information retrieval for decades. From indexing information about nuclear facilities to providing some advice to an AI search start up a few months ago. I have learned that predicting what will become the next big thing in findability is quite difficult.
A number of interesting Web search solutions are available. Some are niche-focused like Biznar. Others are next-generation “phinding” services like Phind.com. Others are metasearch solutions like iSeek. Some are less crazy Google-style systems like Swisscows. And there are more coming every day.
Why? Let me share several observations or “learnings” from a half century of working in the information retrieval sector:
- People have different information needs and a one-size-fits-all search system is fraught with problems. One person wants to search for “pizza near me”. Another wants information about Dark Web secure chat services.
- Almost everyone considers themselves a good or great online searcher. Nothing could be further from the truth. Just ask the OSINT professionals at any intelligence conference.
- Search companies with some success often give in to budgeting for a minimally viable system, selling traffic or user data, and to dark patterns in pursuit of greater revenue.
- Finding information requires effort. Convenience, however, is the key feature of most finding systems. Microfilm is not convenient; therefore, it sucks. Looking at research data takes time and expertise; therefore, old-fashioned work sucks. Library work involving books is not for everyone; therefore, library research sucks. Only a tiny percentage of online users want to exert significant effort finding, validating, and making sense of information. Most people prefer to doom scroll or watch dance videos on a mobile device.
Net net: El Taco is worth a close look. I hope that editorial policies, human curation, and frequent updating become the new normal. I am just going to remain open minded. Information is an extremely potent tool. If I tell you human teeth can explode, do you ask for a citation? Do you dismiss the idea because of your lack of knowledge? Do you begin to investigate of high voltage on the body of a person who works around a 133 kV transmission line? Do you dismiss my statement because I am obviously making up a fact because everyone knows that electricity is 115 to 125 volts?
Unfortunately only subject matter experts operating within an editorial policy and given adequate time can figure out if a scientific paper contains valid data or made-up stuff like that allegedly crafted by the former presidents of Harvard and Stanford University and probably faculty at the university closest to your home.
Our 1992 service had a simple premise. We selected Web sites which contained valid and useful information. We did not list porn sites, stolen software repositories, and similar potentially illegally or harmful purveyors of information. We provided the sites our editors selected with an image file that was our version of the old Good Housekeeping Seal of Approval.

The idea was that in the early days of the Internet and Web sites, a parent or teacher could use our service without too much worry about setting off a porn storm or a parent storm. It worked, we sold, and we made some money.
Will the formula work today? Sure, but excellence and selectivity have been key attributes for decades. Give El Taco a look.
Stephen E Arnold, February 12, 2024
Scattering Clouds: Price Surprises and Technical Labyrinths Have an Impact
February 12, 2024
This essay is the work of a dumb dinobaby. No smart software required.
Yep, the cloud. A third-party time sharing services with some 21st-century add ons. I am not too keen on the cloud even though I am forced to use it for certain specific tasks. Others, however, think nothing of using the cloud like an invisible and infinite USB stick. “2023 Could Be the Year of Public Cloud Repatriation” strikes me as a “real” news story reporting that others are taking a look at the sky, spotting threatening clouds, and heading to a long-abandoned computer room to rethink their expenditures.
The write up reports:
Many regard repatriating data and applications back to enterprise data centers from a public cloud provider as an admission that someone made a big mistake moving the workloads to the cloud in the first place. I don’t automatically consider this a failure as much as an adjustment of hosting platforms based on current economic realities. Many cite the high cost of cloud computing as the reason for moving back to more traditional platforms.
I agree. However, there are several other factors which may reflect more managerial analysis than technical acumen; specifically:
- The cloud computing solution was better, faster, and cheaper. Better than an in house staff? Well, not for everyone because cloud companies are not working overtime to address user / customer problems. The technical personnel have other fires, floods, and earthquakes. Users / customers have to wait unless the user / customer “buys” dedicated support staff.
- So the “cheaper” argument becomes an issue. In addition to paying for escalated support, one has to deal with Byzantine pricing mechanisms. If one considers any of the major cloud providers, one can spend hours reading how to manage certain costs. Data transfer is a popular subject. Activated but unused services are another. Why is pricing so intricate and complex? Answer: Revenue for the cloud providers. Many customers are confident the big clouds are their friend and have their best financial interests at heart. That’s true. It is just that the heart is in the cloud computer books, not the user / customer balance sheets.
- And better? For certain operations, a user / customer has limited options. The current AI craze means the cloud is the principal game in town. Payroll, sales management, and Webby stuff are also popular functions to move to the cloud.
The rationale for shifting to the cloud varies, but there are some themes which my team and I have noted in our work over the years:
First, the cloud allowed “experts” who cost a lot of money to be hired by the cloud vendor. Users / customers did not have to have these expensive people on their staff. Plus, there are not that many experts who are really expert. The cloud vendor has the smarts to hire the best and the resources to pay these people accordingly… in theory. But bean counters love to cut costs so IT professionals were downsized in many organizations. The mythical “power user” could do more and gig workers could pick up any slack. But the costs of cloud computing held a little box with some Tannerite inside. Costs for information technology were going up. Wouldn’t it be cheaper to do computing in house? For some, the answer is, “Yes.”
An ostrich company with its head in the clouds, not in the sand. Thanks, MidJourney, what a not-even-good-enough illustration.
Second, most organizations lacked the expertise to manage a multi-cloud set up. When an organization has two or more clouds, one cannot allow a cloud company to manage itself and one or more competitors. Therefore, organizations had to add to their headcount a new and expensive position: A cloud manager.
Third, the cloud solutions are not homogeneous. Different rules of the road, different technical set up, and different pricing schemes. The solution? Add another position: A technical manager to manage the cloud technologies.
I will stop with these three points. One can rationalize using the cloud easily; for example a government agency can push tasks to the cloud. Some work in government agencies consists entirely of attending meetings at which third-party contractors explain what they are doing and why an engineering change order is priority number one. Who wants to do this work as part of a nine to five job?
But now there is a threat to the clouds themselves. That is security. What’s more secure? Data in a user / customer server facility down the hall or in a disused building in Piscataway, New Jersey, or sitting in a cloud service scattered wherever? Security? Cloud vendors are great at security. Yeah, how about those AWS S3 buckets or the Microsoft email “issue”?
My view is that a “where should our computing be done and where should our data reside” audit be considered by many organizations. People have had their heads in the clouds for a number of years. It is time to hold a meeting in that little-used computer room and do some thinking.
Stephen E Arnold, February 12, 2024
What Does Eroding Intelligence Create? Take-a-Chance Apps in Curated App Stores
February 9, 2024
This essay is the work of a dumb dinobaby. No smart software required.
I am a real and still-alive dinobaby. I read “Undergraduates’ Average IQ Has Fallen 17 Points Since 1939. Here’s Why.” The headline tells the story. At least, Dartmouth is planning to use testing to make sure its admitted students can read and write. But it appears that interesting people are empowering certain business tactics whether they have great test scores or not.
“Warning: Fraudulent App Impersonating LastPass Currently Available in Apple App Store” strikes me as a good example of how tactics take advantage of what one might call somewhat slow or unaware people. The write up states:
The app attempts to copy our branding and user interface, though close examination of the posted screenshots reveal misspellings and other indicators the app is fraudulent.
Are there similarly structured apps in the Goggle Play store? You bet. A couple of days ago, I downloaded and paid a $1.95 for an app that allegedly would display the old-school per-core graphic load which Microsoft removed from Task Manager. Guess what? It did not load.
Several observations:
- The “stores” are not preventing problematic apps from being made available to users
- The people running the store are either unable to screen apps or just don’t care
- The baloney about curation is exactly that.
I wonder if the people running these curated app stores are unaware of what these misfires do to a customer. On the other hand, perhaps the customers neither know nor care that curated apps are creeping into fraud territory.
Stephen E Arnold, February 8, 2024
School Technology: Making Up Performance Data for Years
February 9, 2024
This essay is the work of a dumb dinobaby. No smart software required.
What is the “make up data” trend? Why is it plaguing educational institutions. From Harvard to Stanford, those who are entrusted with shaping young-in-spirit minds are putting ethical behavior in the trash can. I think I know, but let’s look at allegations of another “synthetic” information event. For context in the UK there is a government agency called the Office for Standards in Education, Children’s Services and Skills.” The agency is called OFSTED. Now let’s go to the “real” news story.“
A possible scene outside of a prestigious academic institution when regulations about data become enforceable… give it a decade or two. Thanks, MidJourney. Two tries and a good enough illustration.
“Ofsted Inspectors Make Up Evidence about a School’s Performance When IT Fails” reports:
Ofsted inspectors have been forced to “make up” evidence because the computer system they use to record inspections sometimes crashes, wiping all the data…
Quite a combo: Information technology and inventing data.
The article adds:
…inspectors have to replace those notes from memory without telling the school.
Will the method work for postal investigations? Sure. Can it be extended to other activities? What about data pertinent to the UK government initiates for smart software?
Stephen E Arnold, February 9, 2024


