Prompt Tips and Query Refinements

July 29, 2024

Generative AI is paving the way for more automation, smarter decisions, and (possibly) an easier world. AI is still pretty stupid, however, and it needs to be hand fed information to make it work well. Dr. Lance B. Eliot is an AI expert and he contributed, “The Best Engineering Techniques For Getting The Most Out Of Generative AI” for Forbes.

Eliot explains the prompt engineering is the best way to make generative AI. He developed a list of how to write prompts and related skills. The list is designed to be a quick, easy tutorial that is also equipped with links for more information related to the prompt. Eliot’s first tip is to keep the prompt simple, direct, and obvious, otherwise the AI will misunderstand your intent.

He the rattles of a bunch of rhetoric that reads like it was written by generative AI. Maybe it was? In short, it’s good to learn how to write prompts to prepare for the future. He runs through the list alphabetically, then if that’s enough Eliot lists the prompts numerically:

“I didn’t number them because I was worried that the numbering would imply a semblance of importance or priority. I wanted the above listing to seem that all the techniques are on an equal footing. None is more precious than any of the others.

Lamentably, not having numbers makes life harder when wanting to quickly refer to a particular prompt engineering technique. So, I am going to go ahead and show you the list again and this time include assigned numbers. The list will still be in alphabetical order. The numbering is purely for ease of reference and has no bearing on priority or importance.”

The list is rundown of psychological and intercommunication methods used by humans. A lot of big words are used, but the explanations were written by a tech-savvy expert for his fellow tech people. In layman’s terms, the list explains that anything technique will work. Here’s one from me: use generative AI to simplify the article. Here’s a paradox prompt: if you feed generative AI a prompt written by generative AI will it explode?

Whitney Grace, July 29, 2024

Why Is Anyone Surprised That AI Is Biased?

July 25, 2024

Let’s top this one last time, all right? Algorithms are biased against specific groups.

Why are they biased? They’re biased because the testing data sets contain limited information about diversity.

What types of diversity? There’s a range but it usually involves racism, sexism, and socioeconomic status.

How does this happen? It usually happens, not because the designers are racist or whatever, but from blind ignorance. They don’t see outside their technology boxes so their focus is limited.

But they can be racist, sexist, etc? Yes, they’re human and have their personal prejudices. Those can be consciously or inadvertently programmed into a data set.

How can this be fixed? Get larger, cleaner data sets that are more reflective of actual populations.

Did you miss any minority groups? Unfortunately yes and it happens to be an oldie but a goodie: disabled folks. Stephen Downes writes that, “ChatGPT Shows Hiring Bias Against People With Disabilities.” Downes commented on an article from Futurity that describes how a doctoral student from the University of Washington studies on ChatGPT ranks resumes of abled vs. disabled people.

The test discovered when ChatGPT was asked to rank resumes, people with resumes that included references to a disability were ranked lower. This part is questionable because it doesn’t state the prompt given to ChatGPT. When the generative text AI was told to be less “ableist” and some of the “disabled” resumes ranked higher. The article then goes into a valid yet overplayed argument about diversity and inclusion. No solutions were provided.

Downes asked questions that also beg for solutions:

“This is a problem, obviously. But in assessing issues of this type, two additional questions need to be asked: first, how does the AI performance compare with human performance? After all, it is very likely the AI is drawing on actual human discrimination when it learns how to assess applications. And second, how much easier is it to correct the AI behaviour as compared to the human behaviour? This article doesn’t really consider the comparison with humans. But it does show the AI can be corrected. How about the human counterparts?”

Solutions? Anyone?

Whitney Grace, July 25, 2024

The French AI Service Aims for the Ultimate: Cheese, Yes. AI? Maybe

July 24, 2024

AI developments are dominating technology news. Nothing makes tech new headlines jump up the newsfeed faster than mergers or partnerships. The Next Web delivered when it shared news that, "Silo And Mistral Join Forces In Yet Another European AI Team-Up.” Europe is the home base for many AI players, including Silo and Astral. These companies are from Finland and France respectively and they decided to partner to design sovereign AI solutions.

Silo is already known for partnering with other companies and Mistral is another member to its growing roster of teammates. The collaboration between the the two focuses on the deployment and planning of AI into existing infrastructures:

The past couple of years have seen businesses scramble to implement AI, often even before they know how they are actually going to use it, for fear of being left behind. Without proper implementation and the correct solutions and models, the promises of efficiency gains and added value that artificial intelligence can offer an organization risk falling flat.

“Silo and Mistral say they will provide a joint offering for businesses, “merging the end-to-end AI capabilities of Silo AI with Mistral AI’s industry leading state-of-the-art AI models,” combining their expertise to meet an increasing demand for value-creating AI solutions.”

Silo focuses on digital sovereignty and has developed open source LLM for “low resource European languages.” Mistral designs generative AI that are open source for hobby designers and fancier versions for commercial ventures.

The partnership between the two companies plans to speed up AI adoption across Europe and equalize it by including more regional languages.

Whitney Grace, July 24, 2024

Automating to Reduce Staff: Money Talks, Employees? Yeah, Well

July 24, 2024

dinosaur30a_thumb_thumb_thumb_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Are you a developer who oversees a project? Are you one of those professionals who toiled to understand the true beauty of a PERT chart invented by a Type A blue-chip consulting firm I have heard? If so, you may sport these initials on your business card: PMP, PMI-RMP, PRINCE2, etc. I would suggest that Google is taking steps to eliminate your role. How do I know the death knell tolls for thee? Easy. I read “Google Brings AI Agent Platform Project Oscar Open Source.” The write up doesn’t come out and say, “Dev managers or project managers, find your future elsewhere, but the intent bubbles beneath the surface of the Google speak.

image

A 35-year-old executive gets the good news. As a project manager, he can now seek another information-mediating job at an indendent plumbing company, a local dry cleaner, or the outfit that repurposes basketball courts to pickleball courts. So many futures to find. Thanks, MSFT Copilot. That’s a pretty good Grim Reaper. The former PMP looks snappy too. Well, good enough.

The “Google Brings AI Agent Platform Project Oscar Open Source” real “news” story says:

Google has announced Project Oscar, a way for open-source development teams to use and build agents to manage software programs.

Say hi, to Project Oscar. The smart software is new, so expect it to morph, be killed, resurrected, and live a long fruitful life.

The write up continues:

“I truly believe that AI has the potential to transform the entire software development lifecycle in many positive ways,” Karthik Padmanabhan, lead Developer Relations at Google India, said in a blog post. “[We’re] sharing a sneak peek into AI agents we’re working on as part of our quest to make AI even more helpful and accessible to all developers.” Through Project Oscar, developers can create AI agents that function throughout the software development lifecycle. These agents can range from a developer agent to a planning agent, runtime agent, or support agent. The agents can interact through natural language, so users can give instructions to them without needing to redo any code.

Helpful? Seems like it. Will smart software reduce costs and allow for more “efficiency methods” to be implemented? Yep.

The article includes a statement from a Googler; to wit:

“We wondered if AI agents could help, not by writing code which we truly enjoy, but by reducing disruptions and toil,” Balahan said in a video released by Google. Go uses an AI agent developed through Project Oscar that takes issue reports and “enriches issue reports by reviewing this data or invoking development tools to surface the information that matters most.” The agent also interacts with whoever reports an issue to clarify anything, even if human maintainers are not online.

Where is Google headed with this “manage” software programs? A partial answer may be deduced from this write up from Linklemon. Its commercial “We Automate Workflows for Small to Medium (sic) Businesses.” The image below explains the business case clearly:

image

Those purple numbers are generated by chopping staff and making an existing system cheaper to operate. Translation: Find your future elsewhere, please.”

My hunch is that if the automation in Google India is “good enough,” the service will be tested in the US. Once that happens, Microsoft and other enterprise vendors will jump on the me-too express.

What’s that mean? Oh, heck, I don’t want to rerun that tired old “find your future elsewhere line,” but I will: Many professionals who intermediate information will here, “Great news, you now have the opportunity to find your future elsewhere.” Lucky folks, right, Google.

Stephen E Arnold, July 24, 2024

Modern Life: Advertising Is the Future

July 23, 2024

dinosaur30a_thumb_thumb_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

What’s the future? I think most science fiction authors missed the memo from the future. Forget rocket ships, aliens, and light sabers. Think advertising. How do I know that ads will be the dominant feature of messaging? I read “French AI Startup Launches First LLM Built Exclusively for Advertising Copy.”

image

Advertising professionals consult the book about trust and ethical behavior. Both are baffled at the concepts. Thanks, MSFT Copilot. You are an expert in trust and ethical behavior, right?

Yep, advertising arrives with smart manipulation, psycho-metric manipulative content, and shaped data. The write up explains:

French startup AdCreative.ai has launched a new large language model build exclusively for advertising. Named AdLLM Spark, the system was built to craft ad text with high conversion rates on every major advertising platform. AdCreative.ai said the LLM combines two unique features: instant text generation and accurate performance prediction.

Let’s assume those French wizards have successfully applied probabilistic text generation to probabilistic behavior manipulation. Every message can be crafted by smart software to work. If an output does not work, just fiddle around until you hit the highest performing payload for the doom scrolling human.

The first part of the evolution of smart software pivoted on the training data. Forget that privacy hogging, copyright ignoring approach. Advertising copy is there to be used and recycled. The write up says:

The training data encompasses every text generated by AdCreative.ai for its 2,000,000 users. It includes information from eight leading advertising platforms: Facebook, Instagram, Google, YouTube, LinkedIn, Microsoft, Pinterest, and TikTok.

The second component involved tuning the large language model. I love the way “manipulation” and “move to action” becomes a dataset and metrics. If it works, that method will emerge from the analytic process. Do that, and clicks will result. Well, that’s the theory. But it is much easier to understand than making smart software ethical.

Does the system work? The write up offers this “proof”:

AdCreative.ai tested the impact on 10,000 real ad texts. According to the company, the system predicted their performance with over 90% accuracy. That’s 60% higher than ChatGPT and at least 70% higher than every other model on the market, the startup said.

Just for fun, let’s assume that the AdCreative system works and performs as  “advertised.”

  1. No message can be accepted at face value. Every message from any source can be weaponized.
  2. Content about any topic — and I mean any — must be viewed as shaped and massaged to produce a result. Did you really want to buy that Chiquita banana?
  3. The implications of automating this type of content production begs for a system to identify something hot on a TikTok-type service, extract the words and phrases, and match those words with a bit of semantic expansion to what someone wants to pitch, promote, occur, and what not. The magic is that the volume of such messages is limited only by one’s machine resources.

Net net: The future of smart software is not solving problems for lawyers or finding a fix for Aunt Milli’s fatigue. The future is advertising, and AdCreative.ai is making the future more clear. Great work!

Stephen E Arnold, July 17, 2024

Bots Have Invaded The World…On The Internet

July 23, 2024

Robots…er…bots have taken over the world…at least the Internet…parts of it. The news from Techspot is shocking but when you think about it really isn’t: “Almost Half Of All Web Traffics Is Bots, And They Are Mostly Malicious In Nature.” Akamai is the largest cloud computing platform in the world. It recently released a report that 42% of web traffic is from bots and 65% of them are malicious.

Akamai said that most of the bots are scrapper bots designed to gather data. Scrapper bots collect content from Web sites. Some of them are used to form AI data sets while others are designed to steal information to be used in hacker, scams, and other bad acts. Commerce Web sites are negatively affected the most, because scrapper bots steal photos, prices, descriptions, and more. Bad actors then make fake Web sites imitating the real McCoy. They make money by from ads by ranking on Google and stealing traffic.

Bots are nasty little buggers even the most benign:

“Even non-malicious scraping bots can degrade a website’s performance, impact search engine metrics, and increase computing and hosting costs.

Companies now face increasingly sophisticated bots that use AI algorithms, headless browser technology, and other advanced solutions. These new threats require novel, more complex mitigation approaches beyond traditional methods. A robust firewall is now only the beginning of the numerous security measures needed by website owners today.”

Akamai should have dedicated part of their study to investigate the Dark Web. How many bots or law enforcement officials are visiting that shrinking part of the Net?

Whitney Grace, July 23, 2024

Thinking about AI Doom: Cheerful, Right?

July 22, 2024

green-dino_thumb_thumbThis essay is the work of a dumb humanoid. No smart software required.

I am not much of a philosopher psychologist academic type. I am a dinobaby, and I have lived through a number of revolutions. I am not going to list the “next big things” that have roiled the world since I blundered into existence. I am changing my mind. I have memories of crouching in the hall at Oxon Hill Grade School in Maryland. We were practicing for the atomic bomb attack on Washington, DC. I think I was in the second grade. Exciting.

image

The AI powered robot want the future experts in hermeneutics to be more accepting of the technology. Looks like the robot is failing big time. Thanks, MSFT Copilot. Got those fixes deployed to the airlines yet?

Now another “atomic bomb” is doing the James Bond countdown: 009, 008, and then James cuts the wire at 007. The world was saved for another James Bond sequel. Wow, that was close.

I just read “Not Yet Panicking about AI? You Should Be – There’s Little Time Left to Rein It In.” The essay seems to be a trifle dark. Here’s a snippet I circled:

With technological means, we have accomplished what hermeneutics has long dreamed of: we have made language itself speak.

Thanks to Dr. Francis Chivers, one of my teachers at Duquesne University, I actually know a little bit about hermeneutics. May I share?

Hermeneutics is the theory and methodology of interpretation of words and writings. One should consider content in its historical, cultural, and linguistic context. The idea is to figure out the the underlying messages, intentions, and implications of texts doing academic gymnastics.

Now the killer statement:

Jacques Lacan was right; language is dark and obscene in its depths.

I presume you know well the work of Jacques Lacan. But if you have forgotten,  the canny psychologist got himself kicked out of the International Psychoanalytic Association (no mean feat as I recall) for his ideas about desire. Think Freud on steroids.

The write up uses these everyday references to make the point:

If our governments summon the collective will, they are very strong. Something can still be done to rein in AI’s powers and protect life as we know it. But probably not for much longer.

Okay. AI is going to screw up the world. I think I have heard that assertion when my father told me about the computer lecture he attended at an accounting refresher class. That fear he manifested because he thought he would lose his job to a machine attracted me to the dark unknown of zeros and ones.

How did that turn out? He kept his job. I think mankind has muddled through the computer revolution, the space revolution, the wonder drug revolution, the automation revolution, yada yada.

News flash: The AI revolution has been around long before the whiz kids at Google disclosed Transformers. I think the author of this somewhat fearful write up is similar to my father’s projecting on computerized accounting his fear that he would be harmed by punched cards.

Take a deep breath. The sun will come up tomorrow morning. People who know about hermeneutics and Jacques Lacan will be able to ponder the nature of text and behavior. In short, worry less. Be less AI-phobic. The technology is here and it is not going away, getting under the thumb of any one government including China’s, and causing eternal darkness. Sorry to disappoint you.

Stephen E Arnold, July 22, 2024

Students, Rejoice. AI Text Is Tough to Detect

July 19, 2024

While the robot apocalypse is still a long way in the future, AI algorithms are already changing the dynamics of work, school, and the arts. It’s an unfortunate consequence of advancing technology and a line in the sand needs to be drawn and upheld about appropriate uses of AI. A real world example was published in the Plos One Journal: “A Real-World Test Of Artificial Intelligence Infiltration Of A University Examinations System: A ‘Turing Test’ Case Study.”

Students are always searching for ways to cheat the education system. ChatGPT and other generative text AI algorithms are the ultimate cheating tool. School and universities don’t have systems in place to verify that student work isn’t artificially generated. Other than students learning essential knowledge and practicing core skills, the ways students are assessed is threatened.

The creators of the study researched a question we’ve all been asking: Can AI pass as a real human student? While the younger sects aren’t the sharpest pencils, it’s still hard to replicate human behavior or is it?

“We report a rigorous, blind study in which we injected 100% AI written submissions into the examinations system in five undergraduate modules, across all years of study, for a BSc degree in Psychology at a reputable UK university. We found that 94% of our AI submissions were undetected. The grades awarded to our AI submissions were on average half a grade boundary higher than that achieved by real students. Across modules there was an 83.4% chance that the AI submissions on a module would outperform a random selection of the same number of real student submissions.”

The AI exams and assignments received better grades than those written by real humans. Computers have consistently outperformed humans in what they’re programmed to do: calculations, play chess, and do repetitive tasks. Student work, such as writing essays, taking exams, and unfortunate busy work, is repetitive and monotonous. It’s easily replicated by AI and it’s not surprising the algorithms perform better. It’s what they’re programmed to do.

The problem isn’t that AI exist. The problem is that there aren’t processes in place to verify student work and humans will cave to temptation via the easy route.

Whitney Grace, July 19, 2024

Looking for the Next Big Thing? The Truth Revealed

July 18, 2024

dinosaur30a_thumb_thumb_thumb_thumb_[1]This essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

Big means money, big money. I read “Twenty Five Years of Warehouse-Scale Computing,” authored by Googlers who definitely are into “big.” The write up is history from the point of view of engineers who built a giant online advertising and surveillance system. In today’s world, when a data topic is raised, it is big data. Everything is Texas-sized. Big is good.

This write up is a quasi-scholarly, scientific-type of sales pitch for the wonders of the Google. That’s okay. It is a literary form comparable to an epic poem or a jazzy H.L. Menken essay when people read magazines and newspapers. Let’s take a quick look at the main point of the article and then consider its implications.

I think this passage captures the zeitgeist of the Google on July 13, 2024:

From a team-culture point of view, over twenty five years of WSC design, we have learnt a few important lessons. One of them is that it is far more important to focus on “what does it mean to land” a new product or technology; after all, it was the Apollo 11 landing, not the launch, that mattered. Product launches are well understood by teams, and it’s easy to celebrate them. But a launch doesn’t by itself create success. However, landings aren’t always self-evident and require explicit definitions of success — happier users, delighted customers and partners, more efficient and robust systems – and may take longer to converge. While picking such landing metrics may not be easy, forcing that decision to be made early is essential to success; the landing is the “why” of the project.

image

A proud infrastructure plumber knows that his innovations allows the home owner to collect rent from AirBnB rentals. Thanks, MSFT Copilot. Interesting image because I did not specify gender or ethnicity. Does my plumber look like this? Nope.

The 13 page paper includes numerous statements which may resonate with different readers as more important. But I like this passage because it makes the point about Google’s failures. There is no reference to smart software, but for me it is tough to read any Google prose and not think in terms of Code Red, the crazy flops of Google’s AI implementations, and the protestations of Googlers about quantum supremacy or some other projection of inner insecurity the company’s genius concoct. Don’t you want to have an implant that makes Google’s knowledge of “facts” part of your being? America’s founding fathers were not diverse, but Google has different ideas about reality.

This passage directly addresses failure. A failure is a prelude to a soft landing or a perfect landing. The only problem with this mindset is that Google has managed one perfect landing: Its derivative online advertising business. The chatter about scale is a camouflage tarp pulled over the mad scramble to find a way to allow advertisers to pay Google money. The “invention” was forced upon those at Google who wanted those ad dollars. The engineers did many things to keep the money flowing. The “landing” is the fact that the regulators turned a blind eye to Google’s business practices and the wild and crazy engineering “fixes” worked well enough to allow more “fixes.” Somehow the mad scramble in the 25 years of “history” continues to work.

Until it doesn’t.

The case in point is Google’s response to the Microsoft OpenAI marketing play. Google’s ability to scale has not delivered. What delivers at Google is ad sales. The “scale” capabilities work quite well for advertising. How does the scale work for AI? Based on the results I have observed, the AI pullbacks suggest some issues exist.

What’s this mean? Scale and the cloud do not solve every problem or provide a slam dunk solution to a new challenge.

The write up offers a different view:

On one hand, computing demand is poised to explode, driven by growth in cloud computing and AI. On the other hand, technology scaling slowdown poses continued challenges to scale costs and energy-efficiency

Google sees that running out of chip innovations, power, cooling, and other parts of the scale story are an opportunity. Sure they are. Google’s future looks bright. Advertising has been and will be a good business. The scale thing? Plumbing. Let’s not forget what matters at Google. Selling ads and renting infrastructure to people who no longer have on-site computing resources. Google is hoping to be the AirBnB of computation. And sell ads on Tubi and other ad-supported streaming services.

Stephen E Arnold, July 18, 2024

Stop Indexing! And Pay Up!

July 17, 2024

dinosaur30a_thumb_thumb_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

I read “Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI.” The write up appears in two online publications, presumably to make an already contentious subject more clicky. The assertion in the title is the equivalent of someone in Salem, Massachusetts, pointing at a widower and saying, “She’s a witch.” Those willing to take the statement at face value would take action. The “trials” held in colonial Massachusetts. My high school history teacher was a witchcraft trial buff. (I think his name was Elmer Skaggs.) I thought about his descriptions of the events. I recall his graphic depictions and analysis of what I recall as “dunking.” The idea was that if a person was a witch, then that person could be immersed one or more times. I think the idea had been popular in medieval Europe, but it was not a New World innovation. Me-too is a core way to create novelty. The witch could survive being immersed for a period of time. With proof, hanging or burning were the next step. The accused who died was obviously not a witch. That’s Boolean logic in a pure form in my opinion.

image

The Library in Alexandria burns in front of people who wanted to look up information, learn, and create more information. Tough. Once the cultural institution is gone, just figure out the square root of two yourself. Thanks, MSFT Copilot. Good enough.

The accusations and evidence in the article depict companies building large language models as candidates for a test to prove that they have engaged in an improper act. The crime is processing content available on a public network, indexing it, and using the data to create outputs. Since the late 1960s, digitizing information and making it more easily accessible was perceived as an important and necessary activity. The US government supported indexing and searching of technical information. Other fields of endeavor recognized that as the volume of information expanded, the traditional methods of sitting at a table, reading a book or journal article, making notes, analyzing the information, and then conducting additional research or writing a technical report was simply not fast enough. What worked in a medieval library was not a method suited to put a satellite in orbit or perform other knowledge-value tasks.

Thus, online became a thing. Remember, we are talking punched cards, mainframes, and clunky line printers one day there was the Internet. The interest in broader access to online information grew and by 1985, people recognized that online access was useful for many tasks, not just looking up information about nuclear power technologies, a project I worked on in the 1970s. Flash forward 50 years, and we are upon the moment one can read about the “fact” that Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI.

The write up says:

AI companies are generally secretive about their sources of training data, but an investigation by Proof News found some of the wealthiest AI companies in the world have used material from  thousands of  YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission. Our investigation found that subtitles from 173,536 YouTube videos, siphoned from more than 48,000 channels, were used by Silicon Valley heavyweights, including Anthropic, Nvidia, Apple, and Salesforce.

I understand the surprise some experience when they learn that a software script visits a Web site, processes its content, and generates an index (a buzzy term today is large language model, but I prefer the simpler word index.)

I want to point out that for decades those engaged in making information findable and accessible online have processed content so that a user can enter a query and get a list of indexed items which match that user’s query. In the old days, one used Boolean logic which we met a few moments ago. Today a user’s query (the jazzy term is prompt now) is expanded, interpreted, matched to the user’s “preferences”, and a result generated. I like lists of items like the entries I used to make on a notecard when I was a high school debate team member. Others want little essays suitable for a class assignment on the Salem witchcraft trials in Mr. Skaggs’s class. Today another system can pass a query, get outputs, and then take another action. This is described by the in-crowd as workflow orchestration. Others call it, “taking a human’s job.”

My point is that for decades, the index and searching process has been without much innovation. Sure, software scripts can know when to enter a user name and password or capture information from Web pages that are transitory, disappearing in the blink of an eye. But it is still indexing over a network. The object remains to find information of utility to the user or another system.

The write up reports:

Proof News contributor Alex Reisner obtained a copy of Books3, another Pile dataset and last year published a piece in The Atlantic reporting his finding that more than 180,000 books, including those written by Margaret Atwood, Michael Pollan, and Zadie Smith, had been lifted. Many authors have since sued AI companies for the unauthorized use of their work and alleged copyright violations. Similar cases have since snowballed, and the platform hosting Books3 has taken it down. In response to the suits, defendants such as Meta, OpenAI, and Bloomberg have argued their actions constitute fair use. A case against EleutherAI, which originally scraped the books and made them public, was voluntarily dismissed by the plaintiffs.  Litigation in remaining cases remains in the early stages, leaving the questions surrounding permission and payment unresolved. The Pile has since been removed from its official download site, but it’s still available on file sharing services.

The passage does a good job of making clear that most people are not aware of what indexing does, how it works, and why the process has become a fundamental component of many, many modern knowledge-centric systems. The idea is to find information of value to a person with a question, present relevant content, and enable the user to think new thoughts or write another essay about dead witches being innocent.

The challenge today is that anyone who has written anything wants money. The way online works is that for any single user’s query, the useful information constitutes a tiny, miniscule fraction of the information in the index. The cost of indexing and responding to the query is high, and those costs are difficult to control.

But everyone has to be paid for the information that individual “created.” I understand the idea, but the reality is that the reason indexing, search, and retrieval was invented, refined, and given numerous life extensions was to perform a core function: Answer a question or enable learning.

The write up makes it clear that “AI companies” are witches. The US legal system is going to determine who is a witch just like the process in colonial Salem. Several observations are warranted:

  1. Modifying what is a fundamental mechanism for information retrieval may be difficult to replace or re-invent in a quick, cost-efficient, and satisfactory manner. Digital information is loosey goosey; that is, it moves, slips, and slides either by individual’s actions or a mindless system’s.
  2. Slapping fines and big price tags on what remains an access service will take time to have an impact. As the implications of the impact become more well known to those who are aggrieved, they may find that their own information is altered in a fundamental way. How many research papers are “original”? How many journalists recycle as a basic work task? How many children’s lives are lost when the medical reference system does not have the data needed to treat the kid’s problem?
  3. Accusing companies of behaving improperly is definitely easy to do. Many companies do ignore rules, regulations, and cultural norms. Engineering Index’s publisher leaned that bootleg copies of printed Compendex indexes were available in China. What was Engineering Index going to do when I learned this almost 50 years ago? The answer was give speeches, complain to those who knew what the heck a Compendex was, and talk to lawyers. What happened to the Chinese content pirates? Not much.

I do understand the anger the essay expresses toward large companies doing indexing. These outfits are to some witches. However, if the indexing of content is derailed, I would suggest there are downstream consequences. Some of those consequences will make zero difference to anyone. A government worker at a national lab won’t be able to find details of an alloy used in a nuclear device. Who cares? Make some phone calls? Ask around. Yeah, that will work until the information is needed immediately.

A student accustomed to looking up information on a mobile phone won’t be able to find something. The document is a 404 or the information returned is an ad for a Temu product. So what? The kid will have to go the library, which one hopes will be funded, have printed material or commercial online databases, and a librarian on duty. (Good luck, traditional researchers.) A marketing team eager to get information about the number of Telegram users in Ukraine won’t be able to find it. The fix is to hire a consultant and hope those bright men and women have a way to get a number, a single number, good, bad, or indifferent.)

My concern is that as the intensity of the objections about a standard procedure for building an index escalate, the entire knowledge environment is put at risk. I have worked in online since 1962. That’s a long time. It is amazing to me that the plumbing of an information economy has been ignored for a long time. What happens when the companies doing the indexing go away? What happens when those producing the government reports, the blog posts, or the “real” news cannot find the information needed to create information? And once some information is created, how is another person going to find it. Ask an eighth grader how to use an online catalog to find a fungible book. Let me know what you learn? Better yet, do you know how to use a Remac card retrieval system?

The present concern about information access troubles me. There are mechanisms to deal with online. But the reason content is digitized is to find it, to enable understanding, and to create new information. Digital information is like gerbils. Start with a couple of journal articles, and one ends up with more journal articles. Kill this access and you get what you wanted. You know exactly who is the Salem witch.

Stephen E Arnold, July 17, 2024

x

x

x

x

x

x

« Previous PageNext Page »

  • Archives

  • Recent Posts

  • Meta