What Smart Software Will Not Know and That May be a Problem

April 26, 2023

This blog post is the work of a real, live dinobaby. No smart software involved.

I read a short item called “Who Owns History? How Remarkable Historical Footage Is Hidden and Monetized.” The main point of the article was to promote a video which makes clear that big companies are locking “extraordinary footage… behind paywalls.” The focus is on images, and I know from conversations with people with whom I worked who managed image rights years ago. The companies are history; for example, BlackStar and Modern Talking Pictures. And there were others.

Images are now a volleyball, and the new spiker on the Big Dog Team is smart software generated images. I have a hunch that individuals and companies will aggregate as many of these as possible. The images will then be subject to the classic “value adding” process and magically become for fee. Image trolls will feast.

I don’t care too much about images. I do think more about textual and tabular content. The rights issue is a big one, but I came at smart software from a different angle. Smart software has to be trained, whether via a traditional human constructed corpus, a fake-o corpus courtesy of the synthetic data wizards, or some shotgun marriage of “self training” and a mash up of other methods.

But what if important information are not available to the smart software? Won’t that smart software be like a student who signs up for Differential Geometry without Algebraic Topology? Lots of effort but that insightful student may not be in gear to keep pace with other students in the class. Is not knowing the equivalent of being uninformed or just dumb?

One of the issues I have with smart software is that some content, which I think is essential to clear thinking, is not available to today’s systems. Let me give one example. In 1963, when I was sophomore at a weird private university, a professor urged me to read the metaphysics text by a person named A. E. Taylor. The college I attended did not have too many of Dr. Taylor’s books. There was a copy of his Aristotle and nothing else. I did some hunting and located a copy of Elements of Metaphysics, a snappy thriller.

However, Dr. Taylor wrote a number of other books. I went looking for these because I assume that the folks training smart data want to make sure the “model” has information about the nature of information and related subjects. Guess what? Project Gutenberg, the Internet Archive, and the online gem Amazon have the Aristotle book and a couple of others. FYI: You can get a copy of A. E. Taylor’s Metaphysics for $3.88, a price illustrating the esteem in which Dr. Taylor’s work is held today.

My team and I ran some queries on the smart software systems to which we have access. We learned that information from Dr. Taylor is a scarce as hen’s teeth. We shifted gears and checked out information generated by the much loved brother of Henry James. More of William James’s books were available at bargain basement prices. A collection of essays was less than $2 on Amazon.

My point is that images are likely to be locked up behind a paywall. However, books which may be important to one’s understanding of useless subjects like ethics, perception, and information are not informing the outputs of the smart software we probed. (Yes, we mean you, gentle Bard, and you too ChatGPT.)

Does the possible omission of these types of content make a difference?

Probably not. Embrace synthetic data. The “old” content is not digitally massaged. Who cares? We are in “good enough” land. It’s like a theme park with a broken rollercoaster and some dicey carnies.,

Stephen E Arnold, April 26, 2023

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta