A Cultural Black Hole: Lost Data

May 22, 2024

dinosaur30a_thumb_thumbThis essay is the work of a dinobaby. Unlike some folks, no smart software improved my native ineptness.

A team in Egypt discovered something mysterious near the pyramids. I assume National Geographic will dispatch photographers. Archeologists will probe. Artifacts will be discovered. How much more is buried under the surface of Giza? People have been digging for centuries, and their efforts are rewarded. But what about the artifacts of the digital age?


Upon opening the secret chamber, the digital construct explains to the archeologist from the future that there is a little problem getting the digital information. Thanks, MSFT Copilot.

My answer is, “Yeah, good luck.” The ephemeral quality of online information means that finding something buried near the pyramid of Djoser is going to be more rewarding than looking for the once findable information about MIC, RAC, and ZPIC on a US government Web site.  The same void exists for quite a bit of human output captured in now-disappeared systems like The Point (Top 5% of the Internet) and millions of other digital constructs.

A survey report conducted by the Pew Research Center highlights link rot. The idea is simple. Click on a link and the indexed or pointed to content cannot be found. “When Online Content Disappears” has a snappy subtitle:

38 percent of Web pages that existed in 2013 are no longer accessible a decade later.

Wait, are national libraries like the Library of Congress supposed to keep “information.” What about the National Archives? What about the Internet Archive (an outfit busy in court)? What about the Google? (That’s the “all” the world’s information, right?) What about Bibliothèque nationale de France with its rich tradition of keeping French information?

News flash. Unlike the fungible objects unearthed in Egypt, data archeologists are going to have to buy old hard drives on eBay, dig through rubbish piles in “recycling” facilities, or scour yard sales for old machines. Then one has to figure out how to get the data. Presumably smart software can filter through the bits looking for useful data. My suggestion? Don’t count on this happening?

Here are several highlights from the Pew Report:

  • Some 38% of webpages that existed in 2013 are not available today, compared with 8% of pages that existed in 2023.
  • Nearly one-in-five tweets are no longer publicly visible on the site just months after being posted.
  • 21% of all the government webpages we examined contained at least one broken link… Across every level of government we looked at, there were broken links on at least 14% of pages; city government pages had the highest rates of broken links.

The report presents a picture of lost data. Trying to locate these missing data will be less fruitful than digging in the sands of Egypt.

The word “rot” is associated with decay. The concept of “link rot” complements the business practices of government agencies and organizations once gathering, preserving, and organizing data. Are libraries at fault? Are regulators the problem? Are the content creators the culprits?

Sure, but the issue is that as the euphoria and reality of digital information slosh like water in a swimming pool during an earthquake, no one knows what to do. Therefore, nothing is done until knee jerk reflexes cause something to take place. In the end, no comprehensive collection plan is in place for the type of information examined by the Pew folks.

From my vantage point, online and digital information are significant features of life today. Like goldfish in a bowl, we are not able to capture the outputs of the digital age. We don’t understand the datasphere, my term for the environment in which much activity exists.

The report does not address the question, “So what?”

That’s part of the reason future data archeologists will struggle. The rush of zeros and ones has undermined information itself. If ignorance of these data create bliss, one might say, “Hello, Happy.”

Stephen E Arnold, May 22, 2023


Got something to say?

  • Archives

  • Recent Posts

  • Meta