Deleting Data: Are They Really Gone?

November 17, 2015

I read “Gawker Media’s Data Guru Presents the Case for Deleting Data.” The main idea is that hoarding permits a reality TV program. Hoarding data may not be good TV.

The write up points out that data cleaning is not cheap. Storage also costs money.

A Gawker wizard is quoted as saying:

We effectively are setting traps in our data sets for our future selves and our colleagues… Increasingly, I find that eliminating this data from our databases is the best solution. Gawker’s traffic data is maintained for just a few months. In our own logs and databases, we only have traffic data since February. and even that’s of limited use: We’ll toss some of it before the end of the year.

Seems reasonable. However, there may be instances when dumping or just carelessly overwriting log files might not be expedient or legal. For example, in one government agency, the secretary’s “bonus” depends on showing how Internet site usage relates to paperwork reduction. The idea is that when a “customer” of the government uses a Web site and does not show up in person at an office to fill out a request, the “customer” allegedly gets better service and costs, in theory, should drop. Also, some deals require that data be retained. You can use your imagination if you are an ISP in a country recently attacked by terrorists and your usage logs are “disappeared.” SEC and IRS retention guidelines? Worth noting in some cases.

The question is, “Are data really gone once deleted?” The fact of automatic backups, services in the middle routinely copying data, and other ways of creating unobserved backups may mean that deleted data can come back to life.

Pragmatism and legal constraints as well as the “men in the middle” issue can create zombie data, which, unlike the fictional zombies, can bite.

Stephen E Arnold, November 17, 2015

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta