The House Cleaning of Halevy Dataspace: A Web Curiosity

November 14, 2016

I am preparing three seven minute videos. That effort will be one video each week starting on 20 December 2016. The subject is my Google Trilogy, published by an antique outfit which has drowned in River Avon. The first video is about the 2004 monograph, The Google Legacy. I coined the term “Googzilla” in that 230 page discussion of how Google became baby Google. The second video summarizes several of the take aways from Google: The Calculating Predator, published in 2007. The key to the monograph is the bound phrase “calculating predator.” Yep, not the happy little search out most know and love. The third video hits the main points of Google: The Digital Gutenberg, published in 2009. The idea is that Google spits out more digital content than almost anyone. Few think of the GOOG as the content generator the company has become. Yep, a map is a digital artifact.

Now to the curiosity. I wanted to reference the work of Dr. Alon Halevy, a former University of Washington professor and founder of Nimble and Transformic. I had a stack of links I used when I was doing the research for my predator book. Just out of curiosity I started following the links. I do have PDF versions of most of the open source Halevy-centric content I located.

But guess what?

Dr. Alon Halevy has disappeared. I could not locate the open source version of his talk about dataspaces. I could not locate the Wayback Machine’s archived version of the Transformic.com Web site. The links returned these weird 404 errors. My assumption was that Wayback’s Web pages resided happily on the outfit’s servers. I was incorrect. Here’s what I saw:

image

I explored the bound phrase “Alon Halvey” with various other terms only to learn that the bulk of the information has disappeared. No PowerPoints, no much substantive information. There were a few “information objects” which have not yet disappeared; for example:

  • An ACM blog post which references “the structured data team” and Nimble and Transformic
  • A Google research paper which will not make those who buy into David Gelerter’s The Tides of the Mind thesis
  • A YouTube video of a lecture given at Technion.

I found the gap between my research gathered in 2005 to 2007 interesting. I asked myself, “How did I end up with so many dead links about a technology I have described as one of the most important in database, data management, data analysis, and information retrieval?

Here are the answers I formulated:

  1. The Web is a lousy source of information. Stuff just disappears like the Darpa listing of open source Dark Web software, blogs, and Web sites
  2. I did really terrible research and even worse librarian type behavior. Yep, mea culpa.
  3. Some filtering procedures became a bit too aggressive and the information has been swept from assorted indexes
  4. The Wayback Machine ran off the rails and pointed to an actual 2005 Web site which its system failed to copy when the original spidering was completed.
  5. Gremlins. Hey, they really do exist. Just ask Grace Hopper. Yikes, she’s not available.

I wanted to mention this apparent or erroneous scrubbing. The story in this week HonkinNews video points out that 89 percent of journalists do their research via Google. Now if information is not in Google, what does that imply for a “real” journalist trying to do an objective, comprehensive story? I leave it up to you, gentle reader, to penetrate this curiosity.

Watch for the Google Trilogy seven minute videos on December 20, 2016, December 27, 2016, and

Stephen E Arnold, November 14, 2016, and January 3, 2017. Free. No pay wall. No Patreon.com pleading. No registration form. Just honkin’ news seven days a week and some video shot on an old Bell+Howell camera in a log cabin in rural Kentucky.

Comments

Comments are closed.

  • Archives

  • Recent Posts

  • Meta