Library of Congress: Tweets Are for Real
June 16, 2011
Twitter tweets are headed to a very surprising new home. According to the O’Reily Radar article “How the Library of Congress is Building the Twitter Archive” Twitter plans to hand over all public tweets, since its inception in 2006, to The Library of Congress.
Researchers have been anxiously lining up for their opportunity to crack open the Twitter archive. Twitter fans create millions of tweets per day and according to the article:
“Each tweet is a JSON [JavaScript Object Notation] file, containing an immense amount of metadata in addition to the contents of the tweet itself: date and time, number of followers, account creation date, geodata, and so on. This requires a significant technological undertaking on the part of the library in order to build the infrastructure necessary to handle inquiries, and specifically to handle the sorts of inquiries that researchers are clamoring for.”
It seems that the Library of Congress has validated tweets as real information, not fodder for text analytics. Hopefully the Library will tackle some of the other content it has in its possession. I am thinking about images of which American Memory is a subset and fair copies of certain documents.
Stephen E Arnold, June 16, 2011
Sponsored by ArnoldIT.com, the resource for enterprise search information and current news about data fusion