IBM: Database or Public Relations Wizardry?
March 4, 2010
I cannot figure out if IBM has revealed a breakthrough in technology or publications. You will have to make up your own mind. Navigate to “Putting the Web in a Spreadsheet”. The write up explains that IBM has used Hadoop and its own code called Big Sheets to help make sense of Web information. According to the write up:
BigSheets uses Hadoop to crawl through Web pages, parsing them to extract key terms and other useful data. BigSheets organizes this information in a very large spreadsheet, where users can analyze it using the sort of tools and macros found in desktop spreadsheet software. Unlike ordinary spreadsheet software, however, there’s no limit to the size of a spreadsheet created through BigSheets.
The example in the article is the British Library’s use of the technology as part of an archive project. The article said:
The first test for BigSheets came at the British Library, which has been working since 2004 to create an archive of the roughly eight million UK websites. At regular intervals, the Library takes snapshots of Web pages, converts them to an archival file format, and stores them. But searching and analyzing this data is another challenge, and that’s where BigSheets came in.
IBM, according to the article will use this technology in future products. I will reserve judgment. I did write about the British Library taking months to create an archive of Web sites, noting that the project seemed to be moving slowly. The disconnect in my mind remains because this Web in a Spreadsheet write up suggests that the British Library has an archive of eight million Web sites, not a few thousand. More information is needed.
I don’t know if this is technology or PR.
Stephen E Arnold, March 3, 2010
No one paid me to write this. Since I mention IBN, recipient of a large US government integration project, I will report the fact that I wrote for no dough to IBM Federal Systems, a unit which does work for dough.
Stephen E