Harvard and a Web Archive Tool
May 18, 2023
 Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid.
Note: This essay is the work of a real and still-alive dinobaby. No smart software involved, just a dumb humanoid. 
The Library of Congress has dropped the ball and the Internet Archive may soon be shut down. So it is Harvard to the rescue. At least until people sue the institution. The university’s Library Innovation Lab describes its efforts in, “Witnessing the Web is Hard: Why and How We Built the Scoop Web Archiving Capture Engine.”
“Our decade of experience running Perma.cc has given our team a vantage point to identify emerging challenges in witnessing the web that we believe extend well beyond our core mission of preserving citations in the legal record. In an effort to expand the utility of our own service and contribute to the wider array of core tools in the web archiving community, we’ve been working on a handful of Perma Tools. In this blog post, we’ll go over the driving principles and architectural decisions we’ve made while designing the first major release from this series: Scoop, a high-fidelity, browser-based, single-page web archiving capture engine for witnessing the web. As with many of these tools, Scoop is built for general use but represents our particular stance, cultivated while working with legal scholars, US courts, and journalists to preserve their citations. Namely, we prioritize their needs for specificity, accuracy, and security. These are qualities we believe are important to a wide range of people interested in standing up their own web archiving system. As such, Scoop is an open-source project which can be deployed as a standalone building block, hopefully lowering a barrier to entry for web archiving.”
At Scoop’s core is its “no-alteration principle” which, as the name implies, is a commitment to recording HTTP exchanges with no variations. The write-up gives some technical details on how the capture engine achieves that standard. Aside from that bedrock doctrine, though, Scoop allows users to customize it to meet their unique web-witnessing needs. Attachments are optional and users can configure each element of the capture process, like time or size limits. Another pair of important features is the built-in provenance summary, including preservation of SSL certificates, and authenticity assertion through support for the Web Archive Collection Zipped (WACZ) file format and the WACZ Signing and Verification specification. Interested readers should see the article for details on how to start using Scoop. You might want to hurry, before publishers jump in with their inevitable litigation push.
Cynthia Murrell, May 18, 2023
Libraries: A Target?
October 4, 2022
Reading is FUNdamental. I am not sure that’s an accurate slogan today. “Libraries Across The US Are Receiving Violent Threats” reports:
In the last two weeks, at least a dozen public libraries across the U.S. received threats that resulted in canceled events and system-wide closures. While bomb and active shooter threats to public library systems in Nashville, Fort Worth, Denver, Salt Lake City, and Boston and other cities across the country were ultimately deemed hoaxes, library workers and patrons say they are still reeling in the aftermath.
Nice.
I grew up with the following impressions of libraries:
- My mother took me to the library each week so she could return the books she read from the previous week. She checked out books. I am not sure how old I was when I became aware of this library routine. Didn’t everyone go to the library once a week? Not to protest or make threats, but to get books and introduce a child to the “routine”?
- My sixth grade teacher, Ms. Costello, awarded a paper “flag” for each book read by a student. On the wall was a list of her students. The flags were pinned after each student’s name. One book received one white flag. Five books were converted to a white flag with a blue border. Ten books received a white flag with a red border. Twenty books were represented by a white flag with a yellow border. Each school year ended with Ms. Costello recognizing the students who read the most books. (Guess who won?) I made many trips to the Prospect Branch Library because I nuked the grade school library of books which interested me quickly.
- In high school, wearing my worn out sneakers, my cool plaid shirt, and my blue jeans with cuffs no less, I went to the downtown library which I reached via the bus. In my high school, English teachers assigned essays which had to have footnotes. The reference desk librarians were helpful and showed me the ropes of microfilm newspapers (wow, that technology sucked. Wasn’t there a better way to search?), the Reader’s Guide to Periodical Literature (wow, that print index sucked. Wasn’t there a better way to search and get access to the full text of the article?), the mysteries of the books behind the reference desk. (Oh, Constance Winchell, I loved you!)
- In college, I made the library my home away from home between classes. I had favorite tables at which to work. I loved the Library of Congress cataloging system. I knew exactly where certain book topics were shelved. I worked in the library on and off for a couple of years until I landed a higher paying job, but I learned how to get first crack at books professors put on reserve. I also located the COBOL instruction manuals and used them to do my first computer based indexed project for a professor named William Gillis. Believe it or not, that project was my ticket to the world of commercial database indexing and my first real job at Halliburton Nuclear in Washington, DC. I indexed nuclear information using good old PDP computers. Exciting? You bet.
Why have I isolated four library experiences?
None require terror threats, political actions, or any behavior other than respect for the professionals who assisted me. My wife has told me that I could have gone to work right after high school and skipped college. She’s wrong. I am not sure I learned too much in my college courses. The bulk of the information was repetitive or something with which I was familiar based on my reading.
What was valuable to me was the opportunity to spend significant time in the university library. Here’s a fun fact: I was thrilled when a college event took place on Friday nights. I knew I would be one of a very few students in the library when the event was underway. Silence, no delays at the photocopy machine, no waiting for a specific card catalog drawer, and no one clogging the space between the shelves.
What’s my view of libraries? Can’t figure it out? Perhaps you should consider what one can achieve by doing the library thing. Online is okay, but it sure isn’t the library thing. I should know because I was involved and maybe instrumental in a number of very successful and widely used commercial databases. I knew paper indexes sucked, and I did something about it.
But libraries. The prime mover for me. Why be afraid of learning, knowledge, information, and different ideas? My answer is that those without a library “backbone” are lost in a digital world in which TikTok information imparts wisdom. Ho ho ho.
Stephen E Arnold, October 4, 2022
Libraries and Google: Who Wins?
August 31, 2022
Google uses various ways to protect users’ accounts, such as authentication through a mobile phone or non-Gmail address. This is a problem for large portions of the American population who don’t have regular access to the Internet. These include ethnic minorities, people with low socioeconomic status, and the elderly. These groups usually rely on public libraries for Internet access. These groups also need welfare and other assistance programs for survival.
Shelly R., a librarian in the Free Library of Philadelphia System, wrote a letter to Google in 2021 about how their security authentication hurts these groups. The letter was picked up by Hacker News and it was meant to be private. Her description of the services her library system provides is typical of many places in the United States.
People say that libraries are obsolete, but the naysayers are not taking into account the people that need Internet access, help with technology literacy, applying for benefits and jobs, and more. Librarians have one of the most stressful jobs in the country, because they are forced into more roles than helping people research: teacher, therapist, babysitter, and more. It is ridiculous the amount of roles librarians fill, however, helping people in their community get access to technology is one thing they excel at.
Shelly R. makes a valiant point that many groups cannot afford expensive technology or know how to use it. They rely on community resources such as the public library for assistance, but security features like Google’s authentication system do not help them.
Online accounts must remain secure to protect users, but people without regular Internet access or technology literacy must be taken into account as well. The Internet is supposed to be a great equalizer, but it does not work when everyone does not have equal access.
Shelly R. updated the letter in August 2022, said she spoke with Google’s security team, and things were better for her job. Is that true? We hope so. If only Google would do more to help equalize Internet access. Hey Google, maybe you could donate money or resources to public libraries? You have the power and ability to do so, plus it would be a tax write-off.
Whitney Grace, August 31, 2022
Libraries: Responding to the Pandemic
April 23, 2020
Library patrons are SOL, because the COVID-19 pandemic has closed their beloved knowledge repositories. What are book lovers, people in need of WiFi, and parents in need of story time supposed to do? Libraries have gone digital! Libraries have embraced digital services for decades, but during the pandemic they continue to serve their communities except totally in a digital space. Fast Company reports how in, “Closed Libraries Are Offering Parking Lot Wi-Fi, EBooks, And Zoom Story Time.”
It is commonly believed that libraries are an obsolete government service, but that is completely untrue. Libraries offer a plethora of free resources and services to communities that are otherwise unavailable. They offer free Internet access, entertainment, ways to assist people in job searches, and offer a diverse range of classes.
While libraries are physically closed, librarians have gotten creative. Parking lots, sidewalks, and even bookmobiles have been transformed into wifi hotspots for those lacking Internet access. Even though they might risk being Zoom bombed, libraries have also moved to Zoom for story time and other classes.
Libraries are also offering curbside pickup:
“Some libraries are offering curbside checkout or other ways to pick up books, though doing so in a safe and sanitary way can be a logistical challenge. The El Dorado County Library in California is planning to let patrons go online or call to request books—which are only available after they’ve sat in a holding area for seven days to help ensure they’re free from the virus. The books will be brought for pickup at area grocery stores, so people can retrieve them when they’re out buying food.”
We cannot forget ebook and free streaming services, these include Overdrive, Libby, Kanopy, and even Amazon. There is a learning curve for older patrons versus younger ones who are more tech savvy. Many librarians are acting as tech support during the shutdown.
Once the shutdown is over, patrons will slowly return while maintaining some social distance for a time. There are concerns over libraries’ budgets being cut during the impeding economic downturn, but libraries will get through it as they always to.
Whitney Grace, April 23, 2020
Libraries Fight Publishers In Ebook Limitations
October 17, 2019
Public libraries are an equalizing tool for people who do not have access to technology, books, and other materials that come with higher incomes. Unlike academic and textbook publishers, popular book publishers have had working relationships with libraries for decades. One of the biggest publishing houses in the United States might bring that to an end if they instill limitations on ebooks. The Stranger shares one library’s story against publisher in, “Seattle Public Library ‘Denounces’ Publisher’s New E-Book Policy.”
Come November 1, 2019, Macmillan plans to only sell one digital copy of newly released ebooks for half price. Libraries will also be forced to wait two months before they can buy more copies and that will be at the full retail price. Digital ebooks sell for $60, but are $30 for many libraries due to their non-profit status.
Macmillan CEO John Sargent’s reasoning makes sense from a company trying to make a profit:
“The rationale behind this move, according to a draft of a memo to authors written by Macmillan CEO John Sargent, is “to balance the great importance of libraries with the value of [an author’s] work.” Sargent argues that library lending is “cannibalizing sales” of e-books. He thinks the embargo will help the e-books sell better online, and claims to have data proving that the publisher makes far less on “library reads” than they they do on “retail reads.””
Librarians speak the truth about the issue, because they are in the trenches where the action takes place. Libraries act as free PR for publishers and assist them in selling books with the profits going directly to the publishers, not libraries. Libraries also pay for ebooks than physical copies, despite it being cheaper to release ebooks.
This is going to hurt people with lower incomes, because they use libraries to get books they otherwise would not be able to afford.
The libraries, as always, will bear the brunt of this decision, because the general public does not understand or know about lending agreements between libraries and publishers. Authors could get bad reputations as well.
The number of people using ebooks and audiobooks has dramatically increased not only for the Seattle Public Library, but for libraries across the nation. Libraries have collected data that proves their circulating collections, physical and digital, do increase sales and boosts readership.
Libraries will also spend money, because of the products and services they offer people. If the price of ebooks go up, they will be forced to limit their collection’s holdings which will decrease circulation and the amount of people who visit. It would also lead to a decrease in readership and even book sales.
With an ever increasing cost of living, increasing the price for luxury goods like books will do more damage than boost sales. As a public institution, libraries have a good reputation and will give Macmillan a run for their pages.
Whitney Grace, MLS, October 17, 2019
De-Archiving: Where Is the Money to Deliver Digital Beef?
February 25, 2018
I read “De-Archiving: What Is It and Who’s Doing It?” I don’t want to dig into the logical weeds of the essay. Let’s look at one passage I highlighted.
As the cost of hot storage continues to drop, economics work in favor of taking more and more of their stored material and putting it online. Millions of physical documents, films, recordings, photographs, and historical data are being converted to online digital assets every year. Soon, anything that was worth saving will also be worth putting online. Tomorrow’s warehouse will be a data center filled with spinning disks that safely store any valuable data – even if it has to be converted to a digital format first. “De-archiving” will be a new vocab word for enterprises and individuals everywhere – and everyone will be doing it in the near future.
My hunch is that the thought leader who wrote the phrase “anything that was worth saving will be worth putting online” has not checked out the holdings of the Library of Congress. The American Memory project, on which I worked, represents a miniscule percentage of the non text information the LoC has. Toss in text, boxes of manuscripts, and artifacts (3D imaging and indexing). The amount of money required to convert and index the content might stretch the US budget which seems to wobble around with continuing resolutions.
Big ideas are great. Reality may not be as great. Movies which can disintegrate during conversion? Yeah, right. Easy. Economical.
Stephen E Arnold, February 25, 2018
Millennials Want to Keep Libraries
September 22, 2017
Many people think that libraries are obsolete and are only for senior citizens who want to read old paperbacks. The Pew Research Center says otherwise in the article, “Most Americans-Especially Millennials-Say Libraries Can Help Them Find Reliable, Trustworthy Information.”
Sensationalism in the news is not new, but it has reached extraordinary new heights with the Internet and mass information consumption. In order to gain audiences, news outlets (if some of them can be called that) are doing anything they can and this has lead to an outbreak of fake news.
The Pew Research Center conducted a test to see if adults would like to be taught how to recognize fake information and discovered that 61% said they would. They also discovered that 78% of adults feel that libraries can help them find trustworthy information. An even more amazing fact is that Millennials are the biggest supporters for libraries.
A large majority of Millennials (87%) say the library helps them find information that is trustworthy and reliable, compared with 74% of Baby Boomers (ages 52 to 70) who say the same. More than eight-in-ten Millennials (85%) credit libraries with helping them learn new things, compared with 72% of Boomers. And just under two-thirds (63%) of Millennials say the library helps them get information that assists with decisions they have to make, compared with 55% of Boomers.
People also use the libraries to receive technology training and gain confidence in these skills. Other interesting facts are that women are more likely than men to say that libraries help them find reliable information. Hispanic people also love the library and see it as an essential tool to cope with the busy world. Also, those without a high school diploma say that libraries help them in more than one way.
Libraries are far from obsolete. Libraries are epicenters for technology training and finding reliable and trustworthy information in world hooked on sensationalism.
Whitney Grace, September 22, 2015
How to Quantify Culture? Counting the Bookstores and Libraries Is a Start
February 7, 2017
The article titled The Best Cities in the World for Book Lovers on Quartz conveys the data collected by the World Cities Culture Forum. That organization works to facilitate research and promote cultural endeavors around the world. And what could be a better measure of a city’s culture than its books? The article explains how the data collection works,
Led by the London mayor’s office and organized by UK consulting company Bop, the forum asks its partner cities to self-report on cultural institutions and consumption, including where people can get books. Over the past two years, 18 cities have reported how many bookstores they have, and 20 have reported on their public libraries. Hong Kong leads the pack with 21 bookshops per 100,000 people, though last time Buenos Aires sent in its count, in 2013, it was the leader, with 25.
New York sits comfortably in sixth place, but London, surprisingly, is near the bottom of the ranking with roughly 360 bookstores. Another measure the WCCF uses is libraries per capita. Edinburgh of all places surges to the top without any competition. New York is the only US city to even make the cut with an embarrassing 2.5 libraries per 100K people. By contrast, Edinburgh has 60.5 per 100K people. What this analysis misses out on is the size and beauty of some of the bookstores and libraries of global cities. To bask in these images, visit Bookshelf Porn or this Mental Floss ranking of the top 7 gorgeous bookstores.
Chelsea Kerwin, February 7, 2017
Obey the Almighty Library Laws
January 23, 2017
Recently I was speaking with someone and the conversation turned to libraries. I complimented the library’s collection in his hometown and he asked, “You mean they still have a library?” This response told me a couple things: one, that this person was not a reader and two, did not know the value of a library. The Lucidea blog discussed how “Do The Original 5 Laws Of Library Science Hold Up In A Digital World?” and apparently they still do.
S.R. Ranganathan wrote five principles of library science before computers dominated information and research in 1931. The post examines how the laws are still relevant. The first law states that books are meant to be used, meaning that information is meant to be used and shared. The biggest point of this rule is accessibility, which is extremely relevant. The second laws states, “Every reader his/her book,” meaning that libraries serve diverse groups and deliver non-biased services. That still fits considering the expansion of the knowledge dissemination and how many people access it.
The third law is also still important:
Dr. Ranganathan believed that a library system must devise and offer many methods to “ensure that each item finds its appropriate reader”. The third law, “every book his/her reader,” can be interpreted to mean that every knowledge resource is useful to an individual or individuals, no matter how specialized and no matter how small the audience may be. Library science was, and arguably still is, at the forefront of using computers to make information accessible.
The fourth law is “save time for the reader” and it refers to being able to find and access information quickly and easily. Search engines anyone? Finally, the fifth law states that “the library is a growing organism.” It is easy to interpret this law. As technology and information access changes, the library must constantly evolve to serve people and help them harness the information.
The wording is a little outdated, but the five laws are still important. However, we need to also consider how people have changed in regards to using the library as well.
Whitney Grace, January 23, 2017
The Robots Are Not Taking over Libraries
December 14, 2016
I once watched a Japanese anime that featured a robot working in a library. The robot shelved, straightened, and maintained order of the books by running on a track that circumnavigated all the shelves in the building. The anime took place in a near-future Japan, when all paper documents were rendered obsolete. While we are a long way off from having robots in public libraries (budget constraints and cuts), there is a common belief that libraries are obsolete as well.
Libraries are the furthest thing from being obsolete, but robots have apparently gained enough artificial intelligence to find lost books, however. Popsci shares the story in “Robo Librarian Tracks Down Misplaced Book.” It explains a situation that librarians hate to deal with: people misplacing books on shelves instead of letting the experts put them back. Libraries rely on books being in precise order and if they are in the wrong place, they are as good as lost. Fancy libraries, like a research library at the University of Chicago, have automated the process, but it is too expensive and unrealistic to deploy. There is another option:
A*STAR roboticists have created an autonomous shelf-scanning robot called AuRoSS that can tell which books are missing or out of place. Many libraries have already begun putting RFID tags on books, but these typically must be scanned with hand-held devices. AuRoSS uses a robotic arm and RFID scanner to catalogue book locations, and uses laser-guided navigation to wheel around unfamiliar bookshelves. AuRoSS can be programmed to scan the library shelves at night and instruct librarians how to get the books back in order when they arrive in the morning.
Manual labor is still needed to put the books in order after the robot does its work at night. But what happens when someone needs help with research, finding an obscure citation, evaluating information, and even using the Internet correctly? Yes, librarians are still needed. Who else is going to interpret data, guide research, guard humanity’s knowledge?
Whitney Grace, December 14, 2016
 
	




