Skip to main content
The Collation

An Introduction to Web Archiving at the Folger

As a resident Digital Archivist at the Folger, I’ve been tasked with the management of Folger web archiving efforts.

The Folger Shakespeare Library web collecting mission.

The Folger Shakespeare Library web collecting mission.

Now, you might be asking: what is web archiving exactly? The International Internet Preservation Consortium (IIPC) defines web archiving as the process of “collecting portions of the World Wide Web, preserving the collections in an archival format [most often via the WARC file format], and then serving the archives for access and use.” There are a number of ways to achieve this: from home-grown technical processes relying on a combination of open-source tools, to a number of vendor options which package popular web collecting and organization methods into one service. No matter the route, the under-the-hood mechanics for the collection process remain virtually the same. Generally, web content is harvested through a process in which “web crawlers” (such as the popular Heritix tool) systematically access and gather content from designated URLs through a process referred to as crawling. The results of these crawls are captures of web content that can then be archived and curated into organized collections.

Leave a Reply

Your email address will not be published. Required fields are marked *