Director, Web Archiving Programs
Lead Information Technology Specialist, Web Archiving Team
Library of Congress
Associate Dean for Digital Libraries
University of North Texas
In the fall of 2016 a group of institutions organized to preserve a snapshot of the federal government web. This is the third time this End of Term (EOT) group has organized with the goals of identifying, harvesting, preserving and providing access to a snapshot of the federal government web presence both as a way of documenting the changes caused by the transition of elected officials in the executive branch of the government and to provide a broad snapshot of the federal domain once every four years that is replicated among a number of organizations for long-term preservation. Presenters from three lead institutions on the project will discuss its methods for identifying and selecting in-scope content (including using registries, indices, and crowdsourcing URL nominations [“seeds”] through a web application called the URL Nomination Tool), new strategies for capturing web content (including crawling, browser rendering, and social media tools), access models including both an online portal as well as research datasets for use in computational analysis, and preservation data replication between partners using new export APIs and experimental tools developed as part of the IMLS-funded WASAPI project. Presenters will also speak to how the project illuminates the challenges and opportunities of large-scale, distributed, multi-institutional, born-digital collecting and preservation efforts, how the project aligns with participant institutions collection mandates, the project’s importance for archiving historically-valuable but highly-ephemeral web content without a clear steward, and how the breadth and size of the End of Term Web Archive informs both new methods of collaboration and new models for data-driven access and analysis by researchers.