Dean of Libraries
University of North Texas
Director, Web Archiving Programs
Old Dominion University
Associate Dean for Digital Libraries
University of North Texas
There are several upcoming large-scale endeavors that will address web archiving at the national level. This session will present information concerning these undertakings, including the following:
1) National Web Archiving Capacity Project: The Internet Archive, working with partner organizations, University of North Texas, Rutgers University, and Stanford University Library will undertake a two-year research project to explore techniques that can expand national web archiving capacity in several areas. The project aims to build a foundation for collaborative technology development, improved systems interoperability, and an Application Programming Interface (API) based model for enhanced access to, and research use of, web archives. The project will outline successful community models for cooperative technology development work; it will prototype and test API-based interoperability; and it will explore how interoperability can enable new access models, improve discoverability, and expand shared digital services. In working with the Archive-It platform, now used by more than 350 partner institutions, results of this research will be directly applicable to libraries, archives, and museums around the country and the world.
2) Integrating Storytelling Web Archive Project: Old Dominion University and the Internet Archive will collaborate to develop tools and techniques for integrating “storytelling” social media and web archiving. Services such as Archive-It allow libraries, archives and museums to develop, curate, and preserve collections of web resources. At the same time, storytelling is becoming a popular technique in social media for selecting representative tweets, videos, web pages, etc., and arranging them in chronological order to support a particular narrative or “story.” Tools such as Storify provide an easy interface for users to arrange web resources to create a story. The partners will use information retrieval techniques to (semi-)automatically generate stories summarizing a collection and mine existing public stories as a basis for librarians, archivists, and curators to create collections about breaking events.
3) End of Term (EOT) Web Archive: The EOT Web Archive program is a multi-institution effort that aims to comprehensively capture and save U.S. Government websites at the end of presidential administrations. Beginning in 2008, the EOT has thus far preserved websites from administration changes in 2008 and 2012. The collaborative program is currently preparing for the 2016 electoral season in a critically important election and moment in our nation. The EOT Web Archive contains federal government websites (.gov, .mil, etc) in the legislative, executive, or judicial branches of the government. Websites that were at risk of changing (i.e., whitehouse.gov) or disappearing altogether during government transitions were captured. Local or state government websites, or any other site not part of the federal government domain were out of scope. The 2016 EOT crawl will be a particularly important collection of information given the contested nature of the presidential race.
4) Digital Preservation of Federal Information Summit: This will be a report-out from a just-concluded meeting on the topic of preservation and access to at-risk digital government information. This summit was a meeting of high-level national organizations concerned with this issue, and planning an agenda for mobilization of efforts well in advance of the inauguration of the new president. The meeting examined categories of government information that are most at-risk, and why they are at-risk, technologies for capturing, preserving and making such collections accessible, and strategies for mobilizing collaborative efforts between institutions for these ends.