Content & Metadata Manager
United States Government Printing Office
|Mark Edward Phillips
Head, Digital Projects Unit
University of North Texas
Digital Media Projects Coordinator
Library of Congress
The Library of Congress, the California Digital Library (CDL), the University of North Texas (UNT) Libraries, the Internet Archive (IA), and the U.S. Government Printing Office have joined together for a collaborative project to preserve public United States government Web sites at the end of the current presidential administration, ending January 19, 2009. This harvest is intended to document federal agencies’ presence on the World Wide Web during the transition of presidential administrations and to enhance the existing collections of the five partner institutions.
In this collaboration, the partners will structure and execute a comprehensive harvest of the federal government .gov domain. The IA will crawl broadly across the entire .gov domain. UNT and CDL will supplement and extend the broad comprehensive crawl with focused, in-depth crawls based on prioritized lists of URLs. This dual-edged approach seeks to capture a comprehensive snapshot of the federal government on the Web at the close of the current administration.
The project will call upon government information specialists—including librarians, political and social science researchers, and academics—to assist in the selection and prioritization of Web sites to be included in the collection, as well as to identify the frequency and depth of the act of collecting. A tool has been designed by the project team and developed by UNT to facilitate the collaborative work of these specialists, and was made available to participants in August 2008. The briefing will report on the development of the nomination tool, progress of the nomination and prioritization process, and the progress on initial crawls.
Handout (MS Word)