Senior Crawl Engineer
Archive-It Program Director
This presentation will include a brief discussion of the importance of web archiving, examples of how some institutions are archiving the web, a demonstration of Archive-It, with an overview of the application, a review of the architecture, a discussion of technical challenges already faced, plus a technical roadmap for future developments in the system. RLG will give an overview of its programmatic offerings in web archiving.
Libraries and archives have long collected information that would serve scholars in understanding history, culture, and society. Because of the efforts of memory institutions, many important documents have been saved which document and help us understand and interpret the past. So much of today’s information is easily found on the world wide web — web pages have replaced newsletters, blogs are today’s diaries, many government forms and documents are more readily accessible on the web than they are in paper form. As part of an effort to appropriately document and capture today’s information for tomorrow’s use, institutions must adopt a web archiving strategy. For many institutions, the prospect of capturing and storing web sites, or entire web domains is a daunting prospect.
Fortunately, Archive-It takes much of the burden out of web archiving. Archive-It is a web application uniquely designed for the needs of university and government institutions interested in preserving web content. The application allows organizations with limited infrastructure and technical staff to collect, catalog, search and manage archived web content through a web interface. Built on open source components by the Internet Archive and the International Internet Preservation Consortium, Archive-It creates and stores the ARC files that are the standard format for web archiving.
Even though Archive-It helps with the mechanics of web archiving, there are still substantial community issues to grapple with, including: metadata, shared collection development, intellectual property rights, and end user issues. RLG will be leading a number of working groups that will explore issues and best practice around web archiving for those who are participating in the Archive-It service, or who are pursuing web archiving through other means.
Handout Page 1 (MS Word)
Handout Page 2 (PDF)
Archive-It Architecture Introduction (PowerPoint Presentation)