Web and Data Scientist
Academic Technology Specialist
The Internet Archive (IA) strives to archive more of the web and to archive it better. While it values the quality of web archival collections produced by a well-crafted set of scopes and a curated seed list (e.g., the End of Term Crawls), there are times when it cannot afford to operate crawling activities in a sequential order after a rigorous planning and seed collection because the target web resources become extremely volatile and vulnerable. In such cases, the IA puts extraordinary effort into capturing the section of the web in question as quickly as possible by allocating a significant portion of our finite compute resources as well as human resources. As a recent example, when the Russia-Ukraine war hit the world with a surprise, the IA acted immediately to run various crawls on relevant domains in a reactive mode. Its video archiving pipeline started to collect more videos on the topic.
The Saving Ukrainian Cultural Heritage Online (SUCHO) project was born independently from the IA. Volunteers of the project crowd-sourced efforts to identify relevant resources, archive them using various tools such as the Wayback Machine’s Save Page Now service, Webrecorder, and quality assurance. Many members of the Wayback Machine team were actively involved in supporting the project and addressing infrastructure issues.
This session will include a discussion of some of IA’s ongoing web archiving efforts related to crisis events, as well as an update on the SUCHO project. A presentation on SUCHO was included in CNI’s July 2022 Pre-Recorded Project Briefing Series: https://www.cni.org/topics/digital-humanities/saving-ukrainian-cultural-heritage-online-rapid-response-digital-humanities.