Kate Dohe
Director of Digital Programs and Initiatives
University of Maryland
Artificial intelligence (AI) bots aggressively harvest content from the internet to create vast training datasets for generative AI models. Not so well known are the especially devastating effects this scraping has on cultural heritage institutions like libraries, archives, and museums. This sudden and overwhelming increase in bot traffic to these institutions’ open repositories, discovery systems, and web properties has them grappling with technical strain on their systems, and trying to figure out how to best mitigate the downstream impacts to staff, patrons, and all users who interact with the systems.
Aggressive harvesting can degrade system performance, skew usage metrics, violate usage terms, and strip cultural materials of context. These behaviors pose risks to both the availability and stability of core library online services and the sustainability of public distribution of digital resources. This lightning session will briefly explore the impact of unregulated AI scraping on cultural heritage institutions, and share the work of an “un-official” working group whose mission is to provide a place to collaborate, share experiences, and educate others on this ever-changing issue.