CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Membership Meetings
    • Next Meeting
    • Past Meetings
    • Future Meetings
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • CNI Publications
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Executive Roundtables
    • Follow CNI
    • Historical Resources
  • Contact Us

Unlocking Web Archives Using Retrieval Augmented Generation

Home / Topics / Artificial Intelligence / Unlocking Web Archives Using Retrieval Augmented Generation

May 16, 2025

Corey Davis
Digital Preservation Librarian
University of Victoria

Large Language Models (LLMs) are reshaping how research libraries manage digital preservation and provide access to web archives. This presentation explores the potential and challenges of integrating LLMs with Retrieval-Augmented Generation (RAG) to improve searchability and usability of Web ARChive (WARC) files. The session will showcase a RAG pipeline developed at the University of Victoria Libraries for processing and exploring web archives conversationally, and talk about the challenges inherent in managing AI infrastructure locally, data quality issues, embedding strategies, and computational requirements. Attendees will gain insights into the practical applications of AI for access to unique digital collections, the technical and ethical considerations involved, and strategies for optimizing AI-driven discovery tools in library contexts. The presentation will argue that while AI enhances access to web archives and digital collections more broadly, its successful deployment requires careful design, iterative refinement, and human oversight. Full background information is available here: Davis, C. (2025). Unlocking web archives: LLMs, RAG, and the future of digital preservation. University of Victoria Libraries. https://hdl.handle.net/1828/21379

https://github.com/coreyleedavis/libguides-rag

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to share on X (Opens in new window) X

Filed Under: Artificial Intelligence, Digital Libraries, Digital Preservation, Information Access & Retrieval, Pre-Recorded Project Briefing Pages, Spring 2025 Pre-Recorded Project Briefing Series
Tagged With: Videos

Last updated:  Friday, May 16th, 2025

 

Contact Us

1025 Connecticut Ave, NW #1200
Washington, DC 20036
202.296.5098

Contact us
Copyright © 2025 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo
  • Flickr
  • Tumblr

A joint project