CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Membership Meetings
    • Next Meeting
    • Past Meetings
    • Future Meetings
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • CNI Publications
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Executive Roundtables
    • Follow CNI
    • Historical Resources
  • Contact Us

Library Collections at Scale: Building Research-Ready Collections

Home / Project Briefing Pages / CNI Fall 2025 Project Briefings / Library Collections at Scale: Building Research-Ready Collections

November 17, 2025

Hall-Hoag at Scale: Taming 800,000 Pages

Birkin Diana
Lead Developer, Digital Technologies, Library
Brown University

Justin Uhr
Library Web and Applications Developer, Digital Technologies, Library
Brown University

In 2023, Brown University’s special collections John Hay Library received a grant to scan some 800,000 pages of materials from some 35,000 organizations of “The Hall-Hoag Collection of Dissenting and Extremist Printed Propaganda,” and ingest them into Brown’s Digital Repository. Due to volume and timeline, the individual items ingested were linked to their organizations. The briefing will share (1) current “phase-1: ingestion” work involving optical character recognition research, rotation-detection, organizational-metadata production, and a flexible ingestion pipeline; and (2) preliminary “phase-2: enhancement” investigations involving using large language models for summarization to improve item-level MODS, using multimodal embedding models to group individual items into multi-page documents, and exploring approaches to make the collection discoverable without results dominating all repository searches.

  • Hall-Hoag collection brief overview: https://library.brown.edu/collatoz/info.php?id=62
  • Brown Digital Repository Hall-Hoag collection page: https://repository.library.brown.edu/studio/collections/bdr:wum3gm43/
  • Hall-Hoag collection finding-aid-database website:
    https://apps.library.brown.edu/hall-hoag/

Processing Library Collections at Scale for Broad Research and Artificial Intelligence Training

Catherine Brobston
Program Director, Institutional Data Initiative, Law Library
Harvard University

Greg Leppert
Executive Director, Institutional Data Initiative, Law Library
Harvard University

The Institutional Data Initiative at the Harvard Law School Library is working to build library capacity while improving the available training data for artificial intelligence (AI). In June, the Library released Institutional Books, a dataset of nearly one million public domain volumes from Harvard Library, scanned through its partnership with Google. Next year, the Library will release a dataset of newspapers in partnership with the Boston Public Library. In processing library datasets, the goal is to release structured and refined collections that improve usability for humans and machines alike. The briefing will include insights on working with library data at scale, applying AI tools to build reliable pipelines, learning from a community of academic and AI community users, and working within institutional systems to create room for this work.

https://www.institutional.org/

 

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to share on X (Opens in new window) X

Filed Under: CNI Fall 2025 Project Briefings, Digital Libraries, Information Access & Retrieval, Metadata, Project Briefing Pages
Tagged With: cni2025fall, Project Briefings & Plenary Sessions

Last updated:  Friday, November 21st, 2025

 

Contact Us

1025 Connecticut Ave, NW #1200
Washington, DC 20036
202.296.5098

Contact us
Copyright © 2025 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

LinkedInBlueSkyFacebookTwitterYouTubeVimeoMastodon

A joint project