CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Membership Meetings
    • Next Meeting
    • Past Meetings
    • Future Meetings
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • CNI Publications
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Executive Roundtables
    • Follow CNI
    • Historical Resources
  • Contact Us

“Directory Pipeline”—A Tool for Turning Historical Digital Collections into Structured Data

Home / Topics / Artificial Intelligence / “Directory Pipeline”—A Tool for Turning Historical Digital Collections into Structured Data

May 6, 2026

Josh Hadro

Directory Pipeline is an LLM-assisted, IIIF-native proof-of-concept tool for turning digitized collections items structured, browsable CSV interfaces, with snippet links back to the original source material. With any IIIF Manifest URL (from the Library of Congress, Internet Archive, NYPL Digital Collections, or any institution that publishes a public IIIF manifest) it uses a meta-prompting technique to select a few example entry pages, and then uses evaluation of those pages to generate item-specific OCR or HTR instructions, as well as item-specific data extraction instructions. It’s built for digitized historical directories—city directories, gazetteers, trade directories—but works on just about any historical document with regular entry-like structure, including handwritten log entries, manuscripts, and more.

https://technology.berkeley.edu/research-storage

Presentation Slides

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on X (Opens in new window) X

Filed Under: Artificial Intelligence, Digital Humanities, Digital Libraries, Emerging Technologies, Information Access & Retrieval, Special Collections, Spring 2026 Pre-Recorded Project Briefing Series

Last updated:  Wednesday, May 6th, 2026

 

Contact Us

1025 Connecticut Ave, NW #1200
Washington, DC 20036
202.296.5098

Contact us
Copyright © 2026 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

LinkedInBlueSkyFacebookTwitterYouTubeVimeoMastodon

A joint project