Josh Hadro
Directory Pipeline is an LLM-assisted, IIIF-native proof-of-concept tool for turning digitized collections items structured, browsable CSV interfaces, with snippet links back to the original source material. With any IIIF Manifest URL (from the Library of Congress, Internet Archive, NYPL Digital Collections, or any institution that publishes a public IIIF manifest) it uses a meta-prompting technique to select a few example entry pages, and then uses evaluation of those pages to generate item-specific OCR or HTR instructions, as well as item-specific data extraction instructions. It’s built for digitized historical directories—city directories, gazetteers, trade directories—but works on just about any historical document with regular entry-like structure, including handwritten log entries, manuscripts, and more.