Marissa Friedman
Digital Project Archivist
University of California, Berkeley
Mary Elings
Interim Deputy Director, Assistant Director, and Head of Technical Services
University of California, Berkeley
Vijay Singh
Co-Founder and Chief Executive Officer
Doxie.AI
Tracey Tan
Co-Founder and Chief Product Officer
Doxie.AI
Cameron Ford
Co-Founder, Chief Sales Officer, and Chief Finance Officer
Doxie.AI
As part of a Japanese American Confinement Sites (JACS) grant project supported by the National Park Service, the Bancroft Library is digitizing nearly 210,000 pages of War Relocation Authority (WRA) Form 26 individual records of Japanese Americans incarcerated during World War II in 10 “relocation” centers. The WRA used this two-page census-type form to collect sociological, demographic, and biographical data about the incarcerated population. Coded onto computer punch cards from the original, primarily typewritten forms in the 1940s, then turned into a data file in the 1960s at Berkeley, deposited at the National Archives and Records Administration (NARA) in the 1980s, and made available online by NARA as the Japanese American Internee Data File, this dataset has served for decades as the authoritative source of genealogical information for former inmates and their families and statistical information for social scientists. The existing data file, however, contains gaps, errors, and inaccuracies, and does not adequately represent the breadth and depth of information found in the original forms. The Bancroft Library holds the complete set of over 110,000 original typewritten forms and has joined forces with Doxie artificial intelligence (AI) to use machine learning (ML) models to help automate the extraction of the original data into a new and expanded data file. Doxie AI will describe how they built a custom optical character recognition pipeline to transcribe the War Relocation Authority (WRA) Form 26 records with a high degree of accuracy, and developed a custom ML model to remove noise from images to produce better results. We will share lessons learned so far about the challenges and opportunities of using ML to enhance computational access to digitized archival records.