CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • CNI Collaborations
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Program Plan
    • Current Program Plan
    • Program Plan Archive
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • Publications by CNI Staff
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Follow CNI
    • Historical Resources
  • Contact Us

Using Machine Learning to Extract WWII Japanese American Incarceree Data

Home / Topics / Access & Equity / Using Machine Learning to Extract WWII Japanese American Incarceree Data

December 1, 2021

Marissa Friedman
Digital Project Archivist
University of California, Berkeley

Mary Elings
Interim Deputy Director, Assistant Director, and Head of Technical Services
University of California, Berkeley

Vijay Singh
Co-Founder and Chief Executive Officer
Doxie.AI

Tracey Tan
Co-Founder and Chief Product Officer
Doxie.AI

Cameron Ford
Co-Founder, Chief Sales Officer, and Chief Finance Officer
Doxie.AI

As part of a Japanese American Confinement Sites (JACS) grant project supported by the National Park Service, the Bancroft Library is digitizing nearly 210,000 pages of War Relocation Authority (WRA) Form 26 individual records of Japanese Americans incarcerated during World War II in 10 “relocation” centers. The WRA used this two-page census-type form to collect sociological, demographic, and biographical data about the incarcerated population. Coded onto computer punch cards from the original, primarily typewritten forms in the 1940s, then turned into a data file in the 1960s at Berkeley, deposited at the National Archives and Records Administration (NARA) in the 1980s, and made available online by NARA as the Japanese American Internee Data File, this dataset has served for decades as the authoritative source of genealogical information for former inmates and their families and statistical information for social scientists. The existing data file, however, contains gaps, errors, and inaccuracies, and does not adequately represent the breadth and depth of information found in the original forms. The Bancroft Library holds the complete set of over 110,000 original typewritten forms and has joined forces with Doxie artificial intelligence (AI) to use machine learning (ML) models to help automate the extraction of the original data into a new and expanded data file. Doxie AI will describe how they built a custom optical character recognition pipeline to transcribe the War Relocation Authority (WRA) Form 26 records with a high degree of accuracy, and developed a custom ML model to remove noise from images to produce better results. We will share lessons learned so far about the challenges and opportunities of using ML to enhance computational access to digitized archival records.

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)

Filed Under: Access & Equity, CNI Fall 2021 Project Briefings, Digital Humanities, Emerging Technologies, Project Briefing Pages, Special Collections
Tagged With: cni2021fall, Project Briefings & Plenary Sessions, Videos

Last updated:  Monday, July 25th, 2022

 

Contact Us

21 Dupont Circle
Suite 800
Washington, DC, 20036
202.296.5098

Contact us
Copyright © 2023 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo

A joint project