CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Membership Meetings
    • Next Meeting
    • Past Meetings
    • Future Meetings
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • CNI Publications
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Executive Roundtables
    • Follow CNI
    • Historical Resources
  • Contact Us

Beyond “This Image May Contain:” Using Vision Language Models to Improve Accessibility for Digital Image Collections

Home / Topics / Access & Equity / Beyond “This Image May Contain:” Using Vision Language Models to Improve Accessibility for Digital Image Collections

March 20, 2025

Peter Broadwell
Manager of AI Modeling and Inference in Research Data Services
Stanford University

Lindsay King
Head Librarian, Bowes Art & Architecture Library
Stanford University

Neural network artificial intelligence (AI) technologies capable of working with both images and text offer promising tools for improving access to library collections at scale. In particular, libraries increasingly must address the obligation to generate succinct “alt-text” descriptions of digital images, which often entails remediation tasks in the tens of thousands of items. AI approaches are appealing given their ability to automate complex tasks involving natural language, but there are plentiful reasons to look beyond simply pasting library materials into ChatGPT. Stanford University’s experiments have found that both fine-tuning of locally hosted models and “conditioning” of the captions by incorporating available metadata into the model’s instructions (“prompt engineering”) show promise for producing useful descriptive text for images. They’ve also found that tailoring approaches to specific collections and keeping human reviewers in the loop are keys to making the alt-text as accurate as possible while gaining efficiency at scale. Beyond accessibility compliance, vision language models can also enable free-text “evocative” search in multiple languages, object detection, and other tools for improving discovery within image collections.

https://web.stanford.edu/~pleonard/cni2025/

 

Presentation

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on Bluesky (Opens in new window) Bluesky
  • Click to share on X (Opens in new window) X

Filed Under: Access & Equity, Artificial Intelligence, CNI Spring 2025 Project Briefings, Digital Libraries, Emerging Technologies, Project Briefing Pages, Scholarly Communication
Tagged With: cni2025spring, Project Briefings & Plenary Sessions

Last updated:  Friday, May 2nd, 2025

 

Contact Us

1025 Connecticut Ave, NW #1200
Washington, DC 20036
202.296.5098

Contact us
Copyright © 2025 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo
  • Flickr
  • Tumblr

A joint project