CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Membership Meetings
    • Next Meeting
    • Past Meetings
    • Future Meetings
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • CNI Publications
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Executive Roundtables
    • Follow CNI
    • Historical Resources
  • Contact Us

FedHarv—A Python Workflow for Automating Open Access Article Harvesting

Home / Topics / Artificial Intelligence / FedHarv—A Python Workflow for Automating Open Access Article Harvesting

May 6, 2026

Pascal Calarco
Scholarly Communications and Systems Librarian

University of Windsor

FedHarv is a specialized command-line utility designed to automate the discovery, retrieval, and packaging of open access (OA) scholarly outputs for batch ingestion into DSpace repositories. It replaces manual harvesting with a robust, automated pipeline. FedHarv is architected for OA/copyright compliance and high-quality metadata, operating on a “Diamond/Gold/Hybrid Exclusive” policy. It systematically isolates “Green” (self-archived) and “Bronze” (free-to-read) content to ensure the repository only hosts files with clear re-use licenses or explicit open access status. It leverages several APIs to do so, including OpenAlex, CrossRef TDM, FundRef, DOAJ, Unpaywall, and DataCite, and packages metadata and bitstreams into customizable collection folders for an institution’s repository. FedHarv will be available under an MIT license for others to use and modify in Summer 2026.

Share this:

  • Share on Facebook (Opens in new window) Facebook
  • Share on LinkedIn (Opens in new window) LinkedIn
  • Share on Mastodon (Opens in new window) Mastodon
  • Share on Bluesky (Opens in new window) Bluesky
  • Share on X (Opens in new window) X

Filed Under: Artificial Intelligence, Cyberinfrastructure, Digital Curation, Digital Libraries, Information Access & Retrieval, Metadata, Publishing, Repositories, Scholarly Communication, Spring 2026 Pre-Recorded Project Briefing Series

Last updated:  Wednesday, May 6th, 2026

 

Contact Us

1025 Connecticut Ave, NW #1200
Washington, DC 20036
202.296.5098

Contact us
Copyright © 2026 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

LinkedInBlueSkyFacebookTwitterYouTubeVimeoMastodon

A joint project