Assistant Dean for Research & Innovation Strategy
University of Colorado Boulder
University of California Curation Center (UC3) Director
California Digital Library
Associate Dean for Knowledge Management and Strategy and Director, Feinberg School of Medicine
Librarians and other stakeholders, through the Make Data Count initiative, have worked to advance data citation adoption among researchers and publishers by leveraging a variety of incentives (carrots) and regulations (sticks). Though much progress has been made, DataCite and Crossref systems show that across millions of scholarly outputs, structured ‘data citations’ are only present in tens of thousands of records. Often, this is because researchers mention underlying data without creating a structured citation or because publishers do not support structured citations for datasets in a paper’s references. The Make Data Count initiative devised a new strategy that does not rely on researchers or on publishers to assert the relationship between a paper and its underlying datasets. With funding from Wellcome Trust, DataCite has worked with the Chan Zuckerberg Initiative to develop a machine-learning algorithm that extracts references to underlying data from full journal articles and preprints without the inclusion of structured data citations. This model has been applied to the full text of hundreds of millions of articles, resulting in the Open Global Data Citation Corpus—a trusted central aggregate of all references to research data across articles, preprints, government documents, and other outputs. This corpus will fundamentally change the way libraries, bibliometricians, research administrators, software systems, and funders measure the impact of scholarly research.