Associate University Librarian for Digital Library Systems
This presentation describes the decisions and strategy to develop metadata and to assign digital object identifiers (DOIs) to large and complex data sets that are continuously updated. The Rutgers University Libraries (RUL) are working with the Ocean Observatories Initiative (OOI), a multi-institution, National Science Foundation-funded initiative to monitor the status and health of the Atlantic and Pacific Oceans. Rutgers University developed the the data infrastructure, and the libraries are developing metadata and DOI assignment for the management, discovery and long-term accessibility of the data. Data is streamed continuously from over 800 instruments. Ultimately, more than 45,000 individual data products will be available. The Libraries’ strategy includes the development of a flexible data model for large data sets that identifies the who, what, when and where for data streams and generates automatic metadata for the data streams created from samples and observations continuously generated by instruments on seven platform arrays across the Atlantic and Pacific Oceans. This presentation will look at the complex questions that needed to be answered, including core questions such as what constitutes a single data resource and what context is critical to ensure meaningful reuse of the data, particularly over the long term. Practical issues, such as creating relationships across platform arrays, equipment and data, without requiring human intervention, identifying and creating metadata for new data versions and generating metadata for different lifecycles of the data (from raw, to edited, to repurposed) will be discussed. The implementation is currently in test for raw data using a Fedora repository platform communicating with the DOI data infrastructure. The strategy is intended to be applicable to any large data project and extensible across repository platforms through the use of core DOI metadata and repository APIs by the project’s data capture system. Grace Agnew will present on behalf of the RUL project team. Other team members include programmer Chad M. Mills and research data metadata developer Yu-Hung Lin.