Associate Director of Research
Director, Samuel J. Wood Library
Better access to the data, workflows, and analyses behind published results improves research reproducibility, increasingly meets funder mandates, and protects against allegations of research misconduct. Key elements of this capacity include the ability to identify, locate, and then access the data used, and produce immutable records of the analytical process. Unique difficulties arise when research involves confidential data, whose access cannot be made available readily. Nevertheless, it is important to be able to publish the rules by which access can be granted. To address these challenges, Weill Cornell Medicine (WCM) has created a unique data catalog that integrates data discoverability, data access governance, electronic lab notebooks (ELN), and a filesystem management tool. ELNs capture research workflows and store small datasets. The Starfish file management system allows all non-ELN files associated with a project to be tagged and tracked within the institutional storage infrastructure, capturing file movement and changes, and handling data archiving at the point of publication or project closure. A unique hash index associated with each project file collection and with each ELN are stored within the data catalog, to allow rapid retrieval of data, even following relocation of the files. By capturing data governance, we ensure that access and data reuse conditions are immediately visible to the searcher. The Data Catalog is also an integrated part of the WCM Data Core, which provides curated access to confidential data in a secure analytical environment. The catalog thus aids the administration of access within the Data Core, and the Data Core provides the functional link to providing secure access to the confidential data indexed in the catalog.