CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • CNI Collaborations
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Program Plan
    • Current Program Plan
    • Program Plan Archive
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • Publications by CNI Staff
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Follow CNI
    • Historical Resources
  • Contact Us

Data Management and Accessibility: Initiatives in the Biological Sciences

Home / Project Briefing Pages / CNI Fall 2007 Project Briefings / Data Management and Accessibility: Initiatives in the Biological Sciences

December 13, 2007

Zack E. Murrell
Director of the Southeast Regional Network of Expertise and Collections
Appalachian State University
Chris Hodge
Coordinator, SunSITE, Customer Technology Support
University of Tennessee
Robert K. Peet
Professor and Chair
University of North Carolina, Chapel Hill
Greg Riccardi
Professor of Information
University of Tennessee
Thomas Garnett
Assistant Director for Digital Libraries
Smithsonian Institution
Indra Neil Sarkar
Informatics Manager, MBLWHOI Library
Semantics Manager, EOL Biodiversity Informatics
Marine Biological Laboratory

Building a Distributed Digital Repository of Biological Data: Special Challenges (Murrell, et al)

Owing to the richness of the Earth’s biodiversity, the life sciences have historically been focused primarily on innovation and discovery rather than standardization. The emerging semantic Web and concomitant social networking applications are now enabling biodiversity scientists to develop and employ standards that allow unambiguous communication about types of organisms, their attributes, and distributions. However, the enormous amount of non-standardized legacy data has now become an impediment at a time when the life sciences are gathering and analyzing data at regional and continental scales to address such critical issues such as biodiversity conservation, habitat fragmentation and global warming.

The process of identifying types of organisms, whether in the wild or as species and museum specimens, is at the crux of the development of standards. The scientific community needs to reach a consensus regarding the distribution of information in a format that is standard compliant. In addition to these specific challenges, this community also faces the same challenges other digital initiatives encounter: the conversion of legacy data, digitization of a wide variety of media, the interweaving of resources, long-term storage and preservation of large datasets, and shared management of distributed assets. The Southeast Regional Network of Expertise and Collections (SERNEC) is a National Science Foundation (NSF) funded Research Coordination Network (RCN) that organizes a community to facilitate achievement of these goals at a realizable yet powerful scale. This regional consortium is a “virtual community” of herbarium curators that can be a model for other life science networks developing around the world. This grassroots network provides an electronic federated database that ideally seeks to disseminate all organism occurrences and attribute data in a compliant format, and provide critical tools for integrating large datasets with organisms identified using divergent taxonomic standards. This goal will be reached by working in concert with the larger global arena, led by the international efforts of the Taxonomic Database Working Group (TDWG) or Biodiversity Information Standards.

The SERNEC virtual community is supported by the image and digital object repository of the NSF funded Morphbank project. The Morphbank repository provides tools that allow users to collect and annotate objects in order to illustrate the complex relationships between them. Morphbank facilitates interactions among the virtual community members by providing a shared environment for capturing discussions about the underlying biological issues. As this database is built, it will be reviewed by the collective taxonomic expertise of this virtual community, resulting in an increasingly accurate portrayal of the biogeography of this region.

The SERNEC virtual community is able to leverage the expertise of information scientists, social scientists, educators, and artists, as well as the region’s curators, to ensure data is reliable and authoritative. As collaborations within this virtual community grow, and as innovations such as interactive keys and mapping of biotic and abiotic components of the landscape develop, we will provide complex information in intuitively understandable fashion to various user groups, and therefore stimulate interest in plant systematics and biogeography. On a larger scale, this cooperative network will be capable of addressing the critical worldwide issues of habitat destruction, species loss, and global climate change.

http://www.sernec.org/
http://www.herbarium.unc.edu/seflora/firstviewer.htm

Handout (MS Word)

PowerPoint Presentation

The Biodiversity Heritage Library: A Knowledge Domain Enterprise (Garnett & Sarkar)

Ten major natural history museum libraries, botanical libraries, and research institutions are joined in a collaborative effort to digitize legacy biodiversity literature in an open access manner. From this partnership grew the Biodiversity Heritage Library (BHL) project. The partners envision that any research scientist or student who has access to the Internet, located anywhere in the world, will be able to search for specific information in all of the literature relevant to biodiversity and transparently link into relevant taxonomic, geographic, or other useful databases. Such a tool would erase much of the expensive, labor-intensive work of library research and speed the production of research results many times over.

Why digitize this literature? The ten partner libraries collectively hold a substantial part of the world’s published knowledge on biological diversity. Yet this wealth of knowledge is available only to those few who can gain direct access to these collections. This body of biodiversity knowledge is thus effectively sequestered from wide use for a broad range of applications, including research, education, taxonomic study, biodiversity conservation, protected area management, disease control, and maintenance of diverse ecosystems services. Much of this published literature is rare or has limited global distribution and is available in only a few select libraries. From a scholarly perspective, these collections are of exceptional value because the domain of systematic biology depends — more than any other science — upon historic literature. To positively identify a rare specimen, a working biologist may have to consult a 100 year-old text because that was the last time the organism was found, recorded, and described. Building on existing tools and services developed at the Universal Biological Indexer and Organizer (uBio) to index organism names (NameBank; which contains over 10 million name strings) and their associated hierarchies (ClassificationBank; which contains over 80 classifications), “taxonomic intelligence” will be integrated into the documents immediately as they are digitized using an established named entity recognition tool, TaxonFinder. The integration of taxonomic intelligence, via links from name strings located within each text file generated will enable linkages to other relevant indexed content other web-accessible name-based sources. The types of organisms that are associated with each digitized document will be characterized using the taxonomic groupings reflected in ClassificationBank. This will include the generation of descriptive statistics pertaining to organisms relative to other comparative axes (e.g., temporal or geographic). Ultimately, a complete list of name strings as they appear in each digitized document, reconciled contemporary form of name string, and any other relevant metadata will be incorporated into the index files and used to facilitate knowledge integration across a range of relevant biological databases.

http://www.morphbank.net/
http://glaucon.sunsite.utk.edu/shc/
http://www.ubio.org

PowerPoint Presentation

 

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)

Filed Under: CNI Fall 2007 Project Briefings
Tagged With: CNI2007fall, Project Briefings & Plenary Sessions

Last updated:  Monday, February 24th, 2020

 

Contact Us

21 Dupont Circle
Suite 800
Washington, DC, 20036
202.296.5098

Contact us
Copyright © 2023 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo

A joint project