Loading
 

Collaborative Digital Collection Building: Lessons Learned

Mark E. Phillips
Head, Digital Projects Unit
University of North Texas
Robert M. Johnson
Vice President for Information Services & CIO
Rhodes College
Suzanne Bonefas
Director of Special Projects
Rhodes College
Stacy Pennington
Associate Database Analyst
Rhodes College

Creating a Culture of Collaboration: Collaborative Models for Digital Libraries in a Heterogeneous Institutional Landscape (Phillips)

In 2005, the Texas State Library and Archives Commission, the University of North Texas (UNT), and other partners across the state received an Institute of Museum and Library Services (IMLS) National Leadership grant to develop a multi-component federated search tool on behalf of the Texas Heritage Digitization Initiative. In order for this project to be successful, the project team had to negotiate not only technical challenges, such as interoperability protocols, but also cultural challenges that threatened the collaborative structure of the initiative. Ultimately, a variety of models were adopted, depending on the type, needs, and capabilities of individual institutions.

In this presentation, the broad landscape of digitization of cultural materials in Texas will be described, as well as three scenarios for collaboration that have been incorporated into a single point of access for end users: the Portal to Texas History, a multi-institutional repository hosted at UNT; an OAI harvester and content syndication service, also managed at UNT; and a network of content providers loosely linked through a normalized federated search application managed by the Texas State Library and Archives Commission. These scenarios provide sufficient flexibility and extensibility to accommodate a nearly unlimited number of additional institutions and content types. By providing options rather than mandating solutions, we have increased participation and fostered a collaborative culture among various types of institutions and institutional environments.

http://texashistory.unt.edu
http://www.texasheritageonline.org/

Crossroads to Freedom: Institutionalizing a Digital Collection (Bonefas, et al.)

Crossroads to Freedom is a digital archive of materials related to the civil rights era in Memphis, TN. Its primary goals are (1) to support conversation in Memphis about the impact of the civil rights era on our community today, and (2) to provide Rhodes College students with the opportunity to participate in this conversation by creating, maintaining and managing the archive. The archive is built on Fedora, and contains a variety of media types, including digital video (interviews), short text documents (letters, memos, etc.), book-length documents, and images.

A brief background on this project will be provided as a starting point for discussion of lessons learned from collaborative digital collection building. In particular, connecting a digital library project to an institutional mission in partnership with a local (and non-virtual) community will be discussed. Issues facing small colleges striving to adhere to accepted standards for creation and maintenance of large-scale digital library projects, will also be raised, and there will be discussion regarding strategies for overcoming resource limitations. These include local partnerships, outsourcing, connection of the archive to the academic program, and the role of undergraduate students.

http://www.crossroadstofreedom.org/

 

Copyright and Large Scale Digitization: Implications for Access

Merrilee Proffitt
Program Officer, RLG Programs
OCLC Programs and Research
Constance Malpas
Program Officer, RLG Programs
OCLC Programs and Research

RLG Programs has done a series of short interviews with partner institutions in order to understand how and why copyright evidence is being gathered, how evidence is recorded, and what actions are taken based on the resulting information. Institutions were selected to include those who were participating in large scale digitization efforts (Google Book, the Open Content Alliance, or Microsoft), and institutions who were known to be interested in copyright status for other reasons (in support of publications, etc). A handful of partners discussed their practice in regard to mass digitization efforts. Preliminary findings suggest that most of these institutions are currently limiting attention to materials published in the United States before 1923, and that activities in this area are focused on post-digitization selection for access.

Library-based copyright assessment exercises focused on the “low-hanging fruit” (US imprints published before 1923), which will affect only a small portion of the system-wide book collection. Previous studies have indicated that as many as half of all books in the system-wide library collection were published after 1977, suggesting that comprehensive conversion efforts will produce a body of digitized text that is in large part subject to copyright protection. In order to maximize the impact of their collective investment in copyright assessment and local online access initiatives, libraries may choose to prioritize work around titles that are not widely held or that represent unique research content. This session will examine what is known about the scope and range of the 18% of system-wide library book collections that were published prior to 1923, based on a sample of titles held at North American research libraries. A comparison will be made between the characteristics of public domain and in-copyright titles in the sample. With this information in hand, it should be possible to foster a productive dialogue about how library-based access initiatives can enrich and complement the discovery and delivery services provided by digitization partners, publishers and other agencies.

 

Creating an Integrated Digital Library Based on the Fedora Platform

Susan Schreibman
Assistant Dean and Head of Digital Collections & Research
University of Maryland at College Park
David Kennedy
Applications Developer
University of Maryland at College Park

The University of Maryland Libraries launched its Digital Library Repository in July 2007 based on the Fedora platform. While two collections were mounted in Fedora about a year after implementation began, it took an additional 18 months to release an integrated repository which supports both federated searching across collections, as well as the development of individual ’boutique’ collections with their own interfaces, search parameters, and browse functionality. This session will focus on how these goals informed the development of an extensible framework that was flexible enough to accommodate multiple object types (images, full-text TEI and EAD, moving images, and audio) but which also supported cross-object retrieval.

The session will explore not only the technical issues faced, but the organizational issues in mounting a relatively complex architecture with limited resources. It will also discuss the various metadata schemes used in the system, as well as the system architecture, including the development of an API, object and content classifications, web services, and the development of the administrative interface. Lastly issues of migration, authentication and archival storage will be discussed.

http://www.lib.umd.edu/digital/

Handout (MS Word)

PowerPoint Presentation

 

Data Management and Accessibility: Initiatives in the Biological Sciences

Zack E. Murrell
Director of the Southeast Regional Network of Expertise and Collections
Appalachian State University
Chris Hodge
Coordinator, SunSITE, Customer Technology Support
University of Tennessee
Robert K. Peet
Professor and Chair
University of North Carolina, Chapel Hill
Greg Riccardi
Professor of Information
University of Tennessee
Thomas Garnett
Assistant Director for Digital Libraries
Smithsonian Institution
Indra Neil Sarkar
Informatics Manager, MBLWHOI Library
Semantics Manager, EOL Biodiversity Informatics
Marine Biological Laboratory

Building a Distributed Digital Repository of Biological Data: Special Challenges(Murrell, et al)

Owing to the richness of the Earth’s biodiversity, the life sciences have historically been focused primarily on innovation and discovery rather than standardization. The emerging semantic Web and concomitant social networking applications are now enabling biodiversity scientists to develop and employ standards that allow unambiguous communication about types of organisms, their attributes, and distributions. However, the enormous amount of non-standardized legacy data has now become an impediment at a time when the life sciences are gathering and analyzing data at regional and continental scales to address such critical issues such as biodiversity conservation, habitat fragmentation and global warming.

The process of identifying types of organisms, whether in the wild or as species and museum specimens, is at the crux of the development of standards. The scientific community needs to reach a consensus regarding the distribution of information in a format that is standard compliant. In addition to these specific challenges, this community also faces the same challenges other digital initiatives encounter: the conversion of legacy data, digitization of a wide variety of media, the interweaving of resources, long-term storage and preservation of large datasets, and shared management of distributed assets. The Southeast Regional Network of Expertise and Collections (SERNEC) is a National Science Foundation (NSF) funded Research Coordination Network (RCN) that organizes a community to facilitate achievement of these goals at a realizable yet powerful scale. This regional consortium is a “virtual community” of herbarium curators that can be a model for other life science networks developing around the world. This grassroots network provides an electronic federated database that ideally seeks to disseminate all organism occurrences and attribute data in a compliant format, and provide critical tools for integrating large datasets with organisms identified using divergent taxonomic standards. This goal will be reached by working in concert with the larger global arena, led by the international efforts of the Taxonomic Database Working Group (TDWG) or Biodiversity Information Standards.

The SERNEC virtual community is supported by the image and digital object repository of the NSF funded Morphbank project. The Morphbank repository provides tools that allow users to collect and annotate objects in order to illustrate the complex relationships between them. Morphbank facilitates interactions among the virtual community members by providing a shared environment for capturing discussions about the underlying biological issues. As this database is built, it will be reviewed by the collective taxonomic expertise of this virtual community, resulting in an increasingly accurate portrayal of the biogeography of this region.

The SERNEC virtual community is able to leverage the expertise of information scientists, social scientists, educators, and artists, as well as the region’s curators, to ensure data is reliable and authoritative. As collaborations within this virtual community grow, and as innovations such as interactive keys and mapping of biotic and abiotic components of the landscape develop, we will provide complex information in intuitively understandable fashion to various user groups, and therefore stimulate interest in plant systematics and biogeography. On a larger scale, this cooperative network will be capable of addressing the critical worldwide issues of habitat destruction, species loss, and global climate change.

http://www.sernec.org/
http://www.herbarium.unc.edu/seflora/firstviewer.htm

Handout (MS Word)

PowerPoint Presentation

The Biodiversity Heritage Library: A Knowledge Domain Enterprise (Garnett & Sarkar)

Ten major natural history museum libraries, botanical libraries, and research institutions are joined in a collaborative effort to digitize legacy biodiversity literature in an open access manner. From this partnership grew the Biodiversity Heritage Library (BHL) project. The partners envision that any research scientist or student who has access to the Internet, located anywhere in the world, will be able to search for specific information in all of the literature relevant to biodiversity and transparently link into relevant taxonomic, geographic, or other useful databases. Such a tool would erase much of the expensive, labor-intensive work of library research and speed the production of research results many times over.

Why digitize this literature? The ten partner libraries collectively hold a substantial part of the world’s published knowledge on biological diversity. Yet this wealth of knowledge is available only to those few who can gain direct access to these collections. This body of biodiversity knowledge is thus effectively sequestered from wide use for a broad range of applications, including research, education, taxonomic study, biodiversity conservation, protected area management, disease control, and maintenance of diverse ecosystems services. Much of this published literature is rare or has limited global distribution and is available in only a few select libraries. From a scholarly perspective, these collections are of exceptional value because the domain of systematic biology depends — more than any other science — upon historic literature. To positively identify a rare specimen, a working biologist may have to consult a 100 year-old text because that was the last time the organism was found, recorded, and described. Building on existing tools and services developed at the Universal Biological Indexer and Organizer (uBio) to index organism names (NameBank; which contains over 10 million name strings) and their associated hierarchies (ClassificationBank; which contains over 80 classifications), “taxonomic intelligence” will be integrated into the documents immediately as they are digitized using an established named entity recognition tool, TaxonFinder. The integration of taxonomic intelligence, via links from name strings located within each text file generated will enable linkages to other relevant indexed content other web-accessible name-based sources. The types of organisms that are associated with each digitized document will be characterized using the taxonomic groupings reflected in ClassificationBank. This will include the generation of descriptive statistics pertaining to organisms relative to other comparative axes (e.g., temporal or geographic). Ultimately, a complete list of name strings as they appear in each digitized document, reconciled contemporary form of name string, and any other relevant metadata will be incorporated into the index files and used to facilitate knowledge integration across a range of relevant biological databases.

http://www.morphbank.net/
http://glaucon.sunsite.utk.edu/shc/
http://www.ubio.org

PowerPoint Presentation

 

Data-Cyberinfrastructure Collaboration at the University of California, San Diego

Brian Schottlaender
University Librarian
University of California, San Diego
Robert H. McDonald
Director, Strategic Data Alliances
University of California, San Diego

In recent years both the US National Science Foundation (NSF) and the American Council of Learned Societies (ACLS) have released reports on the coming need for appropriate cyberinfrastructure for engineering, sciences, social sciences and humanities disciplines. An important question for any research library is where and how they will fit into the cyberinfrastructure model that is native to their campus, their university system, and to their national and international partnerships.

At the University of California, San Diego (UCSD), the University Libraries are actively pursuing this agenda by working collaboratively with the San Diego Supercomputer Center to build an intersect of personnel, expertise, and services to provide long-term preservation of and access to research data that enables domain scientists and researchers to carry-out longitudinal complex data analysis to support interdisciplinary research. This critical partnership is providing new opportunities to the UCSD community and when linked with opportunities being developed for a University of California (UC) system-wide grid service platform, it will truly transform the way discovery and access intersect at UCSD and within the UC system. Presenters will describe the collaborative model, identify benefits and challenges, give a brief outline of the projects already underway, expose the key components of the technical architecture, and discuss future plans.

http://dpi.sdsc.edu
http://chronopolis.sdsc.edu
http://libraries.ucsd.edu

 

dbGaP: Linking Clinical Information with Genetic Data

Jeffrey Beck
National Center for Biotechnology Information (NCBI)
National Library of Medicine

To support investigator access to data from genome-wide association study initiatives at the National Institutes of Health and elsewhere, the National Center for Biotechnology Information (NCBI) at the National Library of Medicine has created the database of Genotypes and Phenotypes (dbGaP), a public repository for individual-level phenotype, exposure, genotype, and sequence data, and the associations between them.

dbGaP provides unprecedented access to these large-scale genetic and phenotypic datasets, including public access to study documents linked to summary data on specific phenotype variables, statistical overviews of the genetic information, positions of published associations on the genome, and authorized access to individual-level data.

http://www.ncbi.nlm.nih.gov/sites/entrez?db=gap
http://www.ncbi.nlm.nih.gov/entrez/query/Gap/gap_tmpl/about.html


 

Digital Humanities Centers as Cyberinfrastructure

John Unsworth
Dean, Graduate School of Library and Information Science
University of Illinois at Urbana-Champaign
Mark Lawrence Kornbluh
Director of MATRIX
Michigan State University
Neil Fraistat
Director of the Maryland Institute of Technology (MITH) & Professor
University of Maryland at College Park
Katherine L. Walter
Professor and Chair of Digital Initiatives and Special Collections
Director, Center for Digital Research in the Humanities (CDRH)
University of Nebraska at Lincoln

The American Council of Learned Societies’ (ACLS) report on Cyberinfrastructure in the Humanities and Social Sciences is focused on the technologies needed to advance the study and interpretation of the “messy and idiosyncratic realm of human experience.” In doing so, the ACLS report outlines eight recommendations. In this session, moderator John Unsworth will consider each of the recommendations and will highlight the critical contributions of digital humanities centers in fulfilling them.

A roundtable discussion by digital humanities center directors will follow Unsworth’s presentation. The range of activities, services, and missions of these centers will be discussed.

http://www.acls.org/cyberinfrastructure/index.htm

 

The eCrystals Federation: Open Data Repositories Supporting Open Science

Liz Lyon
Director, UKOLN
University of Bath
Manjula Patel
Research Officer
University of Bath
Simon Coles
Manager, National Crystallography Service
University of Southhampton

This presentation will describe the background of the eCrystals Federation, which is based on eBank foundations, and which, in partnership with OAI-ORE, will implement a federation of open data repositories for crystallography. An application profile, based on the Dublin Core standard, has been developed by the eBank-UK project for the initial construction of a federation of crystallography data repositories. Building on this initial deployment the eCrystals Federation will then work closely with the OAI-ORE team, as part of a wider international consortium funded by Microsoft to support the whole lifecycle of chemistry research data, to develop a more effective protocol for interoperability between data repositories and the construction of tools and third party services based on the data federation.

The groundwork to establish a repository network will be described together with the evolving advocacy program, links with third party services such as data centres, publishers, learned societies, and preservation/sustainability activity. Continuing developments in partnership with the Digital Curation Centre are investigating suitable curation strategies, the creation of preservation metadata, and the application of the DRAMBORA Toolkit as a means of self-assessment to the crystallography repositories. Emerging data policy issues for institutions will be explored and some challenges for the future will be presented.

http://www.ncs.chem.soton.ac.uk/projects/ecrystals/

Handout (MS Word)

Presentation (PPT)

 

Editorial Curation and Identity Management in Digital Libraries: A Case Study of the NSDL

Kate Wittenberg
Director, Electronic Publishing Initiative
Columbia University
David Millman
Director, Research and Development, Academic Information Systems
Columbia University

The National Science Digital Library (NSDL) was established by the National Science Foundation in 2000 as an online library that directs users to exemplary resources in science, technology, engineering, and mathematics education and research. The Columbia University NSDL Core Integration group is working in three areas that provide new models for digital library development within the NSDL: editorial enhancement of publishers’ content, virtual learning worlds in a digital library context, and Community Sign-On as an access management strategy.

The first area explores the potential value of integrating editorial commentary with published research to create teaching resources. The second explores embedding online games within an educational digital library environment. The third area examines new models for identity management through Community Sign-On, an implementation of the Internet2 Shibboleth system for federated identity management. While we anticipated challenges in technology coordination, we have found that our work has also tested deep institutional assumptions about process, business models and policy. It raises questions about privacy, scholarly participation, intellectual property and emerging social network conventions.

http://www.nsdl.org

PowerPoint Presentation

 

Explorative Search and the Library Catalog

Birte Christensen-Dalsgaard
Director of Development
State and University Library

Summa, the search system of the State and University Library in Aarhus, Denmark, addresses many of the issues raised in a number of recently published studies, including the observation that the traditional library catalog cannot compete with other services when it comes to explorative search, as well as the realization that the catalog is only suitable as a localization tool for known items. The challenge is to develop an application which will satisfy the users’ expectations for a modern search system.

Summa is an open source system implementing modular, service-based architecture. It is based on the fundamental idea “free the content from the proprietary library systems,” where the discovery layer is separated from the business layer. In doing so, any Internet technology can be used without the limitations traditionally set by proprietary library systems, and there is the flexibility to integrate or to be integrated into other systems. A first version of a Fedora – Summa integration has been developed.

Summa is the search system behind a national service offering music to all citizens registered as patrons at a Danish library, and it also provides the search interface for library material at the State and University Library, Aarhus. Presently the search system indexes approximately eight million items, of which two million are books from the library catalog and the other six million records are provided by other sources. Many libraries in Europe have expressed an interest in the software and as a result of this demand the code will be open sourced. The system is still being developed but the release is to encourage the creation of a community to support the development of an open source approach to integrated search.

The talk will present the ideas behind Summa, the architecture, and the plans. Examples will be shown to demonstrate the potential for this kind of approach.

http://www.statsbiblioteket.dk/summa
http://www.bibliotekernesnetmusik.dk