Loading
 

Managing Unstructured Data with Latent Semantic Indexing

Maciej Ceglowski
Lead Developer
National Institute for Technology and Liberal Education

Clara Yu
Director
National Institute for Technology and Liberal Education/CET

John L. Cuadrado
Consultant
National Institute for Technology and Liberal Education

Much of the digital content becoming available online lacks meaningful metadata descriptors, but metadata creation is both time-consuming and expensive. Using latent semantic indexing (LSI) techniques, the National Institute for Technology and Liberal Education (NITLE) have developed a search and archiving tool that is able to make inferences about document similarity from patterns of word use across a collection. These similarity values, in turn, allow the tool to assign the documents to categories based on their content. This procedure is language-neutral and fully automatic. While the tool is able to make use of existing metadata, it also can sort and organize raw documents with a high degree of accuracy, across databases, in centralized or distributed mode.

Web Link:
http://www.nitle.org/lsi.php

Handout:
Managing Unstructured Data with Latent Semantic Indexing

METS: A Status Report

Jerome McDonough
Digital Library Development Team Leader
New York University

This briefing will provide an update on the progress of the Metadata Encoding & Transmission Standard (METS) initiative, an effort to define a common format for encoding digital library objects and their metadata. It will include an overview of current software development efforts, and a discussion of METS profiles, which allow organizations to specify restrictions on the METS format for local applications.

Web Link:
http://www.loc.gov/standards/mets/

PowerPoint Presentation:
METS: A Status Report

The National Digital Information Infrastructure and Preservation Program: Challenges and Solutions

Laura E. Campbell
Associate Librarian for Strategic Initiatives
Library of Congress

The Library of Congress is now poised to enter the next phase of the National Information Infrastructure and Preservation Program (NDIIPP), which was created by federal legislation in December 2000 (PL 106-554). Preserving Our Digital Heritage; Plan for the National Digital Information Infrastructure and Preservation Program, A Collaborative Initiative of the Library of Congress (2002) resulted from almost two years of consultations with a broad range of stakeholder communities. The document was submitted to Congress in the fall of 2002 and accepted in early January 2003. In the coming year, we expect to build out the specifications of a proposed technical architecture, invest in pilot projects and experiments, and contribute to a basic research program together with other federal agencies in an effort led by the National Science FoundationÃŒs program in Digital Government.

This presentation will review progress to date and discuss the shape of the program for the next phase of work.

PowerPoint Presentation:
The National Digital Information Infrastructure and Preservation Program (NDIIPP): Challenges and Solutions

The National STEM Education Digital Library: A Progress Report

Lee Zia
Lead Program Director, NSDL Program
National Science Foundation

This session will provide a progress report on the National Science Foundation’s program to support the development of the National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL). To date three sets of grants have been made in three tracks: 1) Collections, 2) Services, and 3) Targeted Research. In addition, a Core Integration activity is developing the “technical and organizational glue” to bind distributed users with distributed collections and services. Members of the core integration team will also participate and provide an update on the ongoing work on access and authentication, and the NSDL’s sustainability efforts.

New Initiatives for Resource Description and Preservation Metadata

Priscilla Caplan
Assistant Director for Digital Library Services
State University System of Florida

Sally H. McCallum
Chief, Network Development and MARC Standards Office
Library of Congress

Rebecca S. Guenther
Network Development and MARC Standards Office
Library of Congress

A new working group, composed of representatives from across the digital preservation community, is being organized by OCLC and RLG. It is a follow-on to the OCLC/RLG Preservation Metadata Working Group, which developed a metadata framework to support the long-term retention of digital materials. The new working group will address implementation strategies for preservation metadata. The group will use the metadata framework developed by the first working group as a starting point, and extend this work to consider issues such as the development of a core set of implementable preservation metadata elements with associated data dictionary; evaluation of alternative strategies for encoding, storage, management, and exchange of preservation metadata; and the development of pilot projects for testing the group’s recommendations.

MARC, sitting on a NISO/ISO standard for record structures, has been a sound basis for the development of a very large automated bibliographic infrastructure globally. But the newer XML record structure provides a flexible environment for use and manipulation of data and, especially, for linking data. Providing an evolutionary pathway from MARC “classic” to MARC in an XML structure, and then developing new approaches in the XML side is the topic of this session. A MARC Toolkit is being developed by the Library of Congress (with community collaboration) that contains data transformation components and enables use of Dublin Core, ONIX and other metadata in the MARC environment. It can help standardize the sometimes chaotic metadata landscape. The purpose and uses of a new simplified MARC companion on the XML side, MODS (Metadata Object Description Schema), will also be described.

 

Web Links:
http://www.oclc.org/research/pmwg/

http://www.loc.gov/marcxml

http://www.loc.gov/mods

OAI Metadata Harvesting and Institutional Repositories

Martin Halbert
Director, Library Systems
Emory University

Institutional planning is needed to prepare and coordinate policies associated with institutional repositories in a way that will facilitate discovery services based on metadata harvesting on the campus and national levels. This presentation will describe planning efforts being undertaken at Emory University as well as policy suggestions for consideration by decision makers involved in developing institutional respositories. The presentation will also outline the risks associated with not coordinating planning on repositories and metadata harvesting.

Online Publishing Use and Costs Evaluation Program, 2003

Christine Norman
Research Director, Online Use and Costs Evaluation Project
Columbia University

Kate Wittenberg
Diector, Electronic Publishing Initiative at Columbia
Columbia University

David Millman
Columbia University

The Andrew W. Mellon Foundation has awarded the Electronic Publishing Initiative at Columbia (EPIC) a cost and usage evaluation grant aimed at gaining a better understanding of how electronic resources affect scholarly communication. In particular, we are interested in how electronic resources are affecting academic presses, information technology personnel, librarians, faculty, and students.

This session will provide an update on the progress of the evaluation program, with specific focus on our multi-institution study of faculty and their use of electronic resources in research and teaching.

Web Links:
http://www.epic.columbia.edu

RedLightGreen Accelerates Research: RLG’s Union Catalog on the Web

Merrilee Proffitt
Program Officer, Member Initiatives
RLG

With generous funding from the Andrew W. Mellon Foundation, RLG began work in late 2001 looking at ways in which it could make the rich information held in the RLG Union Catalog available to a wider audience in a freely available Web environment. During the intervening time, we have learned volumes about what undergraduate users want from online information resources; how data mining software can uncover valuable new information hidden in the RLG Union Catalog; how to provide access to a wealth of complex information through a simple, easy-to-use interface; new opportunities for using bibliographic data to help end users find authoritative sources of research information; what is involved in the complicated process of transforming MARC records to XML; and incorporating concepts outlined in the Functional Requirements for Bibliographic Records (FRBR), an emerging standard for distinguishing between various editions.

This presentation will feature an explanation of the history, motivations, “lessons learned,” and future directions of the project, a live demo of the pilot system, and outtakes of use studies.

Web Link:
http://www.rlg.org/redlightgreen/index.html

PowerPoint Presentation:
RedLightGreen: RLG Accelerates Undergraduate Research

The Role of Incentives in Digital Archiving

Brian Lavoie
Research Scientist
OCLC, Inc.

Economics is fundamentally about incentives, so a study of the economics of digital preservation should begin with an examination of the incentives to preserve. Securing the long-term viability and accessibility of digital materials requires an appropriate allocation of incentives among key decision-makers in the digital preservation process. But the circumstances under which digital preservation takes place often lead to a misalignment of preservation objectives and incentives. Identifying circumstances in which insufficient incentives to preserve are likely to prevail, and how this can be remedied, are necessary first steps in developing economically sustainable digital preservation activities.

Handout:
The Role of Incentives in Digital Archiving

Shibboleth and the Management of Content: Be Careful What You Ask For . . .

Ken Klingenstein
Project Manager, Internet2 Middleware Initiative, Chief Technologist
University of Colorado, Boulder

Shibboleth is an Internet2 initiative to develop and deploy a middleware interinstitutional authentication and authorization infrastructure. It has produced working open-source code that is being widely adopted by campuses and content providers as a new tool for access control. Shibboleth also raises an interesting set of issues, from alignment of content licenses with institutional middleware to role-based access control opportunities.

This session will provide an update on Shibboleth and invite discussion on the policy issues being exposed in its deployments.

Web Links:
http://shibboleth.internet2.edu