CNI Spring 1995 Task Force Meeting Summary Report
May 8, 1995
Introduction
The Coalition for Networked Information Spring 1995 Task Force
Meeting was held in Washington, D.C. on April 10-11, 1995. The
theme of the meeting was "Digital Library Research and
Development." Paul Evan Peters, Coalition Executive Director,
opened the meeting with some comments on the "digital library," a
phrase that has replaced "virtual library" as the term of choice
for the ultimate result of the transition of scientific and
scholarly communication and publication from a system geared
primarily to producing, distributing, and using information in
print and other analog formats to a system geared to network and
other digital formats. Peters stated that while most early
digital libraries are being built to manage digitized versions of
things that were already available in analog formats, e.g. books,
periodicals, and sound and video recordings, he believes that over
time, an increasing number of digital libraries will be built to
manage "digital" rather than "digitized" information. The
"information objects" managed by this emergent class of digital
libraries will be much more like "experiences" than they will be
like "things," and each reader will have a unique experience with
each such object in an even more profound sense than is already
the case.
Peters commented that the meeting had been organized for attendees
to explore a number of questions related to digital libraries:
What will digital information objects and their libraries look
like? What will their libraries contain and how will things get
into them? How will clients find things in them? How will they
interoperate, assuming they will? Who will be responsible for
building them, and how will they be funded, managed, and governed?
What will be their scope: individual, departmental,
institutional, region, national, global...or supernatural?
NSF / ARPA / NASA Digital Library Program
The opening session featured representatives from the three
federal agencies who are sponsoring a four-year, $24.4 million
joint initiative on digital libraries. The projects' focus is to
dramatically advance the means to collect, store, and organize
information in digital forms, and make it available for searching,
retrieval, and processing via communication networks, all in user-
friendly ways.
The six research projects funded through the joint initiative of
the National Science Foundation (NSF), the Department of Defense
Advanced Research Projects Agency (ARPA), and the National
Aeronautics and Space Administration (NASA) are centered at
Carnegie Mellon University; the University of California,
Berkeley; the University of Michigan; the University of Illinois;
the University of California, Santa Barbara; and, Stanford
University. Each effort brings together researchers and users from
the local university with those from other organizations including
other academic institutions, libraries, museums, publishers,
government laboratories, state agencies, secondary schools, and
computer and communications companies.
Stephen Griffin, Program Manager, National Science Foundation,
provided an overview of the projects, which are a mix of
experimental testbeds and prototypes. From the perspective of
NSF, the program goals are to:
- Advance fundamental research over a large set of interdisciplinary topics;
- Develop and demonstrate new digital library technologies through
experimental testbeds and prototyping;
- Build new applications and services; and,
- Establish community presence and influence by becoming the "premier"
effort in digital libraries and through broad participation by a diverse
set of client groups.
Griffin also identified five research areas that NSF feels are
fundamental to the development of digital libraries:
- Capturing data of all forms (text, images, video, etc.) and
descriptive information about that data (metadata);
- Categorizing, organizing, and combining large volumes of electronic
information in a variety of forms and formats;
- Developing software and algorithms for data exploration and
manipulation, and for combining large volumes of various types of
information;
- Developing tools, protocols, and procedures for advancing the
utilization of networked knowledge bases distributed around the Nation
and around the world; and,
- Studying the impact of these technologies on individuals, organizations,
sectors, and society at large.
Nand Lal, Manager of Digital Library Technology Project, Goddard
Space flight Center, noted that NASA has an interest in digital
library technologies as a developer of content and as a consumer
of information. Satellites will be sending down 1/4 terabyte of
information per day in the near future. This makes NASA
interested in new technologies that will enable them to manage
these data better. NASA's involvement in digital library research
and development will benefit the agency in performing its
engineering and science mission, and in its public access and
outreach functions. NASA also feels that substantial advances in
technology will be necessary to make the National Information
Infrastructure (NII) a reality. Lal stated that a digital library
includes the functionality of a traditional library, but is more
than simply a digitized version of the same. It is a collection
of information resources and services (accessible via the NII)
that allows a subscriber easy and timely access to useful
information and knowledge at a reasonable cost.
Lal concluded with what he sees as the management challenges of
digital library development: the adoption of, and adherence to,
appropriate standards; the establishment of metrics for user
satisfaction; the demonstration of scalability; and, performance.
He stated that in a totally distributed environment with a large
spectrum of users consulting a large spectrum of information
content, these will be great challenges.
Glenn Ricart, Program Manager, Advanced Research Projects Agency
(ARPA), currently on leave from the University of Maryland,
College Park, described ARPA's working hand-in-hand with NSF and
NASA on digital library initiatives as an outgrowth of the NREN
legislation. ARPA's view is that in addition to having
information technology and applications, we need an information
technology enterprise for the emerging economy. The National
Information Enterprise (NIE) is the ARPA program focus that
combines ubiquitous networking with services that link to
applications, particularly in national priority areas. ARPA's
major emphases are in service areas, e.g. authenticating and
synchronizing large caches of information. They are interested in
specific projects that deal with the tough questions of copyright
and electronic commerce.
Ricart identified a number of key issues that need to be addressed
in the development of digital libraries: technologies for
locating documents; developing shared, distributed, long-lived
repositories; strategies for document translation and interchange;
scalable registration/recordation; and, rights management systems.
Digital Library Issues
Following the panel by the federal agency representatives, William
Arms, Corporation for National Research Initiatives (CNRI),
provided an overview of digital library technical issues and
terminology as an outgrowth of work being conducted by CNRI
through the Computer Science Technical Reports Project and the
Digital Library Forum. He identified eight key points that need
to be considered as digital libraries develop. They are:
- The technical framework exists in a legal framework (digital library
architectures must take into account such issues as intellectual property,
obscenity, and communications law);
- Architecture needs to separate aspects that depend upon content
(identifiers and security are characteristics that are independent of
content; text and computer programs are dependent on content);
- Names and identifiers are basic to the digital library (and should
include such properties as a location independent name, globally unique,
persistent across time);
- Digital library objects are more than collections of bits (they have
attachments to the content (bits) such as properties, transaction log,
and signature);
- Repositories must look after the information they hold (by supplying
handles, transaction records, and security);
- The digital library object that is used is different from the stored
object (users receive the result of executing a program such as SGML or
interact with a database);
- Users want intellectual works, not digital objects ("report" refers
to groups of objects in a digital library and the grouping depends on
the context); and,
- Understanding of digital library concepts is hampered by terminology
(terms such as "document" have such strong social, professional, legal,
or technical connotations that they obstruct discussion in this environment).
Arms' points were discussed informally by many meeting attendees
throughout the conference. For further information on this set of
issues, Arms referred attendees to
http://www.cnri.reston.va.us/home/cstr.html
and also stated that the ideas are being developed in a forthcoming
working paper by Robert Kahn and Robert Wilensky.
NSF / ARPA / NASA Projects
Each of the six NSF/DARPA/NASA projects was also the subject of a
Project Briefing. "The Stanford Digital Library Project" was
presented by Andreas Paepcke, Senior Research Fellow and DLI
Project Manager, Stanford University, Vicky Reich, Information
Access Analyst, Stanford University, and Rebecca Lasher, Head
Librarian and Bibliographer, Math and Computer Science Library,
Stanford University. The goal of the Stanford Digital Library
project is to develop the enabling technologies for a single,
integrated and "universal" library, composed from the large
numbers of emerging individual heterogeneous repositories of
publication-related services. Their definition of a constituent
repository includes everything from personal information
collections to the collections in conventional libraries and large
data collections shared by scientists.
"The University of Michigan Digital Library Program" was presented
by Randall Frank, Executive Director of Information Technology,
College of Engineering and School of Information and Library
Studies, University of Michigan, Wendy P. Lougee, Director,
Campus-Wide Digital Library Program, University of Michigan, and
Michael Wellman Assistant Professor, Department of Electrical
Engineering and Computer Science, University of Michigan. Their
project represents a coordinated program of experimental research
and deployment of a digital library for earth and space science.
The multi-disciplinary research team is developing an agent
architecture which distributes information retrieval tasks for a
highly heterogeneous set of collections. Intellectual property
issues are being addressed as well as a computational economy
developed.
"The Alexandria Digital Library" was presented by Michael F.
Goodchild, Associate Director of Alexandria Digital Library, and
Professor of Geography, University of California, Santa Barbara.
The project focuses on spatially indexed information, initially
maps and images, and on the problems that need to be solved to
make them accessible in the digital libraries of the future.
Besides maps and images, Alexandria will incorporate other types
of spatially indexed information such as text and photographs, and
accommodate a range of user and query types.
"Building a Digital Library for the Engineering Community" was
presented by William H. Mischo, Head, Grainger Engineering Library
Information Center, University of Illinois at Urbana-Champaign and
Ann P. Bishop, Assistant Professor, School of Library and
Information Science, University of Illinois at Urbana-Champaign.
The project will build a large-scale digital library testbed,
planned to grow to 100,000 users and 100,000 documents. The goal
of the project is to bring professional quality search and display
to Internet information services. The testbed collection consists
of articles from engineering and science journals and magazines,
obtained in SGML format directly from major partners in the
publishing industry. Extensive evaluation of the nature and
extent of testbed use will be based on ethnographic observation of
engineering work teams, interviews, usability testing, surveys,
and system instrumentation.
"The CMU Informedia Digital Video Library," was presented by
Howard D. Wactlar, Vice Provost for Research Computing and
Associate Dean, School of Computer Science, Carnegie Mellon
University and Scott M. Stevens, Software Engineering Institute,
Carnegie Mellon University. The project is developing new
technologies for creating full-content search and retrieval
digital video libraries. Working in collaboration with WQED
Pittsburgh, the project is creating a testbed that will enable
users to access, explore, search and retrieve educational, sports
and entertainment materials from the digital video library. One
of the most interesting research aspects of the project is the
development of automatic, intelligent mechanisms to populate the
library through integrated speech, image, and language
understanding.
"The UC Berkeley Digital Environmental Library" was presented by
Nancy Van House, Acting Dean, School of Library and Information
Studies, University of California, Berkeley. The testbed system
that they are constructing provides widespread online public
access to environmental information. The environmental data
included is of wide-ranging scientific, political, educational,
and economic interest and it involves a broad range of object
types such as texts, images, video, numeric data, and software.
Networked Information Resource Discovery and Retrieval (NIDR)
Avra Michelson, Digital Libraries Department, MITRE Corporation,
introduced the plenary panel on advances in networked information
resource discovery and retrieval (NIDR). She is part of a team,
which includes Clifford Lynch, Director, Library Automation,
University of California, Office of the President, Craig
Summerhill, Systems Coordinator and Program Officer, Coalition for
Networked Information, and Cecelia Preston, who are developing a
Coalition white paper on the topic of networked information
resource discovery and retrieval. Clifford Lynch set some context
for the panel by describing the Coalition's white paper
initiative, which began in the fall of 1994 with the objectives of
framing the major research problems in the NIDR area and
suggesting where standards work might be fruitful. The four
chapters of the paper will include: introductory material,
architectural issues, content issues (metadata), and a discussion
that looks beyond the current framework and discusses extensions
that will be needed as software becomes more autonomous.
Lynch stated that the NIDR "problem" has two components. The
first is discovery, which covers a large spectrum of activities,
e.g. searching, organizing, browsing, selecting among items, and
ranking items. The second component, retrieval, is sometimes
narrowly viewed as the act of downloading information to a
workstation but it should have the broader meaning of making use
of information resources.
At present, Lynch stated, NIDR is considered as a graft-on to the
existing uncontrolled, independent world of Internet resources.
He asked, "When will we see information spaces develop that
integrate NIDR as part of their basic architectural design?" The
CNI paper will examine the idea of tools defining information
spaces as, for example, Gopher defines Gopherspace. Lynch
identified several other issues that will be addressed in the CNI
paper. First, there will need to be an increased emphasis on
selection and ranking of information resources in the networked
environment. Discovery is not simply a process of inundating the
user with candidate resources. Second, the developing mix of free
and for fee information resources on the network has implications
for the existing and future framework of NIDR tools. Information
retrieval protocols will have to become substantially richer to
accommodate the needs of pricing objects. He stated that simple
ftp models will become an increasing liability for the next
generation of NIDR. Third, a basic issue in the problem
definition of NIDR is the current conception that humans are
directly in command of the process, e.g. typing in search
commands. At the same time, we all have visions of worlds that go
way beyond this, worlds in which searching is facilitated by
various types of software agents, and a world in which we can link
disparate information resources together. It may be that beyond
retrieval, the next goal of NIDR is interoperability: linking a
remote collection of information organizationally with a local
resource. The CNI NIDR team has been struck by the difference
between the immediate goals of many tools and the future world,
which is much more mediated by software.
A draft of the first chapter of the NIDR white paper is up on the
CNI server and the team hopes to produce a full draft by Fall.
The paper will be discussed with various communities and by
attendees at the Fall Task Force Meeting.
Harvest
Michael Schwartz, Associate Professor, Department of Computer
Science, University of Colorado, spoke about Harvest, an
efficient, community-tailored resource discovery tool. He began
his presentation with a critique of current navigational tools,
e.g. Archie, Veronica, Web robots, and WAIS. He noted that none
of those tools has a community or topical focus; they all have
poor scaling characteristics; they use unstructured, low-quality
data; and, they have "hard-wired" search algorithms. The tool
that Schwartz has developed, Harvest, uses an efficient,
distributed gathering architecture coupled with topic/community
focused "Brokers." Harvest addresses each of the problems
inherent in other resource discovery tools in various ways. Its
efficient Gatherer can run at a number of sites and an
administrator can configure the data that will be collected. A
sub-program can do selected text extraction, e.g. search only
titles, abstracts, etc. and uses much less space than a tool like
WAIS but delivers high precision and recall. It includes a plug-
and-play index/engine in each Broker and its architecture does not
limit it to text. It uses network-aware caching and replication
for scalable access.
A key feature of Harvest is its network efficiency. It has the
potential to greatly alleviate the network bottlenecks which
develop when particular objects or particular servers become very
popular with network users.
Sample Brokers have been built with computer science technical
reports, the SEC EDGAR files, and Web Homepages. Schwartz is now
beginning to work on supporting more powerful environments than
the unstructured, anarchic content of much of current Internet.
He is interested in integrating commercial search and retrieval
engines, billing and encryption systems, content markup tools,
Z39.50 and other query interfaces into Harvest. More information
is available at:
http://harvest.cs.colorado.edu/
Portfolio
Ann Mueller, Technical Manager, Stanford University described
Portfolio, an enterprise-wide information management system
prototyped at Stanford in 1994 and developed jointly by librarians
and information technologists. The project provides an
infrastructure for the institution's distributed computing
architecture. It is an example of a multi-faceted information
system, including information on the institution's faculty,
computing resources, library (including links to the UC's MELVYL
catalog); information on the local community, and links to
Internet resources throughout the world. The developers seeded
the collection with 400 resources and now have 3,000 internal and
external resources. Decisions on what will be included in
Portfolio are made by information providers and subject
specialists, who provide initial information about objects which
is then augmented by library catalogers. Mueller noted that while
the full potential for the use of metadata in this framework has
not yet been realized, each item does have a metadata profile and
the system uses WAIS for indexing.
A key attribute of this initiative is that it takes disparate
resources and services and treats them as a single entity,
presenting them in a consistent and flexible presentation manner.
The Portfolio developers are confident that they can adapt this
system to the next generation of information clients and adapt to
new information and delivery paradigms.
Daniel Keys Moran
Noted science fiction author, Daniel Keys Moran, gave a
thoughtful and entertaining after-lunch talk. Moran is the author
of the science fiction series The Tales of the Continuing Time, a
projected thirty novels spanning the birth and death of the
universe. The series to date consists of Emerald Eyes, The Long
Run, and The Last Dancer, with Players: The AI War due in 1995 or
early 1996.
Moran described that his life has been defined by his earliest
memory, the astronauts' landing on the Moon. He has also been
influenced by authors who have discussed the difference between
data and information, specifically that data does not have a
message but information does. He stated that we are drowning in
data and swimming in information and we need a way to bridge the
chasm between data beyond information and into knowledge.
Moran read from his current novel in which humans become second
class citizens in the network they build, and artificial
intelligence (AI) agents are the first class citizens. In this
world, more events are taking place in the network than in the
"real"world.
Moran noted that as a science fiction writer, he is finding it
difficult to stay ahead of the curve of technological development.
He invited the audience to look into the future together, where
technology will be bigger, better, cheaper, and more colorful than
ever. He asked, "Where do we want to go as a culture?"
Project Briefings and Synergy Sessions
In addition to the six Project Briefings by the NSF/DARPA/NASA
projects, a number of sessions highlighted building blocks of the
distributed digital library: "The ISI Electronic Library Pilot
Project," presented by Jacqueline Trolley, Institute for
Scientific Information; "Electronic Dissemination of Physics
Journals and Technical Reports on Campus Networks," presented by
Laurie Stackpole and Roderick Atkinson of the Naval Research
Laboratory and Robert Kelly of the American Physical Society; "An
Update on the TULIP Project" presented by Clifford Lynch of the
University of California, Office of the President, and Jaco
Zilstra, Elsevier Science Publishers; "Vatican Library Accessible
Worldwide," presented by Richard Cerreta, IBM Corporation;
"Partners in the Creation of a Worldwide Library," presented by
Sean Haggerty, Rob McKinney, and Scott Sutcliffe of SilverPlatter
Information; and, the "IBM Digital Library Initiative," presented
by Jon Prial, IBM Corporation.
Some briefings focused on current policy issues, including:
"Humanities from a National Perspective," presented by John
Hammer, National Humanities Alliance, George Farr, National
Endowment for the Humanities, Douglas Bennett, American Council of
Learned Societies, David Bearman, Archives & Museum Informatics,
and Charles Henry, Vassar College; and, "Long-Term Strategy for
the Development of Digital Libraries: Financial, Legal, and
Institutional Issues," presented by Brian Kahin, Harvard
University.
Some sessions focused on networked information projects for
particular communities of users, including: "The NSF Synthesis
Coalition's National Engineering Education Delivery System,"
presented by David Martin, Iowa State University; "Library of the
Future Project at LLNL," presented by Hilary Burton of Lawrence
Livermore National Laboratory; "Icarus, Pygmalian and Babbage:
New Technologies and Humanities Research," presented by Nancy Ide,
Association of Computers and the Humanities, and Charles Henry,
Vassar College; "The Humanities Scholar and the Digital Library,"
presented by Susan Hockey, Center for Electronic Texts in the
Humanities, David Chesnutt, University of South Carolina, and C.
M. Sperberg-McQueen, University of Illinois at Chicago;
"Stretching the Web: Early Experiences with Publishing Applied
Physics Letters Online," presented by Timothy Ingoldsby, American
Institute of Physics and W. Daviess Menefee, OCLC, Inc.; and
"OhioLINK - Statewide Cooperation - It's Not the Technology,
Stupid," presented by Judith Sessions, Miami University, Edward
Garten, University of Dayton, and Tom Sanville, OhioLINK.
New and ongoing projects were discussed in other sessions:
"HELIOS: The Heinz Electronic Archive," presented by David Evans
and Charles Lowry of Carnegie Mellon University; "NASA Public Use
of Earth and Space Science Data over the Internet," presented by
Nand Lal and Linda Hill of the Goddard Space Flight Center;
"Cataloging Internet Resources: OCLC Project Updates," presented
by Erik Jul, OCLC, Inc.; "The Morino Institute: Programs and
Strategies," presented by Kaye Gapen, Morino Institute; "Museum
Educational Site Licensing Project (MESL)," presented by Jennifer
Trant, Getty Art History Information Program, Steve Dietz,
National Museum of American Art, Sally Promey, University of
Maryland, and Clifford Lynch, University of California, Office of
the President; "Measuring the Impacts of Networking on the
Academic Environment," presented by Charles McClure and Cynthia
Lopata, Syracuse University; "Text Capture and Electronic
Conversion at the Library of Congress," presented by David
Williamson, Library of Congress; "Cost Centers and Measures in the
Networked Information Value-Chain," presented by Paul Evan Peters,
Coalition for Networked Information, and Mark Tesoriero and Robert
Ubell, Robert Ubell Associates; and, "Describing Image Files: An
Update," presented by Jennifer Trant, Getty Art History
Information Program, David Bearman, Archives and Museum
Informatics, Howard Besser, University of Michigan, and J. Dustin
Wees, Visual Resources Association.
Fall Meeting
The Fall 1995
Task Force Meeting will be held on Monday, October 30 and Tuesday,
October 31 in Portland, Oregon, immediately preceding the Educom Annual
Conference. The theme for the meeting will be "Campus / Community
Networking Partnerships."
Additional Information
Many documents from the Spring 1995 Task Force Meting are
available on the Coalition's Internet server. If you access the
Coalition's server by gopher, point your gopher client to
gopher.cni.org 70 and follow this series of menus:
Coalition FTP Archives (ftp.cni.org)
Coalition Task Force Meetings (/CNI/tf.meetings)
Spring, 1995 Meeting of the Coalition Task Force
If you choose to access the materials via NCSA Mosaic (or some
other browser) and WWW, you can use this URL to access a HTML
formatted document:
URL: http://www.cni.org/tfms/1995a.spring/
If you choose to access the materials via FTP, browse the
directory/CNI/tf.meetings/1995a.spring on the host ftp.cni.org.
If you need additional information, contact:
Joan K. Lippincott, Assistant Executive Director
Coalition for Networked Information
21 Dupont Circle
Washington, D.C. 20036
Voice: 202-296-5098
Fax: 202-296-0884
Internet: joan@cni.org
Note on Redistribution
You are encouraged to use this Summary Report to provide
information to interested individuals in your organization or
institution by, in part or in full, posting it to institutional
and organizational electronic distribution lists or incorporating
it into relevant newsletters, reports, and the like. Publishers
of periodicals and other materials that cover networks and
networked information are also encouraged to use this Summary
Report in similar ways.