CNI Spring 1995 Task Force Meeting Summary Report
May 8, 1995
The Coalition for Networked Information Spring 1995 Task Force Meeting was held in Washington, D.C. on April 10-11, 1995. The theme of the meeting was “Digital Library Research and Development.” Paul Evan Peters, Coalition Executive Director, opened the meeting with some comments on the “digital library,” a phrase that has replaced “virtual library” as the term of choice for the ultimate result of the transition of scientific and scholarly communication and publication from a system geared primarily to producing, distributing, and using information in print and other analog formats to a system geared to network and other digital formats. Peters stated that while most early digital libraries are being built to manage digitized versions of things that were already available in analog formats, e.g. books, periodicals, and sound and video recordings, he believes that over time, an increasing number of digital libraries will be built to manage “digital” rather than “digitized” information. The “information objects” managed by this emergent class of digital libraries will be much more like “experiences” than they will be like “things,” and each reader will have a unique experience with each such object in an even more profound sense than is already the case.
Peters commented that the meeting had been organized for attendees to explore a number of questions related to digital libraries: What will digital information objects and their libraries look like? What will their libraries contain and how will things get into them? How will clients find things in them? How will they interoperate, assuming they will? Who will be responsible for building them, and how will they be funded, managed, and governed? What will be their scope: individual, departmental, institutional, region, national, global…or supernatural?
NSF / ARPA / NASA Digital Library Program
The opening session featured representatives from the three federal agencies who are sponsoring a four-year, $24.4 million joint initiative on digital libraries. The projects’ focus is to dramatically advance the means to collect, store, and organize information in digital forms, and make it available for searching, retrieval, and processing via communication networks, all in user- friendly ways.
The six research projects funded through the joint initiative of the National Science Foundation (NSF), the Department of Defense Advanced Research Projects Agency (ARPA), and the National Aeronautics and Space Administration (NASA) are centered at Carnegie Mellon University; the University of California, Berkeley; the University of Michigan; the University of Illinois; the University of California, Santa Barbara; and, Stanford University. Each effort brings together researchers and users from the local university with those from other organizations including other academic institutions, libraries, museums, publishers, government laboratories, state agencies, secondary schools, and computer and communications companies.
Stephen Griffin, Program Manager, National Science Foundation, provided an overview of the projects, which are a mix of experimental testbeds and prototypes. From the perspective of NSF, the program goals are to:
- Advance fundamental research over a large set of interdisciplinary topics;
- Develop and demonstrate new digital library technologies through experimental testbeds and prototyping;
- Build new applications and services; and,
- Establish community presence and influence by becoming the “premier” effort in digital libraries and through broad participation by a diverse set of client groups.
Griffin also identified five research areas that NSF feels are fundamental to the development of digital libraries:
- Capturing data of all forms (text, images, video, etc.) and descriptive information about that data (metadata);
- Categorizing, organizing, and combining large volumes of electronic information in a variety of forms and formats;
- Developing software and algorithms for data exploration and manipulation, and for combining large volumes of various types of information;
- Developing tools, protocols, and procedures for advancing the utilization of networked knowledge bases distributed around the Nation and around the world; and,
- Studying the impact of these technologies on individuals, organizations, sectors, and society at large.
Nand Lal, Manager of Digital Library Technology Project, Goddard Space flight Center, noted that NASA has an interest in digital library technologies as a developer of content and as a consumer of information. Satellites will be sending down 1/4 terabyte of information per day in the near future. This makes NASA interested in new technologies that will enable them to manage these data better. NASA’s involvement in digital library research and development will benefit the agency in performing its engineering and science mission, and in its public access and outreach functions. NASA also feels that substantial advances in technology will be necessary to make the National Information Infrastructure (NII) a reality. Lal stated that a digital library includes the functionality of a traditional library, but is more than simply a digitized version of the same. It is a collection of information resources and services (accessible via the NII) that allows a subscriber easy and timely access to useful information and knowledge at a reasonable cost.
Lal concluded with what he sees as the management challenges of digital library development: the adoption of, and adherence to, appropriate standards; the establishment of metrics for user satisfaction; the demonstration of scalability; and, performance. He stated that in a totally distributed environment with a large spectrum of users consulting a large spectrum of information content, these will be great challenges.
Glenn Ricart, Program Manager, Advanced Research Projects Agency (ARPA), currently on leave from the University of Maryland, College Park, described ARPA’s working hand-in-hand with NSF and NASA on digital library initiatives as an outgrowth of the NREN legislation. ARPA’s view is that in addition to having information technology and applications, we need an information technology enterprise for the emerging economy. The National Information Enterprise (NIE) is the ARPA program focus that combines ubiquitous networking with services that link to applications, particularly in national priority areas. ARPA’s major emphases are in service areas, e.g. authenticating and synchronizing large caches of information. They are interested in specific projects that deal with the tough questions of copyright and electronic commerce.
Ricart identified a number of key issues that need to be addressed in the development of digital libraries: technologies for locating documents; developing shared, distributed, long-lived repositories; strategies for document translation and interchange; scalable registration/recordation; and, rights management systems.
Digital Library Issues
Following the panel by the federal agency representatives, William Arms, Corporation for National Research Initiatives (CNRI), provided an overview of digital library technical issues and terminology as an outgrowth of work being conducted by CNRI through the Computer Science Technical Reports Project and the Digital Library Forum. He identified eight key points that need to be considered as digital libraries develop. They are:
- The technical framework exists in a legal framework (digital library architectures must take into account such issues as intellectual property, obscenity, and communications law);
- Architecture needs to separate aspects that depend upon content (identifiers and security are characteristics that are independent of content; text and computer programs are dependent on content);
- Names and identifiers are basic to the digital library (and should include such properties as a location independent name, globally unique, persistent across time);
- Digital library objects are more than collections of bits (they have attachments to the content (bits) such as properties, transaction log, and signature);
- Repositories must look after the information they hold (by supplying handles, transaction records, and security);
- The digital library object that is used is different from the stored object (users receive the result of executing a program such as SGML or interact with a database);
- Users want intellectual works, not digital objects (“report” refers to groups of objects in a digital library and the grouping depends on the context); and,
- Understanding of digital library concepts is hampered by terminology (terms such as “document” have such strong social, professional, legal, or technical connotations that they obstruct discussion in this environment).
Arms’ points were discussed informally by many meeting attendees throughout the conference. For further information on this set of issues, Arms referred attendees to http://www.cnri.reston.va.us/home/cstr.html and also stated that the ideas are being developed in a forthcoming working paper by Robert Kahn and Robert Wilensky.
NSF / ARPA / NASA Projects
Each of the six NSF/DARPA/NASA projects was also the subject of a Project Briefing. “The Stanford Digital Library Project” was presented by Andreas Paepcke, Senior Research Fellow and DLI Project Manager, Stanford University, Vicky Reich, Information Access Analyst, Stanford University, and Rebecca Lasher, Head Librarian and Bibliographer, Math and Computer Science Library, Stanford University. The goal of the Stanford Digital Library project is to develop the enabling technologies for a single, integrated and “universal” library, composed from the large numbers of emerging individual heterogeneous repositories of publication-related services. Their definition of a constituent repository includes everything from personal information collections to the collections in conventional libraries and large data collections shared by scientists.
“The University of Michigan Digital Library Program” was presented by Randall Frank, Executive Director of Information Technology, College of Engineering and School of Information and Library Studies, University of Michigan, Wendy P. Lougee, Director, Campus-Wide Digital Library Program, University of Michigan, and Michael Wellman Assistant Professor, Department of Electrical Engineering and Computer Science, University of Michigan. Their project represents a coordinated program of experimental research and deployment of a digital library for earth and space science. The multi-disciplinary research team is developing an agent architecture which distributes information retrieval tasks for a highly heterogeneous set of collections. Intellectual property issues are being addressed as well as a computational economy developed.
“The Alexandria Digital Library” was presented by Michael F. Goodchild, Associate Director of Alexandria Digital Library, and Professor of Geography, University of California, Santa Barbara. The project focuses on spatially indexed information, initially maps and images, and on the problems that need to be solved to make them accessible in the digital libraries of the future. Besides maps and images, Alexandria will incorporate other types of spatially indexed information such as text and photographs, and accommodate a range of user and query types.
“Building a Digital Library for the Engineering Community” was presented by William H. Mischo, Head, Grainger Engineering Library Information Center, University of Illinois at Urbana-Champaign and Ann P. Bishop, Assistant Professor, School of Library and Information Science, University of Illinois at Urbana-Champaign. The project will build a large-scale digital library testbed, planned to grow to 100,000 users and 100,000 documents. The goal of the project is to bring professional quality search and display to Internet information services. The testbed collection consists of articles from engineering and science journals and magazines, obtained in SGML format directly from major partners in the publishing industry. Extensive evaluation of the nature and extent of testbed use will be based on ethnographic observation of engineering work teams, interviews, usability testing, surveys, and system instrumentation.
“The CMU Informedia Digital Video Library,” was presented by Howard D. Wactlar, Vice Provost for Research Computing and Associate Dean, School of Computer Science, Carnegie Mellon University and Scott M. Stevens, Software Engineering Institute, Carnegie Mellon University. The project is developing new technologies for creating full-content search and retrieval digital video libraries. Working in collaboration with WQED Pittsburgh, the project is creating a testbed that will enable users to access, explore, search and retrieve educational, sports and entertainment materials from the digital video library. One of the most interesting research aspects of the project is the development of automatic, intelligent mechanisms to populate the library through integrated speech, image, and language understanding.
“The UC Berkeley Digital Environmental Library” was presented by Nancy Van House, Acting Dean, School of Library and Information Studies, University of California, Berkeley. The testbed system that they are constructing provides widespread online public access to environmental information. The environmental data included is of wide-ranging scientific, political, educational, and economic interest and it involves a broad range of object types such as texts, images, video, numeric data, and software.
Networked Information Resource Discovery and Retrieval (NIDR)
Avra Michelson, Digital Libraries Department, MITRE Corporation, introduced the plenary panel on advances in networked information resource discovery and retrieval (NIDR). She is part of a team, which includes Clifford Lynch, Director, Library Automation, University of California, Office of the President, Craig Summerhill, Systems Coordinator and Program Officer, Coalition for Networked Information, and Cecelia Preston, who are developing a Coalition white paper on the topic of networked information resource discovery and retrieval. Clifford Lynch set some context for the panel by describing the Coalition’s white paper initiative, which began in the fall of 1994 with the objectives of framing the major research problems in the NIDR area and suggesting where standards work might be fruitful. The four chapters of the paper will include: introductory material, architectural issues, content issues (metadata), and a discussion that looks beyond the current framework and discusses extensions that will be needed as software becomes more autonomous.
Lynch stated that the NIDR “problem” has two components. The first is discovery, which covers a large spectrum of activities, e.g. searching, organizing, browsing, selecting among items, and ranking items. The second component, retrieval, is sometimes narrowly viewed as the act of downloading information to a workstation but it should have the broader meaning of making use of information resources.
At present, Lynch stated, NIDR is considered as a graft-on to the existing uncontrolled, independent world of Internet resources. He asked, “When will we see information spaces develop that integrate NIDR as part of their basic architectural design?” The CNI paper will examine the idea of tools defining information spaces as, for example, Gopher defines Gopherspace. Lynch identified several other issues that will be addressed in the CNI paper. First, there will need to be an increased emphasis on selection and ranking of information resources in the networked environment. Discovery is not simply a process of inundating the user with candidate resources. Second, the developing mix of free and for fee information resources on the network has implications for the existing and future framework of NIDR tools. Information retrieval protocols will have to become substantially richer to accommodate the needs of pricing objects. He stated that simple ftp models will become an increasing liability for the next generation of NIDR. Third, a basic issue in the problem definition of NIDR is the current conception that humans are directly in command of the process, e.g. typing in search commands. At the same time, we all have visions of worlds that go way beyond this, worlds in which searching is facilitated by various types of software agents, and a world in which we can link disparate information resources together. It may be that beyond retrieval, the next goal of NIDR is interoperability: linking a remote collection of information organizationally with a local resource. The CNI NIDR team has been struck by the difference between the immediate goals of many tools and the future world, which is much more mediated by software.
A draft of the first chapter of the NIDR white paper is up on the CNI server and the team hopes to produce a full draft by Fall. The paper will be discussed with various communities and by attendees at the Fall Task Force Meeting.
Michael Schwartz, Associate Professor, Department of Computer Science, University of Colorado, spoke about Harvest, an efficient, community-tailored resource discovery tool. He began his presentation with a critique of current navigational tools, e.g. Archie, Veronica, Web robots, and WAIS. He noted that none of those tools has a community or topical focus; they all have poor scaling characteristics; they use unstructured, low-quality data; and, they have “hard-wired” search algorithms. The tool that Schwartz has developed, Harvest, uses an efficient, distributed gathering architecture coupled with topic/community focused “Brokers.” Harvest addresses each of the problems inherent in other resource discovery tools in various ways. Its efficient Gatherer can run at a number of sites and an administrator can configure the data that will be collected. A sub-program can do selected text extraction, e.g. search only titles, abstracts, etc. and uses much less space than a tool like WAIS but delivers high precision and recall. It includes a plug- and-play index/engine in each Broker and its architecture does not limit it to text. It uses network-aware caching and replication for scalable access.
A key feature of Harvest is its network efficiency. It has the potential to greatly alleviate the network bottlenecks which develop when particular objects or particular servers become very popular with network users.
Sample Brokers have been built with computer science technical reports, the SEC EDGAR files, and Web Homepages. Schwartz is now beginning to work on supporting more powerful environments than the unstructured, anarchic content of much of current Internet. He is interested in integrating commercial search and retrieval engines, billing and encryption systems, content markup tools, Z39.50 and other query interfaces into Harvest. More information is available at:http://harvest.cs.colorado.edu/
Ann Mueller, Technical Manager, Stanford University described Portfolio, an enterprise-wide information management system prototyped at Stanford in 1994 and developed jointly by librarians and information technologists. The project provides an infrastructure for the institution’s distributed computing architecture. It is an example of a multi-faceted information system, including information on the institution’s faculty, computing resources, library (including links to the UC’s MELVYL catalog); information on the local community, and links to Internet resources throughout the world. The developers seeded the collection with 400 resources and now have 3,000 internal and external resources. Decisions on what will be included in Portfolio are made by information providers and subject specialists, who provide initial information about objects which is then augmented by library catalogers. Mueller noted that while the full potential for the use of metadata in this framework has not yet been realized, each item does have a metadata profile and the system uses WAIS for indexing.
A key attribute of this initiative is that it takes disparate resources and services and treats them as a single entity, presenting them in a consistent and flexible presentation manner. The Portfolio developers are confident that they can adapt this system to the next generation of information clients and adapt to new information and delivery paradigms.
Daniel Keys Moran
Noted science fiction author, Daniel Keys Moran, gave a thoughtful and entertaining after-lunch talk. Moran is the author of the science fiction series The Tales of the Continuing Time, a projected thirty novels spanning the birth and death of the universe. The series to date consists of Emerald Eyes, The Long Run, and The Last Dancer, with Players: The AI War due in 1995 or early 1996.
Moran described that his life has been defined by his earliest memory, the astronauts’ landing on the Moon. He has also been influenced by authors who have discussed the difference between data and information, specifically that data does not have a message but information does. He stated that we are drowning in data and swimming in information and we need a way to bridge the chasm between data beyond information and into knowledge.
Moran read from his current novel in which humans become second class citizens in the network they build, and artificial intelligence (AI) agents are the first class citizens. In this world, more events are taking place in the network than in the “real”world.
Moran noted that as a science fiction writer, he is finding it difficult to stay ahead of the curve of technological development. He invited the audience to look into the future together, where technology will be bigger, better, cheaper, and more colorful than ever. He asked, “Where do we want to go as a culture?”
Project Briefings and Synergy Sessions
In addition to the six Project Briefings by the NSF/DARPA/NASA projects, a number of sessions highlighted building blocks of the distributed digital library: “The ISI Electronic Library Pilot Project,” presented by Jacqueline Trolley, Institute for Scientific Information; “Electronic Dissemination of Physics Journals and Technical Reports on Campus Networks,” presented by Laurie Stackpole and Roderick Atkinson of the Naval Research Laboratory and Robert Kelly of the American Physical Society; “An Update on the TULIP Project” presented by Clifford Lynch of the University of California, Office of the President, and Jaco Zilstra, Elsevier Science Publishers; “Vatican Library Accessible Worldwide,” presented by Richard Cerreta, IBM Corporation; “Partners in the Creation of a Worldwide Library,” presented by Sean Haggerty, Rob McKinney, and Scott Sutcliffe of SilverPlatter Information; and, the “IBM Digital Library Initiative,” presented by Jon Prial, IBM Corporation.
Some briefings focused on current policy issues, including: “Humanities from a National Perspective,” presented by John Hammer, National Humanities Alliance, George Farr, National Endowment for the Humanities, Douglas Bennett, American Council of Learned Societies, David Bearman, Archives & Museum Informatics, and Charles Henry, Vassar College; and, “Long-Term Strategy for the Development of Digital Libraries: Financial, Legal, and Institutional Issues,” presented by Brian Kahin, Harvard University.
Some sessions focused on networked information projects for particular communities of users, including: “The NSF Synthesis Coalition’s National Engineering Education Delivery System,” presented by David Martin, Iowa State University; “Library of the Future Project at LLNL,” presented by Hilary Burton of Lawrence Livermore National Laboratory; “Icarus, Pygmalian and Babbage: New Technologies and Humanities Research,” presented by Nancy Ide, Association of Computers and the Humanities, and Charles Henry, Vassar College; “The Humanities Scholar and the Digital Library,” presented by Susan Hockey, Center for Electronic Texts in the Humanities, David Chesnutt, University of South Carolina, and C. M. Sperberg-McQueen, University of Illinois at Chicago; “Stretching the Web: Early Experiences with Publishing Applied Physics Letters Online,” presented by Timothy Ingoldsby, American Institute of Physics and W. Daviess Menefee, OCLC, Inc.; and “OhioLINK – Statewide Cooperation – It’s Not the Technology, Stupid,” presented by Judith Sessions, Miami University, Edward Garten, University of Dayton, and Tom Sanville, OhioLINK.
New and ongoing projects were discussed in other sessions: “HELIOS: The Heinz Electronic Archive,” presented by David Evans and Charles Lowry of Carnegie Mellon University; “NASA Public Use of Earth and Space Science Data over the Internet,” presented by Nand Lal and Linda Hill of the Goddard Space Flight Center; “Cataloging Internet Resources: OCLC Project Updates,” presented by Erik Jul, OCLC, Inc.; “The Morino Institute: Programs and Strategies,” presented by Kaye Gapen, Morino Institute; “Museum Educational Site Licensing Project (MESL),” presented by Jennifer Trant, Getty Art History Information Program, Steve Dietz, National Museum of American Art, Sally Promey, University of Maryland, and Clifford Lynch, University of California, Office of the President; “Measuring the Impacts of Networking on the Academic Environment,” presented by Charles McClure and Cynthia Lopata, Syracuse University; “Text Capture and Electronic Conversion at the Library of Congress,” presented by David Williamson, Library of Congress; “Cost Centers and Measures in the Networked Information Value-Chain,” presented by Paul Evan Peters, Coalition for Networked Information, and Mark Tesoriero and Robert Ubell, Robert Ubell Associates; and, “Describing Image Files: An Update,” presented by Jennifer Trant, Getty Art History Information Program, David Bearman, Archives and Museum Informatics, Howard Besser, University of Michigan, and J. Dustin Wees, Visual Resources Association.
The Fall 1995 Task Force Meeting will be held on Monday, October 30 and Tuesday, October 31 in Portland, Oregon, immediately preceding the Educom Annual Conference. The theme for the meeting will be “Campus / Community Networking Partnerships.”
Many documents from the Spring 1995 Task Force Meting are available on the Coalition’s Internet server. If you access the Coalition’s server by gopher, point your gopher client to gopher.cni.org 70 and follow this series of menus:
Coalition FTP Archives (ftp.cni.org) Coalition Task Force Meetings (/CNI/tf.meetings) Spring, 1995 Meeting of the Coalition Task Force
If you choose to access the materials via NCSA Mosaic (or some other browser) and WWW, you can use this URL to access a HTML formatted document:
If you choose to access the materials via FTP, browse the directory/CNI/tf.meetings/1995a.spring on the host ftp.cni.org.
If you need additional information, contact:
Joan K. Lippincott, Assistant Executive Director Coalition for Networked Information 21 Dupont Circle Washington, D.C. 20036 Voice: 202-296-5098 Fax: 202-296-0884 Internet: firstname.lastname@example.org
Note on Redistribution
You are encouraged to use this Summary Report to provide information to interested individuals in your organization or institution by, in part or in full, posting it to institutional and organizational electronic distribution lists or incorporating it into relevant newsletters, reports, and the like. Publishers of periodicals and other materials that cover networks and networked information are also encouraged to use this Summary Report in similar ways.