Intellectual Preservation and Electronic Intellectual
Property
by Peter S. Graham
ABSTRACT
Preserving intellectual property means protecting it from easy change in
electronic form. Change can be accidental, well-intended or fraudulent;
protection must be for terms longer than human lifetimes. Three possible
solutions for authenticating electronic texts are described: encryption (least
useful), hashing, and (with the most potential) digital time-stamping, which
can fix document existence at a point in time using public techniques.
INTRODUCTION
This conference is concerned with means of protecting intellectual property in
the networked environment. This paper will focus on the authenticity of
electronic information content, that is, on intellectual preservation.
[1]
The concern with authentication arises from the concerns of librarianship,
which has the imperatives of identifying information on behalf of users and of
providing it to them, intact, when they need it. The professional paradigm
librarians speak of is that they acquire information, organize it, make it
available and preserve it. The paradigm is appropriate for electronic
information just as for print over the last several centuries.
For printed texts preservation of the work has meant preservation of the
artifact that contains the work. Indeed, for most people there has been no
distinction between the book and the text, though the more sophisticated
analytical bibliographers and librarians have discussed that distinction for
some decades. But now, in the electronic environment, the work (which may be a
text or may be graphic, numeric or multimedia information) can migrate from
medium to medium and has no necessary residence on any one of them. The
preservation of the work independent of its medium takes on importance in its
own right.
Librarians have as their professional responsibility the serving up of the
information placed in their custody as true to its original intellectual
content as they can. This conference's concern is with protection of
intellectual property, a related concern. Such protection must extend not only
to intellectual rights over the property, but to the property itself: how can
we preserve information content from unauthorized, intentional or accidental
change? The exercise of property rights includes purchase and sale. Both the
buyer and seller have an interest in the property being what it is said to be,
that is, in authenticating the property or text. Authentication is an interest
of librarians as well.
Barry Neavill, a professor at the library school at the University of Alabama,
wrote presciently almost ten years ago that no one had "addressed the issue of
the long-term survival of information. . . . The survival of information in an
electronic environment becomes an intellectual and technological problem in its
own right."[2] If we want to assure permanence of the
intellectual record that is published electronically, he said, then it will be
necessary consciously to design and build the required mechanisms within
electronic systems. We are still in need of those mechanisms.
To address this need, this paper is in two parts. First, it will briefly
describe some of the issues associated with preservation of the objects
containing electronic information: medium preservation. Second, it will
discuss the challenge of intellectual preservation, or the protection
and authentication of information which exists in electronic form. Several
potential methods of electronic preservation will be described, and one will be
recommended for further attention.
THE MEDIUM--AND ITS PRESERVATION
In the electronic environment it is unlikely that a focus of critical study
will be upon the electronic medium itself. To begin with, there is nothing
in an electronic text that necessarily indicates how it was created; and the
ease with which electronic texts can be transferred from disk to disk, or
networked from computer to computer, means that there is no necessary
indication of the source medium or even if the information has been copied
at all. We are not likely to see sale catalog references in the future,
therefore, which remark on the fine quality of the floppy disk's exterior
label, or which remark on the electronic text's provenance ("Moby Dick
on the original Seagate drive; never reformatted, very fine").
[3]
The preservation of the information will still require the preservation of
whatever medium it is contained on at any given time. This is mostly what has
been meant up to now when electronic preservation has been discussed. But there
is another kind of preservation required for information media: not only the
preservation of the physical medium on which the information resides, but the
preservation of the storage technology that makes use of that medium.[4]
The physical preservation of media do not need extensive address here, for at
any given time the physical characteristics of the medium in use are well
understood and the problems inherent in preserving it are simply financial and
managerial: Who should pay for the necessary equipment and for the properly
designed and acclimatized space, how often should backups be made, and who
keeps track of backups and sees that they happen? These issues cause expenses
for the electronic collection, but they raise only routine technological
questions.[5]
The storage obsolescence problem is quite another matter. A brief sequence of
storage media many of us have seen in our lifetimes would include:
- punched cards*, in at least three formats (80-column, 90-col, 96-col);
- 7-track half-inch tape* (at densities of 200, 556 and 800 bits per inch);
- 9-track half-inch tape* for mainframes, with various recording modes and
densities up to 3200 bpi and beyond;
- 9-track half-inch tape cassettes* for mainframes ("square tapes", as they are
known in contradistinction to the earlier "round tapes");
- RAMAC disk storage;
- magnetic drum storage;
- data cell drives*;
- removable disk packs*;
- Winchester (sealed removable) disk packs*;
- mass storage devices (honeycombs of high-density tape spindles);
- sealed disk drives;
- floppy disks* of 3 sizes so far; and at least 3 storage densities so far;
- cartridge tapes* of very high density (e.g. Exabyte) for use in workstation
backups and data storage;
- removable disk storage media on PCs;
- laser-encoded disks* (CD-ROMs and laser disks);
- magneto-optical disks*, both WORM (write-once-read-many) and rewritable.
* = considered by some to have long-term storage potential
Some of the storage options appearing now and in the near future include new
floppy disk sizes and storage densities, and "flash cards" (PCMCIA), or memory
cards for use with very small computers. One sees discussion of storage
crystals, encoded by laser beams and having the advantage of great capacity
without moving parts, and probably even as stable as good paper.
Technologies are superseding each other at a rapid rate. We know that authors
and agencies are now storing long-term information on floppy disks of all
sizes, but we don't know for how long we are going to be able to read them. No
competent authorities yet express confidence in the long-term storage
capabilities or technological life of any present electronic storage medium.
CD-ROMs are an example. Their economical use in librarianship derives from
their mass market use for entertainment; that mass market may be threatened by
DVI (digital video interactive) technology, by DAT technology, or by others now
being actively promoted by entertainment vendors. If forms alternative to CDs
win out in entertainment, the production of equipment for CDs and therefore
CD-ROMs will be quickly curtailed.
There are perhaps three possible long-term solutions for preserving storage
media in the face of obsolescence (as opposed to physical decay), and they vary
in practicality: preserve the storage technology, migrate the information to
newer technologies, or migrate the information to paper or other long-term
eye-readable hard copy.
The prospect for the first option, preserving older technologies, is not
bright: equipment ages and breaks, documentation disappears, vendor support
vanishes, and the storage medium as well as the equipment deteriorates.
The second option is migration. Most character-based data could be preserved by
migrating it from one storage medium to another as they become decrepit or
obsolete. To do this requires a computer which can read in the old mode and
write in the new; with present network capabilities, this is usually not
difficult to arrange.
Whether "refreshing" data is practical for large quantities of information over
long periods of time is another matter. The present view of the Commission on
Preservation and Access is expressed in a report entitled Preservation of
New Technology by Michael Lesk (see fn. 4). His view is that "refreshing"
is the necessary and essential means of preserving information as media
obsolesce; I do not believe it will be possible for more than a fraction of
recorded information. The investment necessary to migrate files of data will
involve skilled labor, complex record-keeping, physical piece management,
checking for successful outcomes, space and equipment. A comparable library
data migration cost and complexity at approximately this order of magnitude
would be the orderly photocopying of books in the collection every five years.
This is not practical. In any case, this migration solution will only work
easily for ASCII text data. Migrating graphic, image, moving or sound data, or
even formatted text, will only work as long as the software application can
also be migrated to the next computing platform.
The third option -- practical but unexciting -- is to migrate information from
high-technology electronic form to stable hard copy, either paper or microform.
In the near term, for certain classes of high-value archival material, this is
likely to be the permanent medium of choice. It offers known long life, eye
readability and freedom from technological obsolescence. It also, of course,
discards the flexibility in use and transport of information in electronic
form. But until we have long-term stable electronic storage media, it offers
the medium preservation mode most likely to be used.
THE MESSAGE--AND ITS PRESERVATION
The Problem
The more challenging problem is intellectual preservation -- preserving not
just the medium on which information is stored, but the information itself.
Electronic information must be dealt with separately from its medium, much more
so than with books, as it is so easily transferable. The great asset of digital
information is also its great liability: the ease with which an identical copy
can be quickly and flawlessly made is paralleled by the ease with which a
flawed copy may be undetectably made. Barry Neavill wrote in 1984 of the
"malleability" of electronic information, that is, its ability to be easily
transformed and manipulated.[6] For an author or
information provider concerned with the integrity of their documents, there are
new problems in electronic forms that were not present in print.
The issue may be framed by asking several questions which confront the user of
an electronic document (which may be a text or may be graphic, numeric or
multimedia information, for the problems are similar). How can I be sure that
what I am reading is what I want? How do I know that the document I have found
is the same one that you read and made reference to in your bibliography? How
can I be sure that the document I am using has not been changed since you
produced it, or since the last time I read it? How can I be sure that the
information you sell me is that which I wanted to buy? To put it most
generally: How can a reader be sure that the document being used is the one
intended?
We properly take for granted the fixity of text in the print world: the printed
journal article I examine because of the footnote you gave is beyond question
the same text that you read, and it is the same one that the author proofread
and approved. Therefore we have confidence that our discussion is based upon a
common foundation. The present state of electronic texts is such that we no
longer can have that confidence.
Taxonomy of Changes
Let us examine three possibilities of change or damage which electronic texts
can undergo that confront us with the need for intellectual preservation:
- accidental change;
- intended change that is well-meant;
- intended change that is not well-meant; that is, fraud.
Accidental change
A document can sometimes be damaged accidentally, perhaps by data loss during
transfer or through inadvertent mistakes in manipulation. For example, data may
be corrupted in being sent over a network or between disks and memory on a
computer; this happens seldom, but it is possible.
More likely is the loss of sections of a document, or a whole version of a
document, due to accidents in updating. For example, if a document exists in
multiple versions, or drafts, the final version might be lost leaving only the
previous version; many of us have had this experience. It is easy for the
reader or author not to notice that text had been lost in this way.
Just as common in word-processing is the experience of incorrectly updating the
original version that was supposed to be retained in pristine form. In such a
case only an earlier draft (if it still exists) and the incorrectly updated
version remain. Again, a reader or author may not be aware of the corruption.
Note that in both cases backup mechanisms and the need for them are not the
issue, but rather how we know what we have or don't have.
Intended change -- well-meaning
There are at least three possibilities for well-meaning change. The change
might result in a specific new version; the change might be a structural update
that is normal and expected; or the change might be the normal outcome of
working with an interactive document.
New versions and drafts are familiar to those of us who create authorial texts,
for example, or to those working with legislative bills, or with revisions of
working papers. It is desirable to keep track bibliographically of the
distinction between one version and another.
In the past we have been accustomed to drafts being numbered and edition
statements being explicit. We are accustomed to visual cues to tell us when a
version is different; in addition to explicit numbering we observe the page
format, the typos, the producer's name, the binding, the paper itself. These
cues are no longer dependable for distinguishing electronic versions, for they
can vary for identical informational texts when produced in hard copies. It is
for this reason that the Text Encoding Initiative Guidelines Project has called
for indications of version change in electronic texts even if a single
character has been changed. [7 ]
It is important to know the difference between versions so that our discussion
is properly founded. Harvey Wheeler, a professor at the University of Southern
California, is enthusiastic about what he calls a "dynamic document,"
continually reflecting the development of an author's thinking.[8] But scholars and readers need to know what the changes are
and when they are made. Authors have an interest in their intellectual
property. There is a sense in which the scholarly community has an interest in
this property as well, at least to the extent of being able properly to
identify it.
Structural updates, changes that are inherent in the document, also cause
changes in information content. A dynamic data base by its nature is frequently
updated: Books in Print, for example, or a university directory ("White
Pages"). Boilerplate such as a funding proposal might also be updated often by
various authors. In each of these cases it is appropriate and expected for the
information to change constantly.[9] Yet it is also
appropriate for the information to be shared and analyzed at a given point in
time. In print form, for example, BIP gives us a historical record of
printing in the United States; the directory tells us who was a member of the
university in a given year. In electronic form there is no historical record
unless a snapshot is taken at a given point in time. How do we identify that
snapshot and authenticate it at a later time?[10]
Another form of well-meaning change occurs in interactive documents. Consider
the note-taking capabilities of the Voyager Extended Books, and the interactive
HyperCard novels.[11] We can expect someone to want
snapshots of these documents, inadequate though they may be. We need an
authoritative way to distinguish one snapshot from another.
Intended change -- fraud
The third kind of change that can occur is intentional change for fraudulent
reasons. The change might be of one's own work, to cover one's tracks or change
evidence for a variety of reasons, or it might be to damage the work of
another. In an electronic future the opportunities for a Stalinist revision of
history will be multiplied. An unscrupulous researcher could change
experimental data without a trace. A financial dealer might wish to cover
tracks to hide improper business, or a political figure might wish to hide or
modify inconvenient earlier views.
Imagine that the only evidence of the Iran-Contra scandal was in electronic
mail, or that the only record of Bill Clinton's draft correspondence was in
e-mail. Consider the political benefit that might derive if each of the parties
could modify their own past correspondence without detection. Then consider the
case if each of them could modify the other's correspondence without
detection. We need a defense against both cases.
Solutions
The solution is to fix a text or document in some way so that a user can be
sure of the original text when it is needed. This solution is called
authentication. There are three important electronic techniques proposed for
authentication: encryption, hashing and digital time-stamping. While encryption
offers a form of data security, only hashing and digital time-stamping are
useful for long-term scholarly communication and for providing protection
against change of an intellectual creation.
Encryption
The two best-known forms of encryption are DES and RSA. DES is the Data
Encryption Standard, first established about 1975 and adopted by many business
and government agencies. RSA is an encryption process developed by three
mathematicians from MIT (Rivest, Shamir and Adleman) at about the same time,
and marketed privately. It is regarded by many as superior to the Data
Encryption Standard.[12]
Encryption depends upon mathematical transformation of a document. The
transformation uses an algorithm requiring a particular number as the basis of
the computation. This number, or key, is also required to decode the resulting
encrypted text; the key is typically many digits long, perhaps 100 or more.
Modern encryption depends upon the process being so complex that decoding by
chance or merely human effort is impossible. It also depends upon the great
difficulty of decoding by brute force. Computational trial-and-error methods
would take unreasonably long periods of time, perhaps hundreds or thousands of
years even using modern supercomputers.
Therefore the key is crucial to DES encryption. It is also the problem, for
passing the key to authorized persons turns out to be the Achilles heel of the
process. How is the key sent to someone -- on paper in the mail? By messenger?
These introduce the usual vulnerabilities dramatized in thriller literature. Do
you send the key electronically? Sending it as plain text doesn't seem like a
good idea, and sending it in encrypted form -- well, you see the problem. This
is a recognized flaw in the widely-used DES encryption method.
The RSA encryption technique is called public key encryption. The computational
algorithm depends upon a specific pair of numbers, a public key and a private
key; data encoded by one number cannot be decoded using the same number but can
only be decoded by the other number, and vice versa (see Fig. 1). A
correspondent B keeps one of the pair of numbers secret as a private key and
makes the other number available as a public key. The public key can be used by
anyone, for example her friend A, for coding messages which he sends to B; only
B can decode them, because only she has the other number of the pair. She sends
an encrypted message back to A using not her private key, but A's public key,
and only he can decode it, mutatis mutandum.

Alternatively, B can code a simple message using her private key; anyone can
decode it using her public key. This functions as a digital signature, allowing
her messages to be authenticated, since only she is able to create such
messages. The usefulness is evident in financial transfers, for example, or in
authenticating e-mail or electronic purchase orders.
Encryption is valuable for security. But neither the DES nor the RSA form is
useful as an authentication system. Encryption could perhaps be used to
authenticate a text if one considered it as an envelope with contents presumed
to be intact, but this would only work if the text had not been changed and
re-encrypted. Encryption also has several drawbacks as a long-term
authentication means. No matter which method is used, encryption requires keys
specific to the reader and writer. If the keys are generally available, as they
would need to be for wide document access, then authentication is not possible,
for the document could easily be modified and re-encrypted using the same keys.
In addition, one of our concerns in librarianship is authentication over
periods of time longer than a normal human lifetime. Secret keys may be lost
over such periods of time, making encrypted documents useless.
Hashing
Another technique is called hashing; it is a shorthand means by which the
uniqueness of a document may be established. Hashing depends upon the
assignment of arbitrary values to each portion of the document, and thence upon
the resulting computation of specific but contentless values called "hash
totals" or "hashes." They are "contentless" because the specific computed hash
totals have no value other than themselves. In particular, it is impossible or
infeasible to compute backward from the hash to the original document. The hash
may be a number of a hundred digits or so, but it is much shorter than the
document it was computed from. Thus a hash has several virtues: it is much
smaller than the original document; it preserves the privacy of the original
document; and it uniquely describes the original document.
Fig. 2 allows a simplified description of how a hash is created. If each letter
is assigned a value from 1 to 26, then a word will have a numeric total if its
letters are summed. In the first example, EAT has the value of 26. The problem
is, the word TEA (composed of the same letters) has the same value in this
scheme. The scheme can be made more complicated, as shown in the second pair of
examples, if the letter-values are also multiplied by a place value. In this
scheme, the two words composed of the same letters end up with different
totals. For the sake of illustration, the numbers at the right are shown as
summed to the value 52 at the bottom; in fact the total is 152, but the
leftmost digit can be discarded without materially affecting the fact that a
specific hash total has been found: contentless, private, and (in this simple
example) reasonably distinctive of the particular words in the "document."

This is a very simplistic description of a process that can be made excessively
complicated for human computation. Using cryptographic techniques, it is easy
for current computing technology to compute quite complex hashes for any kind
of document; paradoxically, these hashes are beyond the reach of computers to
phony up or break in the perceived future. Hashing as a means of authentication
is a topic of interest to the business and governmental communities and there
have been several recent mathematical papers on it, including descriptions of
recent patents.
How might authors use hashing as an authentication technique? Above all it must
be easy to use. It is typical for a document to be mundane at the time of its
creation; it is only later that a document becomes important. Therefore an
authentication mechanism must be so cheap and easy that documents can be
authenticated as a matter of routine. First, there must be an agreement on a
hashing algorithm that is generally trusted. Second, the algorithm must be
widely distributable in a useful form, perhaps as a menu or hot-key command on
a microcomputer or even embedded as a routine operating system option. To be
useful, the selected algorithm must be commercially licensed and so cheap that
there is no barrier to hashing documents at will.
In such a scheme, each time a document or a draft is created or saved the hash
is created and saved with it and is separately retrievable. If the document is
electronically published, it is published with its hash; and if the document is
cited, the hash is part of the citation. If a reader using the document then
wishes to know if she has the unaltered form, she computes the hash easily on
her own computer using the standard algorithm and compares it with the
published hash. If they are the same, she has confidence she has the correct,
untampered version of the document before her.
Time-stamping
Digital time-stamping takes the process a step further. Time-stamping is a
means of authenticating not only a document but its existence at a specific
time. It is analogous to the rubber-stamping of incoming mail with the date and
time it was received. An electronic technique has been developed by two
researchers at Bellcore in New Jersey, Stuart Haber and Scott Stornetta.[13] Their efforts initially were prompted by charges of
intellectual fraud made against a biologist, and they became interested in the
problem of demonstrating whether or not electronic evidence had been tampered
with. In addition, they are aware that their technique is useful as a means for
determining priority of thought, for example in the patenting process, so that
electronic claims for intellectual priority could be unambiguously made.
Their technique depends on a mathematical procedure involving the entire
specific contents of the document, which means they have provided a tool for
determining change as well as for fixing the date of the document. A great
advantage of their procedure is that it is entirely public, except (if desired)
for the contents of the document itself. Thus it is very useful for the library
community, which wishes to keep documents available rather than hide them, and
which needs to do so over periods of time beyond those it can immediately
control. It is also likely to be useful for segments of the publishing
community which will want to provide a means for buyers to authenticate what
they have purchased.
The time-stamping process envisioned by Haber and Stornetta depends upon
hashing as the first step. Assume, in Fig. 3, that Author A creates
Document A and wishes to establish it as of a certain time. First he creates a
hash for Document A using a standard, publicly-available program. He then sends
this hash over the network to a time-stamping server. Note that he has thus
preserved the privacy of his document for as long as he wishes, as it is only
the hash that is sent to the server. The time-stamping server uses standard,
publicly-available software to combine this hash with two other numbers: a hash
from the just-previous document that it has authenticated, and a hash derived
from the current time and date. The resulting number is called a certificate,
and the server returns this certificate to Author A. The author now preserves
this certificate, a number, and transmits it with Document A and uses it when
referring to Document A (e.g. in a bibliography) in order to distinguish it
from other versions of the document.

The time-stamping server has one other important function: It combines the
certificate hash with others for that week into a number which, once a week, is
now published in the personals column of The New York Times
("Commercial and Public Notices"), as in Fig. 4. The public nature of this
number (what Stornetta calls an example of a "widely-witnessed event") assures
that it cannot be tampered with.
The privacy of the document been preserved for as long as Author A wishes;
there is also no other secrecy in this process. All steps are taken in public
using available programs and procedures. Note too that no other document will
result in the same certificate, for Document A's certificate is dependent not
only upon the algorithms and the document's hash total, but also upon the hash
of the particular and unpredictable document that was immediately previous.
Once Document A has been authenticated, it becomes itself the previous document
for the authentication of Document B.

Now let us consider Reader C, who wishes to determine the authenticity of the
electronic document before her. Perhaps it is an electronic press release from
a senatorial campaign, or an index purchased over the network from an
electronic publisher, or perhaps it is the year 2093 and the document is an
electronic text of Author A. Reader C has available the certificate for
Document A. If she can validate that number from the document she can be sure
she has the authenticated contents. Using the standard software, she recreates
the hash for the document and sends the hash over the network, with the
certificate, to the time-stamping server. The server reports back on the
validity of the certificate for that document.
But let us suppose that it is the year 2093 and the server is nowhere to be
found. Reader C then searches out the microfilm of The New York Times
for the putative date of the document in question and determines the published
hash number; using that number and the standard software she tests the
authenticity of her document just as the server would.
What I have described are simplified forms of methods for identifying a unique
document, and for authenticating a document as created at a specific point in
time with a specific content. Whether the specific tools of hashing or
time-stamping are those we will use in future is open to question. It is
however the first time that authors, publishers, librarians and end-users have
been offered electronic authentication tools that provide generality,
flexibility, ease of use, openness, low cost, and functionality over long
periods of time on the human scale. Using such tools (or similar ones yet to be
developed), an author can have confidence that the document being read is the
one he or she published, and that it has not been altered without the reader
being aware of it. Such tools are essential for every player in the chain of
scholarly communication.
ROLE OF LIBRARIANS
It may be asked why librarians make such authentication issues their concern.
Why do they do this -- why do they bother? The short answer is that it is what
librarians do. As noted earlier, the basic professional paradigm for librarians
is to acquire information, organize it, preserve it and make it available.
It is the preservation imperative that is particularly important for this
audience of authors and publishers as well as for librarians. Authors and
publishers have an interest in seeing that their works are preserved and
provided in uncorrupted form, but neither have taken on the responsibility for
doing so; librarians have. Authors have a specific interest in the uncorrupted
longevity of their works, and both authors and research libraries have long
periods of time as their concern. Librarians have taken on the particular
responsibility to see that authors' works (and the graphic culture in general)
are preserved and organized for use, not only by our generation but by
succeeding generations of scholars and students. On behalf of future readers,
librarians have the general responsibility for preserving against moth, rust
and change. If librarians do not preserve works for the long haul, no one else
will; once again, it is what librarians do.
Speaking pessimistically for a moment, it is possible that the job cannot be
done. We may all -- librarians, authors and publishers -- be swimming against
the tide. Our society is obsessed with the present and is generally uncaring of
the past and of its records. Technologically refined tools are now available
which allow and encourage the quick and easy modification of text, of pictures,
and of sounds. It is becoming routine to produce ad hoc versions of
performances, and to produce technical reports in tailored versions on demand.
Post-modernist critical theory detaches authorial intention from works, and
demeans the importance of historical context. The technology that allows us to
interact with information itself inhibits us from preserving our interaction.
However, there is cause for optimism. In our house there are many mansions;
there will continue to be people who want history, who care what authors say,
and who wish the human record to last. They will support the efforts of
librarians to achieve these goals. We are fortunate that electronic
preservation is of some interest to other communities for the mundane
commercial reasons. The financial, publishing and other business communities
have a stake in the authenticity of their electronic communications. The
business and computing communities wish to protect against the undesired loss
of data in the short term. The governmental and business communities profess an
interest in the security of systems.
The protection of intellectual property in the internetworked multimedia
environment is the concern of this conference. The preservation of the actual
information content is a prerequisite to the protection of property rights.
Recognizing the need for authenticating and preserving our intellectual
productivity is a common ground for authors, publishers and librarians.
NOTES
1. Parts of this paper are drawn from the author's
presentation at the 1992 annual preconference of the Rare Books and Manuscripts
Section of the Association of College and Research Libraries, and published in
Robert S. Martin, ed., Scholarly Communication in an Electronic Environment:
Issues for Research Libraries (Chicago: American Library Association,
1993), as "Preserving the Intellectual Record and the Electronic Environment"
(pp. 71-101).
2. Gordon B. Neavill, "Electronic Publishing, Libraries,
and the Survival of Information," Library Resources & Technical
Services 28:76-89 (Jan. 1984), p. 78.
3. However, see the recent work by Stuart Moulthrop, Victory Garden
(Cambridge, Mass.: Eastgate Systems, 1991 [800 MB disk (signed and numbered
226/250 by author) for Macintosh + 16 p. brochure with introduction by Michael
Joyce and explanatory matter, in plastic casing labeled "first edition"]).
4. There is a third kind, the obsolescence of software
designed to read a specific medium. For example, Kathleen Kluegel has pointed
out how CD-ROM software updates have left unreadable older disks of the same
published data base. She fears CD-ROM ending up "being the 8-track tape of the
information industry" in "CD-ROM Longevity," message on PACS-L
(listserv@uhupvm1.bitnet, April 29, 1992).
The best discussion of medium preservation, and the distinctions between the
various kinds of obsolescence, is in Michael Lesk, Preservation of New
Technology: A Report of the Technology Assessment Advisory Committee to the
Commission on Preservation and Access (Washington, DC: CPA, 1992).
5. See especially Lesk, but also Janice Mohlhenrich, ed.,
Preservation of Electronic Formats: Electronic Formats for Preservation
(Fort Atkinson, Wis.: Highsmith, 1993), the proceedings of the 1992
WISPPR preservation conference.
6. Neavill, 1984, p. 77.
7. TEI P1, Guidelines, Version 1.1: Chapter 4,
Bibliographic Control, Encoding Declarations and Version Control (Draft Version
1.1, October 1990); sec. 4.1.6, Revision History, p. 55: "...[I]f the file
changes at all, even if only by the correction of a single typographic error,
the change should be mentioned.... The principle here is that any researcher
using the file, including the person who made the changes, should be able to
find a record of the history of the file's contents."
8. Harvey Wheeler, keynote speech at the October, 1988
LITA conference (Boston, Mass.). The issue arises in a different context in the
ESTC note below.
9. A peculiar case is the transportation time-table;
theoretically it could be dynamically updated in electronic form, yet it is the
timetable's hard-copy publication that signals to the users that a change has
occurred.
10. An electronic catalog is a similar case. Librarians
never pretended that card catalogs were static, but the electronic catalogs
(particularly when on the network) are so accessible as to raise citation
problems. Robin Alston, in Searching the Eighteenth Century (London:
British Library, 1983), claimed superiority for the Eighteenth Century Short
Title Catalog (ESTC) on the grounds that "machine-readable data...can be always
provisional." Hugh Amory, a Harvard rare books cataloger, responded in a review
by noting: "The permanence of print has its own advantages, moreover: who will
wish to cite a catalogue that can change without notice?" Papers of the
Bibliographical Society of America (PBSA) Vol. 79 (1985), p. 130.
11. See the discussion of hypertext books in Robert
Coover, "The End of Books," The New York Times Book Review (June 21,
1992), p. 1, 23-25. Examples of such works include Moulthrop (n. 2 above),
Michael Joyce, Afternoon: A Story (Cambridge, Mass.: Eastgate Systems,
1987), and Carolyn Guyer and Martha Petry, "Izme Pass," Writing on the
Edge Vol. 2, no. 2 (Spring, 1991), attached Macintosh disk.
12. DES is described in FIPS Publication 46-1: Data
Encryption Standard, National Bureau of Standards, January 1988. RSA Data
Security, from whom information is available about their product, is at 10 Twin
Dolphin Drive, Redwood City, California 94065; the original description of
RSA's method is in R. L. Rivest, A. Shamir, and L. Adleman, "A Method for
Obtaining Digital Signatures and Public-key Cryptosystems," Communications
of the ACM, Vol. 21, No. 2 (Feb. 1978), p. 120-126.
A few readily available popular articles on the two schemes include John
Markoff, "A Public Battle Over Secret Codes," The New York Times (May 7,
1992), p. D1; Michael Alexander, "Encryption Pact in Works,"
Computerworld, Vol. 25, No. 15 (April 15, 1991); G. Pascal Zachary,
"U.S. Agency Stands in Way of Computer-security Tool," The Wall Street
Journal (Monday, July 9, 1990); D. James Bidzos and Burt S. Kaliski, Jr.,
"An Overview of Cryptography," LAN Times (February 1990). More technical
and with many references is W. Diffie, "The First Ten Years of Public-key
Cryptography," Proceedings of the IEEE, Vol. 76, No. 5 (May 1988), p.
560-577.
13. Stuart Haber and W. Scott Stornetta, "How to
Time-stamp a Digital Document," Journal of Cryptology (1991) 3:99-111;
also, under the same title, as DIMACS Technical Report 90-80 ([Morristown,] New
Jersey: December, 1990). DIMACS is the Center for Discrete Mathematics and
Theoretical Computer Science, "a cooperative project of Rutgers University,
Princeton University, AT&T Bell Laboratories and Bellcore." The authors are
Bellcore employees.
D. Bayer, S. Haber. and W. S. Stornetta, "Improving the Efficiency and
Reliability of Digital Time-stamping," Sequences II: Methods in
Communication, Security, and Computer Science, ed. R. M. Capocelli et al
(New York: Springer-Verlag, 1993), p. 329-334.
A brief popular account of digital time-stamping is in John Markoff,
"Experimenting with an Unbreachable Electronic Cipher," The New York
Times (Jan. 12, 1992), p. F9. A better and more recent summary is by Barry
Cipra, "Electronic Time-Stamping: The Notary Public Goes Digital,"
Science Vol. 261 (July 9, 1993), p. 162-163.
BIOGRAPHY
Peter S. Graham, Associate University Librarian for Technical and Networked
Information Services at Rutgers University, co-leads the Working Group on
Legislation, Codes, Policies and Practices of the Coalition for Networked
Information, and serves on the Council of the American Library Association.
Holding an M.L.S., he has been a senior administrator of university libraries
and computing centers.
Peter S. Graham
Associate University Librarian for Technical
and Networked Information Services
Rutgers University Libraries
169 College Ave.
New Brunswick, N.J. 08903
(908) 932-5908
fax (908) 932-5888
e-mail: psgraham@gandalf.rutgers.edu