A White Paper on Authentication and Access Management Issues in
Cross-organizational Use of Networked Information Resources
Coalition for Networked Information
Appendix: Notes on the State of the Art in Available Software
About This Paper
A first draft of this paper was released for review by members of the CNI Access Management list on March 28, 1998 and generated a great deal of electronic discussion within the closed CNI-AUTHENTICATE mailing list. This was followed by a meeting in Washington DC on April 5, 1998 to review and discuss the draft paper and comments generated on the list up to that date. The revision has also benefited from discussions at a Digital Library Federation/National Science Foundation Workshop held in Washington on April 6, 1998 on closely related issues. My thanks to all who contributed.
This version, which incorporates many of the ideas from this process, is being prepared for distribution at the Spring CNI Task Force meeting in Washington DC, April 14-15; it is also being placed on the CNI web site (www.cni.org) for wider dissemination. Note that in some places time did not permit me to fully incorporate earlier comments or to research questions that were identified, and I have tried to indicate where changes will be made prior to the preparation of the final version. The paper also still needs some considerable editorial work, and I ask readers to be forgiving of editorial problems. Comments are invited and should be sent to <firstname.lastname@example.org>. About 10 May, 1998, I will prepare a final version of the white paper which will be placed on the CNI web site.
Return to Contents
As institutions implement networked information strategies which call for sharing and licensing access to information resources in the networked environment, authentication and access management have emerged as major issues which threaten to impede progress. While considerable work has been done over the last two decades on authentication within institutions and, more recently, in support of consumer-oriented electronic commerce on the Internet, a series of new technical and policy issues emerge in the cross-organizational authentication and access management context. This white paper, which is being prepared by the Coalition for Networked Information in conjunction with a large group of volunteer reviewers and contributors, is intended to serve several purposes:
- To identify and scope the new issues that emerge in the cross-organizational setting and to provide a framework for analyzing them.
- To map out the various best-practice approaches to solving these problems using existing and emerging technology so that institutions and information providers can make informed choices among the alternatives and consider how these choices relate to institutional authentication and access management strategies.
- To provide a common vocabulary and framework to assist in the development of licensing and resource-sharing agreements, and to highlight technical and policy considerations that need to be addressed as part of these business negotiations.
- To lay the foundation for possible follow-on formal or de facto community standards development in access management. If large scale use of networked information resources is to flourish, we need to move away from the specialized case-by-case access management systems in use today and towards a small number of general approaches which will let institutionally-based access management infrastructures interoperate with arbitrary resources.
Return to Contents
The basic cross-organizational access management problem is exemplified by most licensing agreements for networked information resources today; it also arises in situations where institutions agree to share limited-access resources with other institutions as part of consortia or other resource sharing collaborations. In such an agreement, an institution — a university, a school, a public library, a corporation — defines a user community which has access to some network resource. This community is typically large, numbering perhaps in the tens of thousands of individuals, and membership may be volatile over time, reflecting for example the characteristics of a student body. The operator of the network resource, which may a web site, or a resource reached by other protocols such as Telnet terminal emulation or the Z39.50 information retrieval protocol needs to decide whether users seeking access to the resource are actually members of the user community that the licensee institution defined as part of the license agreement.
Note that the issue here is not how the licensee defines the user community — for example how a university might define students, staff members and faculty (all of the problems about alumni, part time and extension students, adjunct faculty, affiliated medical staff and the like); it is assumed that the institution and the resource operator have reached some satisfactory resolution on this question. Rather, the issue is one of testing or verifying that individuals are really a member of this community according to pre-agreed criteria, of having the institution vouch for or credential the individuals in some way that the resource operator can understand. Such arrangements are often called “site” licenses, but this term is really inaccurate; while physical presence at a specific site may be one criteria for having access, a better term is “group” license or “community” license, emphasizing that the key consideration is membership in some community, and that physical location is often not the key membership criteria.
Progress in inter-organizational access management will benefit everyone. To the extent that resource operators and licensing institutions can agree on common methods for performing this authentication and access management function, it greatly facilitates both licensing and resource sharing by making it quick, easy and inexpensive to implement business arrangements. It benefits users by making their navigation through a network of resources provided by different operators more seamless and less cumbersome. The central challenge of cross-institutional access management is not to set up barriers to access; it is to facilitate access in a responsible fashion, recognizing the needs of all parties involved in the access arrangements.
While this white paper will give some particular emphasis to issues that arise in the higher education and library communities (particularly at the policy level) the problem under consideration here is very general, and in fact occurs in general corporate licensing of networked information services, or cooperation among business partners.
As we will see in the next section, not only are there questions about how best to accomplish this technically, there are also a series of intertwined policy and management considerations which need to be considered.
The focus here is on group licenses that may be subject to some additional constraints (for example concurrent user limits) rather than on transactional models where individual users may take actions to incur specific incremental costs back to the licensing institution over and above base community licensing costs. Any incremental cost transactional model will need to incorporate at least two additional features: a set of user constraints that become part of the attributes for each authenticated user and which are made available to the resource operator, and a means by which the resource operator can obtain permission for transactions by passing a query back to the licensing institution. This involves a much more complex trust, liability and business relationship between resource operator and licensing institution, as well as consideration of financial controls and a careful assessment of security threats. It will not be considered further here.
Note that there are several other cross-organizational authentication, authorization and access management issues which are beyond the scope of this paper, including the authentication of service providers and verifying the integrity and provenance of information retrieved from networked resources.
2.1 Terminology and Definitions
Throughout the rest of this paper we’ll use the general terms “resource operator” to cover publishers, web site operators, and other content providers (including libraries and universities in their roles as providers of content), and “licensee institution” to cover organizations such as universities or public libraries that arrange for access to resources on behalf of their user communities.
Authentication and authorization actually have very specific meanings, though the two processes are often confounded, and in practice are often not clearly distinguished. We will use the term “access management” to describe broader systems that may make use of both authentication and authorization services in order to control use of a networked resource.
Authentication is the process where a network user establishes a right to an identity — in essence, the right to use a name. There are a large number of techniques that may be used to authenticate a user — passwords, biometric techniques, smart cards, certificates. Note that names need not correspond to the usual names of individuals in the physical world. A user may have the rights to use more than one name: we view this as a central philosophical assumption in the cross-organizational environment. There is a scope or authority problem associated with names; in essence, when a user is authorized to use an identity this is a statement that some organization has accepted the user’s right to that name. For authorization within an institution this issue often isn’t important, and in some schemes a user may only have a single identity; for cross-organizational applications such as those of interest here, this relativistic character of identity is of critical importance. A user may have rights to use identities established by multiple organizations (such as universities and scholarly societies) and more than one identity may figure in an access management decision. Users may have to decide what identity to present to a resource: they may have access because they are a member of a specific university’s community, or a member of a specific scholarly society, for example. Making these choices will be a considerable burden on users, much like trying to shop for the best discount rate on a service that offers varying discounts to different membership and affinity groups (corporate rate, senior citizen rate, weekly rate, government rate, etc.).
A single, network-wide (not merely institution wide) access management authority would simplify many processes by allowing rights assigned to an individual by different organizations to become attributes of a master name rather than having them embodied in different names authorized by different organizations; yet such a centralized identity system probably represents an unacceptable concentration of power, as well as being technically impractical at the scale we will ultimately need. It should be noted that within the UK Athens project we can see a model of a rather centralized authorization system which has been scaled successfully to quite a large number of users, and which by virtue of its centralized nature has allowed rapid progress in wide access to networked information. The Athens experience and the factors — technical, social, cultural, and legal — that have enabled it to work in the UK call for very careful study as we consider approaches for other nations such as the US.
A name or identity has attributes associated with it. These may be demographic in nature — for example, this identity signifying a faculty member in engineering, or signifying a student enrolled in a specific course — or they may capture permissions to use resources. Attributes may be bound closely to a name (for example, in a certificate payload) or they may be stored in a directory or other demographic database under a key corresponding to the name. Attributes may change over time; for example, from semester to semester the set of courses that a given identity is associated with may well change. Just because some system on a network has knowledge of a name does not necessarily imply that it has access to attributes associated with that name. There is a fine line between rights to names (authentication) and attributes; for some purposes, simply knowing that a user has a right to a name from a given authorizing authority may itself represent sufficient information (an implicit attribute, if one wishes) that can support access management decisions.
Authorization is the process of determining whether an identity (plus a set of attributes associated with that identity) is permitted to perform some action, such as accessing a resource. Note that permission to perform an action does not guarantee that the action can be performed; for example, a common practice in cross-organizational licensing is to further limit access to a maximum number of concurrent users from among an authorized user community.
Note that authentication and authorization decisions can be made at different points, by different organizations.
Some libraries are establishing consortia which involve reciprocal borrowing and user-initiated interlibrary loan services; in a real sense these consortia are developing what amounts to a union or distributed shared patron file. One can view this as moving beyond just common authentication and access management to a system of shared access to a common directory structure for user attributes, and a common definition of user attributes among the consortium members. This is an example of a situation where very rich attributes are available to each participant in the consortium as they make authorization decisions; interlibrary loan and reciprocal borrowing represent a much richer and more nuanced set of actions than would be typical of a networked information resource.
A subsection on models for access management, discussing the locus of authorization decisions and trust relationships between there resource operator and licensing institution, will probably be added here in the next revision.
Return to Contents
We will be examining a number of different proposed solutions to the access management problem. Before describing and analyzing these proposed solutions, this section considers the various requirements that a viable solution needs to address. Obviously, there are trade-offs which will need to be made among the conflicting goals in the context of each specific resource access arrangement, and institutions will have to make policy choices about the relative importance of the various requirements.
3.1 Feasibility and Deployability
First and foremost, the authentication and access management solution needs to work at a practical level. From the user’s perspective, it should facilitate access, minimizing redundant authentication interactions and providing a single-signon, user-friendly view of the array of available networked information resources. It needs to scale; it must be feasible for institutions to deploy and manage for large and dynamic populations of community members. It needs to be sufficiently robust and simple so that user support issues are tractable; for example, a forgotten password should not be an intractable problem. It needs to be affordable.
From the resource operator viewpoint, a viable access management system should not require a vast amount of ongoing production and maintenance. Configuration to add a new licensing institution should be simple, and ongoing maintenance of that configuration should not call for large amounts of information to be interchanged between resource operator and licensing institution on an ongoing basis (such as file updates). Software parameter changes — not new software — should be necessary to add additional institutions. There should be a clean, simple, and well-defined (standard) interface between resource operator and licensing institution. A systems or network failure at one institution should not degrade a resource operator’s service to other licensing institutions.
Practical solutions are inextricably linked to the installed base of software. Ideally, all of the software needed to implement an authentication and access management solution should be available either commercially or as free software. Good solutions will leverage off of the installed technology base, and also current investments in upgrading that technology base: they should not be specific to libraries or even to higher education if possible, at a mechanism level (though libraries or higher educational institutions may use these mechanisms in conjunction with policies that vary from those common in the corporate or consumer markets). Most importantly, the software support that end users require should be available in common packages — such as web browsers — that are already part of the installed base. Any solution that requires custom specialized software to be installed on every potential user’s desktop machine starts with a severe handicap. Similarly, any solution requiring specialized hardware, such as biometric systems or smart card readers, is certainly not going to be feasible on a cross-institutional basis, and while it might imaginably be workable within an institution’s internal authentication system, some other technique would be needed to convey cross-organizational access management data. Few resource providers will be willing to limit access to users equipped with such specialized facilities.
Software isn’t enough; there is also the question of whether the user knows how to configure and employ it. For example, current web browsers contain considerable support for client-side certificates and proxies, but few users know how to use these features. Education about an existing software base is easier than first replacing or upgrading an installed software base and thenteaching users how to employ the new software, but it’s still a substantial issue.
Kerberos is an interesting case study of the feasibility constraints. An institution could certainly make a successful decision to deploy Kerberos as a local authentication system by placing Kerberos support software on each user’s workstation (perhaps via a site license to a vendor); however, inter-realm Kerberos is probably too intimate a connection between resource operators and licensee institutions to be viable, and most resource operators would also reject Kerberos as an inter-organizational approach because of the requirements it places on end user systems at institutions that were not using Kerberos for local authentication. In the cases where Kerberos is being used for inter-organizational resource sharing, I believe that one could argue that the participating institutions (typically consortium members) have made commitments to link their administrative and other support systems at a much more sophisticated level than one would find in the typical resource operator – licensing institution relationship and are coming more to resemble a single “consortium institution” with an internal (local) authentication system.
Any solution also needs to reflect current realities; in particular, it must be able to recognize the need for a user community member to access a resource both independent of his or her physical location (for example, a user must be able to connect to the internet via a commercial ISP, a mobile IP link, or a cable television internet connection from home), and also the need for people to access resources by virtue of their location (for example, access may be granted to anyone who is physically present in a library, whether or not they are actually members of the licensee institutional community).
3.2 Authentication Strength
The solution needs to be reasonably secure. The resource operator needs confidence that an attacker can’t forge a credential easily. All parties need confidence that credentials cannot easily be stolen by eavesdropers on the net (for example, through sniffer attacks), and that they cannot be stolen easily from a user that exercises reasonable precautions. Also, systemic compromise is a concern: this is a very real difference between having an individual user’s credentials compromised (in which case they can be canceled and new ones issued) and having the system as a whole compromised, which might call for reissuing credentials to everybody in the user community.
Authentication strength is a somewhat subjective question. For many of the approaches that we will discuss, strength comes from the details of cryptographic algorithms and key lengths used; but part lies also in overall system design and implementation and in the realities of user behavior, and this can often be the source of the largest number of vulnerabilities. Some level of reason is called for here; most of the resources being access controlled, while certainly valuable assets, do not represent immanent dangers to public safety or national security if access control is breached. An access management system needs to be complemented by monitoring and other controls on the part of the resource operator to limit the impact of a breach. Further, there are after-the-fact legal remedies which can be applied to limit the damage caused by such a breach.
The cryptographic technology underlying many access management systems is legally sensitive on an international export and import basis, and may also be constrained by various national laws (though within the US, cryptographic technology can be employed freely, at least today).This is important for several reasons: resource access may cross national boundaries, and also because members of an institution’s user community may need to access networked information resources when traveling outside of their home nation. We will see international resource sharing consortia, and also see institutions in one nation licensing access to resources in other nations.
It should be noted that virtually any strong access management system that incorporates general purpose cryptographic services will be illegal for export since all strong cryptographic implementations for general encryption/decryption are export controlled in the United States under current laws governing trafficking in arms. Note however that it may be possible for members of a user community traveling abroad to export cryptographic software for temporary personal use under some specific limitations; depending on where they are traveling it may or may not be legal for them to use it under the laws of the country they are in at the time. Matters are more complex than they may seem, however, because US export control laws are mostly concerned with cryptography that can support encryption (for confidentiality or concealment); export licensing of systems specifically for authentication or digital signatures which do not serve dual use as encryption systems has been much less of a problem. Consideration of the legality of developing, importing, exporting, and operating of access management systems outside the US needs to be analyzed on a country-by-country basis; laws vary considerably.
3.3 Granularity and Extensibility
There is a need for fine-grained access control where institutions want to limit resource access to only individuals registered for a specific class; this arises in electronic reserves and distance education contexts, especially when a class may be offered to students at multiple institutions. Other variations are also possible: limiting access to law students, to faculty, to graduate students and faculty in physics. This sort of fine-grained access management is likely to be very complex, since there will be great variation from institution to institution in how groups of users are identified, named and specified. There is also some overlap between fine-grained authentication and demographic information that may be needed to generate management information (discussed below).
Granularity of access has been one of the most controversial issues in the discussions of the first draft of this paper and related issues. Without arguing against the need for fine-grained access control for some applications, I will summarize a few observations:
- At present, most access to network information resources is not controlled on a fine-grained basis. There is a very real danger that by accommodating all of the needs for fine-grained access management into the basic access management mechanisms we will produce a system that is too complex and costly to see wide-spread implementation anytime soon.
- The information needed to support fine-grained access management probably needs to be kept within institutions for privacy reasons, and should be treated as attributes to an identity rather than expressed as additional identities (in other words, one should record that a user with a given identity happens to be enrolled as a member of course X, rather than issuing the user an identity as member-21-of-course-X). This also has implications for the locus of authorization decisions for fine-grained access management.
- In many — but certainly not all — cases, the resources (such as electronic course reserves) that are subject to fine-grained access management will be within an institution, or within one of the institutions in a consortium of institutions that are collaborating closely through shared courses or similar projects. The case where an external commercial networked resource will be access controlled to members of a small group like a class will be rare.
- In some cases, the presence of fine grained access management mechanisms may encourage irrational license economics. For example: suppose there is an electronic journal that prices based on the number of people that have access, rather than on the number of people that actually use it. This would encourage an institution to define a fine-grained group of authorized users to this journal in order to save money. Such an arrangement is complex and sets up barriers to access for the rest of the university community. It would probably make more sense to initially price access for the entire university community based on the approximate number of people who will actually use the journal, and then if it turns out a few more people are using it that were originally expected, negotiate a slightly higher fee at license renewal rather than defining a special access group. Revenues to the publisher will be roughly the same in either case, but additional use would be encouraged rather than discouraged. Note that of course this reasoning doesn’t apply in cases where there is wide demand for a resource, and the licensing institution is making a policy decision to deliberately and systematically limit access to the resource to a specific closed user community; but this is, reviewers believed, the exception rather than the common case.
3.4 Cross-Protocol Flexibility
Some approaches work for a wide range of applications protocols that might be used for accessing information. Others are designed to work only with specific protocols, or would require the development of special software extensions or modifications in order to support a full range of protocols. For our purposes, HTTP-based Web access is the critical application protocol; we will also consider Telnet terminal emulation and the Z39.50 information retrieval protocol, although these are far less critical. The main locus of concern here is the user’s desktop machine, which normally uses HTTP or Telnet to connect to machines that are part of the system of networked information resources; Z39.50 is seldom used at the desktop today and finds its main application in linking major networked information resources together.
Reviews of the earlier draft of the paper felt that the X Window protocol was not an issue, as this was primarily a local access application. The ability to sign electronic mail messages is certainly an issue for email-enabled networked information applications, though probably not a major one. Secure email access — authenticated SMTP, POP, or IMAP, for example — are viewed as primarily issues within an institution rather than cross-organizational questions; while it is certainly useful to have an authentication infrastructure which will support these applications, as well as local administrative applications, this is again not central to the cross-organizational problem. Directory access protocols such as LDAP are also potentially serious issues.
CORBA and DCOM are potential questions, though it is not clear to what extent these will be used from desktop machines in the future. There are also a set of issues involving authentication in conjunction with JAVA applets and systems like Authenticode or PICS which are not well understood at this point. Many of the authentication and authorization problems in this area deal with a user’s machine making decisions about what applets it is willing to accept and to execute, and what authorizations it is willing to assign them; these are similar to questions about document authenticity and integrity and are out of scope for this paper. The other set of problems center around an applet making decisions about a user’s rights; while technology and standards in this area are still in flux, most of the current approaches seem to assume some kind of certificate infrastructure. This is an area where more work is clearly needed.
3.5 Privacy Considerations
The application scenarios here involve access to information resources. In many cases libraries will pay for these licenses to electronic resources as a replacement for physically acquiring information in paper form.
The licensee institution, in the print world, has a set of internal policies about record-keeping and use reporting (both who used it and how often it was used); generally these are very restrictive and stress user privacy. The institution then has a separate set of policies (which may in fact never have been explicitly codified) about sharing this usage information with the content supplier: in general this policy has been very simple — the supplier got no information about usage other than that which the institution chose to make public for other reasons.
In the electronic environment, the situation changes. Because information is often accessed at the publisher site, the publisher may know a great deal about who is accessing what material and how often. Aggregators and service bureaus may also complicate both the collection and flow of information. To some extent the collection, use, retention, and even potential resale of this information can be covered by license contract; and should be. Institutions will have to develop realistic policies about privacy of readers in the networked information environment which are acceptable to their user communities and well understood by readers. However, some authentication and access management approaches offer licensee institutions much greater flexibility than others to limit the amount of information that can technically be collected by the resource operator. In general, it is desirable that the amount of privacy at risk which needs to be controlled by contractual provision be minimized.
Clearly, one strategy for ensuring user privacy is to ensure that users remain anonymous in their use of information resources. We can distinguish several common situations:
- Repeat users cannot be identified; each session is completely anonymous. We will call this anonymous access.
- Repeat users can be identified, but the identity of a user cannot be determined. The resource operator knows only that some specific individual is accessing the resource repeatedly, not who that individual is. The user may be identified by some arbitrary identifier, such as USER123. We will call this pseudononymous access.
- Demographic characteristics of users can be determined, but not actual identities. We will call this pseudonymous access with demographic identification.
- Actual identities can be associated with sessions. We will call this identified access. It may be supplemented with demographics; just because the resource operator knows who someone is does not mean that they automatically know the user’s demographic characteristics as well as his or her name.
Note that many users choose to identify themselves in order to obtain added value services, such as electronic mail notification of changes to a resource, or to preserve context from one session to the next, or to maintain a user profile at a resource. It’s important to distinguish voluntary user self-identification from automatic identification that is generated as a byproduct of an authentication and access management system. It is also worth considering, at least briefly, how an institution might provide services for its community that permits community members to enjoy these added value services without identifying themselves to resource operators, and whether it’s worth going to the trouble to make this possible.
Understanding the coupling between pseudononymous or identified access as provided by an access management system and the desire to implement such capabilities as part of an information access system is a crucial issue. A given information resource may rely on an authentication and access management system to provide identified or pseudononymous access automatically, or it may offer some weak or strong higher level functions (using a userid/password or cookie scheme, for example) that give the already authenticated and authorized user the option of identifying him or her self (literally or pseudononymously) in order to obtain personalized services from the information resource. In the latter case, assuming that it’s a real choice and the level of service offered to the anonymous user is meaningful, this isn’t an authentication and access management system issue at all: it’s a choice that users of the information resource are free to make on an individual basis.
Privacy is not a purely political or moral issue. To the extent that researchers are pursuing patents, developing grant applications in a competitive environment, or seeking precedence for discoveries, confidential access to information resources is a critical issue with potentially significant economic consequences. Many higher education institutions are bound by laws about privacy of student records; some public libraries may face legislative constraints on patron privacy; and medical institutions (including university hospitals) may have to consider issues involving privacy of medical records. And, of course, beyond the United States — for example in Europe — the overall legal framework grants stronger privacy rights for all citizens.
Finally, in discussing privacy, we should recognize the overall need for a secure environment; this goes beyond authentication and access management. If user interactions with networked information resources are conducted in the clear, they are subject to eavesdropping by other machines on a local area network near the user (for example, by sniffer-based attacks within the campus network) or by attackers anywhere along the network path to the resource. Very few information resources today support searching and information retrieval (as opposed to ordering) via encrypted SSL-secured HTTP. If privacy is to be honored in the licensing of networked information resources, then contractual arrangements, resource sharing designs, and procurements must recognize the importance of providing such support.
In some situations privacy and confidentially issues go beyond access management and session encryption. Some users may be concerned that even knowledge that they are using a resource (not necessarily what they are doing with it) becomes known through traffic analysis. Link level encryption helps with this to an extent, but is not widely deployed and is unlikely to be widely deployed anytime soon. Very large scale aggregating proxies and experimental systems such as Crowds, which build on work done with anonymizing emailer systems such as Mixmaster also help to address these needs. Robust protection against traffic analysis in the public internet requires very large overheads. We will not consider this problem further here, other than to observe that credential-based approaches seem likely to be most flexible in these environments, and that if they are used it will be necessary to consider traffic analysis vulnerabilities created by the credentials verification process as well as the submission process. Similarly, there are situations where some users are unwilling to permit a resource operator to know what sort of information they are searching for (even beyond contractual restrictions on the collection and use of this information); in these cases it may be necessary for such users to locally replicate an entire resource or large subsets of it.
In negotiating a license agreement, all parties recognize that the resource being licensed is of value and that the rights of the licenser must be respected. Typically, a licensee institution will agree to educate members of the user community about the license terms and restrictions relevant to the information resource in question, and to work with the resource operator to identify, investigate and put a stop to improper use of the resource. Thus, both the resource operator and the licensee institution share a common interest in having some individual user accountability as part of an authentication and access management system, so that if inappropriate use is detected (for example, if a single user seems to be accessing the resource thousands of times a day from computers on three continents) the organizations know where to begin investigating.
Of course, there’s a tension between accountability and privacy; to the extent that privacy is achieved through anonymity, there is no accountability. Note that this balance may be managed by compartmentalizing information, for example: if a specific user is identified to the resource operator simply as USER2345, and the licensee institution knows who USER2345 actually is (but the resource operator does not) then the resource operator could call for an investigation of what USER2345 is doing, and the licensee institution might then follow its own due process in that investigation, which might result in internal disciplinary action but might never result in revealing the individual’s actual identity to the resource operator. In a real sense, the obligation of the members of the user community are to the licensee institution, and the licensee institution in turn has obligations to the resource operator to ensure that members of its user community behave responsibly; it is not at all clear that it’s appropriate for the resource operator to be dealing with individual members of the user community directly.
Accountability will also have some interactions with institutional policies about inappropriate use of network resources, particularly to the extent that interaction with these resources may go beyond simply retrieving information to participation in interactive communications. For example, policies that typically govern the use of electronic mail may come into play. But even if resources are used purely for information retrieval purposes some accountability (coupled with management data) may be desired in support of policies prohibiting use of university resources for personal commercial gain, for example; a useful analogy may be drawn to practices and policies in areas such as telephone logs.
3.7 Ability to Collect Management Data
The licensing institution has a legitimate need to gather management data in order to guide future decisions; if it is spending a great deal of money to license access to a resource, or to participate in a consortium resource sharing arrangement it is only reasonable that it will want to know how much various resources are being used and what sectors of the user community is making use of them. For public institutions, in particular, collection of management data is an essential part of institutional accountability, and some collection of management data may even be considered part of public records responsibilities for these institutions.
There are many reasons to collect management data besides guiding licensing or resource sharing decisions. These include the allocation of costs within a licensing organization or even the development of enhanced services such as collaborative filtering systems.
It’s useful to define some terms. Management data can be faceted in two ways. The first is by user: this might include faceting by source IP address, by identity (name), or by user attributes that figure into a contractually based authorization decision (i.e. a resource is limited to faculty and graduate students; this user had the faculty attribute), or by demographic information that the licensee institution knows and wants to correlate with usage patterns (i.e. this is a first year graduate student in civil engineering, or even, in theory though likely not in practice, this is a male student). They second way to facet management data is by the objects being accessed or the services being used: which pages of which articles are being read, which one of several different databases on a server is being searched, how often searching is by author rather than by date, etc.
Collecting highly aggregated data is not particularly problematic; there’s no way to prevent the resource operator from having aggregated data (although its use can obviously be managed by contract). The only question is whether the licensee institution can collect its own aggregate data or whether it must take it as a return feed from the resource operator; in the latter case, there are a whole series of scaling issues related to standards, since it will be a significant burden for the licensee institution to receive use statistics feeds from potentially hundreds of resource operators in different formats, reflecting different conceptual models about what is being counted, and with different delivery schedules.
The larger problems arise when one wants demographically faceted use data, or even individual use data. In the case of demographically faceted data, either the licensee institution must use the authentication and access management system to pass demographic faceting to the resource operator so that it can become part of the usage data that the resource operator returns, or the licensee institution must be able to capture its own demographically faceted use data. Privacy considerations begin to emerge when demographic data must be passed to the resource operator.
In the case of individual use data the problems become even more sensitive. Clearly, if users are individually tracked by the resource operator (whether or not their identities are known — i.e. whether they are pseudononymous or identified) then the resource operator can collect individual level data and return it to the licensee institution. The resource operator may even get supplemental demographic data about the individuals from the licensee organization. There are also a series of institutional policy problems having to do with individual level data at the licensee institution: who can see this data — for example, can a faculty member look at the statistics for his or her students use of specific information resources? Under what procedures are usage records subject to audit to detect misuse? Again, we need to consider when these issues should be defined by policy and trust in implementation of policy as opposed to being managed by technical means.
While many scenarios are possible, I suspect that the most common practical situations today will be these:
- usage is tracked on an aggregated basis either by the institution or the resource operator; I suspect tracking by the resource operator will be more common since the resource operator will be able to count events that are more meaningful in measuring resource utilization (for example, by journal rather than just page accesses).
- usage is tracked on an individual (pseudononymous or identified) basis by the resource operator, who then passes use logs back to the institution ,which processes them to factor in demographic data and obtain a demographically faceted usage report.
- institution and resource operator agree on some very simple demographic faceting and demographic data is passed to the resource operator by the access management system; these demographics are then factored into the usage reports developed by the resource operator.
Management data is a major problem in the current access framework. Part of the problem is the conflict between privacy and a desire for demographic or individual data. Most of this is going to have to be sorted out at the institutional policy level, and may involve making sacrifices in order to ensure privacy. Some institutions may be legally limited in their ability to collect certain management data. It would be very useful to have some real-world examples of how this trade-off has been settled.
A very insightful comment was made at the meeting to review the first draft of this paper. From the perspective of the licensing institution, particularly when facing difficult collection and resource allocation decisions, the observation was “there’s never enough management information — this issue here is to define what you absolutely have to have, not would you would ideally like”.
Return to Contents
Having summarized the many and sometimes conflicting requirements that an access management system must address, we now consider a number of actual schemes currently in use or under consideration and analyze how well they meet these requirements.
It’s important to recognize that in solving real-world problems more than one approach may be relevant at a single institution; one might use one scheme for one class of users and a different scheme for another class. For example, an institution might choose to manage access for kiosks and public workstations by IP source address, and to use a credential scheme for other users. Indeed, virtually all of the major institutional systems that are currently being deployed combine multiple approaches. Also, note that approaches can be cascaded in a hierarchy; for example, a resource might be set up to first check whether a user could be validated by an IP source filtering approach but if the IP source address isn’t valid for access, the resource might then apply a credential-based access management test.
At the most general level, there are three approaches — proxies, IP source filtering, and credential-based access management.
Basically, with IP filtering, the licensee institution guarantees to the resource operator that all traffic coming from a given set of IP addresses (perhaps all IP addresses on one or more networks) represent legitimate traffic on behalf of the licensee institution’s user community. The resource operator then simply checks the source IP address of each incoming request.
In the case of a proxy, the licensee institution has deployed some sort of local authentication system, and users employ specific proxy machines to send traffic to the resource and receive responses back from that resource; the local authentication system (which is invisible to the resource operator, except that the resource operator knows that it is in place in order to guarantee that traffic coming from the proxy machines is legitimate) is used to control who can have access to the proxy machine. As a business matter, the resource operator may want to know something about how the local authentication system works in order to have confidence in the proxy, but this does not enter into the actual authentication which is performed operationally by the resource operator. The resource operator will most commonly identify the proxy machines by their IP addresses (or some variation such as reverse DNS lookup), and for this reason from the resource operator’s point of view proxies are often just considered to be a special case of IP source address filtering — a resource operator who is set up to do IP source address filtering can accommodate a licensing institution employing proxies with essentially no additional work. However, proxies can actually be identified using either IP addresses or any credential-based cross-organizational authentication scheme (such as certificates). Because of this, and also because many of the policy and technical issues surrounding proxies at a higher level are quite distinct from those involved in IP source address filtering, we will treat proxies as a separate approach.
The third approach is credential-based. Here the user presents some form of credential — a user id and password, or a cryptographic certificate, for example — to the resource operator as evidence that he or she is a legitimate member of the user community. The resource operator then validates this credential with some trusted institutional server (or third party server operating under contract to the institution) before deciding whether to allow access. Note that there needs to be advance agreement (most likely as part of the license contract or resource sharing agreement) as to how the mutually trusted institutional servers or third parties (such as certificate authorities) are identified and authenticated themselves.
For completeness, it is worth noting that there is one other possibility: the resource operator assigns credentials to individual members of the licensee community (perhaps in cooperation with the licensee institution). This is what was done historically when small numbers of users needed access to a few specialized information resources. The trouble is that it does not scale manageably to large numbers of users or large numbers of resources, and particularly not to both. While it’s reasonable for an institution to distribute one set of credentials to each member of its user community (for example, in conjunction with an internal authentication system) it’s not reasonable to distribute hundreds of different credentials for different resources to each user, or to expect the users to manage them or to keep straight which credentials are for use with which resource. Thus, we will not consider this model further, other than to recognize that it may have its place for specialized resources that serve only a handful of users.
4.1 IP source address filtering
Currently, IP source address filtering is the major mechanism used to implement authentication and access management for cross-institutional resource access. The way this works is that the licensee institution provides the resource operator with a list of IP addresses that are authorized access; this can include some wildcarding to permit entire subnets or networks to have access, and also occasionally incorporates exclusion lists (all hosts on a given net or subnet EXCEPT for the following specific hosts). There is general agreement that it is unsatisfactory for a number of reasons, and it is instructive to evaluate it against our seven functional requirements both to see where it works and where it actually falls short.
Feasibility and Deployment: This is relatively easy to deploy and manage from the perspective of both the institution and the resource operator. No special software is needed at the user side, and at the resource operator side the support is not difficult. There is some maintenance involved in keeping the tables at the resource providers up to date, but this is not unmanageable. It is necessary for the licensee institution to perform some analysis on access and use policies for the machines within the institution to make sure that machines that aren’t access-limited to the institutional community are excluded where necessary, and to educate members of the community that giving outsiders an account on a machine also gives them access to institutional resources that they may not be entitled to; there are some real dangers of access control breaches by the creation of proxies either through ignorance of the implications or deliberately.
The major problem, from a feasibility point of view, is that many legitimate users are not coming through the institutional network at all times; they may want access through commercial ISPs, at their workplaces outside of the institution, or from home. Some other solution is needed to handle these users.
One should not underestimate the management complexities of IP source address based access management, particularly from the point of view of a resource operator. Configuration changes are frequent, and configurations for a large licensee institution can be quite complex. Also, the move from the older class-based network addresses owned by institutions to classless IP network addressing with the address space managed by the ISP has introduced new problems; not only must the licensee institution get the network masks right, but there’s no easy way for the resource operator to independently verify this (for example, that an institution’s network is a /18 rather than a /19).
Authentication Strength: Source IP filtering is actually relatively strong. While it’s not difficult to introduce packets to the network with spoofed source addresses unless appropriate packet filters are in place (and this has become a major problem in the context of network denial of service attacks), getting responses back to a spoofed network address is much harder, and basically involves hijacking entire network addresses within the routing infrastructure. This is relatively unlikely; it’s a sophisticated and complex attack, and is very likely to be noticed quickly. Resolving the threat of IP spoofing needs to be addressed at the network routing infrastructure level, and considerable work is going on in this area (packet filters and authenticated BGP peering, for example).
A specific machine with an excluded source IP address that sits on a generally authorized network can circumvent that restriction more easily, if the machine isn’t under institutional administration (for example, its owner can just give it a new IP address on the same network.)
Source IP filtering isn’t subject to systemic compromise, and doesn’t come with export control restrictions.
Granularity and Extensibility: To the extent that membership in specific groups can be linked unambiguously to specific network addresses (for example, in an office, a dorm room, or a computer lab) fine grained access is feasible. Such direct linkage is often not the case, however; students in a class may share use of a computer lab, or need to use public workstations in a library.
Cross-Protocol Flexibility. Since all protocols of interest run on top of IP, source IP address based access control is quite universal.
Privacy Considerations: To the extent that source IP addresses can be linked to individuals (for example, personal workstations in offices) there are some privacy issues. And certainly source IP addresses are correlated to demographics, if the resource provider is willing to invest in understanding the campus network architecture. Access in a source IP filtering authentication environment is probably somewhere between anonymous and pseudononymous, with some ability to move from pseudononymous to identified access in individual cases if the resource provider is willing to go to the trouble to do so (this is the case of personal workstations used primarily by a single individual).
Accountability: There is limited accountability — at the level of machines rather than people — which mirrors the privacy situation. One has relatively good accountability for individually-owned personal workstations and relatively poor accountability for everything else; for a large, shared machine one gets accountability to the machine level, and then has to work with the administrator of that machine to identify a specific user or users. If dynamic IP address assignment is used (as is often the case for laptops in public areas, for example), then accountability is particularly weak.
Management Data: An institution can collect some usage data at a highly aggregated level that is not well correlated to application-level constructs through a border router, or get aggregated usage data from the resource operator. Demographic data can be obtained to the extent there is correlation between IP address blocks and demographics (for example, there might be a campus subnet for a medical school); this demographic data will be sketchy and imperfect at best, and some differentiations (such as students as opposed to faculty) will be very hard to extract. Individual level usage data will be possible only in the case where there are personal workstations, and all work by an individual is done on that workstation.
Summary: IP source address based access management tracks the activities of machines rather than people. To the extent that there’s a very close correlation between the two, it works reasonably well. Unfortunately, the correlation has never been that good and many trends (such as the move from institutional modem banks to purchase of commercial dial up access to the internet) continue to weaken this correlation. IP source address access management may work particularly well for fixed-location, institutionally managed public terminals, such a public workstations in libraries or computer labs.
There are several additional issues and variations on source IP filtering which deserve some additional comment.
Many organizations are moving to dynamic assignment of IP addresses, either for limited situations such as laptops that may be docked in classrooms, computer labs, or public areas such as library reading rooms, or in some cases, campus wide in order to simplify address management. This dynamic assignment weakens accountability, strengthens privacy, and complicates the collection of meaningful management data. However, since dynamic IP addresses are assigned within an organizational network number, use of dynamic IP addresses does not invalidate the use of IP source address based access management.
To mitigate the problems with access via dialup ISP connections, a few universities have negotiated special arrangements with specific ISPs so that members of their community are assigned addresses on a specific (private) net or subnet when connecting via the ISP (since the ISP does authentication on the users as part of the establishment of the dialup connection, this is feasible if the ISP can maintain this information as part of its user attribute database). While this makes it possible to extend IP source authentication to dialup users obtaining service through the ISP, it should be clear that this approach will not scale reasonably to offer users a wide range of choice in the ISP marketplace (including wireless and cable TV based ISPs); it is most practical in situations with large educational institutions who have the marketplace power to negotiate such arrangements and where members of the institution’s user community are willing to select from at most a small number of competing ISPs.
Approaches using IP tunneling and/or Mobile IP type support can be used to mitigate some of the limitations of traditional source IP based access management schemes, though they may have considerable performance and complexity drawbacks. The next revision of the paper will include a discussion of these approaches.
Some organizations have used reverse Domain Name System (DNS) lookups on source IP addresses and then checked the DNS name in order to perform access management. This changes matters very little except that it means that access management must also rely on the security of the DNS system itself (which can be a problem; secure DNS is not yet deployed widely) and requires that all hosts have DNS names tabled, which is often not the case. This approach also does not work well with DHCP (dynamic assignment of IP addresses) which is often used to support laptop machines.
In some sense, proxy based approaches simply shift the problem, since an institution will still have to deploy an internal authentication and access management system in order to control use of the proxy servers. However, it may be easier to implement an internal system than to implement a system that must be used by a wide range of resource providers; proxies modularize and compartmentalize the authentication problem.
Let us assume for the time being that an institution has implemented a viable internal authentication system and analyze various proxy schemes under that assumption. Our comments, then, will only cover the proxy scheme itself, not the institutional authentication system necessary to support the proxy.
We need to distinguish between two different kinds of services that are sometimes referred to as proxies. The first, which we will call mechanical proxies, are services which take make use of facilities designed directly into implementations of protocols such as HTTP. To use a web proxy server, one configures a browser to pass all HTTP requests not directly to the destination host, but instead to a proxy server, which intercepts these requests and when necessary retransmits them to the true destination host. In this case, the operation of the proxy should be invisible to the end user.
The second type of proxy is what we will call an application-level proxy (historically, these have often been called “protocol translation systems” or “gateways”). An application level proxy functionally forwards requests where appropriate, but does not rely on protocol mechanisms. An example might be a Telnet proxy, where in order to reach an access-controlled Telnet based resource, one telnets to an institutional system; this might engage the user in an authentication and authorization dialog, and then mange a Telnet session to the remote resource, with some editing. In the web environment, a service such as the anonymizer (www.anonymizer.com) is a good example; here, one accesses the web page of the service and provides the URL of the remote resource one really wishes to access. The anonymizer service not only forwards requests on, but also dynamically re-writes each page coming back from the remote resource prior to presenting it to the end user, for example, replacing each URL in the retrieved page with a URL that accesses the anonymizer with a parameter of the actual remote page that is being requested. As the environment becomes more sophisticated, applications proxies become increasingly problematic: for example, an applications-level proxy generally will not handle pages that contain Java applets properly.
Feasibility and Deployment: This is not entirely straightforward. Proxies introduce a considerable amount of overhead, and the institution will need to invest in the installation and operation of proxy servers. Some overhead may be mitigated by having the proxy server perform caching operations as well as access management, although this introduces a range of other responsibilities and problems. Also, proxy servers become mission critical systems; they need to be available and reliable, and to be sized so that they do not represent a performance bottleneck.
Proxies — and in particular application level proxies — have scaling problems not only in terms of computational resources to support a large user community, but also in terms of configuration management and support as the number of resources available to the user community multiply. Each resource needs to be configured, and as resources change, configuration changes will be needed in the proxy.
In the case of mechanical proxies, user browsers have to be properly configured to make use of the proxy rather than communicating directly with resources on the network. This will be a particular problem when pre-configured browsers are supplied by sources other than the licensee institution; for example, cable-TV based internet service providers like @home make extensive use of proxies and caching within their own networks, and supply browsers that are configured to use the ISP’s network. In the case of applications level proxies, users will have to be taught to go through the application in order to reach remote information resources.
Integrating a local authentication system with a commercial (usually mechanical) proxy server may be non-trivial. Programming for an application level proxy can become quite complex. One useful distinction is the locus and complexity of decision making that the proxy must perform. At the simplest level, a proxy can just screen all potential users without regard to the resource that they want to access; essentially there’s a single authorization to use the proxy, and through it all of the resources that it permits access to. At a more complex level, the proxy might consider both the user and the resource in order to make an authorization decision; at the most complex level, it may track in detail the user’s interaction with various resources and make very specialized decisions about what requests it will and will not pass through to the resources.
Telnet application proxies are tricky to build (consider problems like the handling of break signals as they are propagated across the proxy), and as far as I know, standard commercial software to support construction of such proxies doesn’t exit. For Z39.50 applications, it’s certainly possible to construct custom proxies, although I am not aware of general purpose software to do this. The proxy strategy is a very general one architecturally.
From the point of view of the resource operator, proxies are easy to work with; they usually just look like a particularly simple form of IP source address authentication. However, they may raise some user support problems; if an institutionally-provided proxy is out of service or overloaded, the resource operator can expect complaints about bad service for reasons that are outside of its control.
Authentication strength: obviously, this depends on the local authentication system. There is the danger of systemic compromise if the proxy server is successfully attacked (that is, the local authentication built into the proxy server is broken) or the proxy is misconfigured. A breach of the local authentication system is likely to be a very high visibility event which will receive rapid response from the licensing institution; a breach of the proxy may be more insidious and more difficult to detect. The communication between the proxy server and the resource can be very strongly secured and authenticated using certificates and session level encryption.
Granularity and extensibility: in theory, anything is possible if enough work is done on the proxy server. For fine-grained access control, however, it’s necessary for the proxy to consider who is trying to access what, rather than just having the proxy server authenticate members of the user community prior to any use of the proxy. It’s not clear how hospitable commercial proxy software is to this kind of application, or how complex the institution-specific programming will have to be; the more complex it gets, the more likely there are going to be security vulnerabilities.
Cross-Protocol Flexibility: Because the authentication mechanism used between proxy and user and between proxy and resource need not be the same, there’s a particularly high level of cross-protocol flexibility. In the worst case, the proxy can use a very general authentication approach like source IP filtering to support protocols between the proxy and the resource, and can use specialized methods (even embedded within application proxy code) to authenticate users to the proxy server.
Privacy: proxies can provide real anonymity of use if they are set up properly; the resource operator need not even get a source IP address for the end user. On the other hand, they provide a choke point for potential systematic institutional monitoring of what the user community is doing, which may be some cause for concern.
Accountability: in general, proxies provide poor accountability, since they offer anonymous access. At best, some level of accountability can be provided by correlating local logs at the proxy (which is tied into the local authentication system) and monitoring at the resource. In theory it would be possible for the proxy to pass some pseudonym or identity to the resource, but it’s not clear how this would be accomplished in a standard and interoperable fashion.
Management data: just as a proxy is a choke point for monitoring, it is also a choke point for collecting management data, including demographically faceted data or individual data since it authenticates users and then sees all of their requests to resources. Of course, correlating this to applications-level events and terminology is hard. It is not clear how a proxy could pass demographic data along with requests to a resource to permit faceted statistics collection at the resource side.
Summary: it’s hard to fully evaluate the proxy approach for two reasons. To some extent it just moves the authentication problem because it presupposes the existence of an institutional authentication system, and the problems of deploying such a system really need to be considered. Second because a proxy — particularly an applications level proxy — is a point at which custom programming can be inserted almost anything is possible, at least in theory, but it’s hard to evaluate the implementation and maintenance cost of such a system, and the extent to which it demands custom interfaces to the resources themselves, as opposed to using completely standard interfaces.
4.3 Credential based approaches
In a credential based approach, the user interacts directly with resources on the net rather than working through an institutionally-provided proxy intermediary. The key problems here are:
- What are the credentials that the user presents to the resource?
- how are these credentials presented securely?
- how are the credentials validated with the issuing institution?
For a credential based approach to scale, all of these activities need to take place in a standardized fashion. The most commonly discussed credentials are X.509 certificates, which are attractive because browsers and servers already have some support for them (designed to enable electronic commerce) and because other software components needed for an X.509 public key infrastructure are already becoming available on the marketplace. However, many other forms of credentials are possible, including userids and passwords, one time passwords, and the like. Indeed, it’s useful to differentiate between application-level credentials — where the collection of the credential and its validation is packaged into the application itself, such a obtaining and checking a userid and password — and credentials which are built into protocol mechanisms, such as the use of certificates with HTTP and SSL. The protocol based mechanisms are more general and often require less work to implement on the part of the resource operator, but are less familiar to end users, calling for a larger investment in infrastructure and user education.
Credentials can be confusing to analyze because they can potentially carry both authentication and attribute information together, or they can be used purely (or almost purely) for authentication.
We will analyze two credential-based approaches: a userid/password scheme at the application level, and a certificate based approach.
4.3.1 Password based credentials
Assume that institutions simply maintained databases of (pseudonymous or identified) user ids and passwords. Note carefully that the idea here is that a member of the institutional user community has a single userid and password for access to all licensed resources, and not a separate userid and password for each licensed resource.
Using SSL-encrypted forms (which eliminates the problems of transmitting passwords in the clear), it would be fairly easily for a resource to ask for this userid and password securely; one could then have a special purpose protocol so that a resource could securely check whether the userid and password were valid by querying an institutional userid/password database server. Note that SSL can set up an encrypted connection with a server certificate but no client-side certificate.
The special purpose userid/password checking protocol doesn’t exist today, but is not hard to design or implement, and since it only needs to be implemented by the resource operator and by an institutional server or two at each licensee institution, it might be much less problematic than making all licensee community users go through the complications of obtaining and installing certificates on their machines. Further, similar protocols for userid/password checking are already in use for validating users to terminal servers (i.e. TACACS, RADIUS); these might be used, or at least adapted.
Users are already familiar with user ids and passwords, including the need to keep passwords secure, to change them, and to pick them well (or at least they are more familiar with these issues than, for example, certificate use). Userids and passwords can be carried in the minds of people rather than being installed on specific machines the way that certificates are; this helps with kiosks, computer labs, libraries and other shared machine settings — assuming that one can teach the user to log off when he or she is finished, rather than just leaving the machine signed on. Probably the biggest problem with this approach — which is not shared with certificates — is that the resource operator obtains a set of globally valid credentials for the user, and has to be trusted to keep them secure. There are also some secondary problems — Trojan horse resources that capture user ids and passwords under false pretenses, for example, are a much more serious threat than they are in a certificate exchange environment.
Let’s consider passwords and user ids carried over SSL encryption from the perspective of our requirements definition. It’s clear that they are feasible and deployable. Assuming that a protocol for verifying user ids and passwords with an institutional server is standardized and deployed, the amount of work faced either by a licensee institution or a resource operator is quite manageable. Special desktop software is not required for web access; for other protocols, such as Telnet, an SSL- capable Telnet is needed (my understanding is that some of these are under development). Z39.50 credentials are a particular problem because no Z39.50 interface to a service like SSL is currently defined. User ids and passwords are clearly linked to people rather than network addresses of machines. One problem with userids and passwords is that they don’t encourage seamless navigation among resources; each resource is going to explicitly annoy the user by asking for his or her userid and password on each visit.
While passwords represent relatively weak security, a system can be put in place to require them to be difficult to guess (by forcing the use of pass phrases rather than passwords, or avoiding use of words in a dictionary), and also insisting that they be changed frequently. The use of an SSL based transport removes the security problems of transmitting them in the clear. The protection provided by SSL will depend on whether US-only (long key) or international (short key) versions of SSL are supported by the user’s browser. Userids and passwords are subject to systemic compromise from two perspectives; if the institutional password verification server is compromised, new passwords would have to be issued to all members of the user community. Also, each resource operator now shares in the responsibility for keeping userids and passwords secure; if any resource operator’s site is retaining user ids and passwords, and is compromised, this will compromise all other resource operators as well as the home institution (if the institution is using the same userid and password for internal and external authentication and authorization purposes).
Granularity and extensibility. An institutional password server will just verify that a particular userid/password combination is valid (it would also know what resource operator was asking). In situations where an access management decision needs to be made that goes beyond validity of the userid/password pair, the key question is the locus of that decision. The resource operator will either have to maintain a list of valid Ids (identities) or the password server will have to keep information about what resources a userid has access to. Or the institution would have to offer resource operators access to a user attribute database keyed on userid.
Cross-protocol flexibility: because passwords operate at a higher level of abstraction than protocols they are general. Telnet and Z39.50 support should be straightforward, assuming that there is encryption on the link over which the passwords are transmitted, as discussed above.
Privacy and accountability. The use of user ids and passwords transfers personal information directly to the resource operator. This information may be pseudononymous or identified; it will not be anonymous. To this extent, it undermines privacy but offers accountability. Management data faceted by demographic categories will be available from the resource operator only to the extent that the licensee institution provides demographic data as a byproduct of userid/password validation. there is no opportunity for the licensee institution to collect statistical information directly, other than a count of how often userid/password pairs are validated by the various resource operators.
Summary: to the extent that an institutional password verification server controls the export of individual and demographic information, passwords could work surprisingly well in an SSL-protected context. A primary benefit is that users are familiar with the model. There are important missing pieces here, particularly the protocol to permit resource operators to verify userid/password pairs with institutions that issued them. Probably the greatest weakness of this approach is the dependency on each resource operator to protect userid/password pairs, and the danger of systemic compromise due to a security failure on the part of a single resource operator.
Further comments. Clearly, by issuing different passwords and userids for different resources, it is possible to reduce the interdependence among resource operators and the dependence on each resource operator in maintaining security. However, large numbers of passwords and userids are extremely unfriendly and confusing for users, and probably impractical. For users who only use a single machine (or who are willing to store a cookie file in a network file system), and for resources that don’t require high security, it’s certainly possible to store userids and passwords as cookies on the user’s machine (though many users have become “cookie-phobic” due to the overly dire publicity surrounding cookies); once stored, the user doesn’t have to enter them at all, improving seamless cross-resource navigation. This is the approach that is taken by many low-security commercial services in the consumer marketplace today.
4.3.2 Certificate based Credentials
X.509 certificate based credentials are substantially more complex than passwords, but offer a number of advantages. In essence, an X.509 certificate (plus the private key that goes with the certificate) gives a machine credentials that support its right to make use of a name, and allows this assertion to be verified by checking with a certificate authority (which might be operated by the licensee institution, or operated by a third party under contract to the licensee institution). X.509 certificates include expiration dates, and certificate authorities can also provide revocation lists to invalidate certificates prior to their expiration date (though checking such lists can involve substantial overhead, and not all systems supporting certificates currently check revocation lists.)
Rather than making a complete analysis of certificate based credentials, we will simply highlight how they differ from the password based credential approach already discussed.
X.509 certificates and corresponding private keys are messy to distribute (much more so than, for example, a starter single use password for a local authentication system), and complicated for users to install, particularly in cases where the certificate needs to be installed in multiple machines owned by a single user. Backup and recovery needs to be considered carefully lest a user lose his or her certificates permanently as the result They are highly intractable in cases where users share machines, such as public workstations. X.509 certificates can contain demographic data (though there are standardization problems here about how to encode them in the certificate payload) which could be used for resource-operator based statistics gathering or fine-grained authorization decisions.
In contrast to passwords, there is already a well defined protocol/process which can be used to validate an X.509 certificate-based credential that has been presented to a resource operator.
Note that an X.509 certificate based credential does not consist of simply the certificate itself, but rather a complex object that includes the certificate and is signed with the (secret) private key corresponding to the certificate; since this is computed anew each time a credential is needed, X.509 based certificates do not share the password-approach problem that security depends on each resource operator carefully protecting the user’s credentials.
Userids and passwords are application level constructs; they can be designed into an application using any protocol, assuming only that the connection can be encrypted. The exchange of X.509 certificates is a lower level, protocol-integrated operation and does not rely on encryption. Thus, there is work involved in extending the use of X.509 certificates to work with protocols other than HTTP, such as Telnet. (Z39.50 already contains facilities for certificate exchange). There is also still a need for an SSL-type service to encrypt the connection where confidentiality is desired; SSL can also handle many aspects of certificate exchange without the need for upper level protocol engineering, if it is available (though the application — if not the applications-level protocol — still needs to know something about certificates). One advantage of certificates is that they are more flexible than most other mechanisms; they can be used for signing electronic mail messages, for example (though generally a separate key is used for signing). And much of the current work on new protocols and services — for example in the Java environment — seems to be based on certificate models.
The issues involving privacy, accountability and management data change little from the password scenario already discussed. One point worth noting that if the user has several certificates — for example, an identified one for use with an internal institutional authentication and authorization system and a pseudononymous one for use with external services — he or she must select the correct certificate for presentation in order to maintain privacy.
4.4 Proxy/Credential Hybrid Schemes
There are several interesting and confusing schemes that after much discussion the initial reviews of the paper recognized are really hybrids of the proxy and credential approaches. In these schemes, the user contacts an applications proxy in order to gain access to the resource. The proxy authenticates the user, checks his or her authorization, and then prepares and submits a set of credentials to the resource. After the user’s connection to the resource is established through these credentials, the proxy steps out of the way (via an HTTP redirect) and the user interacts directly with the resource. This has several useful results. It greatly reduces the overhead generated by use of a proxy, and minimizes the resource requirements for the proxy machines. It reduces some of the privacy concerns related to the proxy. And it means that short lived rather than long-lived credentials (something perhaps more akin to a Kerberos ticket, philosophically, though it may be embodied in a certificate based credential) can be sent to the resource operator; further, it may avoid the need to store these short-term credentials locally on the end user’s machine.
Return to Contents
Both proxies and credential-based authentication schemes seem to be viable approaches. Proxies have the advantage of compartmentalizing and modularizing authentication issues within an institution. But they also place heavy responsibilities upon the licensee institution to operate proxy servers professionally and responsibly. Proxy servers will become a focal point for policy debates about privacy, accountability and the collection of management information; successful operation of a proxy server implies that the user community is prepared to trust the licensee institution to behave responsibly and to respect privacy. Similarly, resource operators have to trust the licensee institution to competently implement and operate a local authentication system; anomaly monitoring of aggregated traffic from a proxy server by a resource operator is very difficult, and the resource operator will have to largely rely on the institution to carry out a program of anomalous access monitoring.
A cross-organizational authentication system based on a credential approach has the advantage of greater transparency. Resource operators can have a higher level of confidence in the access management mechanisms, and a much greater ability to monitor anomalous access patterns. The downside is much greater complexity; issues of privacy, accountability and the collection of management statistics become a matter for discussion among a larger group of parties. Further, it seems that a credential system means that there has to be cross organizational interdependency in order to avoid systemic compromise of the authentication system, as opposed to a simple relationship of trust — recognized in a contract — for the proxy approach.
One point that seems clear is that an institutional public key infrastructure may not extend directly to a cross-institutional one; it may be desirable to issue community members a set of pseudononymous certificates for presentation outside the institution as well as individually identified ones that are used within the institution in order to provide a privacy firewall while still maintaining some level of accountability.
IP source filtering does not seem to be a viable general solution, although it may be very useful for some niche applications, such as supporting public workstations or kiosks. It can be used more widely — indeed today it usually is the basic access management tool — but it definitely cannot support remote users flexibly in its basic form. Most real-world access management systems are going to have to employ multiple approaches, and IP source address filtering is likely to be one of them.
Reviewers of the first draft of the paper were very concerned with the costs of deploying access management systems and the supporting authentication infrastructure. There is relatively little good data on this, though some early adopter institutions are seeing rather high costs, particularly for public key (certificate) based approaches. There is an urgent need to develop a bettter basis for estimating the initial deployment and operating costs of the various approaches, and this need should addressed in any follow-on work to the white paper.
A final issue: this white paper has focused on inter-institutional issues in authentication and access management. It should be clear that the role of the licensee institution as a mediator adds some very significant value for the members of the user community. There are many users of networked information resources who do not have a natural affiliation with a licensee organization, and who thus do not have a way to obtain these benefits. We can expect these users to seek affiliations — such as that of alumni — which allow them to obtain these benefits. The idea of being able to have a single ID that allows access to a vast array of networked information resources is a very powerful one, and it is one that today is available only in an institutional context.
Return to Contents
This appendix provides a snapshot of the currently available state of the art for key software components in terms of their support for authentication, authorization and access management. One of the key issues that the white paper has identified is the need for off the shelf software to provide the needed facilities, particularly at the user’s desktop.
Two web browsers — Netscape Navigator and Microsoft’s Internet Explorer — currently dominate the broswer marketplace. Both support a wide range of platforms, including Microsoft windows, the Mac OS, and various varieties of UNIX.
Both browsers support SSL for encrypting forms that include passwords. It is worth noting that while both browsers support 128-bit encryption in their US-only products, users must take special action to obtain these versions and the vast majority of users probably are still running the much less secure 40-bit export qualified versions that are available as the default distributions. Both browsers support proxy servers as a configuration option. Both browsers support the incorporation of X.509 certificates. The browsers do not yet support certificate revocation lists (verify this).
There are many problems with certificates. They are not simple for the average user to import. Certificate backup and recovery (for example, in the case of a disk crash) is a problem. Certificates may not be moved smoothly as part of an upgrade; they definitely won’t move if a user switches between Netscape and Internet Explorer (Netscape will import IE certificates via explicit action, but neither browser will simply make use of certificates installed in its competitor).
Both browsers include a built-in Telnet. This Telnet does not support SSL for protecting the transmission of user ids and passwords. Both browsers can be configured to use independent Telnet helper applications rather than the build it Telnet. I am aware of work going on in the Mac world to provide a stand-along Telnet application which incorporates SSL encryption. Reconfiguration of any browser to substitute an external Telnet is non-trivial for the average user.
One issue that was identified during early reviews of this paper was the Lynx character-based web browser. Lynx is important for two reasons: because there is still a large installed base of trailing-edge character-based terminal technology, and, perhaps more compellingly, because Lynx, in conjunction with other specialized assistive software, is a key part of many institutional strategies for meeting the needs of disabled users and the requirements of the Americans with Disabilities (ADA) law. Lynx capabilities remain to be researched.
Commercial web servers from Netscape and Microsoft support SSL, as does Stronghold (commercial Apache); Apache proper supports SSL only on a limited basis with the addition of the shareware SSLeay module.
Need to review X509 support, including what Certificate Authorities are supported/will issue, support of Certificate Revocation Lists, etc.
Return to Contents