 |
 |
 |
 |
 |
 |
|
|
|
CNI White Paper on Networked Information Discovery and Retrieval
Outline: Chapter 1
The Nature of the NIDR Challenge
- initial brief definition of terms: networked information discovery and
retrieval; network resources (objects); metadata (including comments on
etymology).
- scope of problem that is focus of this paper: how to improve ability of user
to discover and access resources the current internet-based networked
information resource environment. Extent of software mediation in the NIDR
process. How a mix of free and for-fee information will change the picture.
- Characterization of key features of the networked information environment
relevant to NIDR problems:
- very large scale, rapid growth; dynamic addition and relocation of
information resources
- extremely heterogeneous nature of resources
- wide variation in granularity of resources; hierarchical resource
organization
- multiple generations of information resources and supporting access systems
- distributed and autonomously managed resources
- wide variation in quality of resource content and implementation
- growth of "self-publishing" models of information distribution
- combination of free & for-fee information resources
- combination of public and private information spaces
- no commitment by information providers to offer service registry within a
central framework
- very heterogeneous user base; varying expertise and needs, varying access
capabilities
- unrealistic (and poorly articulated) user expectations
- poorly defined user selection requirements
- information overload: too much overall information, and too much relevant
information
- a closer look at the discovery process:
- discovery as an iterative research activity; different kinds of discovery.
- discovery as "catalog use"; performed by humans
- components of discovery as a process: selection, collocation, duplicate
elimination, ranking/differentiation, browsing, determining "fitness for use".
- hierarchical searching and granularity; discovering systems/information
spaces; knowing where to search
- the continued need for surrogates for objects in discovery on the net;
arguments based on limited ability to fetch headers that are object components
selectively, performance issues, economic and intellectual property issues
(i.e. separate creation, control and distribution of surrogates and primary
objects)
- automated support for discovery as a continuing process: SDI, personal
agents, filters
- a closer look at the retrieval/access process:
- defined primarily by existing (simple) network retrieval protocols; these put
an undue burden on discovery
- parameters of access processes, e.g. costs and formats (static vs. dynamic
issues); poor accommodation by current protocols
- multistage, sequential nature of access/retrieval & subsequent use of
network information objects.
- low levels of interoperability targeted (moving bits, or application-specific
file formats)
- key problems with achieving current NIDR objectives:
- objects as viewed in the NIDR context are extremely simple
- classic information retrieval issues; heavy use of natural language
- lack of data sources (cataloging) upon which to base discovery
- networked information retrieval issues (extended retrieval)
- performance and architecture problems (technical issues) in large scale
distributed systems
- incorporation of nontextual objects and their description
- nontechnical issues with major architectural implications: privacy, security,
intellectual property, charging for information
nidrcall@cni.org
![[CNI Home Page]](/Images/home.gif)
CNI
21 Dupont Circle Suite #800
Washington, DC 20036-1109
202.296.5098
<http://www.cni.org/>
| | |