Subject: Re: GILS Report available
Carl Hage (carl@chage.com)
Date: Fri, 7 Aug 1998 12:15:59 -0800
Message-Id: <199808071958.OAA15272@rgate2.ricochet.net> From: "Carl Hage" <carl@chage.com> To: gils@cni.org Date: Fri, 7 Aug 1998 12:15:59 -0800 Subject: Re: GILS Report available In-Reply-To: <v04003a02b1effc3bd12d@[204.210.227.47]> References: <Chameleon.902352942.patrice@patrice.rtknet.org>
It seems to me there is confusion as to what GILS is supposed to be --
is it a list of agencies and major infosystems, e.g. like the US
Government Manual, or is it supposed to be a card catalog like MOCAT
only much bigger. Counting GILS records or hits doesn't seem that
useful in evaluating the status of GILS.
Looking at the numbers, the number of GILS records is typically fewer
than the number of agencies and/or departments that exist within a
cabinet department or major agency, and virtually 0 compared to the
number of documents that must exist. The numbers are too small for
compliance to be the problem.
On Fri, 7 Aug 1998, David Landsbergen <landsbergen.1@osu.edu> wrote:
>
> I read the report but I'd still like to talk about the best way to
> get GILS implemented. I believe that we can talk all we want about
> statutory authority but the tools must be available to these agencies
> to allow them to implement.
I agree with that -- if good tools existed (and addressed various needs)
then people would use GILS. We often see messages posted here saying
something like, "OK I read the spec and created a GILS record -- now what
do I do with it?"
One of the problems, in my opinion, is that the tools are highly focused
on query searching, rather than on methodology for collecting,
maintaining, and exchanging data. GILS now essentially means a system
where you can tell it a few words and you get back a few entries within
the local system containing those words. Data exchange, browsing, and
multiple re-use of the metadata records is pretty much nonexistant.
Improved authoring tools are needed. The "flat" GILS record is very
complex with many fields common across documents within an
agency/department. Authoring tools need to include/reference predefined
common data like Distributor rather than require it to be re-entered.
(For that matter, things like Distributor contact info should be stored
and updated on the Distributor's WWW site, and referenced rather than
just copied into a GILS record. We need a live distributed database,
not a paper card-catalog analog.)
Tools besides search and retrieval are needed. GILS would be useful for
cataloging WWW sites, and tools processing GILS records could be used to
create a WWW Table of Contents or to produce WWW catalogs/directories of
documents. GILS could be tightly integrated with WWW site
implementation, but tools need to address this vs simplistic word
search.
The binary format of Z39.50 is inappropriate as an exchange format, and
severly limits how easily GILS records can be created and exchanged.
Instead, a simple ASCII format is needed which can be created and read
by a person with no special software, but is also machine readable with
software trivial to write. Z39.50 records can't be accessed or
exchanged without sophisticated tools, and software to process data can
be very complex and difficult to write. Rather than use a protocol
optimized for a search engine and database, the protocol should be
optimized for human use and 10 line perl programs.
Take the Linux Software Map (LSM), as an example, which is a voluntary
system cataloging about 4000 software packages written for Linux, with
multiple WWW sites serving as front-ends. This is a simple ASCII format
which can be prepared without any tools (i.e. just a text editor). LSM
entries are typically embedded within announcements posted to
newsgroups, email lists, and README files. LSM can be trivially
extracted, and new applications can be easily written to load the LSM
into a database for searching or HTML page generation.
I think Elliot was on the right track mapping GILS into XML. I plan to
comment extensively on that, but haven't had time yet. XML will have
the ability to represent GILS information appropriately within a
distributed network accessable database, though it still is suboptimal
for human use. However, a simple SUTRS-like format could be mapped to
and from SGML/XML with trivial software.
The biggest problem with GILS is the lack of efforts and tools dealing
with controlled vocabularies (standardized sets of index terms). There
are few mechanisms to create, maintain, exchange, and access CVs.
Effective searching and browsing rely on use of CVs. Not only are there
few tools to deal with CV, the CV content is often made inaccessable so
noone can even write tools. Take the LCSH for example, which is
probably the best general CV. The LOC provides free value-added
services for limited searching, but does not provide access to the data
in a reusable form. The data is copyrighted outside the US, and
effectively copyrighted within the US by restricted distribution
methods. CD-ROMS bundled with copyrighted proprietary software aren't
too expensive, but the same data without the software costs thousands of
dollars -- too expensive for small-scale tool writers.
Basic access and pseudo-copyright are a major impediments. The OMB and
Congress doesn't need to enforce GILS compliance -- it needs to redefine
concepts like "Public Notification" and "Archival" to require
net-accessable electronic archival of public data in source form.
Official documents only available via fee-based services should fail to
meet the legal standard for public notice. Agencies should have a
mandate to first make public data accessable in full source/reusable
form, then offer value-added services, distributions, and publications.
Agencies should be prohibited from bundling public data with proprietary
software, offer limited-access "value-added" service (like word search),
or engage in licensed and-or fee-based distributions without first
storing data in a public archive. Protecting information sales is
precluding the ability to implement modern network database technology,
and in turn improve service and cut costs. How can we expect GILS to be
a success when the LCSH data needed to create GILS records costs
thousands of dollars and MOCAT (the closest thing to GILS) is mainly
distributed via expensive private services?
While we spend huge amounts of public money on super high speed
fiber-optic links, we've neglected the "Information" part of the
"National Information Infrastructure". Fundamental data needed by
almost all infosystems, such as the definition of a postal zip code, has
been purposely excluded from the NII, precluding many applications that
might benefit. (The USPS apparently escapes the federal copyright
exclusion, and sells copyrighted ZIP code data bundled with software
marketed to junk mailers. They offer free lookups of single zip codes,
but for more only sell data via CD-ROM. Ironically, the graphics on the
WWW pages guiding you to the order form is more data than what they
sell. Numerous databases (e.g. Census) have been invalidated due to
zip code changes, and the USPS is not required to post these changes, so
there's no way for all these infosystems to relate current zip codes to
existing data.)
--------------------------------------------------------------------------
Carl Hage C. Hage Associates
<mailto:carl@chage.com> Voice/Fax: 1-408-244-8410 1180 Reed Ave #51
<http://www.chage.com/chage/> Sunnyvale, CA 94086
This archive was generated by hypermail 2a16 : Tue Mar 23 1999 - 03:55:43 EST