Loading
 

Taming the Data Shrew: The National Science Board’s Priorities and Recommendations on Scientific Data Management

Jose-Marie Griffiths
Vice President for Academic Affairs
Bryant University

The increasing ease of gathering large amounts of varied data (including digital data, research specimens, artifacts, etc.), and funding of large-scale collaborative projects, have caused the broad policy issues surrounding the management of scientific and engineering research data to become critically important. The National Science Board (NSB) appointed a Data Policy Task Force to explore these various issues and develop a set of recommendations as to how collected data are shared and managed to ensure broad, timely, and long-term availability and accessibility to the entire research community.

This session will present the results of the Task Force’s work and the NSB priorities and recommendations on scientific data management based on the Task Force’s input. It will also include a discussion of the impacts, challenges, potentials and possible next steps for higher education, publishing, information technology, scholarly and professional organizations, foundations, and libraries and library organizations, especially as it relates to NSF-funded grants and projects.

 

Too Big to Know

David Weinberger
Co-Director, Harvard Library Innovation Lab
Harvard University

The Internet is probably making us both smarter and stupider, but it’s also changing the nature of knowledge itself. In this informal session, David Weinberger will lay out some of the themes of his new book, “Too Big to Know,” to initiate an open discussion about how the changing nature of knowledge affects us, and what we as librarians, technologists, parents, and teachers can do to seize the opportunity to enter a new Renaissance, instead of a new Dark Ages.

Transforming Research Support Services

Jennifer Rutner
Senior Analyst
Ithaka S&R
Roger Schonfeld
Director of Research
Ithaka S&R

Rapidly evolving research methods and practices across disciplines are changing the nature of scholars’ interactions with service providers such as libraries, computing support centers, humanities centers, scholarly societies, and publishers. As a result, many scholars have become less dependent on traditional library information services, and research support service providers would like to better understand the evolving research practices of their users, in order to transform their services in parallel. Ithaka S+R has launched the Research Support Services for Scholars program to engage scholars and research support professionals in building a deeper understanding of the needs of researchers, the support landscape, current and evolving practices, and the challenges both communities face in conducting and facilitating innovative research. The first disciplines to be covered in this new Ithaka S+R program are history and chemistry.

This session will provide an overview of the Ithaka S+R Research Support Services for Scholars History Project. Presenters will share preliminary research findings, including data gathered through interviews with twenty research support professionals and forty academic historians. In this session, presenters will facilitate discussion about the user research needs of the information services community, organizational challenges that research support organizations face in meeting evolving needs, and effective ways to consider research findings in the ongoing transformation of research support service environments.

http://www.researchsupportservices.net/

 
 

Under the Blacklight: Open Source Content Management for Collaborative Digitization Projects Involving Mass Digitization of Archival Materials

Eric C. Weig
Director, Digital Library Services
University of Kentucky

Institutions continue to digitize their unique holdings, with the current impetus to get more and more of that archival content online quickly. Traditionally to enable access, this has been a slow process of relating images to finding aids or digitizing a few select items from a collection. Content management systems have often done a poor job in connecting the finding aid to its digitized objects in meaningful ways.

The University of Kentucky has built upon its extensive experience with mass digitization of newspapers and archival description to envision an approach to mass digitization for archival collections. In the past year, the University has developed an open source content management system utilizing the Blacklight discovery software. It has also developed a method for quickly digitizing and loading complete archival collections described with Archivist Toolkit at the item or folder level while seamlessly integrating the finding aid and digital objects together with the assistance of generated METS files.

This session will include a presentation of Kentucky’s new open source content management system as well as an outline of the approach developed to automate all steps in the digitization process beyond image capture and Encoded Archival Description (EAD) creation for archival collections. The automatically generated METS objects will be displayed and discussed as well as a demonstration of additional content encompassed by the new content management system including historic newspapers and oral histories.

Project contributors include Eric Weig, Dr. Michael Slone, Deirdre Scaggs, Mary Molinaro, and Dr. Doug Boyd.

http://eris.uky.edu
http://eris.uky.edu/exploreuk

The United States End-of-Term Web Archive

 

Abbie Grotke
Web Archiving Team Lead
Library of Congress
Kathleen Murray
Post-Doctoral Research Fellow
University of North Texas

In the spring of 2008 an ad-hoc collaboration was formed to build a comprehensive archive of the United States Federal Government Web domain before, during, and immediately after the transition to a new presidency. The Library of Congress, the Internet Archive, the California Digital Library, the University of North Texas and the Government Printing Office collaborated to assemble a comprehensive list of sites, provide a nomination tool to engage federal documents experts in site selection, and distribute the work of harvesting content. This presentation will include discussion of various aspects of the ongoing collaboration, including recent work to provide researchers access to the archive, which consists of over 3000 sites, and plans which are underway for collecting in 2012 and 2013. The archive will be demonstrated at this session. The speakers will also discuss a two-year grant from the Institute of Museum and Library Services (IMLS) funding research into comparing machine clustering of Web pages to classification by subject matter experts.

http://eotarchive.cdlib.org/index.html

 Presentation (PDF)

Update on the Activities of the Board on Research Data and Information (BRDI)

 

Clifford Lynch
Executive Director
Coalition for Networked Information

The National Research Council (NRC) established the Board on Research Data and Information (BRDI) in 2008 with the mission “to improve the management, policy, and use of digital data and information for science and the broader society.” As BRDI begins its second term of membership, with Francine Berman (Rensselaer Polytechnic Institute) and Clifford Lynch (CNI) serving as co-chairs, the major activities in the near term will include: a consensus study on the future career opportunities and educational requirements for digital curation, an international data attribution and citation initiative, a sustainability study on publicly funded research databases, an international symposium on intellectual property rights in scientific databases, and of course the continuing meetings of the Board itself. This session will provide an update and overview of the Board’s recent and planned activities, including the February 2012 Forum on CODATA-World Data Systems Cooperation.

http://sites.nationalacademies.org/PGA/brdi/index.htm

Why Google Scholar Has Trouble Indexing Institutional Repositories


Kenning Arlitsch
Associate Dean for IT Services
University of Utah
Patrick O’Brien
Search Engine Optimization Manager
University of Utah

Google Scholar (GS) has difficulty indexing the contents of institutional repositories (IRs) because most IRs use Dublin Core metadata, which cannot express bibliographic citation information adequately for academic papers. GS’s Webmaster Inclusion Guidelines site cautions to “use Dublin Core only as a last resort,” and recommends other metadata schemas instead. It also recommends specific guidelines to facilitate crawlers, including writing metadata from the repository database to HTML headers. Surveys of institutional and disciplinary repositories across the United States were conducted and the inquiries revealed indexing ratios to support the hypothesis that IRs that do not follow these metadata and crawl guidelines suffer from a low indexing ratio. Survey results also demonstrate that the low indexing ratio problem cuts across institutions and repository software. Three pilot projects were conducted that transformed the metadata of a subset of papers from USpace, the University of Utah’s institutional repository, and examined the results of Google Scholar’s harvest. The pilot projects were successful, achieving a 90% indexing ratio.

This presentation will cover the highlights of a paper that is being published in March in Library Hi Tech. The broader research initiative emphasizes search engine optimization for all digital repositories, including general digital library collections, and has recently been funded by a 3-year National Leadership Grant from the Institute of Museum and Library Services.

 

Roadmap for the Spring 2012 CNI Member Meeting, Baltimore, MD, April 2-3

A Guide to the Spring 2012
Coalition for Networked Information Membership Meeting

The Spring 2012 CNI Membership Meeting, to be held at the Sheraton Inner Harbor  Hotel in Baltimore, Maryland on April 2 and 3, offers a wide range of presentations that advance and report on CNI’s programs, showcase projects underway at CNI member institutions, and highlight important national and international developments.  Here is the customary “roadmap” to the sessions at the meeting, which includes both plenary events and an extensive series of breakout sessions focusing on current developments in networked information.

As usual, the CNI meeting proper is preceded by an optional orientation session for new attendees – both representatives of new member organizations and new representatives or alternate delegates from existing member organizations – at 11:30 AM; guests are also welcome.  Refreshments are available for all at 12:15 PM on Monday, April 2.  The opening plenary is at 1:15 PM and will be followed by three rounds of parallel breakout sessions.  Tuesday, April 3, includes additional rounds of parallel breakout sessions, lunch, and the closing keynote, concluding around 3:30 PM.  Along with plenary and breakout sessions, the meeting includes generous break time for informal networking with colleagues and a reception which will run until 7:15 PM on the evening of Monday, April 2, after which participants can enjoy a free evening in Baltimore.

The CNI meeting agenda is subject to last minute changes, particularly in the breakout sessions, and you can find the most current information on our website, www.cni.org, and on the announcements board near the registration desk at the meeting.  Information about wireless access in the meeting room areas will be available at the registration table.

The Plenary Sessions

I am delighted that James Duderstadt, President Emeritus of the University of Michigan and currently University Professor of Science and Technology there, will join us as our opening speaker.  He has been a prominent, articulate and thoughtful leader in higher education and research policy for decades, with an extensive record of public service (you can find details on our website).  Of particular relevance to the CNI community is his involvement with the evolution of ideas about research cyberinfrastructure, and his encouragement of new interdisciplinary collaborations involving computer and information scientists with researchers in many other disciplines through the creation of the School of Information at Michigan and the development of what was originally called the Media Union (now the Duderstadt Center).  He chaired the National Academies committee that published the key 2002 report Preparing for the Revolution:  Information Technology and the Future of the Research University, and he has been a member of the current Academies committee studying the future of the research university.  In his talk, Duderstadt will look broadly at the social and technological trends driving the restructuring of higher education, the future role of the research university, and the changing understandings of teaching and learning, scholarship, and engagement.

Our closing plenary session on Tuesday will feature Professor Phillip Long, Director of the Centre for Educational Innovation and Technology, University of Queensland.  Phil also maintains a connection to the Massachusetts Institute of Technology (MIT), where he worked to support change in learning, and is currently a visiting researcher there.  For many years, Phil has been at the forefront of the innovative use of technology in teaching and learning, and he is both inspirational and pragmatic, but always deeply sensitive to the actual realities of teachers and students.  His recent work in Australia has given him a genuinely global perspective on these issues.  Phil’s current interests focus on emerging technologies, the cognitive interactions of learners with technology, and learning spaces, both physical and virtual.  His wide-ranging presentation will explore current trends in higher education, such as the emergence of massive open online courses, the rise of community-generated learning content, learning analytics, and mobility, and their potential to genuinely change the higher education landscape.

These two plenary sessions should complement each other to provide a thought-provoking view of the interactions between developments driving change from within the academy and the external forces reshaping the role of the academy within the broader society.

Highlighted Breakout Sessions

I will not attempt a comprehensive summary of breakout sessions here; we offer a great wealth and diversity of material.  However, I want to note particularly some sessions that have strong connections to the Coalition’s 2011-2012 Program Plan (www.cni.org/program/2011-2012/) and also other sessions of special interest, and to provide some additional context for a few sessions that may be helpful to attendees in making session choices.  I do realize that choosing among so many interesting concurrent sessions can be frustrating, and as always we will try to put material from the breakout sessions on our website following the meeting.

David Weinberger’s recently published book Too Big to Know was hailed by John Seely Brown as a “stunning and profound book on how our concept of knowledge is changing in the age of the Net.”  I am thrilled that David will lead a session in which he will describe some themes from the book and then encourage discussion of how changing knowledge affects all of us.  David will also be a co-presenter in a session on the Digital Public Library of America (DPLA), focusing on the service platform, particularly the metadata server, for that project, which will make some very interesting capabilities available to the library community and to the public.

The management of large-scale data sets in e-research has been a key theme for CNI’s program in recent years, and sessions at this meeting explore the progress that is being made in many areas.  We have several sessions that deal with aspects of the federal government policy on data management.  As the newly appointed co-chair of the National Academies’ Board on Research Data and Information, I will lead a session describing the priorities and activities of that group.  José-Marie Griffiths, who chaired the National Science Board’s Data Policy Task Force, will describe their findings and facilitate a discussion of the challenges related to data access and preservation for higher education, publishing, and other organizations.  Presenters from the University of North Texas and the Council on Library and Information Resources (CLIR) will provide an overview of federal policies on data management and will describe the role and education needs of information professionals who are involved with data management.  We will have an update on the multi-institution project to develop a data management planning tool that can be used with researchers as part of their grant proposals; this work has moved ahead substantially and should have wide application.

Several sessions will highlight collections and tools that are being developed for researchers.  They include a German project on climate data, the National Science Foundation’s (NSF) EarthCube program for Geoscience, and a University of Toronto portal for geospatial resources.  Johns Hopkins University will present findings of a feasibility study for a National Science Foundation open access repository.  The National Institutes of Health (NIH) had PubMedCentral available as a repository where researchers could place their papers prior to enacting their open access mandate; there is nothing similar to fill the same role in other disciplines.  This session – particularly timely in light of the proposed legislation on open access to journal articles produced as part of federal grants – will help us look at some options and think about what is necessary to extend funder-driven open access mandates beyond NIH.

For many of the meeting attendees, this will be a first opportunity to hear from James Hilton of the University of Virginia about a planned digital preservation network (DPN).  This is a significant and large-scale undertaking, which is, as far as I know, the first attempt to build a digital preservation system coming directly out of the university world rather than from the science agencies.  The session will describe system architecture and strategies.  We will have a presentation from the California Digital Library on their work to develop an economic model for long-term preservation; we have had several sessions on this key topic at past CNI meetings, including a plenary by David Rosenthal of Stanford University.  It is important for the community to make progress in understanding the economics of digital preservation.  Community projects also face sustainability issues, and we will have a discussion from DuraSpace about their organizational strategy.

It is also important that we address the preservation of a wide variety of content related to our cultural heritage.  Colleagues from New York University, George Mason University, and the Internet Archive will address the challenges of collecting materials from an evolving social movement, in this case the “Occupy” movement.  The University of California Santa Cruz has the enviable charge to preserve the Grateful Dead Archive; they will describe the elements of the collection, the challenges they perceive, and the community involvement that they are fostering.  An important project on standards and practices for newspaper preservation will be represented and we will learn about their findings and challenges.

As institutions’ digital collections – digital libraries and repositories – mature, they are rethinking priorities, establishing new modes of operation, and experimenting with new models.  The California Digital Library and the Public Knowledge Project have recently developed a partnership to help realize the development of a fully integrated, open-source institutional repository and journal publication service.  The University of North Carolina at Chapel Hill is addressing their need to grow capacity and encourage innovation.  The Ontario Council of Libraries has taken a very interesting approach to developing very large, locally-hosted collections of digital content, both for articles and e-books; their rationale and implementation should be of wide interest.  JSTOR is also experimenting with new models, opening up access to independent scholars; a session will describe the initial stages of that new program and what they are learning.  Taking a more technical approach, the University of Kentucky will describe their open-source content management system and their process for quickly digitizing and loading complete archival collections.

Large collections of digital materials need new perspectives and solutions for information access and retrieval, particularly as the ecology of discovery and access systems becomes ever more complex.  There is increasing discussion in the community about the value of linked data, and presenters from Stanford and the University of Rochester will address that topic from conceptual and campus perspectives.  The University of Utah will provide us with information on their studies of why Google Scholar has difficulties indexing institutional repositories; there has been recent discussion on a number of listservs about their study and potential remedies.  A presentation from EBSCO Publishing will describe their data driven approach to developing relevant search results for users.  ARTstor has organized a panel that includes representatives from Harvard and the University of Virginia to look at interoperability requirements for image management and preservation systems such as ARTstor’s Shared Shelf that are being designed to manage complex local image collections, and how the articulation of these kinds of interoperability needs will help to inform both the development of strategies for supporting demanding image collection requirements at other institutions, as well as more effective cross-system coordination.  ExLibris will discuss the challenges of providing the most relevant search results to users and the factors that go into producing the best results.

The meeting will present some projects related to changes in scholarly communication and the role of libraries and information technology in providing services and developing collaborations to support innovation.  Ithaka S+R will present results from the first in a series of studies on Research Support Services for Scholars; the focus will be on services for academic historians.  The University of Oregon has developed a collaboration between the librarians, faculty, and graduate students who are involved in research on gender, new media, and technology; they will describe their developing partnership.  A session by the University of Kansas will describe, from both library and IT perspectives, the use of a management consultant to improve effectiveness of the organization.

Two sessions will address topics related to federal information.  One will provide an update on a multi-institution project to preserve federal government websites at the end of the Bush administration and describe plans for a similar exercise in 2012-2013.  The other session will focus on opportunities and challenges for the Federal Depository Library Program (FDLP) in the 21st century technology environment.

CNI has been focusing attention on new approaches to identity management, biography, and bibliography in academic institutions, including connections to areas as diverse as authority control, campus and federated identity management systems, and institutional repositories.  Ken Klingenstein of Internet2 and Renee Shuey of Pennsylvania State University will provide an update on their work in federated identity management.  Following the CNI membership meeting, we will be hosting a day-long invitational workshop on scholarly identity management and will issue a report; I’ll also be doing a CNI Conversations podcast summary of the meeting and related developments.

We will have some sessions focusing on innovative technologies and tools in library and information environments.  The University of Utah is doing some fascinating work to use technologies to recover content that has been lost or obscured due to human or natural causes, such as floods; their retroReveal process is open-source and they hope to build a community of users and contributors.  Herbert van de Sompel will describe a project for a web-based approach for resource synchronization.

It is important that we continue to find ways to leverage the increasing amount of scholarly information in digital form for research and teaching.  The Sakai Open Academic Environment project is addressing content authoring, sharing, and discovery as well as standard learning management system functionality; we will have an update on this important work.

Finally, we will have two sessions that feature innovations in teaching and learning.  Project SCARLET in the UK is using augmented reality technologies and mobile devices to enhance students’ experiences in interaction with library special collections materials.  Gardner Campbell, a well-known speaker in teaching and learning circles, will describe a course aimed at helping participants thrive and innovate within the framework of new technologies; he has given this course at a number of institutions and with participants ranging from undergraduates to faculty and staff members.  He will likely whet our appetites to participate in one of his future classes.  Our invitational executive roundtable at this meeting will cover multiple devices and platforms and will look at some of the emerging mobile platform issues in more depth; we’ll be issuing a report from this session following the meeting.  (I should also note that last December, we had an outstanding roundtable on risk management and disaster planning; I’ve included the report from that meeting in your registration packet in case you missed it.)

I invite you to browse the complete list of breakout sessions and their full abstracts at the CNI website (available shortly).  In many cases you will find these abstracts include pointers to reference material that you may find useful to explore prior to the session, and after the meeting, we will add material from the actual presentations when it is available to us.  We will also be videotaping a few selected sessions, including the plenary sessions, and making those available after the meeting.  You can follow the meeting Twitter stream by using the hashtag #cni12s.

I look forward to seeing you in Baltimore this April for what promises to be another extremely worthwhile meeting.  Please contact me (cliff@cni.org), or Joan Lippincott, CNI’s Associate Director (joan@cni.org), if we can provide you with any additional information on the meeting.

Clifford Lynch

Preview CNI’s Spring Mtg in latest Conversations

The latest CNI Conversations podcast (http://wp.me/p1LncT-277) offers a preview of the CNI Spring 2012 Membership Meeting, including brief discussions of general meeting themes, and descriptions of plenary sessions and selected project briefings.

CNI’s spring membership meeting will be held in Baltimore, MD on April 2-3, 2012.  Visit www.cni.org/mm/spring-2012/ for more information.

March 7, 2012: Preview of CNI Spring 2012 Membership Meeting

20120307-CNI-Conversations
[20 min.]
March 7, 2012

This podcast offers a preview of the CNI Spring 2012 Membership Meeting, including brief discussions of general meeting themes, and descriptions of selected project briefings.

We hope you enjoy this program and we welcome your feedback.  For questions or comments related to CNI Conversations, please contact CNI Associate Executive Director Joan Lippincott at joan@cni.org.