Materials from October 2011 ARL-CNI Symposium

Audio recordings and powerpoints from the sessions at the October 13-14, 2011 ARL-CNI Wahsington DC Symposium on 21st Century Collections are now available at


There’s a wealth of great material here, though personally the two talks that stand out in my memory (and I have not gone back and listened to the all the talks again) were those of Paul Duguid, my colleague at the UC Berkeley School of Information, and John V. Lombardi, the very blunt-spoken and provocative President of the Lousiana State University System.

European Science Foundation on “Research Infrastructures in the Digital Humanities”

The European Science Foundation’s Standing Committee on the Humanities  has just issued a policy brief on Research Infrastructure in the Digital Humanities which offers an excellent survey of many large-scale European digital humanities efforts. Further details and a link to the report can be found at


Roadmap for the Fall 2011 CNI Member Meeting, Arlington, VA, Dec 12-13

A Guide to the Fall 2011
Coalition for Networked Information Membership Meeting

The Fall 2011 CNI Membership Meeting, to be held at the Crystal Gateway Marriott Hotel in Arlington, Virginia on December 12 and 13, offers a wide range of presentations that advance and report on CNI’s programs, showcase projects underway at member institutions, and highlight important national and international developments. Here is the “roadmap” to the sessions at the meeting, which includes both plenary events and an extensive series of breakout sessions focusing on current developments in digital information. As always, we have strived to present sessions that reflect late-breaking developments and also take advantage of our venue in the Washington, DC area to provide opportunities to interact with policy makers and funders.

As usual, the CNI meeting proper is preceded by an optional orientation session for new attendees-both representatives of new member organizations and new representatives or alternate delegates from existing member organizations-at 11:30 AM; guests are also welcome. Refreshments are available for all at 12:15 PM on Monday, December 12. The opening plenary is at 1:15 PM and will be followed by three rounds of parallel breakout sessions. For this meeting, we added an extra round of breakout sessions on Monday in order to take advantage of an unprecedented number of proposals for sessions; it’s clear that the pace of developments is accelerating, and we need to be responsive to this. Tuesday, December 13, includes three additional rounds of parallel breakout sessions, lunch and the closing keynote, concluding around 3:30 PM. Along with plenary and breakout sessions, the meeting includes generous break time for informal networking with colleagues and a reception which will run until 7:15 PM on Monday evening, December 12, after which participants can enjoy a wide range of dining opportunities in the Crystal City and Washington areas. Downtown Washington, DC is a quick taxi ride or accessible via the Metrorail (Metro) transit system, which is directly connected to the Crystal Gateway hotel.

The CNI meeting agenda is subject to last minute changes, particularly in the breakout sessions, and you can find the most current information on our Web site, www.cni.org, and on the announcements board near the registration desk at the meeting.


The Plenary Sessions

As is usual at our fall meetings, I have reserved the opening plenary session. I want to look at recent developments and the ways in which the landscape is changing, and to outline the developments I expect to see in the coming years. As part of this, I’ll discuss progress on the Coalition’s agenda, and highlight selected initiatives from the 2011-2012 Program Plan. The Program Plan will be distributed at the meeting (and will be available electronically on the Coalition’s Web site, www.cni.org around December 13). I look forward to sharing the Coalition’s continually evolving strategy with you, as well as discussing current issues. The opening plenary will include time for questions and discussion, and I am eager to hear your comments.

The closing plenary, scheduled to start at 2:15 PM on Tuesday, will be given by William Michener, Professor and Director of e-Science Initiatives for the Libraries of the University of New Mexico, and Principal Investigator of the National Science Foundation (NSF)-funded DataONE project.

Bill is a marine scientist by training, and has spent most of his professional career leading large, complex, multi-disciplinary and information-intensive ecological, biological and environmental science projects; you can find the details of his biography as part the plenary session descriptions for the fall meeting. I’ve had an opportunity to get to know Bill Michener a bit over the past years as part of the privilege of serving on the DataONE External Advisory Board; he is a tremendously energetic and deeply thoughtful person who is committed to advancing scholarship, and very sensitive to the changing relationships among the scientific enterprise, higher education, libraries, science funders, and the public. He’s also carefully studied successes, failures, and sustainability problems in past efforts to use technology to transform scholarly practices and cultures at scale. Bill will share his views on the new landscape of science broadly – emphasizing the earth and life sciences – and give us insight into how the DataONE project can support and facilitate these transformations.

Highlighted Breakout Sessions

I will not attempt to comprehensively summarize the wealth of breakout sessions here. However, I want to note particularly some sessions that have strong connections to the Coalition’s 2011-2012 Program Plan, as well as a few other sessions of special interest or importance, and to provide some additional context that may be helpful to attendees in making choices. We have a packed agenda of breakout sessions, and, as always, will try to put material from these sessions on our Web site following the meeting for those who were unable to attend. We will also be capturing a few sessions on video for later re-distribution.

With the imposition of the US NSF requirements for data management plans in grant proposals early in 2011, CNI’s work in data curation has reached a new level of intensity; at this meeting we can see member efforts to build common tools or services for broad deployment, as well as many exemplar or prototype campus or consortial projects and services. We are delighted to have an update from the California Digital Library and University of Virginia on the DMPTool, an institutionally-configurable tool which provides online guidance and resources to help researchers develop data management plans as part of grant proposals. We will also have an introduction to the Sustainable Digital Data Preservation and Access Network Partnership SEAD Project, funded as part of the NSF DataNet initiative, as well as the plenary update on the DataONE Datanet work. Johns Hopkins University will describe their Data Management services group, which provides proposal development and planning support to their institution’s faculty using data management and archiving systems developed by the Data Conservancy. DuraSpace’s DuraCloud Direct to the Researcher provides enhanced cloud-based storage for research data, and we will hear a report on its current status. The University of Nebraska will feature a partnership between IT and libraries for seamless data archiving.

As we develop open collections of data, we must also address their accessibility. Tufts and Harvard, with additional partners, are collaboratively developing a Web application to discover, preview, and retrieve geospatial data from a range of sites. We also need to develop systems for capturing research data, and Rutgers will describe a MODS application profile that documents the content, major lifecycle events and rights associated with research data. Two other projects focus on assisting researchers with the data lifecycle: the Smithsonian Institution is developing a virtual research environment for this purpose and the University of Idaho and partners are implementing the Northwest Knowledge Network, a regional approach to research data lifecycle management. A German project also focuses on creating a virtual research environment, WissKI, that will incorporate a range of primary sources from universities and museums and will employ semantic technologies.

We have a particularly strong set of sessions on topics related to digital preservation. David Rosenthal of Stanford, who captivated attendees in a plenary session at a 2009 CNI meeting, will discuss a model for comparing the cost of local storage with cloud storage. He has developed a framework into which models including interest rates, technology evolution and other factors can be plugged and then used to explore a wide range of scenarios; this will be of great interest to campus IT units trying to design and price long-term reliable storage services.

Cornell and Columbia, in a 2CUL study, have produced some startling and extremely important information for the library community. They will discuss their finding that only a small part of the libraries’ e-journal titles are currently being preserved by PORTICO or LOCKSS; understanding the details and implications of this will, I believe, lead to major rethinking and reprioritizing of our community’s strategies for protecting our e-journals. We will learn about major national efforts in digital preservation as reported at a conference in Estonia in May 2011 along with a report on the first year of the National Digital Stewardship Alliance, coordinated by the Library of Congress. Several sessions will address aspects of preservation for specialized collections: game preservation at the University of Illinois, executable content at Carnegie Mellon, and older reel-to-reel and cassette audio recordings at Columbia. The Florida Center for Library Automation will describe its rewritten DAITSS, a preservation repository application.

In the largest of our digital collections, while technical challenges continue to be significant, often the issues needing the most attention have to do with policies, infrastructure, governance, sustainability, collaboration, and funding. HathiTrust, a growing multi-institutional collaboration, now has a repository of nearly 10 million volumes and is developing the next stage of its partnership model as it grapples with policy and legal issues as well as setting priorities. Europeana is the major aggregator of digital library collections in Europe, and incorporates a wide range of materials, including audiovisual materials, special collections, and theses. We’ll have sessions covering both of these initiatives.

Some specialized digital collections projects in the US are developing new collaborations, such as the Open Folklore collaboration between the Indiana University Library and the American Folklore Society, and some are building new tools to enhance specialized collections, such as the Mapping Atlanta project at Emory. At Binghamton University, they are developing a repository that focuses on both born digital artifacts as well as unique research products; by using the Rosetta tool, they are enhancing discovery while providing a digital preservation methodology for university digital assets.

Some projects are focusing on more technical issues related to large collections, repositories, and digitization. Hydra assists institutions in building a robust and durable digital repository system supporting multiple collections with tailored workflows; Stanford, Virginia, and Hull were the original partners and now a dozen institutions are using the system. We will also have an update on the Exhibit software, which assists institutions in organizing, publishing, and navigating linked data on the Web; it is especially important for managing and publishing very large collections of any type of data. George Washington University is developing a cost-forecasting model for new digitization projects that employ robotic arm technology. The International Standards Organization (ISO) has recently approved a new standard against which one can assess repositories of digital content and data; we will have a report by the Center for Research Libraries on the potential benefits of this work.

There are a number of interesting developments related to information discovery, access, and retrieval. These projects are experimenting with the need to provide more sophisticated access to the wealth of digital information, whether by individuals or by machines. One central idea is that of scholars (or librarians working on their behalf) taking control of their records of publication, and using measures related to this record as a way of demonstrating the impact of scholarly work. We’ll have a session on work that Microsoft’s Academic Search is doing as an exemplar to support such efforts (though I note this is only one example of what is happening in this area, albeit an important and unusually transparent one). In related developments, Elsevier is working on machine-readable interfaces to scholarly data that are useful for scholarly content and bibliometrics. An area that has been of long-standing interest to CNI is the name management problem in the attribution of scholarship and the management of the scholarly record; this combines some of the traditional library authority work with some components of persistent identifiers, authentication, biography, bibliography and other topics. If we put a uniform author identifier system in place, it will, over time, greatly increase discoverability of the body of an author’s work, and in the work on display in the Microsoft Academic Search presentation we can see the effects of not having such a system. The ORCID initiative is currently the leading proposal for addressing this problem, and they will have a session to report on progress and invite participation from CNI attendees.

The State and University Library of Denmark and Cornell University are both developing search mechanisms that can be customized for various audiences, whether a subject specialty such as engineering or a format such as television and radio programs. Two projects will focus on specialized software to access specific types of collections: the National Library of Medicine will feature its video search tool and the University of British Columbia will describe its CONTENTdm-based interface to historical newspapers. The International Image Interoperability Framework (IIIF) is being developed by Stanford, the British Library, and other partners with the goal of providing users an unprecedented level of uniform and rich access to image based resources that currently reside in many discrete silos; this will be an opportunity to catch up on developments that I don’t think are yet widely known.

Some projects are incorporating user community participation and contribution into their systems for discovery and access, including the “What’s on the Menu?” project at the New York Public Library, which has invited the public to transcribe items in a menu collection into a structured and reusable form; we will learn about their successes and challenges. A University of Maryland project also encouraged social tagging of images by non-experts; this briefing will focus on the computational linguistic processing of the user-created metadata and on bilingual and multilingual issues in social tagging patterns.

We face a continuing need for new and better tools to illuminate user behavior. JSTOR has been amassing 15 years of log data and has been working on normalizing and organizing these data in order to understand user behavior and trends; we will have a report on this work. The Association of Research Libraries (ARL) has been working with the Ontario Council of University Libraries to assess the impact of networked electronic resources with the implementation of MINES for Libraries.

A number of sessions will provide insights into new developments related to teaching and learning and the educational process. CNI’s Joan Lippincott, along with Malcolm Brown from the EDUCAUSE Learning Initiative (ELI) and Jeanne Narum from the Learning Spaces Collaboratory (LSC), will focus on current issues and projects that address the assessment of learning spaces.

CNI has had a particular interest in programs that foster new types of literacy for the digital age. At this meeting, we have a number of sessions that address aspects of this topic. At the University of Pennsylvania and Temple, they are working with undergraduates to produce videos as course projects and seeking to understand what kinds of support and instruction are needed. McMaster University is working with a large number of students to develop their geospatial literacy skills. The University of Toronto iSchool has a certificate program in conceptual curation and pedagogy for conceptual thinking that will develop new kinds of analytical skills in future information professionals; attendees at the session will be invited to join Toronto in some collaborative work. The University of Virginia has focused some resources on moving video and audio into the scholarly mainstream. Their work, using Sakai, Drupal, and Kaltura, has been achieving success in the teaching and learning arena, and they are also working on moving media into the mainstream of scholarly communication.

Another important area in teaching and learning is the re-examination of modes of making textbooks available to students. Indiana University has launched an eText program that employs software with many of the functionalities that students have requested for digital curricular resources; we will hear an update on its implementation.

Several sessions will explore how organizational and professional roles are changing or must evolve in our current technology and institutional environments. How do we create information organizations that encourage innovation? Cornell and the University of Illinois Urbana-Champaign will challenge us to think about ideas and organizations that drive technology innovation. Virginia Tech, the University of North Texas, and the MetaArchive Cooperative will discuss their thoughts and projects related to new roles for supporting and curating digital scholarship, with some case studies as examples. The Association of College & Research Libraries will highlight findings from their recent summits that extended the work of their Value of Academic Libraries initiative; the summits included representatives who contributed perspectives from chief academic officers, senior institutional researchers, representatives from accrediting associations and higher education organizations.

We will have a session in which presenters explore overall issues related to academic and research library publishing strategies; data from a survey of ARL libraries and liberal arts colleges from the Oberlin Group will be presented. The session will also feature a specific instance of a new type of publishing venture: the Long Civil Rights Movement collaboration is a joint program of the University of North Carolina (UNC) Press, the UNC Library, and several research centers. We will hear about the usage of the collection, user behavior (including user-contributed annotations), and lessons learned.

We know our members are always interested in understanding funding opportunities for digital projects, and we will have an update session from the Institute for Museum and Library Services on their programs for 2012.

CNI was an early leader in the movement to implement electronic theses and dissertations. As this initiative has gained broader adoption, there has been concern (particularly from faculty) about whether students who provide open access to their theses and dissertations, generally via institutional repositories, will damage their prospects for publishing this work in a scholarly journal. The Networked Digital Library of Theses and Dissertations (NDLTD) 2011 publishers’ survey provides real data to challenge the assertions that an open access ETD is generally considered to constitute prior publication. This will be of interest to the many CNI institutions planning or managing an ETD program.

There is much more, and I invite you to browse the complete list of breakout sessions and their full abstracts at the CNI Web site. In many cases you will find these abstracts include pointers to reference material that you may find useful to explore prior to the session, and after the meeting we will add material from the actual presentations, including selected video recordings, when they available to us. You can also follow the meeting via Twitter, using the hashtag #cni11f.

I look forward to seeing you in Arlington, Virginia this December for what promises to be another extremely worthwhile meeting. Please contact me (cliff@cni.org), or Joan Lippincott, CNI’s Associate Director (joan@cni.org) if we can provide you with any additional information on the meeting.

NSF Report on Future Research in the Social, Behavioral and Economic

Last fall, I shared with the CNI-announce community a call for short papers issued by the Social, Behavioral and Economic Sciences Directorate of the US National Science Foundation as part of an unusual  effort to identify new research opportunities in the ten to twenty year time horizon so that they could be factored into NSF’s programs. Clearly, data and computationally intensive research methods are an important part of this environment, as are social media and the new collections of evidence created by extensive use of digital technology in society.

The submitted papers became available some time ago, and we did a breakout on this at the CNI member meeting last Fall. NSF has now produced a final synthesizing report on the project, titled “Rebuilding the Mosaic” which is available for downloading, and set up a page that integrates all the material. This can be found at:


There is also a press release with pointers to some related video materials at


Science Magazine, Dec 2, 2011, on Reproducibility in Science

The December 2, 2011 issue of Science Magazine has a series of interesting articles about the reproducibility and replicability of results in scientific research, an important driver in many of the current conversations about data management and curation in science. See


Note that some of the material requires licensed access.

Open Data, Publishing Innovations, Assessment, in Latest ‘Conversations’

Updates on issues ranging from open data to innovations in publishing to assessment are some of the topics covered in the latest report from CNI, the Nov. 28, 2011 installment of CNI Conversations.

Director Clifford Lynch reports on the US Office of Science & Technology Calls for Comment on open data and open publications, as well a recent paper by the European Knowledge Exchange on an action plan for research data.  CNI’s associate director Joan Lippincott summarizes the Berlin 9 meeting on open access, and she discusses assessment in higher education.  Upcoming meetings of interest to the community are also highlighted.

For more information about the Nov. 28 CNI Conversations, and to listen, go to http://wp.me/p1LncT-1NK

CNI Fall Mtg Line-up – Plenaries & Project Briefings

CNI’s fall membership meeting will be held Dec. 12-13 in Arlington, VA.  CNI Director Clifford Lynch will provide an overview of the 2011-12 CNI Program Plan during the meeting’s opening session, and William Michener, Director of e-Science Initiatives for University Libraries at the University of New Mexico, will present the closing plenary address Five New Paradigms for Science and Academia and an Introduction to DataONE.

More information about the plenary sessions, as well as a preliminary list of project briefings to be presented at the meeting is now available:


Check back frequently as we will be adding project briefing abstracts, handouts, and a finalized scheduled shortly.

Looking forward to seeing you all in December!

Data Curation Research Summit Report Available

On December 9, 2010, in conjunction with the International Data Curation Conference in Chicago, the Graduate School of Library and Information Science at the University of Illinois Urbana Champaign and the Purdue Universities Libraries organized an IMLS-funded Data Curation Research Summitt. I had an opportunity to participate in this event, which covered a good deal of ground, including some focus on the interplay between scholarly publishing and data curation. The final report from this meeting is now available at


4-nation action program for research data: “A Surfboard for riding the wave”

Some CNI-readers will recall a 2010 European Union report titled “Riding the Wave”that looked at high-level research data management issues. This week, the Knowledge Exchange, a 4-nation collaboration (Denmark, Germany, the Netherlands and the United Kingdom) issed a response to this titled “A Surfboard for Riding the Wave — Towards a four country action programme on research data”. I attach full details from the Knowledge Exchange press release below. The report is at


15 November 2011

Action plan on making research data accessible
Knowledge Exchange publishes the report “A Surfboard for Riding the Wave – Towards a four country action programme on research data”

The report not only offers an overview of the present activities and challenges in the field of research data in Denmark, Germany, the Netherlands and the United Kingdom but also outlines an action programme for the four countries in realising a collaborative data infrastructure. This report is a response to the Riding the Wave report which was published by the High Level Expert Group on Scientific Data. It was commissioned by the Knowledge Exchange Primary Research Data Working group and was written by Leo Waaijers and Maurits van der Graaff.

In the report four key drivers are addressed: incentives for researchers, training in relation to researchers in their role as data producers and users of information infrastructure, organisational and technical infrastructure and, finally, the funding of the infrastructure. The report offers recommendations for actions in each of these fields for the partners and others, not only in the four partner countries, but also beyond these borders.

Based on the overview of the present situation in the four Knowledge Exchange partner countries, the report formulates three long-term strategic goals:

1.                                    Data sharing will be part of the academic culture
2.                                    Data logistics will be an integral component of academic professional life
3.                                    Data infrastructure will be sound, both operationally and financially.

Robert Madelin, the director general of Information Society and Media at the European Commission remarked on the report: “The report is a very timely input. The European Commission is working on the foundations for an Open Data Strategy for Europe which will be very soon communicated to EU Member States and the European Parliament. It is extremely useful to have with this report not only an overview of what is taking place in four European countries, but of possible actions for the future taking into account the importance of open data for the European economy and society.”

The report is presented today at the Knowledge Exchange workshop “Research Data Management – Activities and Challenges” in Bonn.

The report is available for download at:


Open Repositories Conference in Edinburgh July

Here’s the preliminary announcement for Open Repositories 2012, the next in this excellent series of conferences.

The University of Edinburgh Information Services, EDINA, and the Digital Curation Centre are delighted to announce that the University of Edinburgh has been selected to host the Seventh International Conference on Open Repositories (OR12) July 9-13th July, 2012.

The call for proposals will be available from the conference web site soon: or2012.ed.ac.uk

The University George Square Campus is located in the centre of Edinburgh a short distance from the iconic Edinburgh Castle in the Old Town and numerous attractions, venues, restaurants and pubs.

Open Repositories is run by an international steering committee of experts, and has been the pre-eminent conference for repository managers, researchers and developers to share developments across national boundaries and technical platforms since 2006. OR 2011 was hosted at the University of Texas, Austin USA; OR 2010 was hosted in Madrid.

The theme and title of the 2012 conference at Edinburgh – Open Services for Open Content: Local In for Global Out – reflects the current move towards open content, ‘augmented content’, distributed systems, microservices and data delivery infrastructures. Kevin Ashley, Director of the Digital Curation Centre (DCC) will chair the Programme Committee.

The conference will feature both general conference sessions and user group meetings for the three main open source repository platforms: DSpace, Fedora, and EPrints.  There will also be a strand for the popular ‘Repository Fringe’, an informal, creative gathering of repository managers and developers which has been hosted at the University of Edinburgh each year since 2008 – to coincide with the internationally well known Edinburgh Fringe Festival.

Whether integrated into external research, or teaching and learning workflows, repositories form a key component to ensure that digital output within academic institutions can be accessed more widely. They are changing the nature of scholarly communication across universities, research laboratories, libraries and publishers. Repositories are now being deployed across sectors (education, research, science, cultural heritage) and at all levels (national, regional, institutional, project, lab, personal). The aim of the Open Repositories Conference is to bring those responsible for the development, implementation and management of digital repositories together with stakeholders to address theoretical, practical, and strategic issues: across the entire lifecycle of information, from the creation and management of digital content, to enabling use, re-use, and interconnection of information, and ensuring long-term preservation and archiving. The current economic climate dictates that repositories operate across administrative and disciplinary boundaries and to interact with distributed computational services and social communities.

The University of Edinburgh retains a unique position in the UK’s repository landscape, serving as home to:

* The Digital Curation Centre, the UK’s leading hub of expertise and national focus for research and development into digital curation. The DCC promotes good practice and training in the management of all research outputs in digital format. See http://www.dcc.ac.uk/ for more.

* EDINA, the JISC-funded national data centre at the University, supporting all universities and colleges across the UK. EDINA delivers a range of online data services including a number of repository initiatives: Open Access Repository Junction, OpenDepot.org, and ShareGeo Open. See http://edina.ac.uk/ for more.

* The Digital Library Section and Edinburgh University Data Library serve researchers and students at the University as part of its Information Services. See http://www.ed.ac.uk/schools-departments/information-services/about/organisation
o The Data Library provides research data support for university researchers and hosts the Edinburgh DataShare repository service for researchers to deposit and share research data.
o DLS supports repositories of research publications to support the University’s Open Access Publications Policy and is currently implementing a Current Research Information System (CRIS). DLS also provides technical and administrative support to the Scottish Digital Library Consortium (SDLC), which provides repository services to universities across Scotland.

* The University’s School of Informatics supports IDEALab, a virtual laboratory that facilitates prototyping of novel applications of state-of-art informatic technologies, forming part of the New Institute for eResearch. See http://idea.ed.ac.uk/IDEA/Welcome.html for more.

For further information visit URL: or2012.ed.ac.uk or email:
or2012@ed.ac.uk; Google Groups: http://groups.google.com/group/open-repositories

