Absolute Relevance? Ranking in the Scholarly Domain

Tamar Sadeh
Director of Marketing
Ex Libris Group

The greatest challenge for discovery systems is how to provide users with the most relevant search results, given the immense landscape of available content. In a manner that is similar to human interaction between two parties, in which each person adjusts to the other in tone, language, and subject matter, discovery systems would ideally be sophisticated and flexible enough to adjust their algorithms to individual users and each user’s information needs. When evaluating the relevance of an item to a specific user in a specific context, relevance-ranking algorithms need to take into account, in addition to the degree to which the item matches the query, information that is not embodied in the item itself.

Such information, which includes the item’s scholarly value, the type of search that the user is conducting (e.g., an exploratory search or a known-item search), and other factors, enables a discovery system to fulfill user expectations that have been shaped by experience with Web search engines. This session will focus on the challenges of developing and evaluating relevance-ranking algorithms for the scholarly domain. Examples will be drawn mainly from the relevance-ranking technology deployed by the Ex Libris Primo discovery solution.

 Presentation (PDF)

Advances in Discovery: An EBSCO Service

Michael Gorrell
Executive Vice President, CIO
EBSCO Publishing

Discovery Services have emerged to become a key element of libraries’ efforts to allow their patrons to satisfy their research needs. Harvesting and indexing millions of scholarly journal articles, books, biographies, reviews, and a vast array of other content types from thousands of sources, allowing users to find the best matches for their needs and presenting this information in a clear and understandable way is a tall order. Challenges include determining relevance for search results, providing users with ways to understand the depth and breadth of the collection being searched, and overall site usability. EBSCO has taken a data driven approach to solving these problems by testing various aspects of its Discovery Service, and using other data mining techniques. This session will describe the various methodologies that have been used and describe ways in which the service has evolved based on these efforts.

 Handout (PDF)

Archiving Large Swaths of User-Contributed Digital Content: Lessons from Archiving the Occupy Movement

Howard Besser
Director, Moving Image Archiving & Preservation MA Program
New York University

David Millman
Director, Digital Library Technology Services
New York University

Sharon M. Leon
Director of Public Projects, Center for History & New Media
George Mason University

Archiving born-digital content from the “Occupy” movement can serve as a prototype for archiving all kinds of user-contributed content. In this presentation, several organizations will discuss the tools and methods they have developed for ingesting, preserving, and offering discovery services to large numbers of digital works where they cannot really rely on the contributors to follow standards and metadata assignment. Topics covered will range from automatic extraction of time-stamp and location metadata (and an empirical analysis of which upload services strip these out), to app development for uploading content along with permission forms, to maintaining lists of frequently-changing URL nodes for web-crawling, to issues in educating content creators in best practices. Speakers will also discuss issues in trying to document a social movement while it is happening.



Presentation (Besser PPT)
Presentation (Millman PPT)
 (Leon PDF)
Presentation (Hanna PPTX)

Building the Grateful Dead Archive Online: The Golden Road to Unlimited Devotion

Virginia Steel
University Librarian
University of California, Santa Cruz
Robin Chandler
Project Manager
University of California, Santa Cruz

The University of California Santa Cruz (UCSC) Libraries, recipient of a 2009 two-year Institute for Museum and Library Services (IMLS) grant, is building the socially constructed Grateful Dead Archive Online (GDAO) website using Omeka open source software. The Grateful Dead Archive (GDA) represents one of the most significant popular culture collections of the 20th Century and documents the band’s activity and influence in contemporary music from 1965 to 1995.

Donated to the UCSC Library in 2008, the GDA contains over 600 linear feet of material including business records, photographs, posters, fan envelopes, tickets, video, audio (oral histories and interviews) and 3-dimensional objects such as stage props and band merchandise. With the release of GDAO in July 2012, the Archive will actively begin collecting artifacts from an enthusiastic community of Grateful Dead fans.

This presentation will discuss the donation of the collection to UCSC; the challenges of merging a traditional archive with a socially constructed one; rights clearances issues and the intellectual property strategy; crawling and harvesting strategies employed for collecting web resources; plugins and workflows supporting data exchange between CONTENTdm and Omeka; and integrating “the crowd” in the curation of user-submitted content preserved by the California Digital Library’s Merritt repository. Future directions, such as the integration/development of better curation tools and what the Libraries hope to learn from opening the archive to contributions from a large community of fans, will also be discussed.



Handout (PDF)
Presentation PDF)

The California Digital Library and the Public Knowledge Project Partnership: A New Model of Collaborative Institutional Repository Publishing Services Development

Lisa Schiff
Technical Lead, Access & Publishing Services
California Digital Library 
Brian Owen
Associate University Librarian
Simon Fraser University 
Catherine Mitchell
Director, Access & Publishing Services
California Digital Library

The California Digital Library (CDL) and the Public Knowledge Project (PKP) have recently joined forces, with the CDL signing on as a major PKP development partner. This relationship has grown out of CDL’s recent work to incorporate a customized version of PKP’s Open Journal System (OJS) into the back-end submission and publishing system for eScholarship, the University of California’s open access institutional repository and publishing platform. This development work marks an important step toward fully integrated, open-source institutional repository and journal publication services, and the CDL and PKP have ambitious plans for extending this work to the larger PKP community. This panel will describe:

• How OJS was customized to meet the needs of eScholarship journals (including user interface modifications, the extension of a single OJS instance to support almost 50 independent journals, PDF generation, and more)
• Which of these and other features may be available in a future release of OJS as a result of this new partnership
• How the PKP development partnership program is shaping the direction of OJS and other PKP scholarly communication services
• How the relationship with PKP is likely to affect future development and service directions for eScholarship
• How this work fits into the larger effort of both of these organizations to refine their services in support of new practices and opportunities within the scholarly publishing environment



Presentation (PowerPoint)

Changing for Excellence: Libraries and IT Experience with Campus Consultants

Deborah Ludwig
Assistant Dean
University of Kansas
Paul Farran
Chief of Staff, Information Technology
University of Kansas

In 2011, the University of Kansas hired the consulting firm HURON to undertake an intensive review of how the University conducts business, and to help the institution identify ways to increase its effectiveness. This effort, “Changing for Excellence” (CFE), encompassed campuses in Lawrence, Kansas City, and Wichita and was divided into phases in which Information Technology (IT) and the Libraries have been involved: Assessment and Analysis, Business Case Development, and Implementation.

This presentation will include descriptions of the business cases developed in both the IT and in the Libraries, work with the consultants and campus stakeholders, movement forward into implementation, and the connections between CFE, the University’s strategic planning effort (Bold Aspirations), and possible organizational transformation.



Competing Priorities: Sustainability, Growth, and Innovation in Digital Collections

Jennifer Riley
Head, Carolina Digital Library and Archives
University of North Carolina at Chapel Hill

Academic libraries with digital collections programs are faced with a difficult task in simultaneously growing capacity, promoting sustainability, allowing room for innovation, ensuring success within an environment that has limited and, in many cases, shrinking resources, and giving staff the tools they need to be effective contributors to technology initiatives. The University Library at the University of North Carolina at Chapel Hill is embarking upon a number of efforts designed to support the transition of digital library work from project to program, a transition that will allow the Library to better address these competing needs.

This presentation will introduce two major aspects of the work in this area. First, in early 2012, a new process was implemented for the allocation of library technology resources and the selection of projects requiring technology support. Through careful design and ongoing assessment, this process should significantly improve the Library’s efforts to promote a culture of solid planning and accountability, reduce uncertainty that has historically caused inaction and missed opportunities, clearly determine and communicate ongoing support models, and reinforce the Library’s confidence in its ability to live up to the commitments it makes. A key part of this new proposal, review, and approval process is staff from the Library’s Carolina Digital Library and Archives (CDLA) department acting as project facilitators, and offering planning and coordination services where necessary to help technology proposals have their best chance of success.

The second area of work to be presented in this session involves the early planning stages of a new technical infrastructure that will provide coordinated and shared support for digital collections, digital humanities, and institutional repository work. The plan is to prioritize digital library initiatives that help build out and implement this technical infrastructure, reduce ‘siloization’ and promote reuse of content in multiple environments, increase efficiency and sustainability of curated digital collections, and provide tiered access and preservation services. This shift has the potential to re-frame the development work on the Carolina Digital Repository to occupy a more centralized role in the University’s digital library efforts.

Presentation (PowerPoint)

Creating the Digital Preservation Network

James Hilton
Vice President and CIO
University of Virginia

The Digital Preservation Network (DPN) will address risk to the very long-term preservation of the scholarly record by creating a federated approach to preservation of academic content. The DPN ecosystem ensures reliable, long-term digital preservation through a federated network of diverse, non-overlapping preservation strategies sustained by committed institutions of higher education, which mitigates the threat of a single point of failure — organizational, technical, physical, or political — jeopardizing centuries of scholarship. This presentation will discuss progress toward the launch of DPN and opportunities to engage in the effort.



Curation Practices for Born-Digital and Digitized Newspaper Collections

Martin Halbert
Dean of Libraries
University of North Texas


Katherine Skinner
Executive Director
Educopia Institute


Tyler Walters
Dean of University Libraries
Virginia Polytechnic Institute and State University

This briefing will highlight and discuss the early findings of a National Endowment for the Humanities-funded project hosted by the Educopia Institute that is documenting and modeling the use of data preparation techniques and distributed digital preservation frameworks to collaboratively preserve digitized and born-digital newspaper collections. US libraries and archives have been digitizing newspapers since the mid-1990s using a highly diverse and ever-evolving set of encoding practices, metadata schemas, formats, and file structures. Increasingly, they are also acquiring born-digital newspapers in an array of non-standardized formats, including websites, production masters, and e-prints. This content genre is of great value to scholars and researchers in the humanities, and it is in critical need of preservation attention.

This project is exploring how existing standards (including the National Digital Newspaper Program’s digitization standards) may be elaborated upon and applied to foster the preservation readiness of collections from the last two decades that were digitized according to evolving standards, as well as the born-digital content that institutions are steadily acquiring. This project is also documenting how curators can effectively exchange their preservation-ready content across repository systems, focusing on the use of distributed digital preservation (DDP), a collaborative approach in which content is exchanged and replicated across multiple sites, and actively monitored using various network-driven technologies (e.g., LOCKSS, iRODS, CODA).

This briefing will share initial project results, including the following:
1. A “state of the field” report (based on surveys conducted by the researchers) regarding the challenging collections with which academic libraries are contending, including legacy content from more than two decades of digitization and a wide range of born digital content; and
2. Preliminary recommendations regarding what type and level of preservation preparation for these diverse newspaper collections might be considered essential, and what type and level might be considered optimal.