Beth Sandore Namachchivaya
University of Waterloo
Digital text corpora and text data mining (TDM) tools are enabling new discoveries through computational analytics. A high percentage of the texts, however, are protected by copyright, or subject to license agreements that limit access and use. These restrictions can complicate a researcher’s efforts to access texts and perform computational analysis, as well as to communicate the output and related methods to a broader audience. Increasingly, libraries are getting engaged as intermediaries between content providers and scholars to facilitate access to text datasets. Still, the process of interpreting or obtaining rights to perform computational analysis is arduous, and the results of the research often cannot be adequately documented to support reproducibility in a scholarly climate focused on evidence. The perceived high barrier to entry for TDM can lead to one of two outcomes: a scholar abandons the research project or moves ahead using unsanctioned approaches, such as screen-scraping, to assemble and mine the corpus. This project briefing provides an update on research funded by the Institute of Museum and Library Services to hold a national forum with key stakeholders to develop a research and implementation agenda for libraries that work with scholars and content providers to enable streamlined access to copyrighted and licensed texts for data mining research. In particular, we focus on the perspectives and the SWOT (Strengths, Weaknesses, Opportunities, and Threats) analyses provided by the National Forum attendees.
Bertram Ludäscher, PI, School of Information Sciences, University of Illinois at Urbana-Champaign
Beth Sandore Namachchivaya, co-PI, University Library, University of Waterloo
Megan Senseney, co-PI, School of Information Sciences, University of Illinois at Urbana-Champaign
Eleanor Dickson, University Library, University of Illinois at Urbana-Champaign