William Mischo
Head, Grainger Engineering Library Information Center; Berthold Family Head Emeritus in Information Access and Discovery
University of Illinois at Urbana-Champaign
Libraries have been exploring the application of artificial intelligence (AI) and machine learning (ML) technologies within a variety of library services. Several grant projects and institutional initiatives have looked at the use of AI and ML technologies in libraries. Libraries have the tools to generate large bibliographic metadata datasets from analytics & insights and repository APIs and to apply ML techniques such as clustering, classification, regression, and dimension reduction to these large datasets. The University of Illinois at Urbana-Champaign Library has been investigating the use of ML techniques in text mining, image analysis, and topic modeling in several functional areas. In particular, the Library has been developing a service to provide users with an API-based literature retrieval service that generates a custom database that includes a topic modeling component to identify key concepts from the user’s dataset. The topic modeling service would be part of our broad offering of bibliometric services. ML has been hyped and promoted for a number of years and several famous failures have been documented. This project briefing will discuss: the usefulness of ML document clustering techniques in topic modeling; the issues surrounding the bag of words vs. text phrase approaches for vectorization and similarity measures; standard clustering algorithms; and the role of the library as a testbed for the development of responsible ML activities. Elisandro (Alex) Cabada, Interim Head, Mathematics Library; Medical and Bioengineering Librarian at the University of Illinois at Urbana-Champaign, is a collaborator on this project.