CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • CNI Collaborations
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Program Plan
    • Current Program Plan
    • Program Plan Archive
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • Publications by CNI Staff
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Follow CNI
    • Historical Resources
  • Contact Us

The Use of Machine Learning Techniques for Performing Topic Modeling and Topic Identification on Bibliographic Datasets

Home / Project Briefing Pages / CNI Fall 2021 Project Briefings / The Use of Machine Learning Techniques for Performing Topic Modeling and Topic Identification on Bibliographic Datasets

December 1, 2021

William Mischo
Head, Grainger Engineering Library Information Center; Berthold Family Head Emeritus in Information Access and Discovery
University of Illinois at Urbana-Champaign

Libraries have been exploring the application of artificial intelligence (AI) and machine learning (ML) technologies within a variety of library services. Several grant projects and institutional initiatives have looked at the use of AI and ML technologies in libraries. Libraries have the tools to generate large bibliographic metadata datasets from analytics & insights and repository APIs and to apply ML techniques such as clustering, classification, regression, and dimension reduction to these large datasets. The University of Illinois at Urbana-Champaign Library has been investigating the use of ML techniques in text mining, image analysis, and topic modeling in several functional areas. In particular, the Library has been developing a service to provide users with an API-based literature retrieval service that generates a custom database that includes a topic modeling component to identify key concepts from the user’s dataset. The topic modeling service would be part of our broad offering of bibliometric services. ML has been hyped and promoted for a number of years and several famous failures have been documented. This project briefing will discuss: the usefulness of ML document clustering techniques in topic modeling; the issues surrounding the bag of words vs. text phrase approaches for vectorization and similarity measures; standard clustering algorithms; and the role of the library as a testbed for the development of responsible ML activities. Elisandro (Alex) Cabada, Interim Head, Mathematics Library; Medical and Bioengineering Librarian at the University of Illinois at Urbana-Champaign, is a collaborator on this project.

Presentation

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)

Filed Under: CNI Fall 2021 Project Briefings, Emerging Technologies, Project Briefing Pages, User Services
Tagged With: cni2021fall, Project Briefings & Plenary Sessions, Videos

Last updated:  Monday, July 25th, 2022

 

Contact Us

21 Dupont Circle
Suite 800
Washington, DC, 20036
202.296.5098

Contact us
Copyright © 2023 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo

A joint project