CNI: Coalition for Networked Information

  • About CNI
    • Membership
    • CNI Collaborations
    • Staff
    • Steering Committee
    • CNI Awards
    • History
    • CNI News
  • Program Plan
    • Current Program Plan
    • Program Plan Archive
  • Topics
  • Events & Projects
    • Membership Meetings
    • Workshops & Projects
    • Other Events
    • Event Calendar
  • Resources
    • Publications by CNI Staff
    • Program Plan
    • Pre-Recorded Project Briefing Series
    • Videos & Podcasts
    • Follow CNI
    • Historical Resources
  • Contact Us

JSTOR’s Big Data Challenge: Mining Log Files to Improve Service to Users

Home / Project Briefing Pages / CNI Fall 2011 Project Briefings / JSTOR’s Big Data Challenge: Mining Log Files to Improve Service to Users

December 7, 2011

Ronald Snyder
Director of Advanced Technologies
Ithaka – JSTOR

In the 15 years that JSTOR has been in existence, a wealth of logging data has been generated and archived.  This logging data represents many billions of user actions.  Until recently, this usage data has mainly been used for generating summary-level institution and publisher reports.  The sheer volume and complexity of these data made multi-dimensioned, longitudinal analysis impractical until just recently. Over the last year, Ithaka has made a significant investment in normalizing and organizing these data in the interest of better understanding user behaviors and trends in the consumption of academic materials.

This presentation will include discussion of the technological approach that Ithaka has taken in dealing with the data volume and complexity issues, including Big Data challenges such as storage, processing, and analysis.  Some experiences from the original attempt to build this data warehouse using traditional relational database technologies and the decision to abandon this approach in favor of a solution based on the open source Hadoop infrastructure will be shared.  Hadoop provides a robust, scalable and cost-effective solution to managing Ithaka’s big data. Ithaka has combined Hadoop with an open source indexing technology (Lucene/SOLR) and some custom-built software providing a Web-based tool for the interactive exploration of this rich data set.  The presentation will also include some top-level observations on user behaviors and content discovery and consumption trends that have been identified using these tools.

 

Handout (PDF)

Share this:

  • Click to share on Facebook (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)

Filed Under: CNI Fall 2011 Project Briefings, Digital Curation, E-Journals, Information Access & Retrieval
Tagged With: CNI2011fall, Project Briefings & Plenary Sessions

Last updated:  Thursday, December 15th, 2011

 

Contact Us

21 Dupont Circle
Suite 800
Washington, DC, 20036
202.296.5098

Contact us
Copyright © 2023 CNI

  • Copyright Policy
  • Privacy Policy
  • Site map

Keeping up with CNI

CNI-ANNOUNCE is a low-volume electronic forum used for information about the activities and programs of CNI, and events and documents of interest to the CNI community.
Sign up

Follow CNI

  • View cni.org’s profile on Facebook
  • View cni_org’s profile on Twitter
  • LinkedIn
  • YouTube
  • Vimeo

A joint project