Skip to main content

HathiTrust Research Center Extracted Features API and Visualization Workshop

The HathiTrust Research Center (HTRC) Extracted Features is a derived dataset consisting of volume metadata and word-level statistical data (aggregated at the page level) that has been extracted from content in the HathiTrust Digital Library. It is presented as JSON-LD files representing a snapshot of the HathiTrust corpus from February 2020. As part of an NEH-funded project, “Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction (TORCHLITE),” the HTRC team is building a new API and interactive dashboard to allow our user community to develop its own tools for interacting with data from the 17.5 million-volume HathiTrust Digital Library. This workshop will provide an introduction to the Extracted Features dataset and the new TORCHLITE API, and set the stage for an NEH-funded hackathon in Fall 2023. There will be a hands-on portion of the workshop. Beginners with little to no coding experience can follow along as we use interactive code notebooks to create visualizations using Extracted Features. More advanced users can write their own code to interact with the data.

Speaker(s)


Location

1:30pm - 4:30pm
Eastern Time
Julis Romo Rabinowitz Hall - A01