Skip to main content

Duplicates in the repository: remediation and reconciliation in three systems, including DataCite

Academic Commons provides long-term open access to digital scholarships produced by Columbia University affiliates. Content may be added by authors through a self-deposit form, by library staff through the cataloging backend (Hyacinth), and via SWORD deposit from entities such as library-hosted OJS, journal publishers, and others. As one might expect, after fifteen years of additions through these various channels, duplication happens! When faced with a corpus of nearly 40,000 records that must be reviewed, with duplicates remediated in three separate systems, how does one even start? This poster illustrates our approach to defining and scoping this problem, as well as the project workflows and technical solutions we utilized to remediate approximately 300 duplicate item records and 600 associated asset records.

Technologies: Fedora, Solr, Rails, Python, DataCite

Presenter(s): Sunni Wong, Esther Jackson, Frederic Duby, Kathryn Pope, Pratt Institute School of Information (Columbia University Libraries, Ask a Librarian Internship), Columbia University Libraries


3:30 PM
Frist Lobby