How to Rebuild a Jumbo Jet at 30,000 Feet: Strategies for Digital Library Migration
In support of research, teaching and learning, the Stanford Digital Repository (SDR) is a network of systems and services that house the digital collections of Stanford University Libraries (SUL). Collections in SDR include Google-scanned books, student dissertations and theses, University Archives, Allen Ginsberg’s papers, Parker Library, and the Fugitive U.S. Agencies Web Archive, to name a handful. As of late 2022, SDR holds over 5 million digital objects composed of more than 530 million content files. SDR is extremely heterogeneous along several facets, including content types (e.g., books, images, web archives, GIS datasets) and file types. For the past 4 years, SUL has been working towards migrating SDR to a new datastore and data model. We successfully completed the migration this year. In this presentation, we will describe the motivations for this work and the strategies used to accomplish the migration. These strategies may be repurposeable in other production system migrations: adopting a validatable data model, abstracting the datastore behind an API, separating concerns, testing metadata mappings against production, writing reports to understand complex data, templating unit tests, performing a rolling migration, and incorporating migration into ongoing project work.