Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Tuesday, December 17, 2019

The Smithsonian’s Journey of Computerized Library and Archives (2010-2019)

Read Part I : The First Integrated Library System
Read Part II:  Stepping Outside of the Box 


Contributing to the start of Digital Public Library of America (DPLA)

The idea of a national digital library had been circulating among librarians, scholars, educators, and private industry representatives in the United States since the early 1990s.  The DPLA planning process began in October 2010 at a meeting in Cambridge, M.A. During this meeting, 40 leaders from libraries, foundations, academia, and technology projects agreed to work together to create an open and distributed network of comprehensive online resources which are provided by the U.S. libraries, archives, universities and museums.    

The planning team solicited ideas for how this open platform should work and received hundreds of responses from around the country.  Martin Kalfatovic of Smithsonian Libraries approached me for possible ideas. We decided to work with the Library of Congress (LC) and the National Archives Record Administration (NARA) to develop a joint proposal that suggested using the Smithsonian Collections Search Center for DPLA.  We took about 100 MARC records from LC and about 20 MARC records from NARA and, using the Smithsonian Index Metadata model, imported them into the Smithsonian Collection Search Center test system. 

The LC and NARA records worked well among the Smithsonian records in our Collections Search Center without great effort.  This success affirms the importance of developing a system that includes stringent data standards.   We were among the top six submissions selected by the DPLA planning committee for a final open presentation hosted at NARA. 

In December 2010, the Berkman Klein Center for Internet & Society at Harvard University convened leading experts in libraries, technology, law, and education and began work on this ambitious project. I was part of this meeting and worked among those who contributed their knowledge for a brand new DPLA system.  DPLA was launched in April 2013 with contributions from a small number of universities, libraries and museums.  The Smithsonian contributed tens of thousands of library and archives records in the initial launch and supported the Creative Common CC0 for “No Copyright Reserved” on metadata records.  To expand Smithsonian participation, Library Director Nancy Gwinn reached out to many Smithsonian museum directors for support.  Today, the Smithsonian Institution contributes 3.6 million records to DPLA monthly.  This number is expected to increase over time.

Increasing Access and Public Engagement through the Smithsonian Transcription Center

Two examples of Transcription Page
Even though the Smithsonian had millions of collections and historical documents online, accessibility remained a challenge.  Many digitized materials are view-able but not easily searchable; many handwritten materials are difficult to read and understand and many collections were not online.  By creating a transcription center, we could address some of these limitations and support better discovery across disciplines Smithsonian wide.

We began planning and developing the new software platform in 2012.  We worked closely with several archival unit partners to launch the Transcription Center ( on June 15, 2013.

The Transcription Center was designed to support various object types and material formats, including those held by not just libraries and archives, but museums as well.  Object types included diaries, field books, correspondence, currency bank notes, sound-recordings, photo albums, botanical specimen labels, cataloging sheets, joke index cards, and more. The project crowdsources both the transcribing and reviewing processes by public volunteers, allowing the Smithsonian staff the option to conduct final approval before posting the record online.  We have discovered that because our digital volunteers produce such accurate transcriptions, some staff find no need to review the work!  Transcribed contents are immediately searchable , displayed online in the Collections Search Center and downloadable as PDF files,. Public participation has been phenomenal, with 13,890 digital volunteers and 496,300 pages transcribed as of December 2019, including creating 130,755 catalog records that were previously unavailable to the public. 
Transcribed Text on display automatically with the corresponding image

A full-time project coordinator is on staff to ensure timely communications between the Smithsonian and the public via emails and social media platforms.  Meghan Ferriter, Andres Almeida and Caitlin Haynes served as the coordinators consecutively. It is very important that our volunteers feel connected to the Smithsonian Institution and that their contribution is recognized and greatly appreciated.  We express our appreciation by crediting the volunteers in the transcribed records and the PDF files.

The Transcription Center is more than a website to transcribe historic documents. It is a platform for us to increase our public outreach and engagement.  Being able to interact with our volunteers was one of the most rewarding aspect of this project.   Many of our dedicated volunteers continue to achieve huge progress day after day.  They not only transcribing thousands of pages, but also going above and beyond sharing knowledge and enhancing Smithsonian collections by entering additional information in the note fields on each page.

The Transcription Center has also become a useful tools to Smithsonian social media managers of individual museums. Many Of them have share stories uncovered from the Transcription Center in their outreach campaigns, and they also want to continue the relationship with the Transcription Center in the future. This digital platform demonstrates how transcription work can not only create and diffuse knowledge, but also develop strong community among digital volunteers  and Smithsonian.   

Smithsonian Online Virtual Archives (SOVA) and ArchivesSpace

Archives by their very nature are different from libraries: Most libraries include individual items such as books and journals, while archival collections contain multiple records that are both unique, interrelated and often arranged in a nested hierarchy structure. A library system cannot adequately support archival needs.  Increasingly, archival staff began calling for a system that specifically addressed the unique needs of archives, To remedy this problem, a Melon Foundation grant started software development of The Archivists’ Toolkit™ system in 2006. It was the first open source archival data management system to provide broad, integrated support for the management of archives.

The Smithsonian began to experiment with The Archivists’ Toolkit™ (AT) software system in 2011.  First, we migrated 7,000 MARC records into the new AT system. This allowed for series, box and folder level information to be managed hierarchically within an information management system.  The AT system was superseded in late 2013 by ArchivesSpace.  Both open source software systems allowed archivists to manage archives using a collection-centric approach.

The Internal Smithsonian ArchivesSpace Collections Management System

A difference between an item-centric management approach and a collection-centric approach is that archival materials need finding aids, which increase accessibility to the collection.  At the time, the Smithsonian’s fourteen archival units had different understandings and approaches to description and management.   Barbara Aikens, Head of Collections Processing at the Archives of American Art, took the lead to eliminate these inconsistencies.  She wrote five internal and external grants totaling $499,900 between 2010 and 2016 on behalf of the Smithsonian archival units. The grants allowed the Smithsonian to conduct a pan-institutional Encoded Archival Description (EAD) Gap Analysis Study and hire EAD Metadata Coordinators to focus on content creation and support.  These coordinators, Mark Custer and Nancy Kennedy, proved essential to this project.  Managed by OCIO LASSB, they developed and managed an EAD implementation plan for each Smithsonian archival unit.  This work included assisting units to convert legacy finding aids to EAD standards, answering all questions, and fixing dilemmas as they occurred throughout the system’s implementation.  Meanwhile, we collaborated among all of the archival units and supported their backlog processing projects. The migration from the Horizon MARC system to ArchivesSpace was a very complex process which required folding 400,000 flat records into hierarchical EAD finding aids.  In the end, archives across the institution created 16,800 new EAD finding aids in ArchivesSpace, and the quality of content description at the Smithsonian improved dramatically.
Record Display Supporting Hierarchy levels of Collection, Series, Box and Folder in SOVA
While the quality and quantity of the archival descriptions grew exponentially, public searching and display remained a challenge.  To remedy this, we focused on developing and launching the Smithsonian Online Virtual Archives ( in October 2015.    For the first time, the new systems allowed users to search archival materials at all levels (collections, series and items), enabled the  visual display of the hierarchy of the record and dynamically generated EAD online finding aids for nearly 17,000 collections, many of which contain tens of thousands of individual items in multiple series, box and folders.

Screenshots of the search result page with tabs to browse by collections, sub-series or digital items

The ArchivesSpace and SOVA systems enable a digital workflow that is organized, systematic, scalable, and incorporates the digitization process.  In addition, the Smithsonian Digital Access Management System (DAMS) supports seamless metadata synchronization and media file links among all systems.  These systems guide archivists to catalog, digitize, store, link and share digital images, allowing information systems to keep track of mass amounts of media files every step of the way.  Today, there are over 6.7 million images, sound and video recording electronic resources in SOVA.

In 2019, the Smithsonian received accolades for its SOVA and Collections Search Center (CSC).  An online survey, conducted during a NEA-funded workshop on developing in-house collections management systems and online discovery portals, asked professionals to name their favorite online aggregate search center. The Smithsonian’s CSC and SOVA systems were voted the best among all LAM (libraries, archives, and museums) institutions.

Final Thoughts

It has been a long and rewarding journey in the fields of information management and public service.  Knowing the history of the Smithsonian’s transformative information management systems and efforts gives us valuable insight that we can use as we work towards new goals and solving challenges.  The Smithsonian Institution will continue to push forward to support research, education and public service by increasing its mass digitization efforts.  With tens of millions of collections online already, we look forward to making Open Access our next major milestone in 2020.

Read Part I : The First Integrated Library System
Read Part II:  Stepping Outside of the Box 

Ching-hsien Wang,  Branch Manager
Library and Archives Systems Support Branch (LASSB)
Office of the Chief Information Officer

No comments:

Post a Comment