In May 2011, John Palfrey, the Chairman of the Steering Committee of the Digital Public Library of America at the time, put out a call for a Beta Sprint of ideas for a new common platform called DPLA. By June of 2011, the committee had received 60 enthusiastic responses. The Smithsonian was among the 60 submissions and participated in a Beta Sprint Contest to show our idea of what a DPLA could be.
At the time, Smithsonian was running a new Collections Search Center system that allows cross searching of millions of Smithsonian’s collections from its libraries, archives and museums. Having gone through a project like this before, we thought we might submit a similar system architecture to the DPLA Contest. The Smithsonian partnered with the Library of Congress and the National Archives to pitch the idea. The Smithsonian’s role was to provide a demo system that would host and search records from the three parties, and the roles of Library of Congress and National Archives were to provide selected records for use in this demo system. In the end, an external review committee selected our joined proposal as one of the top six best ideas. On October 21, 2011, the “Big Tent Meeting” was hosted at the National Archives in Washington DC. The six finalists were invited to do a live presentation and the event was also available via webcast for the public to view live.
My fellow team members from the Collections Systems & Digital Assets Division worked together to create a dedicated system which contained the existing Smithsonian data, and we planned on ingesting data from the Library of Congress and the National Archives into it. This is one of the presentation slides where we proposed the system architecture.
As part of this demonstration, we wanted to highlight the fact that records from different organizations could work well together. Both the Library of Congress and the National Archives sent records that they could produce in a short amount of time. Among them were catalog records of photographs, personal letters, and music manuscripts. These items told stories of Civil War veterans, the Union Pacific Railroad, musical history, and gave insight into the lives of many famous people from American history. Because the National Archives had a proprietary system, it was not easy for them to produce records in MARC format (Machine Readable Cataloging format). It took some hand coding to produce these archival records in MARC. Though the Smithsonian system did not require records to be stored in this format, using MARC enabled us to standardize our starting point. This also made the point that even though our data could come from different places, we needed a standard format to create the necessary data consistency for a common system to work well.
We mapped the two record sets from MARC to the Smithsonian EDAN (Enterprise Digital Asset Network) data format in no time. After the initial data ingest process into the Smithsonian system, we matched these records from the Library of Congress and the National Archives with the Smithsonian data. Even though the two record sets comprised fewer than 200 records, exciting results started to happen immediately. For example, the Library of Congress’s photographs of “Civil War veterans” responded to searches along with Smithsonian records of sculptures, paintings, and photographs on the same topic. The National Archives’s photographs of “railroad trains” matched with Smithsonian photographs, trade catalogs, postcards and posters. The National Archives’s letter written by “Rose Greenhow” matched with multiple Smithsonian’s photographs of Rose Greenhow and a book about the life of Rose Greenhow. The Library of Congress’s Letter by Johannes Brahms matched with Smithsonian’s photographs of Johannes Brahms. The following are some of the examples we used in our presentation.
This experiment provided the evidence that the concept of DPLA would work very well. Even though these records had never been on the same system before, this preliminary experiment worked immediately; the standard metadata and proper vocabulary control used in these records were the key to success. These records all used Library of Congress subject headings and Form and Genre terms, and all records contained properly formulated name headings. The system architecture proposed to the Beta Sprint proved to be robust and can handle dynamic situations with very different records.
The other presentations at the Beta Sprint also showed strong ideas and proved great technical points as well. We were honored to present alongside some truly great peers. The Smithsonian Libraries played a key role and was a great partner in the DPLA project. With their support, the Smithsonian became an early DPLA Content Hub contributing 1.25 million records monthly. Collections consist of staff publications and digitized books from the libraries, photographs, manuscripts, interview and diaries records from the archives, and scientific specimens, historical objects and art collections from the museums. In return, DPLA generates about 230,000 annual visitors traffic back to the Smithsonian Collections Search Center broadening our audience to our collections.
This is a win-win project for all, and we encourage more libraries, archives and museums to join this great national project!
Ching-hsien Wang, Project Manager
Collections Systems & Digital Assets Division
Office of the Chief Information Officer
This is a win-win project for all, and we encourage more libraries, archives and museums to join this great national project!
Ching-hsien Wang, Project Manager
Collections Systems & Digital Assets Division
Office of the Chief Information Officer
No comments:
Post a Comment