Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Tuesday, February 11, 2020

From Textile Mills to Seventeen Magazine: Exploring the History of American Girlhood through Transcription Center Projects

Far from sitting quietly on the sidelines, American girls have been on the frontlines of political, cultural, and social change. A new signature exhibit, Girlhood (It's complicated), opening at the National Museum of American History on June 12, 2020 as part of the Smithsonian's American Women's History Initiative, explores the diverse and complex stories of girlhood in our nation's history. The exhibit will tour the country through the Smithsonian Institution Traveling Exhibition Service from 2023 through 2025.

Skateboard designed by Cindy Whitehead, a member of the skate team Sims in the late 1970s and early 1980s, who continues to be a vocal advocate for women in the sport. Courtesy National Museum of American History, 2019.0092.01.

In collaboration with educators, archivists, and museum collections' staff from across the Smithsonian, we're joining the effort to celebrate and highlight the stories of American girls. Through Transcription Center projects - ongoing and completed - we're inviting volunteers of all ages to help us discover and share a diverse set of experiences and representations of girlhood throughout history, enriching the content and knowledge surrounding the Girlhood (It's complicated) exhibit.

Janice Lowry Diary, 1960-1961, Archives of American Art
Join in by transcribing and exploring diaries, letters, and other materials from the collections of the Smithsonian Institution Archives, the National Museum of African American History and Culture, the Archives of American Art, and more, created by and for girls from the 19th century to the present day. Completed transcriptions will be used to create educational resources for teachers and students in grades 8-12, as they investigate and learn from the lives and contributions of girls in the United States. As this work develops, we'll be posting updates, information, and additional resources here in the Transcription Center, on social media, and on the Smithsonian Learning Lab. Follow along with us to learn more, and share your own stories, discoveries, and knowledge on American girlhood using the hashtag #BecauseOfHerStory
Coordination of girlhood history projects in the Transcription Center (including selection, digitization, cataloging, outreach, and creation of educational resources) was funded by the Smithsonian American Women's History Initiative. Head to the Transcription Center to learn more and start transcribing!

School Letters from Grace Thorpe to her Mother, 1927-1929,
National Museum of the American Indian Archive Center. 
Bat Mitzvah Scrapbook of Sarah Leavitt, c. 1983, National Museum of American History    

Tuesday, February 4, 2020

"Discovering Yayoi Kusama's Watercolors"

by Anna Rimel, Archivist for the Joseph Cornell Study Center, Smithsonian American Art Museum

(Figure 1) Yayoi Kusama, (from left to right) "Autumn," 1953 (2019.32.1); "Deep Grief," 1954 (2019.32.2); "Fire," circa 1954 (2019.32.3); "Forlorn Spot," 1953 (2019.32.4), Smithsonian American Art Museum, Gift of Mr. and Mrs. John A. Benton and The Joseph and Robert Cornell Memorial Foundation

Hired as the Archivist for the Joseph Cornell Study Center in 2017, with generous funding from the Joseph and Robert Cornell Memorial Foundation, I have been working steadily through hundreds of linear feet of artist Joseph Cornell's two- and three-dimensional source material, family and estate papers, and collected artifacts and ephemera. The collection also includes a collection of over 150 record albums, and a personal library and book collection of over 2500 titles.

In 1978, the Joseph Cornell Study Center was founded with a donation from Joseph Cornell's sister and brother-in-law, Elizabeth Cornell Benton and John A. Benton, to the Smithsonian American Art Museum (SAAM). There were several subsequent donations from his estate, the Joseph and Robert Cornell Memorial Foundation, further donations from Elizabeth Cornell and John A. Benton, and transfers from other Smithsonian repositories, which make up the Joseph Cornell Study Center collection today.

Though the project to archivally process the collection is still in progress, and a partial finding aid forthcoming, an exciting discovery has been making its way through the art world. In the process of conducting a preliminary survey of all contents of the collection, four small watercolors[i] by Yayoi Kusama were found still in the original Manila envelope, alongside the receipt for purchase by Joseph Cornell from Kusama for $200 on August 22, 1964. Upon notifying curatorial staff, Melisa Ho, SAAM's curator of 20th-century art, was vocal in getting the delicate watercolors accessioned into SAAM's permanent collections, which previously held no works by Kusama." Rendered in watercolor, ink, pastel, and tempera paint," Melissa Ho explained that these works, created in the mid-fifties, "represent a crucial body of work that bridged Kusama's transition from Japan to the United States."[ii] In a blog post for the museum on December 17, 2019, she continues to write: "They were among the roughly 2,000 works on paper Kusama brought with her when she left Japan in 1957, hoping to sell them to support herself."[iii]

(Figure 2)"Surrealisme" (1932) exhibition announcement.
Joseph Cornell Study Center, Smithsonian American Art Museum.

Joseph Cornell (1903-1972) was an artist known primarily for his assemblage box constructions, who also created two-dimensional collages and avant-garde films. He had two younger sisters, Helen and Elizabeth, who married and lived on Long Island. Joseph lived with his younger brother, Robert, and his mother, Helen, in Queens, New York, from 1921 until their deaths in 1965 and 1966, respectively. He would remain in the same home on Utopia Parkway until his death in 1972. Initially thought to be somewhat reclusive, the artist is now known to have had a wide circle of friends and acquaintances in the art world. His first exhibition was a group show at the Julien Levy Gallery in 1932, "Surréalisme," alongside artists Jean Cocteau, Salvador Dalí, Marcel Duchamp, Max Ernst, Pablo Picasso, Man Ray, and Pierre Roy, for which Cornell also designed the announcement.[iv]

Cornell met artist Yayoi Kusama in early 1964, introduced by art dealer Gertrude Stein when he expressed a desire to learn to draw and asked Stein to bring him models to sketch. A number of these sketches apparently survive among her papers.[v] After sketching Kusama, they appear to have formed a bond, and continued to meet and correspond.

Other Kusama-related materials, including letters with sentiments like, "You and Me – Birds of a Feather,"[vi] as well as numerous photographs of Kusama, still remain within the Joseph Cornell Study Center collection.

(Figure 3) Letter from Yayoi Kusama to Joseph Cornell, circa 1972.
Joseph Cornell Study Center, Smithsonian American Art Museum.

The collection remains open to researchers, and more information can be found on the Joseph Cornell Study Center website, at


[Cross-posted in the Society of American Archivists' Museum Archivist: Newsletter of the Museum Archives Section (Winter 2020: Volume 30, Number 1)]

[i] See (Figure 1).
[ii] Melissa Ho, "The Lost Kusamas." Eye Level (blog), Smithsonian American Art Museum, December 17, 2019.
[iii] Ibid.
[iv] Deborah Solomon, Utopia Parkway: The Life and Work of Joseph Cornell (New York: Other Press, 1997), 87.; See (Figure 2).
[v] Deborah Solomon, Utopia Parkway: The Life and Work of Joseph Cornell (New York: Other Press, 1997), 380-381.
[vi] See (Figure 3).

Recent discovery of four of Yayoi Kusama's watercolors!

Check out this blog from the Smithsonian American Art Museum about Archivist Anna Rimel's exciting discovery of four watercolors by famed artist Yayoi Kusama!

 (1 of 2)

Yayoi Kusama, Fire, ca. 1954, watercolor, pastel, ink, tempera on paper, Smithsonian American Art Museum, Gift of Mr. and Mrs. John A. Benton and The Joseph and Robert Cornell Memorial Foundation, 2019.32.3

Tuesday, December 17, 2019

The Smithsonian’s Journey of Computerized Library and Archives (2010-2019)

Read Part I : The First Integrated Library System
Read Part II:  Stepping Outside of the Box 


Contributing to the start of Digital Public Library of America (DPLA)

The idea of a national digital library had been circulating among librarians, scholars, educators, and private industry representatives in the United States since the early 1990s.  The DPLA planning process began in October 2010 at a meeting in Cambridge, M.A. During this meeting, 40 leaders from libraries, foundations, academia, and technology projects agreed to work together to create an open and distributed network of comprehensive online resources which are provided by the U.S. libraries, archives, universities and museums.    

The planning team solicited ideas for how this open platform should work and received hundreds of responses from around the country.  Martin Kalfatovic of Smithsonian Libraries approached me for possible ideas. We decided to work with the Library of Congress (LC) and the National Archives Record Administration (NARA) to develop a joint proposal that suggested using the Smithsonian Collections Search Center for DPLA.  We took about 100 MARC records from LC and about 20 MARC records from NARA and, using the Smithsonian Index Metadata model, imported them into the Smithsonian Collection Search Center test system. 

The LC and NARA records worked well among the Smithsonian records in our Collections Search Center without great effort.  This success affirms the importance of developing a system that includes stringent data standards.   We were among the top six submissions selected by the DPLA planning committee for a final open presentation hosted at NARA. 

In December 2010, the Berkman Klein Center for Internet & Society at Harvard University convened leading experts in libraries, technology, law, and education and began work on this ambitious project. I was part of this meeting and worked among those who contributed their knowledge for a brand new DPLA system.  DPLA was launched in April 2013 with contributions from a small number of universities, libraries and museums.  The Smithsonian contributed tens of thousands of library and archives records in the initial launch and supported the Creative Common CC0 for “No Copyright Reserved” on metadata records.  To expand Smithsonian participation, Library Director Nancy Gwinn reached out to many Smithsonian museum directors for support.  Today, the Smithsonian Institution contributes 3.6 million records to DPLA monthly.  This number is expected to increase over time.

Increasing Access and Public Engagement through the Smithsonian Transcription Center

Two examples of Transcription Page
Even though the Smithsonian had millions of collections and historical documents online, accessibility remained a challenge.  Many digitized materials are view-able but not easily searchable; many handwritten materials are difficult to read and understand and many collections were not online.  By creating a transcription center, we could address some of these limitations and support better discovery across disciplines Smithsonian wide.

We began planning and developing the new software platform in 2012.  We worked closely with several archival unit partners to launch the Transcription Center ( on June 15, 2013.

The Transcription Center was designed to support various object types and material formats, including those held by not just libraries and archives, but museums as well.  Object types included diaries, field books, correspondence, currency bank notes, sound-recordings, photo albums, botanical specimen labels, cataloging sheets, joke index cards, and more. The project crowdsources both the transcribing and reviewing processes by public volunteers, allowing the Smithsonian staff the option to conduct final approval before posting the record online.  We have discovered that because our digital volunteers produce such accurate transcriptions, some staff find no need to review the work!  Transcribed contents are immediately searchable , displayed online in the Collections Search Center and downloadable as PDF files,. Public participation has been phenomenal, with 13,890 digital volunteers and 496,300 pages transcribed as of December 2019, including creating 130,755 catalog records that were previously unavailable to the public. 
Transcribed Text on display automatically with the corresponding image

A full-time project coordinator is on staff to ensure timely communications between the Smithsonian and the public via emails and social media platforms.  Meghan Ferriter, Andres Almeida and Caitlin Haynes served as the coordinators consecutively. It is very important that our volunteers feel connected to the Smithsonian Institution and that their contribution is recognized and greatly appreciated.  We express our appreciation by crediting the volunteers in the transcribed records and the PDF files.

The Transcription Center is more than a website to transcribe historic documents. It is a platform for us to increase our public outreach and engagement.  Being able to interact with our volunteers was one of the most rewarding aspect of this project.   Many of our dedicated volunteers continue to achieve huge progress day after day.  They not only transcribing thousands of pages, but also going above and beyond sharing knowledge and enhancing Smithsonian collections by entering additional information in the note fields on each page.

The Transcription Center has also become a useful tools to Smithsonian social media managers of individual museums. Many Of them have share stories uncovered from the Transcription Center in their outreach campaigns, and they also want to continue the relationship with the Transcription Center in the future. This digital platform demonstrates how transcription work can not only create and diffuse knowledge, but also develop strong community among digital volunteers  and Smithsonian.   

Smithsonian Online Virtual Archives (SOVA) and ArchivesSpace

Archives by their very nature are different from libraries: Most libraries include individual items such as books and journals, while archival collections contain multiple records that are both unique, interrelated and often arranged in a nested hierarchy structure. A library system cannot adequately support archival needs.  Increasingly, archival staff began calling for a system that specifically addressed the unique needs of archives, To remedy this problem, a Melon Foundation grant started software development of The Archivists’ Toolkit™ system in 2006. It was the first open source archival data management system to provide broad, integrated support for the management of archives.

The Smithsonian began to experiment with The Archivists’ Toolkit™ (AT) software system in 2011.  First, we migrated 7,000 MARC records into the new AT system. This allowed for series, box and folder level information to be managed hierarchically within an information management system.  The AT system was superseded in late 2013 by ArchivesSpace.  Both open source software systems allowed archivists to manage archives using a collection-centric approach.

The Internal Smithsonian ArchivesSpace Collections Management System

A difference between an item-centric management approach and a collection-centric approach is that archival materials need finding aids, which increase accessibility to the collection.  At the time, the Smithsonian’s fourteen archival units had different understandings and approaches to description and management.   Barbara Aikens, Head of Collections Processing at the Archives of American Art, took the lead to eliminate these inconsistencies.  She wrote five internal and external grants totaling $499,900 between 2010 and 2016 on behalf of the Smithsonian archival units. The grants allowed the Smithsonian to conduct a pan-institutional Encoded Archival Description (EAD) Gap Analysis Study and hire EAD Metadata Coordinators to focus on content creation and support.  These coordinators, Mark Custer and Nancy Kennedy, proved essential to this project.  Managed by OCIO LASSB, they developed and managed an EAD implementation plan for each Smithsonian archival unit.  This work included assisting units to convert legacy finding aids to EAD standards, answering all questions, and fixing dilemmas as they occurred throughout the system’s implementation.  Meanwhile, we collaborated among all of the archival units and supported their backlog processing projects. The migration from the Horizon MARC system to ArchivesSpace was a very complex process which required folding 400,000 flat records into hierarchical EAD finding aids.  In the end, archives across the institution created 16,800 new EAD finding aids in ArchivesSpace, and the quality of content description at the Smithsonian improved dramatically.
Record Display Supporting Hierarchy levels of Collection, Series, Box and Folder in SOVA
While the quality and quantity of the archival descriptions grew exponentially, public searching and display remained a challenge.  To remedy this, we focused on developing and launching the Smithsonian Online Virtual Archives ( in October 2015.    For the first time, the new systems allowed users to search archival materials at all levels (collections, series and items), enabled the  visual display of the hierarchy of the record and dynamically generated EAD online finding aids for nearly 17,000 collections, many of which contain tens of thousands of individual items in multiple series, box and folders.

Screenshots of the search result page with tabs to browse by collections, sub-series or digital items

The ArchivesSpace and SOVA systems enable a digital workflow that is organized, systematic, scalable, and incorporates the digitization process.  In addition, the Smithsonian Digital Access Management System (DAMS) supports seamless metadata synchronization and media file links among all systems.  These systems guide archivists to catalog, digitize, store, link and share digital images, allowing information systems to keep track of mass amounts of media files every step of the way.  Today, there are over 6.7 million images, sound and video recording electronic resources in SOVA.

In 2019, the Smithsonian received accolades for its SOVA and Collections Search Center (CSC).  An online survey, conducted during a NEA-funded workshop on developing in-house collections management systems and online discovery portals, asked professionals to name their favorite online aggregate search center. The Smithsonian’s CSC and SOVA systems were voted the best among all LAM (libraries, archives, and museums) institutions.

Final Thoughts

It has been a long and rewarding journey in the fields of information management and public service.  Knowing the history of the Smithsonian’s transformative information management systems and efforts gives us valuable insight that we can use as we work towards new goals and solving challenges.  The Smithsonian Institution will continue to push forward to support research, education and public service by increasing its mass digitization efforts.  With tens of millions of collections online already, we look forward to making Open Access our next major milestone in 2020.

Read Part I : The First Integrated Library System
Read Part II:  Stepping Outside of the Box 

Ching-hsien Wang,  Branch Manager
Library and Archives Systems Support Branch (LASSB)
Office of the Chief Information Officer

The Smithsonian’s Journey of Computerized Library and Archives (1994-2009)

Read Part I: The First Integrated Library System


Jump starting and Supporting Digitization

In 1994, OIRM SIRIS began a new venture in the field of library and archives automation: the support of online media files.   At the time, the Smithsonian had several Collection Information Systems including the library’s system, but no catalog records were linked to images or video files, which prohibited public access.   

One of the NAA images digitized during early digitization
With a newly implemented internet, we modified a new WebPac application configuration to enable images to display with catalog records online, demonstrating the technical potential to library and archives staff. This new and exciting feature required Smithsonian staff to digitize images and then link the image files to catalog records by referencing the image URL in the MARC 856 field.  It was a challenge to get started because no one knew how this would work, so we had to lead by example.

By 1995, OIT (Office of Information Technology, successor of OIRM) purchased a couple of image scanners.   SIRIS helped the NMAA Art Inventory project digitize about 200 photographs of sculptures and linked them to their catalog records.  The first Smithsonian public online system that could display object records with images was born!   In 1996 at the Smithsonian Institution’s 150th Anniversary Event on the National Mall, we showcased the brand-new functionality to the public.  The online demonstration using the Netscape Navigator web browser even included a few cephalopod video clips from NMNH.   The excitement for the new functionality energized archives staff.  Although more archives professionals accelerated their image digitization efforts, most of them did not have the resources to host images online.  The digitized image files accumulated on hard drives, CD-ROMs, and laser disks.  Many of these storage devices sat on bookshelves or under desks; they were not accessible to the public online.

In 1996, OIT SIRIS created the first Smithsonian central “Multi-Media Server” that hosted online images for SIRIS members. This service included online storage and web server support, image maintenance support, digitization training programs, image linking trainings, and usage statistic reporting.  Until 2014, this multimedia server hosted over 900,000 images, video and sound files for 18 SI units.  Jim Felley, (SIRIS senior system administrator), provided critical support and management of the service until it was retired in 2014, after all images were migrated to a new Digital Asset Management System (DAMS).

Leadership in Data Standard and Vocabulary Control

In 1999, the Smithsonian library system was upgraded to the Ameritech (now SIRSIDYNIX) Horizon system.  This new system came with flexible system-configuration capability and a strong authority (vocabulary) control function.  Most importantly, it allowed the Smithsonian to establish many locally defined fields, supported record relationship linking capability and supported specialized indexes that met the needs of the Smithsonian’s nontraditional challenges.  SIRIS had grown to support eight databases:  Library, Archives, Art Inventories, SAAM Photo Archives, Art Exhibition, Research Bibliography, History of Smithsonian, and Directory of Airplanes. 
Over the years, 14 archives, 20 library branches and several museum research departments depended on SIRIS to do a wide variety of collection management functions.  More and more data sets were added to the eight databases using custom programming and data importing.  By 2006, nearly 50% of the 955,000 non-library records were transferred from local databases such as DBASE, MS Access, Excel, C-Quest, FileMaker Pro, WordPerfect, Text, etc. 
Library of Congress Subject Headings Catalog

Mapping these different datasets into the MARC format was a big challenge, but dealing with data inconsistencies was an even bigger one!  Much of the data from these random databases lacked consistency from record to record and across datasets, and very few datasets followed national data standards.  So, our priority shifted to data cleanup of the records created by the staff at 14 Smithsonian archives .  Our goal was to following national data standards and cataloging guidelines.  This approach proved to be a wise decision on multiple levels.  First, we avoided internal disagreements as to how to standardize the data among several archival units.  Secondly, we were able to hire professionals whose knowledge was applicable to our goal.  Finally, standardizing the data in different databases across the Smithsonian made building the Smithsonian wide Collections Search Center platform much easier. We didn’t know the benefit of this final point at the time. 

We used a few main approaches that were very productive:
  • Conducted extensive data analysis, created reports using thousands of programming scripts, looked for exceptions and patterns in data and listed them out for catalogers to review or make changes. This approach took advantage of both human intelligence and computer speed to handle complex data issues.
  • Conducted several thousand global data modifications based on cataloger’s requests.  This allowed us to make changes to thousands of records at a time, thus speeding up progress and efficiency.
  • Prioritized access points and authority records for Names, Subjects, Form & Genre, Geographical, and Culture terms which greatly improved searchability and discovery.
  • Sent out authority records to professional vendors for authority heading matching, then flipping incorrect terms to Library of Congress standards and reloading the records back into our system. While expensive, it provided high quality data.
  • Conducted regular cataloging and metadata training and encouraged collaboration among cataloging units to maintain high-quality cataloging practices.  The regular face to face meetings reinforced the importance of data quality and improved interactions among staff across the Institution.
For more than ten years, we continued to transform and standardize metadata within the eight Horizon databases.  We established methodologies as to how to handle chaotic situations and developed creative solutions to solve problems.  The result of our persistent efforts became the solid foundation for the next phase: creating a centralized searching system for the Smithsonian Institution and filling the goal and wish from 1980.

Pushing Beyond the Norm and Changing Culture - First Large-Scale Library, Archives and Museum Online Search Center

By 2005, the Smithsonian’s libraries, archives and museum collection records had been growing rapidly across the Institution thanks to the advancement of and wide use of database technology.  Large numbers of computer records were created and maintained in highly specialized commercial and local database systems.  However, collection records were available on over 100 disparate websites, which made them difficult for the public to use. 

In 2006, OCIO LASSB (Library Archives System Support Branch, successor of SIRIS) began to design a one-stop discovery platform that would include all Smithsonian collection data regardless of data format, professional disciplines or data owning organization.  We decided that this Cross Search Center should support simple keyword searching and be able to filter search results by data categories such as Name, Topic, Place, Culture, Date, Media type.  Since no one had done this at a large scale before, we had to innovate and find the best solutions to problems as they arose. 
We started with the eight SIRIS Horizon datasets. Our first challenge was to address the diverse data types and find ways to make the data consistent in the Cross Search Center.  We reviewed technology platforms, data standard options and data mapping possibilities.  We identified common data elements in records from across different disciplines including art, science, culture and history, and defined a new metadata format that supports a wide range of material and object types (i.e. books, journals, bibliographies, photographs, art objects, and archival materials). 

Andrew Gunther(senior software developer), took the lead in selecting an open source technology (Solr) platform that supported easy searching, faceted filtering and fast indexing functions.  The platform also allowed searching with automatic stemming for word matching, configurable relevancy ranking of search results, positive and negative limit options, and scalability for large data sets.

Insisting on consistent metadata standards was the key to our success.  After evaluating several existing metadata standards (MARC, VRE, MEDS, CDWLITE, CCO), we identified the most common data elements and created the Smithsonian Index Metadata Model.  George Bowman (senior system administrator), took the lead in designing this flexible metadata model that accommodated many specific use cases.  The LASSB (Library and Archives System Support Branch, the successor to SIRIS) team consulted OCLC FAST (Faceted Application of Subject Terminology) schema and used it to break up our LCSH subject heading by subfields from our MARC records, thus allowing faceted searching and filtering in the Cross Search Center. 

The system was designed to aggregate data from multiple databases into a central Solr index.  Jim Felley (senior system administrator), led our team in extracting data from the Horizon databases.  All data was mapped to follow the Smithsonian Index Metadata Model and each dataset required custom extraction programs to support the necessary data mapping.  We carefully tracked the highly complex data mapping requirements in a spread sheet, which allowed us to update the data and refresh it daily in the data repository for the Cross Search Center.  Randy Arnold (system administrator) ensured all systems are integrated and monitored multiple servers and system operations.
Data Aggregated from different databases into EDAN for Collections Search Center

In 2007, the Cross Search Center (  went live with nearly two million records from the Smithsonian libraries, 14 archival units, and several other research offices from two museums.  For the first time, the public was able to search all library and archival collection records in one platform at once.  These search capabilities were the result of Smithsonian staff’s diligence in working on metadata and authority control over the past ten years. The public and the reference staff loved the new user-friendly system.  Anne Spire, Director of the Office of the Chief Information Officer, advised that the Cross Search Center (CSC) be expanded to support all Smithsonian museum collections. The Cross Search Center was renamed the Smithsonian Collections Search Center (, and the back-end data indexing and data repository platform was named the Enterprise Digital Access Network (EDAN). 

Getting more museums to contribute data to EDAN and Collection Search Center required effort to build relationships between OCIO LASSB and the museums.  Even though the technology and system design were fully ready to take on the wide variety of data, changing institutional culture took a lot more time and work.  Smithsonian collection staff had not traditionally worked together across the institution, and letting go of their carefully curated data that was compiled over many years required a new way of thinking. 

LASSB made sure that this collaborative work with the museum staff created mutual benefits. 
  • To make sure the museums can control their own data in the centralized system, we allow the museums to decide on which data elements to contribute and the display labels for their data element.   
  • In the Collection Search Center, museum names were prominently on display and every record had a link to the hosting museum’s website which greatly increased online traffic to the museums’ collection website. 
LASSB first approached smaller museums that were more willing and had more to gain in participating in this project.  Some of the early participating museums included the National Portrait Gallery, the National Postal Museum and the Freer Sackler Gallery.  Mike Trigonoplos (system administrator) extracted most museum data in this phase.  These museums’ holdings were seemingly unrelated, but in the Collection Search Center, search results produced surprising connections.  The positive feedback and testimonies from staff helped to propel the project forward.  The message was clear: collaboration among units produced powerful results.   

Screen shot of Collections Search Center in 2009

By December 2009, the Collection Search Center became the first large scale LAM system in the United States with more than two million catalog records from several SI museums. The system added data from more museums over the next few years.   Today, this system includes 15.5 million records and five million online images, audio and video files from all major Smithsonian libraries, archives, museums, blogs and YouTube websites.  Once again, we had to tackle data consistency issues submitted by the different museums.  Capitalizing on our previous experience in vocabulary control, we quickly developed a systematic method to address these issues.   George Bowman created a sophisticated data mapping database system that defines exception terms and enables replacements by the controlled vocabulary and data categories.  This database contains about 50,000 specific use cases and instructions.  The standardized terms significantly improved the performance of the Collections Search Center and the accuracy of search results.

In 2012, a public tagging functionality was added to the Collections Search Center.  It allowed the public to add keywords to catalog records online, with those tags searchable within ten minutes. During the trial period, 1.6 million records from nine Smithsonian units were released for tagging.  In just six-month, the public entered more than 1,000 tags.  Public users filled in blanks for creator names, classified object types, identified historical events, individuals, ethnic groups, genders, aesthetic characteristics and style, characterized film clips and pointed out mistakes. 
A Tagging Screenshot from the Collections Search Center in 2009
The tags function improved searchability and increased public participation.  However, the Smithsonian staff did not have the resources to shift through all the tags and add them to catalog records, so the project ended after 5 years.

Ching-hsien Wang,  Branch Manager
Library and Archives Systems Support Branch (LASSB)
Office of the Chief Information Officer

The Smithsonian’s Journey of Computerized Library and Archives

The Smithsonian Institution, with its 19 museums, 20 libraries and 14 archival units, prioritizes sharing our resources and discovering knowledge with the public.  Today, 15.5 million library, archives and museum objects and 10.4 million images are online to support research, education and public service.  It has been a challenging but rewarding journey to transform a manual and paper-based Smithsonian into the digital Smithsonian of today.  This evolution of automated library and archives systems and the collaboration that made it all possible at the Smithsonian is impressive, and I wrote these three blogs to share this history with you.


Ahead of Its Time from the Beginning

At its inception, the Smithsonian Institution Libraries (SIL) depended on paper card catalogs.  In 1965, the SIL began to slowly convert from the Dewey Decimal System to the Library of Congress’s cataloging classification system.  Simultaneously, it began to transition from using handwritten catalog cards to computer printed ones.  Smithsonian invested in the latest technology: card punching machines to support the data entry.

Starting in 1975, SIL began working towards in-house automation for library operations and joined OCLC (Ohio College Library Center) as a member in 1976. 

In 1980, under the leadership of Director Robert Maloy, SIL envisioned a unified electronic system that would link the ordering, accounting, receiving, indexing, circulation and inventory control functions into one data flow. Mr. Maloy and others also began advocating for making information in libraries, archives, and museums accessible in Smithsonian computers for public access.  This vision proved to be very challenging to accomplish since SIL was still using random manual and semi-automated systems for its daily business. Even though this seemed to be an impossible goal at the time, it set the Smithsonian on its path for our accomplishment 30 years later.

In 1980 Stephen Toney,(the first system librarian at SIL),  began to work closely with the Smithsonian central IT office, OIRM (Office of Information Resource Management), for stronger computer support.  OIRM and SIL worked to purchase a dedicated computer system that was intended to support not only the needs of the library, but also of the archives and museum research offices.  An RFP (request for proposal) was sent out in February 1983 to library system vendors. The proposals were reviewed by staff from the SIL, OIRM, Smithsonian Institution Archives(SIA), National Museum of American Art (NMAA) and others.  In September 1983, a GEAC system was selected and named SIBIS (Smithsonian Institution Bibliographic Information System).

Implementing the First Library System

The GEAC system contained multiple modules to support Acquisition, Cataloging, Circulation and Email functionality.  It was based on supporting data in MARC (Machine-Readable Cataloging) format. The mainframe GEAC computer was installed in the basement of the National Museum of Natural History (NMNH), the museum in which most of the SIL staff worked.    At the time, there was no Local Area Network (LAN) at the Smithsonian, so computer terminals could only be connected to the mainframe computer by long wires from within the building.

SIL was successful in importing the catalog card data from OCLC to the SIBIS system via computer tapes.  The new automation system brought a change in the library’s work culture: many staff were surprised that automation didn’t reduce their amount of work; instead, it needed different types of work.  The automated system required more accurate data, identified mismatched inventory lists and shelving issues, identified missing or unreturned books, and produced lists of records for enhancement.  The inconsistent data from pre-automation days caused inaccurate search and display problems; therefore, top priorities following the implementation focused on data clean up, problem tracking, data standardization and enhancement work for many years to come.  

The library also transformed its departments and workflow to integrate the automated system which allowed copy cataloging from records in OCLC.  The head of the newly formed SIL Systems Office, Tom Garnett, learned to program on GEAC to produce reports for new title list, inventories, acquisition orders, etc.  Marcia Adams (A systems librarian) focused initially on automating the circulation system that tracks book check-ins, check-outs, borrower records, and circulation reports.    Even with much more work, everyone agreed that the automated system increased work efficiency and the quality of library management .

A GEAC Computer Room in 1980s
OIRM provided critical operational support for this groundbreaking endeavor.  The GEAC system required 24-hr coverage of computer operators and was composed of proprietary hardware and software, oversized mainframe CPU chassis, disk storage units and tape drives for 10.5-inch magnetic tape reels.  Computer operation support included regular magnetic tape loads for OCLC records, daily batch jobs that helped to maintain databases, the generation of reports and printouts, and daily backup and restore during the midnight shift.   

Adapting Existing Standards for Non-Bibliographic Content

Soon after installing the GEAC system, the Smithsonian began installing CO-LAN modems, which served as the primitive predecessor of the computer network.  This allowed connections from GEAC mainframe to computer terminals in different buildings.  The American Art Museum Research and Scholar Center and the Archives of American Art were the first museum and archives to use a library management system for automation. In the early 1980s, there were no established data standards for non-library materials undergoing computer automation. Among existing standards, there were two that came closest to fitting the Smithsonian’s needs: 
1.       UNIMARC (Universal MARC) format): Although most of these existing standards relate more closely to library materials than to archival ones, the general approach and specific guidelines was still relevant.
2.       AMC (Archival and Manuscripts Control) format: Developed by the Society of American Archivists in 1985, the instruction manual provided standards that were specifically for archives.

With the standard selected, the immediate challenge was to map the data into the MARC format and enter the data into the library system. The GEAC system was implemented in three separate databases:  Library, Archives and Art Inventory.   Archives of American Art began creating descriptions (mostly collection level) for their collection in the Archives database.   A couple of thousand descriptions were entered in just two years.  However, the limitations of using a library system as an archival system soon became apparent: record size and field occurrence limitations caused major frustration among archivists for years.

I joined the Smithsonian OIRM in 1988 as the system administrator and a technical lead and became part of this exciting project.  We worked hard to push the software vendors to fix these limitations, but the necessary technology was not available to address these issues at that time. However, several more archives joined SIBIS and continued to add records with greater complexity.  Early implementers included the Smithsonian Institution Archives, National Museum American History Archives Center, National Anthropological Archives, and Human Studies Film Archives. The Smithsonian grew to become the institution with the largest archival electronic records online.

NMAA’s Art Inventory project also joined SIBIS as an early museum adaptor to a library system.  The highly specialized Art Inventory Database, which compiled and cataloged artworks created by American artists, was one of the leading online reference resources. The dataset documented sculptures and paintings with many data elements outside of traditional MARC format.  Eleanor Fink, (Chief of the office of Research Support, NMAA), advocated to adapt and test the flexibility of the MARC Visual Material format for three dimensional objects.  OIRM SIBIS customized the GEAC system to accommodate the unusual data fields to support indexing, searching and display purposes.  This strong collaboration between NMAA and OIRM created the first successful large-scale art project adaptation in a library system.   This implementation had early success with 16,000 sculpture records successfully imported in just a couple of years.  It also pushed the GEAC system to its limit, unable to support many customized data fields and special search indexes.

Raising Expectations and Improving Automation

Encouraged by the initial success of SIBIS in 1989, The Smithsonian Castle formed a SIBIS Management and Planning Committee with the purpose of elevating its performance, increasing funding to OIRM, and expanding its usage to more Smithsonian units.  The funding structure was a “Cost Center” model where units would transfer funds annually to OIRM.  Ross Simon, (An assistant to the Smithsonian Secretary), became the first chairman for this management committee.  In 1992, SIBIS was renamed SIRIS (Smithsonian Institution Research Information System) to match the broader goal of the committee.  The SIRIS board decided to look for a new generation of library information system.   In December 1993, the NOTIS system was purchased and records were migrated to it.  This new system ran on IBM 4381 mainframe computers.  Computer terminals were on Zenith PCs which were booted with floppy disks to emulate IBM 3270 terminals.  Later, the PCs were upgraded to PS2 computers, which had local disk drives that could hold the terminal emulating software.  Floppy disks were retired.

Before the Smithsonian joined the World Wide Web (WWW), there was WAN, Gopher, and WAIS, which allowed internet access beyond the Smithsonian network. One of the first to do so, SIRIS successfully implemented remote Telnet connections.  The NOTIS system supported TCP protocol with a TAG machine (IBM RISC server) for internet searching capability. George Bowman, the main library system administrator, was the key technical staff to take advantage of the latest technology.

In 1994, OIRM SIRIS team successfully implemented a PACLINK function which allowed the SIRIS computer to remotely access online catalogs from several remote institutions such as the Harvard Library, Yale Library, and WRLC Consortium (of George Washington University, Catholic University, American University and George Mason University) on the SIRIS terminals for the first time.  We also made the Smithsonian Catalogs (Library, Archives and Art Inventories) available to many other libraries around the US and Canada in 1994.   The PACLINK function was based on Z39.50 protocol for searching and retrieving information from a remote database using TCP protocols.  These services predated the WWW at the Smithsonian and the first desktop PC web browser; it was cutting edge! 

Ching-hsien Wang,  Branch Manager
Library and Archives Systems Support Branch (LASSB)
Office of the Chief Information Officer