Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Monday, October 18, 2010

From Analog to Digital – Oral History Transcripts

Samples of Oral History Transcriptions
Samples of Final Proccessed
Oral History Transcriptions
How do you digitize tens of thousands of pages of paper without any extra staff or funding?  It’s a daunting task but we’re almost done with our project.  In an earlier blog, I talked about how we are digitizing analog magnetic tapes in the Smithsonian Institution Archives Oral History and Videohistory Collections.  We are also digitizing our transcripts of these interviews, for both preservation and access/reference purposes.  We want to ensure that existing word processing files are preserved in a format that archivists will be able to access for decades to come.  We also plan to prepare electronic reference copies to deliver to researchers and make available on our website.  We have well over 20,000 pages of transcript in our collection, so this is a challenging task.

We have three types of transcripts:  1) typewritten documents on acid free paper created in the 1970s-early 1980s; 2) word-processed documents in DOS file formats and hard copies on acid free paper created in the late 1980s-early 1990s; and 3) word-processed documents in Windows file formats created since the mid-1990s.   All the transcripts include a finding aid to the interviews, copies of deed of gift forms, images, and an index, and some include appendices.

We first had to decide on preservation and access formats.  For preservation, we create Rich Text Files, which do not retain formatting, and PDF files, which do retain formatting.  Both are open source and cross-platform.  For access, we decided to use PDF files that would retain formatting, could include security features, and could incorporate supplementary materials such as images.  PDFs are a widely used consumer format that can be opened on Windows, Macintosh and Linux platforms. 

Sandy George SIA Volunteer Creates PDFs
Sandy George SIA Volunteer Creates PDFs
Since there are only two of us in the IHD, with many demands on our time, Courtney Esposito and I had to decide where to begin in this very large collection.  We started in the middle, with the DOS files, since we wanted to preserve these currently inaccessible files electronically in an archival format.  Stored on 3” and 5” diskettes, they were transferred to a network server using old disk drives.   An Electronic Records Program intern, Courtney Brucato, tested file conversion programs, selected one that worked best with our array of Samna, WordStar, WordPerfect and Word files.  She then converted each file into a Rich Text File and text PDF file.  Out of hundreds of files, less than ten did not convert.  She documented the problems she encountered, such as commands removed from a file so the conversion could proceed.  We have stored these files on a preservation server and burned them to CD/DVDs to store offsite at National Underground Storage. 

We next created Rich Text Files and text PDF files for the modern Windows-based word-processed files and also stored them on the preservation server.   The PDF files include images and appendices. 

We have a large legacy collection of typewritten paper transcripts, with no word-processed files.  We also had a few paper transcripts for the DOS-file era that did not convert successfully.  At this time, we lack the resources to either keyboard in electronic transcripts or OCR scan them, since OCR scanning requires a significant investment of time for proofreading, correcting, and formatting.  We scan these into image PDF files, primarily for access, but also to have a preservation electronic file.  These image PDF files cannot be searched, but do retain the formatting of the paper transcripts and can contain images and appendices.  Two interns, Katherine Egan and Caitlin Adams, spent many hours scanning these paper transcripts into image PDF files, double-checking to see that all pages were scanned correctly. 
PDF of Walter Shropshire's Oral History Interview
PDF of Walter Shropshire's
Oral History Interview

While the scanning is time-consuming, it does not take all that much longer than producing a photocopy, and once done, we don’t ever have to run the paper transcript through a scanner/copier again.  We can now attach transcripts to an email for reference – something that takes a minute rather than an hour.  We see this as a “green” option, saving mountains of paper and toner.   If a researcher wishes to print out a full copy, he or she can.  We also plan to post PDF transcripts of unrestricted interviews on our website when it is redesigned within the coming year. 

 As with our analog tapes, the benefits of digitization are many – new preservation formats, a limited number of formats to work with, ability to share excerpts of the collection on websites and in public programs, and ease of reference when you can email an attached file.   We’ve carried this project out over the course of several years and are pleased with the results in both preservation and accessibility of these collections.  We’d be interested to hear how other oral history collections are dealing with the digital transition so please comment and let us know your ideas.  The October Archives Month Blogathon has been a wonderful way for all of us to learn about what colleagues are doing and how they are getting things done!

No comments:

Post a Comment