Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Sunday, October 10, 2010

Is it Digital Yet?

Jim Wallace, Lorie Aceto and Roberta Diemer among the negative files in Office of
Printing and Photographic Service's (OPPS) cold storage vault (1983). The OPPS is now
part of the Smithsonian Institution Archives and is estimated to contain over 3 million images. 
Negative Number 2004-10338. Photograph by Richard K. Hofmeister.
Sometimes when I’m giving a tour of the archives and showing off an unusual image or document the visitor will say, “That’s already digitized, right?” Well, no. Digitization is not too hard to do, and we have some great staff members doing it, but, actually, most of our collections are not digitized. Why is that? Well, take for example our wonderful cold storage vault holding about three million images. Let’s say that we digitize 5,000 images a year. Five thousand is a lot (about 100 images per week, or 20 images per day; about 20 minutes per image that includes handling the original negative, adding catalog data (metadata) and post-processing the image from a negative to a positive and doing minimal sharpening), but at that rate it would take about 600 years to digitize that collection. Let’s say that we digitize about 50,000 images a year; it would still take 60 years.

Some of the 250,000+ slide transparencies in the SIA
cold vaults (2007). Negative number 2007-17181.
Photograph by Ken Rahaim.
Another reason why everything’s not digitized is that each digital image takes up computer memory. A good quality tiff file (tagged image file format) at 600 dpi (dots per inch) is easily 30 Megabytes (look at the Federal Digitization Guidelines Initiative for more information about what standards we follow and contributed to). The 5,000 images that we digitize per year equal around 150 Gigabytes. At 50,000 images per year we can anticipate about 1.5 terabytes; next year another 1.5 terabytes, and so on. Eventually this will add up to 90 terabytes. Digital imaging has only been with archives, libraries and museums since the late 1990s, and the technology is vastly better than it was a decade ago, so our digitization numbers go up every year. Equipment is able to capture digital images more quickly and storage costs go down. The Smithsonian Archives approaches digitization sensibly – we look at research statistics and prioritize requests – and we are working quickly to make sure that researchers have the images they want or need, but, no, it’s not all digital. Not yet!

Sarah Stauderman is the Collections Care Manager for the Smithsonian Institution Archives, where she oversees the preservation of over 30,000+ boxes of paper, photographs, audio, video and film materials.


  1. Would it be worth setting a goal of digitizing all 3 million images in a decade, and then working backwards to see what it would take to get there?

    That would at least give us a baseline with today's technology. Needless to say, 600 years would be an awfully long time given film's capability of deteriorating, even in cold storage.

    I recently priced out a system with a petabyte of disk (1,000 terabytes) and it came in under $200,000, so let's not let storage costs be the obstacle.

  2. Thanks so much for the comment! I agree that your calculation is a better way to do the math: start with the goal and work backwards. My post truly is just a muse about the flippant way that "digitization" is perceived in the public mind (and even in policy maker's minds). And I also very much agree about the storage costs -- at a place like the Smithsonian storage costs will be the last obstacle to our efforts... if it is even an obstacle at all (thanks to a fantastic IT department that has really addressed these issues over the past several years). Another thing to consider is that it took only about 40 years to create this collection of 3+ why not be able to digitize it in that amount of time too? The technology to do the work of digitization ever more rapidly is growing daily (for instance, we are going to be moving away from scanner-based systems to camera based digitization systems within the next 5 years and drastically reduce the time an object is "being" digitized). For me the bottom line is people: people to scan, add metadata, and place into our digital asset management systems.