Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Wednesday, July 20, 2016

SELGEM: The Data Structure

This is the first of a three-part-series on SELGEM, a pioneering computer system used to manage museum collections in the United States

The first publication about the Smithsonian’s
SELGEM system, August 1971. 
SELGEM was an information management computer system invented, developed, and distributed at the Smithsonian Institution and used for more than 30 years.  The exact invention date for SELGEM has been lost to antiquity, perhaps very late 1969; however Reginald A. Creighton and James J. Crockett, two of the men who developed and created SELGEM, state that it “has been in operation … since spring 1970”.*  SELGEM has been called a “records management system”, rather than a true database, at least in the modern senses of the word.  In an era pre-dating relational databases, when commercial software tools were almost nonexistent, SELGEM was used to organize and manage data in a wide variety of applications at the Smithsonian and also many other institutions.  The name is an acronym derived from the system’s full name:  SELf GEnerating Master.  It was “self-generating” because of its general purpose format and because no initial computer programming was required to create a new data file or application.

The SELGEM data structure was simple.
User documentation illustrating the relationship between SELGEM transactions records and SELGEM master records.  Undated, ca. 1971-1972, MNH.
The physical record was a single fixed length record of 77 characters in length. There were only four components to the record:
Serial Number, 8 characters, required, no spaces, an alphanumeric value. Frequently it was a number, however, it could be an arbitrary number, but it could also be a data element**, such as a museum catalog number, a sample number, or a photo image number.
Category Number, three digit numeric, required.  Any three digit number could be used, between “001” and “999”.  The Category Number was a code number for a data element* or data field name.
Line Number, two digits numeric, in the range of “01” to “99”, required.  The first Line Number for a Category Number should begin with “01” and continuation lines should be numbered sequentially.
Data:  the good stuff, up to 64 characters of data could be stored in a single SELGEM line or a physical record.
A single physical record frequently represented a single data element if the data were less than 64 characters in length, or one line of multi-line textual data element, such as:  remarks, description, or an abstract. As an example, suppose that data element “Country” has been assigned to Category Number “100”, the data fits into one line, and the record would look like this:
1234567810001United States

The computer records were created by the SELGEM update program (SELUPD) from a sorted SELGEM transaction record file.  SELGEM transaction records were 80 characters in size (see flowchart at right).  The transaction code controlled the action to be performed by the update program, such as add, change, or delete.

SELGEM transaction records could be prepared using any data entry technology available.

Forms used to create data entry programs on the key-to-disk systems 

The 80-character IBM card format could be produced directly or indirectly.  The following technologies were used for data entry at the Smithsonian at various times:  paper type typewriters, teletype machines, IBM keypunches, key-to-disk data entry systems (such as ENTREX and NIXDORF systems), optical characters recognition (OCR), optical mark sense (OMR) forms, and personal computers.

This was the SELGEM physical record description; in the next blog the story will continue with the Logical Record structure.  A Logical Record was all the “physical records” with identical Serial Numbers (the first eight characters).

David Bridge, Volunteer
Smithsonian Institution Archives

*Creighton, Reginald A. and James J. Crockett.  1971.  SELGEM:  A system for collection management.  Smithsonian Institution Information Systems Innovations 2(3):1.

**The term data element is used as defined in  “the term data element is an atomic unit of data that has precise meaning or precise semantics.” SELGEM only the 3-digit code number was stored in the computer file; the definition, data standards, rules, and attributes were defined externally to SELGEM.


  1. The malacology department at the Academy of Natural Sciences of Philadelphia adopted SELGEM around 1975 and used it for more than 15 years.

  2. Hi Gary:
    I have also been compiling a bibliography of reports and papers related to the development and adoption of computers in Museums and what we general call IT these days.
    The bibliography includes:

    Davis, George M. 1977. Academy of Natural Sciences of Philadelphia.
    Association of Systematic Collections Newsletter 5(5):49-53.

    The primary focus of this paper is a detailed description of the facilities and activities of the ANSP. Each department is described, outstanding features and their major programs noted. In the report on Malacology he states: “Data processing using the SELGEM program has been used to catalogue all specimens entering the collection since 1 September 1976.” (p. 52).

    Dr. Davis’s mollusks collections data was initially processed by NMNH, ADP Program on the Smithsonian’s Honeywell computer in Washington, DC. Processing at the Smithsonian probably only occurred for a year or two; then we must have sent the SELGEM programs and the ANSP collections data back to you all.

    It would be interesting to learn about your experiences with SELGEM.