Smithsonian Collections Blog

Highlighting the hidden treasures from over 2 million collections

Collections Search Center

Wednesday, July 27, 2016

SELGEM: The Logical Structure

This is the second of a three-part-series on SELGEM, a pioneering computer system used to manage museum collections in the United States. Read the first post here.

A SELGEM (an acronym for: SELf GEnerating Master) computer record was what we all define as a typical record, all the information about: one object, one specimen, one work of art, one publication, or, one person, etc. It included all the related fields or data elements. The SELGEM logical record was composed of one or more physical records. A logical record was all the “physical records” with identical Serial Numbers.

Because SELGEM was a general purpose data management system it was used for a wide range of applications at the Smithsonian Institution. Computer applications (using SELGEM) in the 1970s included: museum objects from National Museum of Natural History (NMNH), National Museum of American History (NMAH), National Air and Space Museum (NASM), art museums, the Smithsonian Institution Archives, Volcanoes of the World, bibliographic applications, systematic checklists, type-specimen catalogs, Catalog of American Portraits, National Portrait Gallery (NPG), Inventories of American Painting and Sculpture, and a list of threatened and endangered plants in the United States. Many of these pioneer applications continue today, having evolved into modern databases and web-based applications.

SELGEM master file, directory record
Ayensu, Edward S. and Robert A. DeFilipps. 1978.  Endangered and threatened plants of the United States.  xv, 403 pages.  Smithsonian Institution, Washington, DC. 
Typically a SELGEM logical record was composed of from about 5 to 50 physical records (individual lines); maybe an occasional application included as many as a hundred. The Category Numbers could be any number between “001” and “999”. Each Category Number represented a data element. The first Category Number for a logical record was not required to begin with “001” and application owners generally provided gaps when assigning Category Numbers to create a more flexible design. However a Master List of records displayed Category Numbers, and the associated data, in numerical sequence.

The first line of a category always began with Line Number “01”. If the amount of data exceeded 64 characters, then continuation lines were created, and numbered sequentially “02”, “03”, etc. The theoretical maximum amount of data for one data element (or Category) was 99 lines x 64 characters or a maximum of 6,336 characters.
Sample SELGEM record, page III
Creighton, Reginald A., Penelope Packard, and Holley Linn.  1971.  SELGEM Retrieval:  a general description.  Smithsonian Institution Procedures in Computer Sciences, 1(1):(6 pages) + 1-38.Dated July 1972.
The theoretical maximum amount of data for one logical record was: 999 Category Numbers x 6,336 characters, for a total of 6,329,664 (or 6.3 megabytes). No application reached this size, let alone an individual record. In addition, individual SELGEM computer programs also had memory limits; which also limited the maximum size of a record that could be processed. Remember, even mainframe computers had some limitations.

Some advantages of this design:
  • Easy to add new data elements to any SELGEM record by creating new catalog numbers (Either due to lack of planning or the development of data elements, such as DNA information.)
  • Empty, missing, or blank data elements were not stored in the SELGEM record
  • Flexibility to respond to changing and evolving user requirements
  • The data structure was under the control and responsibility of the end user and less restricted by the application, and
  • With limited technical support staff, a general purpose system supported more applications across the organization than if custom-design systems were developed for each application.
A sample page of a master list showing six logical records
Wilson, Don E., Beth Ann Sabo, and Gregory Blair.  1987.  Automated Data Processing Procedures at the U.S. National Museum of Natural History, pages 111-119.  In:  Genoways, Hugh H., Clyde Jones, and Olga L. Rossolimo (editors).  Mammal Collection Management.  Texas Tech University Press
All the stored data was character data; there was no ability to store binary data, special numerical data types, image data, memo fields, currency or “date formatted” information, as are typically supported in many current database programs. Most application owners in NMNH maintained a detailed data standards document external to SELGEM; defining the data definition, data format, controlled vocabulary lists, and rules for recording the data, etc.

David Bridge, Volunteer

No comments:

Post a Comment