Tuesday, June 8, 2010

The Soundex Code: Boon, Bane, or Bomb?

Those of us who had to do research the old fashioned way on clunky microfilm machines learned that the later U.S. census records were indexed with a coding system called the Soundex. The 1880 census was the first to be coded by the Commerce Department with Soundex cards. Unfortunately, the 1880's indexing was limited, as only households with children 10 and under were cataloged by Soundex numbers. If you were searching for a household that had older children or no children, you were out of luck with regards to the 1880 census Soundex.

Unlike the 1880, the censuses for 1900, 1910, 1920, and 1930 were completely Soundexed. These cards were later microfilmed by the National Archives and a person could search the records first by state, then by the Soundex code, and finally by first name and middle initial of the individual in question. If you were doing research on a specific surname, you could look at the Soundex microfilm and get a truncated view of the census records for everyone by that surname by that state.  Additionally, the cards provided locality data and page numbers so a researcher could reference the actual census record for more complete information.

The Workings of the Soundex

The Soundex system was developed in the second decade of the 20th century as a phonetic algorithm that combined sounds with an index tool. A Soundex number is constructed with the name’s first letter and a three digit numerical code based on consonants. Vowels and the consonants H, W, and Y are not coded. Double consonants or two consonants of the same Soundex code are counted as one letter. For example, “TT” would be coded as one letter and “CK,” in which both letters make up the same code would be counted once. If the word has more than three coded consonants, the remaining consonants are ignored. The numeric codes are based on the following:

Letters  Number  
  B, F, P, V       1
  C, G, J, K, Q, S, X, Z          2
  D, T       3
  L       4
  M, N       5
  R       6

For example, the following names are coded as such:

Name  Soundex  
   Archibald         A621
   Boston         B235
   Carothers         C632
   Dickenson         D252
   Schmidt         S253
   Milosovic       M421
   Wineberg       W512

When names do not have enough characters to fill three Soundex code numbers, the blank space is represented by a zero.

  Name    Soundex 
    Bates         B320
    Poe        P000
    Goins         G520

A Major Genealogical Help in the Analog Age

When I immersed myself in the world of genealogy in 1978, only the 1880 and 1900 census records were available. The Soundex for these two censuses allowed me to gather data quickly on my family. When the 1910 became available in 1982, I visited libraries and viewed these records. The same occurred with 1920, which came available in 1992. With this census, I ordered the microfilm records directly from the National Archives through inter-library loan.

In 2002 when the 1930 census was released to the public, Ancestry.com and Genealogy.com both offered this particular census online. Since I was subscribed to both, it was no longer necessary for me to sit at a microfilm reader to use a census Soundex. Genealogy.com actually beat Ancestry to the punch and had a limited version of the 1930 census online first.

If I remember correctly, their images were better than Ancestry's. Ancestry returned this favor by buying this competitor in 2003. The sites were run as separate entities for about two years until the records of both were consolidated.

Is the Soundex Viable in the Digital Age?

Theoretically, with complete indexing of census records, the Soundex is almost obsolete, but not entirely. The Soundex system is used on all of the records on Ancestry.com and may be helpful if you are searching for a name that is often spelled differently but is phonetically the same. While the coding is not necessary for us to know these days, it is helpful to understand how it works and its limitations. 

For example, my surname Owston and its variant spelling of Ouston is coded in the Soundex as O235 – but so is Oston, Oyston, Ogden, Oxton, Osteen, Ostani, etc. Some of these are close to my surname and others are not. Sometimes my name is misspelled as Auston and the code A235 will return Austin, Austen, Auston, Academy, Acton, and many, many others.

The upside of Soundex searches is that it returns names that are actually similar to the name in question. The downside of Soundex is that it returns names that are phonetically equivalent but dissimilar to the name in question – such as same code of O235 representing both Ogden and my surname of Owston.

Using the Soundex option in searches on Ancestry with other specific information may help you find a long lost relative whose name may have been misspelled in the census. Ancestry's Soundex option allows multiple types of records to be searched and not just census records. It also gives the researcher the opportunity to simultaneously search census records beyond the borders of one state - something that was impossible on microfilm as each state had to be searched individually. The key to using this feature successfully is to provide enough other information to narrow your search.

How to narrow that search and a few other secrets to sleuthing the census records is forthcoming in our next census installment.

No comments:

Post a Comment