QUALITY OF DATA Although vital statistics data are useful for a variety of administrative and scientific purposes, they cannot be correctly interpreted unless various qualifying factors and methods of classification are taken into account. The factors to be considered depend on the specific purposes for which the data are to be used. It is not feasible to discuss all the pertinent factors in the use of vital statistics tabulations, but some of the more important ones should be mentioned. Most of the factors limiting the use of data arise from imperfections in the original records or from the impracticability of tabulating these data in very detailed categories. These limitations should not be ignored, but their existence does not lessen the value of the data for most general purposes. Completeness of registration An estimated 99 percent of all births occurring in the United States in 1994 were registered; for white births registration was 99.4 percent complete and for all other births, 98.6 percent complete. These estimates are based on the results of the 1964-68 test of birth-registration completeness according to place of delivery (in or out of hospital) and race and on the 1989 proportions of births in these categories. The primary purpose of the test was to obtain current measures of registration completeness for births in and out of hospital by race on a national basis. Data for States were not available as they had been from the previous birth-registration tests in 1940 and 1950. A detailed discussion of the method and results of the 1964-68 birth-registration test is available (15). The l964-68 test has provided an opportunity to revise the estimates of birth-registration completeness for the years since the previous test in 1950 to reflect the improvement in registration. This has been done using registration completeness figures from the two tests by place of delivery and race. Estimates of registration completeness for four groups (based on place of delivery and race) for 1951-65 were computed by interpolation between the test results. (It was assumed that the data from the more recent test are for 1966, the midpoint of the test period.) The results of the 1964-68 test are assumed to prevail for 1966 and later years. These estimates were used with the proportions of births registered in these categories to obtain revised numbers of births adjusted for underregistration for each year. The overall percent of birth-registration completeness by race was then computed. The figures for 1951-68 shown in table 1-3 differ slightly from those shown in annual reports for years prior to 1969. Data adjusted for underregistration for 1951-59 shown in tables 1-1, 1-4, 1-5, 1-9, 1-10, and 1-11 have been revised to be consistent with the 1964-68 test results and differ slightly from data shown in annual reports for years before 1969. For these years the published number of births and birth rates for both racial groups have been revised slightly downward because the 1964-68 test indicated that previous adjustments to registered births were slightly inflated. Because registration completeness figures by quanat94.doc - Page 1 age of mother and by live-birth order are not available from the 1964-68 test, it must be assumed that the relationships among these variables have not changed since 1950. Discontinuation of adjustment for underregistration, 1960-- Adjustment for underregistration of births was discontinued in 1960 when birth registration for the United States was estimated to be 99.1 percent complete. This removed a bias introduced into age-specific rates when adjusted births classified by age were used. Age-specific rates are calculated by dividing the number of births to an age group of mothers by the population of women in that age group. Tests have shown that population figures are likely to be understated through census undercounts; these errors compensate for underregistration of births. Adjustment for underregistration of births, therefore, removes the compensating effect of underenumeration, biasing the age-specific rates more than when uncorrected birth and population data are used. (For further details see page 4-11 in the Technical Appendix of volume I, Vital Statistics of the United States, 1963.) The age-specific rates used in the cohort fertility tables (tables 1-15 through 1-22) are an exception to the above statement. These rates are computed from births corrected for underregistration and population estimates adjusted for underenumeration and misstatement of age. Adjusted birth and population estimates are used for the cohort rates because they are an integral part of a series of rates, estimated with a consistent methodology. It was considered desirable to maintain consistency with respect to the cohort rates, even though it means that they will not be precisely comparable with other rates shown for 5-year age groups. Completeness of reporting Interpretation of these data must include evaluation of item completeness. The percent "not stated" is one measure of the quality of the data. Completeness of reporting varies among items and States. See table A for the percent of birth records on which specified items were not stated. Quality control procedures States in the Vital Statistics Cooperative Program are required to have an error rate of less than 2.0 percent for each item for 3 consecutive data months during the initial qualifying period. Once a State is qualified, NCHS monitors the quality of data received. This was achieved through independent verification of a sample of records for some States as well as comparing the State data with data from previous years. In addition, there is verification at the State level before NCHS is sent the data. After the coding is completed, counts of the taped records are balanced against control totals for each shipment of records from a registration area. Impossible codes are eliminated during the editing processes on the computer and corrected on the basis of reference to the source record or adjusted by arbitrary code assignment. All subsequent operations involved in tabulation and table preparation are verified during computer processing or by statistical clerks. Small frequencies The numbers of births reported for an area represent complete counts. quanat94.doc - Page 2 As such, they are not subject to sampling error, although they are subject to errors in the registration process. However, when the figures are used for analytical purposes, such as the comparison of rates over a period of time or for different areas, the number of events that actually occurred may be considered as one of a large series of possible results that could have arisen under the same circumstances. The probable range of values may be estimated from the actual figures according to certain statistical assumptions. In general, distributions of vital events may be assumed to follow the binomial distribution. Estimates of standard errors and tests of significance under this assumption are described in most standard statistics texts. When the number of events is large, the relative standard error, expressed as a percent of the number or rate, is usually small. When the number of events is small (fewer than 100) and the probability of such an event is small, considerable caution must be observed in interpreting the conditions described by the figures. Events of rare nature may be assumed to follow a Poisson probability distribution. For this distribution, a simple approximation may be used to estimate the error as follows: If N is the number of births and R is the corresponding rate, the chances are 19 in 20 that 1. The "true" number of events lies between N - 2=FBN and N + 2=FBN 2. The "true" rate lies between R - 2(R/=FBN) and R + 2(R/=FBN) If the rate R1 corresponding to N1 events is compared with the rate R2 corresponding to N2 events, the difference between the two rates may be regarded as statistically significant if it exceeds 2 x [=FB of (R1 squared/N1 + R2 squared/N2)] For example, suppose that the observed birth rate for area A was 15.0 per 1,000 population and that this rate was based on 50 recorded births. Given prevailing conditions, the chances are 19 in 20 that the "true" or underlying birth rate for that area lies between 10.8 and 19.2 per 1,000 population. Let it be further supposed that the birth rate for area A of 15.0 per 1,000 population is being compared with a rate of 20.0 per 1,000 population for area B, which is based on 40 recorded births. Although the difference between the rates for the two areas is 5.0, this difference is less than twice the standard error of the difference 2 x [=FB of (15.0 squared/50 + 20.0 squared/40)] of the two rates that is computed to be 7.6. From this, it is concluded that the difference between the rates for the two areas is not statistically significant. quanat94.doc - Page 3