Appendix C. GES Technical Notes

Standard Errors

The national estimates produced from GES data may differ from the true values, because they are based on a probability sample of crashes and not a census of all crashes. The size of these differences may vary depending on which sample of crashes was selected. [For a complete description of the GES sampling design, see National Accident Sampling System General Estimates System Technical Note (DOT HS 807 796) available from NCSA.] The standard error of an estimate is a measure of the precision or reliability with which an estimate from this particular GES sample approximates the results of a census.

In a report of this size, it is impractical to provide standard errors for each estimate. Instead, generalized standard errors for estimates of totals are provided in the following table. Generalized errors were calculated separately for the crash, vehicle, and people characteristics. The values for the GES estimates and an estimate of one standard error are given in the following table. By adding and subtracting two standard errors, a 95 percent confidence interval can be created for the GES estimates in this report. For example, the estimated number of injury crashes that occurred in the month of February is given in Table 23 as 144,000. To calculate one standard error for this crash estimate, use the table on the following page. Since 144,000 does not appear in the Crash Estimate column, use linear interpolation from the standard error values for 100,000 (8,000) and 200,000 (14,500). One standard error would be approximately 10,900. The 95 percent confidence interval for this estimate would be 144,000 ± 2 × 10,900 or 122,200 to 165,800.

2003 GES Estimates and Standard Errors

Crash
Estimate
(
x)

Crash
Standard Error (SE)*

Vehicle
Estimate
(
x)

Vehicle
Standard Error (SE)**

Person
Estimate
(
x)

Person
Standard Error (SE)***

1,000

400

1,000

400

1,000

400

5,000

900

5,000

900

5,000

900

6,000

1,000

10,000

1,500

10,000

1,400

7,000

1,100

20,000

2,300

20,000

2,200

8,000

1,200

30,000

3,100

30,000

2,900

9,000

1,300

40,000

3,900

40,000

3,500

10,000

1,400

50,000

4,600

50,000

4,200

20,000

2,300

60,000

5,300

60,000

4,800

30,000

3,100

70,000

6,000

70,000

5,400

40,000

3,900

80,000

6,600

80,000

5,900

50,000

4,600

90,000

7,300

90,000

6,500

60,000

5,300

100,000

8,000

100,000

7,100

70,000

6,000

200,000

14,300

200,000

12,300

80,000

6,700

300,000

20,400

300,000

17,400

90,000

7,400

400,000

26,500

400,000

22,300

100,000

8,000

500,000

32,600

500,000

27,200

200,000

14,500

600,000

38,600

600,000

32,000

300,000

20,900

700,000

44,700

700,000

36,800

400,000

27,200

800,000

50,900

800,000

41,600

500,000

33,500

900,000

57,000

900,000

46,500

600,000

39,900

1,000,000

63,200

1,000,000

51,300

700,000

46,300

2,000,000

126,900

2,000,000

99,900

800,000

52,700

3,000,000

194,000

3,000,000

149,900

900,000

59,200

4,000,000

263,900

4,000,000

201,200

1,000,000

65,700

5,000,000

336,400

5,000,000

253,800

2,000,000

133,500

6,000,000

411,300

6,000,000

307,600

3,000,000

205,200

7,000,000

488,400

7,000,000

362,600

4,000,000

280,500

8,000,000

567,500

8,000,000

418,600

5,000,000

359,000

9,000,000

648,600

9,000,000

475,700

6,000,000

440,200

10,000,000

731,500

10,000,000

533,700

6,500,000

481,900

11,000,000

816,100

11,000,000

592,600

7,000,000

524,100

12,000,000

902,400

12,000,000

652,400

*SE   =   e a + b (ln x) 2, where
a
= 4.208860
b = 0.036070

**SE   =   e a + b (ln x) 2, where
a
= 4.272400
b = 0.035530

***SE   =   e a + b (ln x) 2, where
a
= 4.357200
b = 0.033990

 

Unknowns

GES data are obtained either directly from an item on the PAR or by interpreting the information provided in the report through reviewing the crash diagram, the Officer’s written summary of the crash, or combinations of variables on the PAR. Because of this interpretation, and because the police officer may not have entered some item of information or provide complete information, data can be missing. Two different statistical procedures are used on GES data to complete values for unknown data. These procedures, univariate and hotdeck imputation, are described in a technical report available from NCSA, Imputation in the General Estimates System (DOT HS 807 985). The table below gives the reader the proportion of unknown values prior to imputation for variables with imputed values that were used in this report.

Percent of Unknowns for 2003 GES Data Elements

Crash Level

Alcohol Involved in Crash

7.7%

Manner of Collision

0.2%

Atmospheric Condition

1.9%

Minute of Crash

0.6%

Crash Severity

3.5%

Relation to Junction

0.2%

Day of Week

0.0%

Relation to Roadway

0.3%

First Harmful Event

0.1%

Roadway Surface Condition

1.8%

Hour of Crash

0.6%

Speed Limit

15.9%

Light Condition

1.1%

Traffic Control Device

4.3%

Vehicle/Driver Level

Driver Drinking in Vehicle

10.7%

Most Harmful Event

0.1%

Initial Point of Impact

1.8%

Vehicle Type

1.6%

Person Level

Age

8.3%

Seating Position

0.9%

Injury Severity

4.4%

Sex

5.7%

Police-Reported Alcohol Involvement

4.4%