Appendix C. GES Technical Notes

Standard Errors

The national estimates produced from GES data may differ from the true values, because they are based on a probability sample of crashes and not a census of all crashes. The size of these differences may vary depending on which sample of crashes was selected. [For a complete description of the GES sampling design, see National Accident Sampling System General Estimates System Technical Note (DOT HS 807 796) available from NCSA.] The standard error of an estimate is a measure of the precision or reliability with which an estimate from this particular GES sample approximates the results of a census.

In a report of this size, it is impractical to provide standard errors for each estimate. Instead, generalized standard errors for estimates of totals are provided in the following table. Generalized errors were calculated separately for the crash, vehicle, and people characteristics. The values for the GES estimates and an estimate of one standard error are given in Table C1 on the following page. By adding and subtracting two standard errors, a 95 percent confidence interval can be created for the GES estimates in this report. For example, the estimated number of injury crashes that occurred in the month of February is given in Table 23 as 144,000. To calculate one standard error for this crash estimate, use Table C1. Since 144,000 does not appear in the Crash Estimate column of Table C1, use linear interpolation from the standard error values for 100,000 (8,000) and 200,000 (14,600). One standard error would be approximately 10,900. The 95 percent confidence interval for this estimate would be 144,000 ± 2 × 10,900 or 122,200 to 165,800.

Table C1. 2004 GES Estimates and Standard Errors

Crash
Estimate
(
x)

Crash
Standard Error (SE)*

Vehicle
Estimate
(
x)

Vehicle
Standard Error (SE)**

Person
Estimate
(
x)

Person
Standard Error (SE)***

1,000

400

1,000

400

1,000

400

5,000

900

5,000

900

5,000

900

6,000

1,000

10,000

1,400

10,000

1,400

7,000

1,100

20,000

2,300

20,000

2,100

8,000

1,200

30,000

3,100

30,000

2,800

9,000

1,300

40,000

3,800

40,000

3,500

10,000

1,400

50,000

4,500

50,000

4,100

20,000

2,300

60,000

5,200

60,000

4,700

30,000

3,100

70,000

5,900

70,000

5,300

40,000

3,800

80,000

6,600

80,000

5,800

50,000

4,600

90,000

7,200

90,000

6,400

60,000

5,300

100,000

7,900

100,000

6,900

70,000

6,000

200,000

14,200

200,000

12,200

80,000

6,700

300,000

20,300

300,000

17,200

90,000

7,300

400,000

26,300

400,000

22,200

100,000

8,000

500,000

32,400

500,000

27,100

200,000

14,600

600,000

38,500

600,000

31,900

300,000

21,000

700,000

44,600

700,000

36,800

400,000

27,400

800,000

50,700

800,000

41,600

500,000

33,800

900,000

56,900

900,000

46,500

600,000

40,300

1,000,000

63,100

1,000,000

51,400

700,000

46,900

2,000,000

127,200

2,000,000

100,700

800,000

53,400

3,000,000

194,700

3,000,000

151,700

900,000

60,100

4,000,000

265,200

4,000,000

204,200

1,000,000

66,700

5,000,000

338,500

5,000,000

258,100

2,000,000

136,300

6,000,000

414,200

6,000,000

313,400

3,000,000

210,300

7,000,000

492,200

7,000,000

370,000

4,000,000

288,100

8,000,000

572,400

8,000,000

427,800

5,000,000

369,400

9,000,000

654,500

9,000,000

486,600

6,000,000

453,800

10,000,000

738,600

10,000,000

546,600

6,500,000

497,100

11,000,000

824,400

11,000,000

607,500

7,000,000

541,000

12,000,000

912,000

12,000,000

669,400

*SE   =   e a + b (ln x) 2, where
a
= 4.168580
b = 0.036360

**SE   =   e a + b (ln x) 2, where
a
= 4.240450
b = 0.035690

***SE   =   e a + b (ln x) 2, where
a
= 4.297920
b = 0.034310

 

Unknowns

GES data are obtained either directly from an item on the PAR or by interpreting the information provided in the report through reviewing the crash diagram, the Officer’s written summary of the crash, or combinations of variables on the PAR. Because of this interpretation, and because the police officer may not have entered some item of information or provide complete information, data can be missing. Two different statistical procedures are used on GES data to complete values for unknown data. These procedures, univariate and hotdeck imputation, are described in a technical report available from NCSA, Imputation in the General Estimates System (DOT HS 807 985). Table C2 below gives the reader the proportion of unknown values prior to imputation for variables with imputed values that were used in this report.

Table C2. Percent of Unknowns for 2004 GES Data Elements

Crash Level

Alcohol Involved in Crash

5.8%

Manner of Collision

0.2%

Atmospheric Condition

1.7%

Minute of Crash

0.5%

Crash Severity

3.1%

Relation to Junction

0.4%

Day of Week

0.0%

Relation to Roadway

0.2%

First Harmful Event

0.1%

Roadway Surface Condition

1.5%

Hour of Crash

0.5%

Speed Limit

15.7%

Light Condition

1.0%

Traffic Control Device

4.5%

Vehicle/Driver Level

Driver Drinking in Vehicle

8.7%

Rollover Type

0.5%

Initial Point of Impact

1.7%

Vehicle Type

1.6%

Most Harmful Event

0.1%

 

 

Person Level

Age

8.2%

Seating Position

1.0%

Injury Severity

4.3%

Sex

5.8%

Police-Reported Alcohol Involvement

4.2%