Precision is a measure of consistency or agreement
between scores and concerns the degree to which errors
of measurement affect test scores. Measurement errors
do not usually refer to inconsistencies in the aptitudes
or behaviors being assessed; rather, these errors are
related to factors that prevent an individual from achieving
a score identical to their true latent ability or score.
There are many ways in which precision can be measured.
In traditional test theory, precision is measured by
a reliability coefficient, which is the ratio of true
score variance to observed score variance (Lord
& Novick, 1968):

Because true score variance can be computed as the
difference between observed score variance and error
variance, classical reliability can be represented as:

Item Response Theory (IRT) provides a means of estimating
reliability that operates on the item characteristics
and the individual pattern of responses given by examinees
to items within a test. The IRT analogue to classical
reliability is called marginal reliability, and operates
on the variance of the theta scores and the average
of the expected error variance (Sireci,
Thissen, & Wainer, 1991):

If it can be safely assumed that theta is distributed
N(0,1), then marginal reliability can be measured as:

When sample sizes are large, the average of the expected
error variance can be computed by averaging the variance
of the estimated posterior distributions across individuals.
In the reliabilities reported below, the posterior standard
deviation (PSD) for individual i was estimated
using the methodology given in Bock
and Mislevy (1982):



ASVAB Reliabilities
For each ASVAB subtest, Equation 6 was used to compute
EAP ability estimates for applicants that completed
the test during the 2005 fiscal year (FY2005; October
1, 2004 — September 30, 2005). Equation 5 was
then used to compute PSDs (using the EAP ability estimates,
and assuming a N(0,1) population distribution). The
average of the squared PSDs was then computed over applicants,
and substituted into Equation 4 to compute subtest reliability.
For AFQT scores, reliability was computed using the
methodology for computing composite reliabilities reported
in Gulliksen (1987;
pg. 346-347, Equation 74).
Reliability estimates were computed over all FY2005
applicants, and by gender (Male, Female), ethnic group
(Hispanic, Non-Hispanic), and race (American-Indian/Alaska
Native, Asian, Black/African-American, Native Hawaiian/other
Pacific Islander, White/Caucasian).*
The sample sizes used
to compute the reliability estimates across subtests
and AFQT scores are given in the table below.
click
to display table
The estimated reliabilities
for AFQT scores and the subtests that comprise AFQT
scores are reported in the table below. [Click here
to learn more about AFQT scores.] [Click here
to learn more about the content of the ASVAB subtests.]
click
to display table
The estimated reliabilities
for the remaining ASVAB subtests are given in the table
below. Note that AI and SI are administered as separate
subtests in CAT-ASVAB, but combined into one single
score (labeled AS). AI and SI are combined into one
single subtest (AS) in P&P-ASVAB. Scores on the
combined subtest (AS) are reported for both CAT-ASVAB
and P&P-ASVAB.
click
to display table
ASVAB Standard Errors of Measurement
The standard error of measurement (SEM) provides an
alternate way of summarizing the amount of error or
inconsistency in test scores. It is computed as:

where
is the observed score standard deviation for test x.
If the measurement error is normally distributed, then
the true scores for approximately 68% of the applicants
would fall in the interval created by adding and subtracting
one SEM from their reported score.
The SEM of each ASVAB subtest and AFQT score was computed
over all FY2005 applicants, and by gender (Male, Female),
ethnic group (Hispanic, Non-Hispanic), and race (American-Indian/Alaska
Native, Asian, Black/African-American, Native Hawaiian/other
Pacific Islander, White/Caucasian).* The sample sizes
are shown above.
The SEMs for AFQT
scores and the subtests that comprise AFQT scores are
reported in the table below.
click
to display table
The SEMs for the remaining
ASVAB subtests are given in the table below. Note that
the SEM computations for AI and SI are based on the
observed standard deviation of the AS score, since separate
scores are not reported for AI and SI.
click
to display table
|