ASVAB scoring is based on an Item Response Theory
(IRT) model. IRT is a theory that enables test questions
and examinee abilities to be placed on the same scale,
thereby allowing tests to be tailored to the specific
ability level of each examinee and scores to be expressed
on the same scale regardless of the combination of items
that are taken.
The IRT model underlying ASVAB scoring is the three-parameter
logistic (3PL) model. The 3PL model represents the probability
that an examinee at a given level of ability will respond
correctly to an individual item with given characteristics.
Specifically, the item characteristics represented in
the 3PL model are difficulty, discrimination (i.e.,
how well the item discriminates among examinees of differing
levels of ability), and guessing (i.e., the likelihood
that a very low ability examinee would respond correctly
simply by guessing).
For both the paper-and-pencil (P&P) and computerized
adaptive testing (CAT) versions of the ASVAB, the 3PL
model is used to compute final ability estimates for
examinees. For the CAT-ASVAB the 3PL model is also used
to select items. When a CAT-ASVAB session is started,
every examinee is assigned an initial ability estimate
of
= 0.0, which is the mean of the expected distribution
of examinee abilities. After each new item is administered,
the scored response is used to update the ability estimate.
A sequential Bayesian procedure is used for this purpose.
When the test is completed (or the time limit exceeded),
a final ability estimate is computed as the mode of
the posterior distribution (Bayesian modal estimate).
Incomplete tests are handled differently for the P&P
and CAT ASVAB versions. For the P&P-ASVAB, any unanswered
items are treated as incorrect. For CAT-ASVAB examinees
who do not complete the test before the time limit is
exceeded, a penalty function is applied to their final
ability estimate. The penalty function has the following
properties:
- The size of the penalty is related to the number
of unfinished items.
- Examinees who answer the same number of items and
have the same ability estimate receive the same penalty.
- The penalty eliminates the possibility of using
“coachable” test-taking strategies to
artificially increase test scores.
The final ability estimate computed using the penalty
procedure is equivalent to the score that would be obtained
if the examinee guessed at random on the unfinished
items.
After the final ability estimate is computed, it is
converted to a standard score on the ASVAB score scale
that has been statistically linked to the ability estimate
through a process called equating. Equating studies
are conducted for every CAT-ASVAB item pool (and for
every paper and pencil ASVAB form) to ensure that scores
have the same meaning regardless of which item pool
or test form the examinee receives.
Standard Scores are scores that have a fixed mean and
standard deviation in the population of examinees. A
Standard Score indicates how many units of the standard
deviation a particular score is above or below the mean.
In the case of the ASVAB subtests, the mean is set to
50 and the standard deviation is set to 10. Thus, a
Standard Score of 40 indicates that the examinee scored
1 standard deviation below the mean. A Standard Score
of 70 indicates that the examinee scored 2 standard
deviations above the mean. To learn more about how standard
scores are derived and used, click here.
Examinees also receive a score on what is called the
Armed Forces Qualification Test (AFQT). AFQT scores
are computed using the Standard Scores from four ASVAB
subtests: Arithmetic Reasoning (AR), Mathematics Knowledge
(MK), Paragraph Comprehension (PC), and Word Knowledge
(WK). AFQT scores are reported as percentiles between
1-99. An AFQT percentile score indicates the percentage
of examinees in a reference group that scored at or
below that particular score. For current AFQT scores,
the reference group is a sample of 18 to 23 year old
youth who took the ASVAB as part of a national norming
study conducted in 1997. Thus, an AFQT score of 90 indicates
that the examinee scored as well as or better than 90%
of the nationally-representative sample of 18 to 23
year old youth. An AFQT score of 50 indicates that the
examinee scored as well as or better than 50% of the
nationally-representative sample.
ASVAB scores are used primarily to determine enlistment
eligibility, assign applicants to military
jobs, and aid students in career
exploration.
To learn more about norming for the ASVAB, click here.
|