Standards and Criteria Redux
I am posting only the beginning of this paper because I can't reproduce the figures. It is definitely worth your while to go to the url below and read the entire paper. This observation comes near the end of the paper:
I have read the writings of those who claim the ability to make the determination of mastery or competence in statistical or psychological ways. They can't. At least, they cannot determine "criterion levels" or standards other than arbitrarily. The consequences of the arbitrary decisions are so varied that it is necessary either to reduce the arbitrariness, and hence the unpredictability of the consequences of applying the standards, or to abandon the search for criterion levels altogether in favor of ways of using test data that are less arbitrary and, hence, safer.
This monograph has grown out of a series of discussions and a six-month period of reading and reflecting on the literature which were initiated by Fritz Mosher's suggestions to the National Assessment of Educational Progress (NAEP) to examine the "standards" question. Conversations with Mosher himself and the staff of NAEP have been most influential. The Analysis Advisory Committee of NAEP, under Fred Mosteller's chairmanship, proved a rigorous testing ground for many of the ideas.
In the following pages, I shall (a) examine the ordinary usage of the words "standards" and "criteria" in the measurement literature; (b) trace the evolution of the notion of performance standards in the criterion-referenced testing movement; (c) analyze and critique six methods of setting performance standards on criterion-referenced tests; and (d) reflect briefly on the political forces which have become focused on the standards issue.
"Standards" In Common Parlance
Setting standards or mastery levels is frequently written about as though it is a well-established and routine phase of instructional development. In conversations with measurement specialists and instructional development experts over the past few years, I have been literally dumbfounded by the nonchalance with which they handle the standards problem. One will report that he always sets a standard of two-thirds of the items correct for mastery because he's a sort of "liberal guy." Another expert will report that he holds learners to 70% mastery, and a third advances his 90% standard with an air of tough-mindedness and respect for excellence. None of them bothers with such apparently extraneous considerations as how the test items are to be composed and whether they will be abstruse or obvious. In one of the sacred writings of the instructional objectives movement, Robert F. Mager (1962) identified standard setting as an integral part of stating an objective properly:
Mager went on to illustrate what he meant by a behavioral objective and its associate standard:
This language of performance standards is pseudoquantification, a meaningless application of numbers to a question not prepared for quantitative analysis. A teacher, or psychologist, or linguist simply cannot set meaningful standards of performance for activities as imprecisely defined as "spelling correctly words called out during an examination period. And, little headway is made toward a solution to the problem by specifying greater detail about how the questions, tasks, or exercises will be constructed.
Can a more meaningful performance standard be stated for an objective as molecular as "the pupil will be able to discriminate the grapheme combination 'vowel + r' spelled 'ir' from other graphemes"? Can it be asserted confidently about this narrow objective that a pupil should be able to make 9 out of 10 correct discriminations? In point of fact, this objective appears on the Stanford Reading Test where it is assessed by two different items:
a) Mark the word "firm" (Read by proctor)
b) Mark the word "girl" (Read by proctor)
The percentages of second-grade pupils in the norm population answering items a) and b) correctly were 56% and 88%, respectively. Any performance standards (e.g., "8 out of 10 correct") for a group of items like item a would be quite inappropriate for a group of items like item b, since they are so different in difficulty. Results from a grade seven assessment by the Department of Education in New Jersey illustrate the same point. Pupils averaged 86% on vertical addition, but only 46% on horizontal addition. The vagaries of teaching and measurement are so poorly understood that the a priori statement of performance standards is foolhardy.
Benjamin S. Bloom (1968), whose name has become closely associated with the notion of "mastery learning," has written of instructional psychology in ways that depend fundamentally on notions of performance standards:
Popham (1973), writing on instructional objectives for teachers in training, reaffirmed the centrality of performance standards:
The notion of performance standards is repeatedly illustrated in Popham's teachers' manual:
Wiersma and Jurs (1976), in outlining the instructional evaluation component of Individually Guided Education (the University of Wisconsin R & D Center instructional plan), gave the following description of criterion-referenced testing
In detailing the role of testing in assessment programs, Ralph W. Tyler (1973) illustrated a performance standard for determining mastery:
The staff of the National Assessment of Educational Progress have grappled with the performance standards problem for years to almost no one's satisfaction. Though they have never adopted an official position on the matter, they did cooperate with the National Council for the Social Studies in an effort to apply performance standards to the assessment results in citizenship and social studies (Fair, 1975). A fully representative panel of nine judges (3 minorities, 5 women, 3 under the age of 30) was formed. Each judge was shown an assessment item and then asked, "Realistically what level of performance nationally for the age level being considered would satisfy you for this exercise? (1) less than 20% correct, (2) 20-40%, (3) 41-60%, (4) 61-80%, or (5) more than 80%?" The panel rendered over 5,000 judgments in a three-day sitting, and it has been reported that "...panel members agreed more often than not, but at times spread their responses across all the available categories" (Fair, 1975, p. 45). About half of the exercises were given a "satisfactory performance level" of "more than 80%." About 35% of the exercises would satisfy the panel if between 60% and 80% of the examinees answered correctly. The desired performance levels were generally above the actual rates of correct response. What is to be made of the gap? Ought it to be read as evidence of the deficiency of the educational system; or is it testament to the panel's aspirations, American hustle and the indomitable human spirit ("Man's reach Should exceed his grasp, etc.")?
The reader can justifiably ask, "What manner of discourse is being engaged in by these experts?" How is one to regard such statements as "the student must be able to correctly solve at least seven simple linear equations in thirty minutes" or "90 percent of all students can master what we have to teach them." If such statements are to be challenged, should they be challenged as claims emanating from psychology, statistics, or philosophy? Do they maintain something about learning or something about measurement? Are they disconfirmable empirical claims or are they merely educational rhetoric spoken more for effect than for substance? . . .
Please go to the hot link below to read the rest of the paper.
Gene V. Glass
FAIR USE NOTICE
This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.