Limits of High Stakes Testing
Susan Notes: Walt Haney provides a clear, readable summary of the harm done by high stakes testing--testing that does not meet professional standards and is unregulated.
Thank you for the opportunity to speak with you.
WH: Professor of Education in Lynch School of Education at Boston College
Senior researcher in Center for the Study of Testing, Evaluation and Educational Policy for last two decades.
Director of the Educational Pipeline project, funded by Ford Foundation.
By way of introduction, I would like to summarize the general points I would like to make:
1.1 High stakes use of test results is contrary to professional standards.
1.2 The standardized testing industry is totally unregulated and its products are of increasingly dubious quality.
1.3 Low-tech tests are shortchanging high-tech students.
1.4 High stakes misuse of test results is distorting education in the U.S.
II. High stakes use of test results is contrary to professional standards
To be clear I should start by explaining that by high stakes testing I refer to the use of test results mechanically and in isolation to make important decisions about students and/or schools. One example is to use standardized test results in isolation to determine whether or not students may graduate from high school. Yet to base high school graduation decisions on standardized test results in isolation,
irrespective of other evidence about student performance in high school, is contrary to recognized professional standards regarding appropriate use of test results. (See for example the statement of American Educational Research Association,
A simple way of illustrating why making important decisions on students based on test scores alone is to note how college admissions test results are used. There is not a single college anywhere in the nation that accepts all applicants who score above a particular point on the SAT (say a combined score of 1000) and rejects all applicants who score below that point. Instead colleges make admissions decisions flexibly using test scores, grades and other information rather than making
decisions mechanically based on test scores alone. Decades of research on college admissions testing show that it is far more sound (more valid and with smaller adverse impact on minorities and females) to use test
scores in this way (in what might be called a sliding scale approach in which students with high grades may be considered with lower test
scores, but students with low grades need higher test scores to be considered for admission).
III. Standardized testing industry is totally unregulated and its products are of increasingly dubious quality
A second general point is that the standardized testing industry is totally unregulated and its products are of increasingly dubious quality. The standardized testing industry is a very odd product of entrepreneurial academics and the publishing industry (Haney, Madaus & Lyons, 1993). But despite the increasing public policy prominence of standardized test results, there is virtually no governmental or other
oversight of the standardized testing industry. As my colleague George Madaus observed some years ago, our nation provides more quality assurance for pet food than for standardized tests which now can drastically affect the lives of children. Given this situation it is not surprising that there have been numerous accounts recently of widespread errors in scaling, scoring and reporting
in the testing industry (Henriques & Steinberg, NYT, May 20, 2001; Steinberg & Henriques, May 21, 2001, Rhoades & Madaus, 2003). One of
the most egregious examples in recent years occurred in Minnesota where a testing company used the wrong scoring key and hundreds of students were wrongly prevented from graduating from high school. Even more recently in New York, blunders in test content and scoring became so obvious that state officials had to throw out graduation test results
altogether (see http://www.timeoutfromtesting.org/). Such experiences should make clear how unwise it is to make important decisions mechanically based on test scores in isolation.
IV. Low-tech tests are shortchanging high-tech students
A third general point is that low-tech tests (that is, paper-and-pencil tests in which students have to write long hand) seriously underestimate the skills of students used to writing with computers (Russell & Haney,
2000; Russell & Plati, 2001). This point is important because the "high stakes" tests being implemented in almost all the states are of the
old-fashioned variety in which students have to pencil in "bubble sheets" (more formally known as mark readable answer sheets) and/or write long-hand using pencil and paper. (As far as I know, by the way, only a few states, such Kansas, Utah and Virginia, have been working
seriously to bring their state testing programs into the 21st century by using telecommunications via the World Wide Web). The almost universal reliance on low-tech testing technology causes a number of problems
beyond underestimating the skills of some students. Here, let me mention just two. First, it means that student work, as evident from either bubble sheet records or handwritten answers, must be physically sent away to be scored. In consequence, results are delivered back to students only very slowly and in very crude form, e.g. as pass/fail, proficient/non-proficient ratings or some sort of numerical scores. Yet learning theory and practical experience both tell us that slow and crude feedback is not conducive to promoting learning. This clearly shows that the term "educational testing" as sometimes applied to the sorts of high stakes testing going on nowadays is mainly a malapropism. There is hardly anything educational about it at all.
A second problem is that high stakes testing is distorting - some would say corrupting - education in the United States. Regarding the low-tech nature of high stakes testing, let me start with a simple example. Massachusetts started implementing its own high stakes test in the late 1990s. This test, called Massachusetts Comprehensive Assessment System
or MCAS (another malapropism by the way), is used to rate both students and schools (Haney 2002a, 2002b). Schools have been under such pressure to show score improvements on the low-tech MCAS that some have had students stop using computers in school and go back to long-hand composition. More broadly, a recent national survey of teachers found that some 20-33% of teachers nationwide reported that the handwritten format of their state-mandated test limited their use of computers in teaching writing (Pedulla, et al., 2003, p. 81).
V. High stakes misuse of test results is distorting education in the U.S. The previous example, of high stakes testing leading some schools to turn back the clock in terms of pedagogy, is a relatively benign case.
Most students are fairly resilient - and anyway, research suggests that young people skilled in using computers tend to gain these skills outside rather than inside schools.
A much more deplorable example of how high stakes testing is distorting education in the U.S. is that it is causing some students to be driven out of school altogether. Indirect evidence of this comes from the fact that high school graduation rates have been declining in most states over the last decade or so. Table 1, for example, shows graduation rates for the 50 states from 1988-89 to 2000-01. Without taking time to discuss the many different ways of calculating graduation rates, I note that the rates shown here are calculated simply as the number of graduates in a particular year divided by the number of students enrolled in grade 8 four years earlier.1
If you examine these data closely you will see that more than thirty states showed declines in high school graduation rates over this interval. By 2000-01, thirty-five states had graduation rates of less than 80%.
[Table 1 omitted from text version of this paper]
Reasonable people may well question whether falling graduation rates over the last decade are directly attributable to high stakes testing. So let me cite direct evidence from three different states (Texas, Alabama and New York) that high stakes testing coupled with
ill-conceived school "accountability" schemes have led some school officials to push students out of school. In Texas, some school officials have been actively pushing students out of school, but using contrivances so that such students are not officially counted as dropouts:
Interviews with school administrators who are also students in graduate courses in school leadership in Texas suggested the severity of the dropout and undercounting problems there. They spoke of the tremendous pressure they were under to disguise dropouts in novel and clever ways. They talked of fudging records so that dropouts would appear to be students who transferred to other schools. They said they were told not to count students who were transferred to "alternative education," even
if they knew these children would never show up. (Bainbridge, April 16, 2003).
Also from Texas, an even more recent news account tells the story of Crystal Gonzalez, a ninth grader who was encouraged to drop out by
school officials: "They told me it would be better for me to get my GED. They probably had other students that needed to be in school more"
(Associated Press, September 10, 2003).
From Alabama comes the troubling story of Steve Orel and the "Birmingham 500." Orel was a teacher in an adult education program in Birmingham, Alabama, in the spring of 2000 when he discovered that many "low-achieving" students (that is students who had scored low on the Stanford Achievement Test, 9th Edition or SAT-9) had been administratively dismissed from the Birmingham Public Schools, ostensibly because of "lack of interest." Upon investigation, Orel learned that a total of 522 students had been pushed out of school in
this manner in an effort to make school test results look better. Six Birmingham high schools had been placed on "academic alert status," and were thereby threatened with takeover by the state if SAT-9 score averages did not improve (Orel, 2003). Because of his efforts to bring the "push-out" policy to public attention and to end it, Orel was fired
from his public school teaching job, but with support from a local charity went on to help organize an adult education program for the
students pushed out of Birmingham public high schools.
And from New York, a recent report from Advocates for Children (Gotbaum, 2002) documents the fact that tens of thousands of students are being "discharged" from New York City high schools, but in ways so that they
are not counted as dropouts. According to the report, "school officials are encouraging students to leave regular high school programs even though they are of school age or have a right to receive appropriate literacy, support, and educational services through the public school system" (Gotbaum, 2002, p. 2). In 2001, according to Gotbaum, thirty-one New York City high schools "discharged" more students than
they graduated. The number of students discharged was more than triple the number officially counted as dropouts. The real number of dropouts may be masked, according to the report, by counting discharged students
as transferring to GED preparation programs. In such cases students were not counted as dropouts.
Also, it seems no accident that among the states with the lowest graduation rates at the end of the century - Florida, Georgia,
Louisiana, Indiana, South Carolina, Mississippi, North Carolina, Tennessee, Alabama, Arizona, Texas, and New York, all with graduation rates of 75% or less - all implemented high school graduation tests.
And to be clear, these states had the lowest graduation rates not just in terms of results shown in Table 1 above, but also in terms of
alternative graduation rate measures reported by Warren (2003) and Greene & Forster (2003). Florida, perhaps, presents the most cautionary
example. Florida is a state with one of the oldest high school graduation testing programs in the country, going back to the 1970s2.
Yet according to results in Table 1, as of 2000-01, Florida had one of the very worst graduation rates among the states (ranking 47th out of 50). And according to other analysts, using alternative measures of graduation rate, it had the 47th worst graduation rate in 2000 (Warren, 2003, Table 8) and the dead last graduation rate among the states in 2001 - an appalling 56% (Greene & Forster, 2003, p. 18).
In these brief comments, I have had to speak generally, so in conclusion, I add four points of elaboration. One is that though here I
have had to comment briefly and generally, I have previously studied testing programs in a number of states in some detail. Two, within the
next few weeks I and colleagues at Boston College will be issuing a report on the education pipeline in the U.S. and in the fifty states over the last three decades, showing among other things that not everything is bad news in this history and that there are other
problems in the education pipeline of the U.S besides falling graduation rates. Three is that the problems I describe are not the result of standardized testing per se. Standardized tests have long been proven to be useful, for example, in college admissions and in program
evaluation. Rather it seems to me that the problems I describe result from the ill-conceived misuse of test results in isolation, for example in trying to use the same test results to hold both schools and students
accountable. In this regard, it must be remembered that when the same fallible technology is used to try to hold both institutions and students accountable, the institutions are always in a better position
to protect their interests than are the young people.
Finally, it seems to me that regardless of what has been causing a decline in the national rate of high school graduation - to less than
75% in 2000-01 - be it high stakes testing or myriad other factors, this situation should be a cause for alarm. Recall that the Goals 2000 law passed in 1994 called for a high school graduation rate by the year 2000 of 90%. Clearly we have failed miserably to meet that goal. And even if we had it would still mean that our young people are being decimated
before they finish high school. A current graduation rate of less than 80% means that we are more than doubly decimating our young.
Thank you again for the opportunity of speaking with you.
Bainbridge, W. L. (2003) Texas model for school achievement doesn't hold up. Columbus Dispatch. April 16, 2003.
Gotbaum, B. (2002, November 21). Pushing out at-risk students: An analysis of high school discharge figures. (Report by the Public
Advocate for the City of New York and Advocates for Children). New York: Advocates for Children. (http://www.advocatesforchildren.org/ Accessed
April 5, 2003).
Greene, J. P. & Forster, G. (2003, September 17). Public high school graduation and college readiness rates in the United States. (Working
paper No. 3). New York, New York: The Manhattan Institute.
(http://www.manhattaninstitute.org/ewp_03_embargoed.pdf, accessed 9/15/2003)
Haney, W. (2001). Revisiting the Myth of the Texas Miracle in Education: Lessons about Dropout Research and Dropout Prevention. paper prepared for the "Dropout Research: Accurate Counts and Positive Interventions"
Conference Sponsored by Achieve and the Harvard Civil Rights Project, January 13, 2001, Cambridge MA. (Available at: http://www.civilrightsproject.harvard.edu/research/dropouts/haney.pdf
Haney, W. (2002a). Lakewoebeguaranteed: Misuse of Test Scores in Massachusetts, Part 1. Education Policy Analysis Archives, 2002, May 3.
(http://epaa.asu.edu/epaa/v10n24/, accessed 2002, September 9).
Haney, W. (2002b). Ensuring failure: How a state's achievement test may be designed to do just that. Education Week, 21: 42 (July 19, 2002), pp. 56, 58.
Haney, W., Madaus, & Lyons, R. (1993). The Fractured Marketplace for Standardized Testing. Boston: Kluwer Academic Publishers.
Henriques, D. & Steinberg, J. (May 20, 2001). Right answer, wrong score: Test flaws take toll, New York Times, p. 1. Available at http://www.nytimes.com/2001/05/20/business/20EXAM.html.
Orel, Steve (2003). Left behind in Birmingham: 522 pushed out students.
In Lent, R. C. and Pipkin, G. Silent no more: Voices of courage in American schools. (Portsmouth, NH: Heinemann) pp. 1-14.
Pedulla, J. et al. (2003). Perceived effects of state-mandated testing programs on teaching and learning: Findings from a national survey.
Chestnut Hill, MA: National Board on Educational Testing and Public Policy. (Available at http://www.bc.edu/nbetpp).
Rhoades, K., & Madaus, G. (2003, May). Errors in standardized tests: A systemic problem. (Report of the National Board on Educational Testing and Public Policy.) Chestnut Hill, MA: Boston College Center for the Study of Testing,
Russell, M. & Haney, W. (2000). Bridging the gap between technology and testing. Education Policy Analysis Archives Volume 8 Number 41, March 28, 2000. Available on the WWW at: http://epaa.asu.edu/epaa/v8n41/
Russell, M. & Plati, T. (2001). Effects of Computer Versus Paper Administration of a State-Mandated Writing Assessment. Teachers College
Record On-line. Available at http://www.tcrecord.org/Content.asp?ContentID=10709
Steinberg, J. & Henriques, D. (May 21, 2001). When a test fails the schools, careers and reputations suffer, New York Times, p. 1.
Available at: http://www.nytimes.com/2001/05/21/business/21EXAM.html.
Warren, J. R. (2003, August). State-level high school graduation rates in the 1990s: Concepts, measures and trends. Paper prepared for
presentation at the annual meetings of the American Sociological Association, Atlanta, August 2003.
1 The sources of data from which these rates were calculated are the Digest of Education Statistics (DES), a report issued by the National Center for Education Statistics since 1962 and the Common Core of Data or CCD, a federal repository of education statistics available on-line
2 The New York Regents exams are much older than the high school graduation tests that started being required of all students in the
1970s. But until the 1990s, the NY Regents graduation tests were optional. Since the Regents graduation tests started being required of all students, the NY graduation rate has fallen to be worse than several southern states with historically poor graduation rates.
Dr. Walter M. Haney
2 Pencil Required: The Effectiveness of High Stakes Testing
The National Academies
INDEX OF RESEARCH THAT COUNTS