Test-Based Accountability and International Comparisons: Lessons Ignored

Susan Notes:

Grade 3 Teacher Responds (to someone who accused Thomas of cherry-picking for ideological purposes):

My 25+ years as an educator of children confirm EXACTLY what Paul Thomas reports

I suspect the real "truth" is that you do not teach children, let alone those whose lives are less than perfect. When you have spent years with 8 year olds from broken homes, whose parents are in prison and those at home struggle with substance abuse problems, who have been abused and neglected, who do not have even routine medical and dental care, who eat their only calorie-dense meal (note I do not refer to it as nutritious) at school, who are responsible for the care of younger siblings... I could go on. When you have poured blood, sweat, and tears into teaching your own groups of children and have watched them blossom into little guys who love to read and can't wait to write, but who cannot make 2.5 years worth of progress in one year and are deemed "failures" and reduced to a 3-digit number by their performance on a test...then you are free to cry "ideology" to people who attempt to point out the folly in such practices.

However, I suspect you prefer to troll. Get out there and teach some real kids whose lives are a mess. I challenge you.

by Paul Thomas

The historical and current focus on test-based accountability to drive education evaluation and reform is often situated within another historical and current approach to judging U.S. public educationâ€"international comparisons. Just as we tend to misuse test data, specifically the SAT, to rank and label the quality of schools and state education systems, we do the same with international comparisons.

"A century ago, the United States was among the most eager benchmarkers in the world," opens Tucker (2011), leading to the focus of his report, Standing on the Shoulders of Giants: An American Agenda for Education Reform [pdf file]: "In this paper, we stand on the shoulders of giants, asking what education policy might look like in the United States if it was based on the experience of our most successful competitors" (p. 1). Tucker notes that the report’s goal also stands on the claims by Secretary Duncan that U.S. education is lagging behind other countries.

Tucker (2011) also establishes early the evidence supporting the U.S.'s slipping status among the world in terms of education when placed against GDP per capita. According to the four charts offered by Tucker (pp. 3-4), the U.S. sits far ahead of other countries in GDP (including a massive quadrupling of Shanghai and over $11,000 ahead of Finland), but sits well behind other nations in reading, math, and science scores on Program for International Student Assessment (PISA), specifically Shanghai, Finland, Singapore, Canada, and Japan. The charts and the message are compelling because political leaders and the public believe strongly several assumptions about education: (1) the primary purpose of education is to prepare students for the workforce, either directly after high school or through college-readiness that also leads to the workforce, (2) there is a powerful and direct relation ship between the quality of any country’s education system (as measured by tests) and that country's economic well being, and (3) U.S. education has always been weak, especially when compared internationally (even though the same people who claim this also simultaneously claim that the U.S. once was at the top).

To understand fully, then, the purposes for education--what education can and cannot accomplish as a foundational institution of a free society--we must unpack the assumptions and claims we commonly make about education and compare those to the evidence. Since political and public discourse about education seamlessly and haphazardly intertwine claims about education, economics, international comparisons, and testing, these claims are complex and, thus, difficult to separate and address accurately, but let's look here at many of the claims placed against the evidence:

â€Â˘ Is there a powerful and direct relationship between test-based assessments of educational quality and strength of economies internationally? Bracey (2004, 2008) offers a careful refuting of this robust but flawed claim. In short, Bracey (2008) explains:

"First, comparing nations on average scores is a pretty silly idea. It’s like ranking runners based on average shoe size or evaluating the high school football team on the basis of how fast the average senior can run the 40-yard dash. Not much link to reality. . . . Second, test scores, at least average test scores, don’t seem to be related to anything important to a national economy. Japan's kids have always done well, but the economy sank into the Pacific in 1990 and has never recovered. The two Swiss-based organizations that rank nations on global competitiveness, the Institute for Management Development and the World Economic Forum, both rank the U. S. #1 and have for a number of years."

Thus, the faith the U.S. has in education as a central institution for driving an internationally competitive economy is "[n]ot. . .link[ed] to reality." Likely, political and corporate leaders need the public to believe their claim in order to keep public schools focused on producing a compliant workforce, instead of allowing public education to fulfill its role in supporting human agency and democratic ideals.

â€Â˘ Even if the relationship between education quality and economic strength is not supported by the evidence, isn’t the U.S. education system, as measured by PISA, lagging behind other nations, notably nations with much lower GDP? The basic charts offered by Tucker (2011) appear damning, but as Bracey (2008) warns, simple ranking of average test scores from single data points cannot offer a fair or accurate picture of much of anything of value concerning the quality of education in an entire countryâ€"particularly if we decontextualize that data from one important factor, poverty. Riddile (2010) presents a more nuanced analysis of PISA that compares apples to apples internationally by considering childhood poverty rates along with PISA data. The result shows that the U.S. sits at the top of ranking when poverty is considered:

Country/Poverty Rate/PISA Score

United States/ <10%/ 551

Finland/ 3.4%/ 536

Netherlands/ 9.0%/ 508

Belgium/ 6.7%/ 506


United States/10%-24.9%/ 527Canada/ 13.6%/ 524

New Zealand/ 16.3%/ 521

Japan/ 14.3%/ 520

Australia/ 11.6%/ 515

Poland/ 14.5%/ 500

Germany/10.9% / 497

Riddile also notes more problems with simplistic international comparisons by addressing Shanghai (Zhao, 2010):

"Shanghai, China topped the list with 556 but is not included in this analysis because Shanghai is a city not a country and because only 35% of Chinese students ever enter high school and because 'when you spend all your time preparing for tests, and when students are selected based on their test-taking abilities, you get outstanding test scores.'"

The two most repeated and compelling claims about education in the U.S., then, are factually inaccurate; thus, we have to be skeptical at best about Tucker (2011) pursuing an extending discussion of the U.S. adopting practices from other countries in order to reform educationâ€"even if we maintain the tenuous and distorting assumption that the primary purpose of education is to prepare students as future workers. But, Tucker's central goal for his report, to suggest how the U.S. can and should model education reform on successful international comparisons, does provide further evidence that we have misguided assumptions about the purpose of education, and thus are prone to continue pursuing flawed policies for reform.

The first focus offered by Tucker (2011) is addressing quality, primarily teacher quality. While this has been the central argument for the "no excuses" segment of the new reformers led by Secretary Duncan and Gates, the claim has some serious problems. First, discussions of teacher quality suffer the similar fate that international comparisons experience, oversimplification. Teacher quality is difficult to measure, just as student learning is, but assuming that the best students make the best teachers is at least debatable (Sears, Marshall, & Otis-Wilborn, 1994). Further, the argument that teacher quality must be increased to reform education rests on another flawed assumptionâ€"the impact of teacher quality on student outcomes. Sawchuk (2011) details the current understanding of teacher quality and student outcomes, concluding:

"Research has shown that the variation in student achievement is predominantly a product of individual and family background characteristics. Of the school factors that have been isolated for study, teachers are probably the most important determinants of how students will perform on standardized tests [original in italics]."

The in-school influence of teachers on student outcomes is considerably smallâ€"about 14% ( Hirsch, 2007) or 13%-17% (Hanushek, 2010). Thus, even if we accept elite students make elite teachers, and thus we need to recruit high-achieving students into the teaching profession, we are tinkering with a very small measurable influence on the exact data we are using to evaluate schools.What Tucker (2011) approaches but never fully addresses within the larger and misleading call to focus on teacher quality is how many countries he labels superior to the U.S. do treat, educate, and pay teachers. For one example, let’s consider Finlandâ€"where teachers are required to complete a publicly-funded masters degree, where teachers are universally unionized, where teachers are not held accountable for standards or test scores ( Horn, 2010, November 22). While Tucker's echoing of Duncan and Gates the need for greater teacher quality is refuted by evidence, it seems likely that the U.S. could benefit from reconsidering how we treat, educate, and pay teachers, but the details of that consideration works against many of the commitments of corporate reformers. [emphasis added]

Equity, the second focus presented by Tucker (2011), appears justified by, ironically, the flaws inherent in his initial premise based on raw comparisons of PISA data. While the argument for equity focuses on in-school equity without considering the need to address social inequityâ€"the primary reform needed in the U.S.â€"Tucker raises important questions about stratified course offerings (tracking), teacher assignments (students coming from poverty tend to have least experienced and un- or under-qualified teachers), and the potential for schools to address high-poverty students more effectively than we currently do in the U.S.

While the equity section from Tucker (2011) has the most potential for being valuable in the education reform debate, we should note, again, that out-of-school equity is essentially ignored and that this section is dwarfed by the other two sections, the teacher quality section being seven times in length and the final section being just a bit more than double the equity section. Yes, equity matters, but Tucker's discussion helps highlight that most new reformers persist in undervaluing the impact of social inequity as well as the greatest flaw in public educationâ€"that school perpetuates the inequity children experience in their lives outside formal education.

The final section, productivity, reveals the corporate commitments driving the report. Tucker (2011) endorses the business model for reforming and running schools, accountability, and merit-based incentives for teachers and schools to perform. Again, even if we remain within the assumptions about international comparisons, test scores, and the relationship between education and economies, Tucker's third focus is not based on evidence. Hout and Elliott (2011), using test scores and international comparisons, conclude that the current and prolonged accountability era driven by standards, testing, and high accountability "have not increased student achievement enough to bring the United States close to the levels of the highest achieving countries" (p. S-3). What we should insert here is that nearly three decades of focusing on high accountability in order to raise U.S. test scores to compete internationally has produced two conclusions the new reformers will not acknowledge: (1) Test scores remain most strongly connected to out-of-school factors, and (2) accountability paradigms do not work.

Accountability has never worked, but neither has merit-based incentive programs. Kohn (2003) has discredited merit pay at both the corporate and education levels, concluding about merit pay for teachers:

"So how should we reward teachers? We shouldn't. They're not pets. Rather, teachers should be paid well, freed from misguided mandates, treated with respect, and provided with the support they need to help their students become increasingly proficient and enthusiastic learners."

Amen. [Ohanian comment]

As well, the merit pay argument fails for the same reason international comparisons based on test scores failâ€"test scores and linking those scores to teacher quality are unstable, misleading, and corrosive to the teaching/learning dynamic ( Baker, et al. 2010).

The historical and current education reform movement, then, is doomed to fail (yet again) as long as leaders persist in placing tests at the center of determining education quality, international comparisons, and recruiting, preparing, and paying teachers. The reform movement is also futile as long as schools remain primarily to train a compliant workforce, as long as schools are managed like businesses, and as long as accountability drives that reform regardless of decades of evidence that standards, testing, and accountability do not work.

Finally, and most importantly, education reform will always fail students and our society if we fail to learn the lessons taught by international comparisons and testingâ€"the weight and impact of a child's life is the central issue of equity our culture must address as we also commit fully to equitable schools. To maintain tunnel vision on schools and simplistic use of data serve only the privileged at the expense of everyone else.


— Paul Thomas
