Orwell Award Announcement SusanOhanian.Org Home


NCLB Outrages

Rationing Education In an Era of Accountability

Ohanian Comment: Here, writing with the moral imperative that must be applied to the research that documents the lives of students, Jennifer Booher-Jennings details what data-driven really means. You can read Below the Bubble: "Educational Triage" and the Texas Accountability System (PDF file) in American Educational Research Journal Summer 2005, Vol. 42, NVo2., pp. 231-268

The push for accountability was originally cast as a way to ensure that schools would leave no child behind. Ironically, as Ms. Booher-Jennings points out, the NCLB system of requiring schools to demonstrate adequate yearly progress through test scores has created incentives to neglect the very students who need the most help.


by Jennifer Booher-Jennings

MEET Mrs. Dewey, 46 years old and a veteran fourth-grade teacher at Marshall Elementary School. Mrs. Dewey entered the teaching profession in the wake of A Nation at Risk and has weathered the storm ever since. For the last 20 years, she has survived the continuous succession of faddish programs that has characterized American education reform. Year after year, administrators have asked Marshall teachers to alter their practice to conform to the latest theory. Mrs. Dewey's colleagues, frustrated by the implementation of such silver-bullet approaches, have often flouted the administrative directives and chosen instead to serve as the sole arbiters of their classroom practice.

But it is the newest of the new solutions that worries Mrs. Dewey most. The language of accountability is swift and uncompromising: hold educators responsible for results. Identify those teachers who, as President Bush says, "won't teach." Fair enough, Mrs. Dewey thinks. The consummate professional, Mrs. Dewey always looks for the silver lining.

Like other reforms, accountability requires teachers to embrace a new strategy. Data-driven decision making, a consultant told the faculty at a professional development session, is the philosophy Marshall teachers must adopt. The theory is simple. Give students regular benchmark assessments; use the data to identify individual students' weaknesses; provide targeted instruction and support that addresses those areas. Mrs. Dewey remembers nodding approvingly. After all, this approach -- gathering textured information on each student to guide instructional activities -- was one she had been using for 22 years.

The consultant moved on. "Using the data, you can identify and focus on the kids who are close to passing. The bubble kids. And focus on the kids that count -- the ones that show up at Marshall after October won't count toward the school's test scores this year. Because you don't have enough special education students to disaggregate scores for that group, don't worry about them either." To make this concept tangible for teachers, the consultant passed out markers in three colors: green, yellow, and red. Mrs. Dewey heard someone mutter, "What is this? The traffic light theory of education?"

"Take out your classes' latest benchmark scores," the consultant told them, "and divide your students into three groups. Color the 'safe cases,' or kids who will definitely pass, green. Now, here's the most important part: identify the kids who are 'suitable cases for treatment.' Those are the ones who can pass with a little extra help. Color them yellow. Then, color the kids who have no chance of passing this year and the kids that don't count -- the 'hopeless cases' -- red. You should focus your attention on the yellow kids, the bubble kids. They'll give you the biggest return on your investment."

As the bell tolls a final warning to the boisterous 9-year-olds bringing up the rear of her class line, Mrs. Dewey stares blankly into the hallway. Never did she believe that the advice offered by that consultant would become Marshall's educational mantra. Focus on the bubble kids. Tutor only these students. Pay more attention to them in class. Why? It's data-driven. Yet this is what her colleagues have been doing, and Marshall's scores are up. The community is proud, and the principal has been anointed one of the most promising educational leaders in the state. At every faculty meeting, the principal presents a "league table," ranking teachers by the percentage of their students passing the latest benchmark test. And the teachers talk, as they always do. The table makes perfect fodder for faculty room gossip: "Did you see who was at the bottom of the table this month?"

Mrs. Dewey has made compromises, both large and small, throughout her career. Every educator who's in it for the long haul must. But this institutionalized policy of educational triage weighs heavily and hurts more.1 Should she focus only on Brittney, Julian, Shennell, Tiffany, George, and Marlena -- the so-called bubble kids -- to the exclusion of the other 17 students in her class? Should Mrs. Dewey refuse to tutor Anthony, a persistent and eager little boy with no chance of passing the state test this year, so that she can spend time with students who have a better shot at passing? What should she tell Celine, a precocious student, whose mother wants Mrs. Dewey to review her entry for an essay contest? Celine will certainly pass the state test, so can Mrs. Dewey afford the time? What about the five students who moved into the school in the middle of the year? Since they don't count toward Marshall's scores, should Mrs. Dewey worry about their performance at all?

In her angrier moments, Mrs. Dewey pledges to ignore Marshall's approach and to teach as she always has, the best way she knows how. Yet, if she does, Mrs. Dewey risks being denounced as a traitor to the school's effort to increase scores -- in short, a bad teacher. Given 22 years of sacrifices for her profession, it is this reality that stings the most.

Mulling over her choices, Mrs. Dewey shuts her classroom door and begins her class.2

Unintended Consequences of Accountability Systems: Educational Triage

Test-based accountability systems aim to direct the behavior of educators toward the improvement of student achievement. The No Child Left Behind (NCLB) Act codified accountability as our national educational blueprint, requiring schools to increase test scores incrementally so that all students are proficient in reading and math by 2014. Yet, despite the stated intent of NCLB to improve outcomes for all students, particularly those who have been historically neglected, educators and others may adopt a series of "gaming" practices in order to artificially inflate schools' passing rates. Such practices include giving students a special education classification to exclude them from high-stakes tests,3 retaining students in grade to delay test-taking,4 diverting attention away from subjects not evaluated on high-stakes tests,5 teaching to the test,6 and cheating.7

In what follows, I discuss two of the dilemmas presented by a less-well-known gaming practice: educational triage. The insights offered here derive from an ethnographic study of an urban elementary school in Texas, to which I have assigned the pseudonym "Beck Elementary." Educational triage has become an increasingly widespread response to accountability systems and has been documented in Texas, California, Chicago, Philadelphia, New York, and even England.8 By educational triage, I mean the process through which teachers divide students into safe cases, cases suitable for treatment, and hopeless cases and ration resources to focus on those students most likely to improve a school's test scores. The idea of triage, a practice usually restricted to the direst of circumstances, like the battlefield or the emergency room, poignantly captures the dynamics of many schools' responses to NCLB. In the name of improving schools' scores, some students must inevitably be sacrificed. And the stakes are high -- for schools, which face serious sanctions for failing to meet adequate yearly progress targets; for students, who increasingly face retention if they do not pass state tests; and for teachers, who are judged by the number of students they ''save."

Dilemma 1. Data can be used to improve student achievement, but they can also be used to target some students at the expense of others. Data-driven decision making has become something of a sacrosanct term in education policy circles. Who could be against it? The public face of data-driven decision making -- identifying the needs of each individual child and introducing interventions to remediate any learning difficulties -- is sensible and beyond question.

But the Achilles' heel of education policy has always been implementation. When I listened closely to the conversations that educators at Beck Elementary School had about "being data-driven," the slippage between evaluating the individual needs of every student and deciding which students to target to maximize school performance quickly became evident. As I moved closer and closer to the classroom, the administrators' ideal version dissipated and gave way to a triage-based understanding of data-driven decision making. Teachers were most attuned to the chasm between administrators' theoretical proclamations and how the same administrators expected them to operate: teachers understood that the bottom line in this numbers game was the percentage of students who passed. Because of the unrelenting pressure to increase test scores, one mode of using data became dominant at Beck: the diversion of resources (e.g., additional time in class; enrichment sessions with the literacy teacher; and after-school, Saturday, and summer tutoring) to students on the threshold of passing the test, the "bubble kids."

All my questions about which students received extra help were met with the deferent maxim, "It's data-driven." When I asked one teacher how the school allocated additional services to students -- for example, the reading specialist or after-school and Saturday tutoring -- she provided the following response:
It's all data-driven. . . . We do projections -- how many of them do you think will pass, how many of them do you think will need more instruction, how many teachers do we have to work with, what time limit do we have. Based on that, who are we going to work with? It comes down to that. . . . We really worked with the bubble kids . . . that's the most realistic and time-efficient thing we can think of.

In this conception of data-driven practice, the choice to privilege one group of students over another is viewed as neutral and objective. The decision to distribute resources to those most advantageous to the school's pass rates is not understood as a moral or ethical decision. Instead, it is seen as a sterile management imperative. Protected by its scientific underpinnings, the data-driven focus on the bubble kids is difficult for teachers to attack. In sum, at Beck Elementary, the invocation of the phrase "data-driven" obscures, neutralizes, and legitimates a system of resource distribution that is designed to increase passing rates rather than to meet the needs of individual students.

The blunt vocabulary of triage infiltrated every corner of Beck. The tenor of the phrases used to describe students -- "the ones who could make it" and "hopeless cases" -- speaks not only to the perceived urgency to improve test scores but also to the destructive labeling of those children who find themselves below the bubble. Driven by the pressure to increase the passing rate, teachers turned their attention away from these students. As one teacher related in an interview:


I guess there's supposed to be remediation for anything below 55%, but you have to figure out who to focus on in class, and I definitely focus more attention on the bubble kids. If you look at her score [pointing to a student's score on her class test-score summary sheet], she's got a 25%. What's the point in trying to get her to grade level? It would take two years to get her to pass to the test, so there's really no hope for her. . . . I feel like we might as well focus on the ones that there's hope for.

To say that hope is absent for a 10-year-old child is a particularly telling comment on how dramatically the accountability system has altered the realm of imagined possibility in the classroom. Now, with an unforgiving bottom line for which to strive, teachers can retain hope only for those perceived as potential passers. To assert that students below the bubble are just too low-performing to help establishes that the only worthwhile improvement in this brave new world is one that converts a nonpasser to a passer.

The problem is that those students who arrive at school as the most disadvantaged are often the lowest scoring. And since the focus on the bubble kids at Beck Elementary begins not in the third grade -- the first year that students take state tests -- but the moment students enter kindergarten, they are branded as "hopeless cases" from the very first days of their schooling.

An important shift occurs in a system focused on the percentage of students above a particular threshold. When a low-performing student enters a teacher's classroom, he or she is seen as a liability rather than as an opportunity to promote individual student growth. As Michael Apple trenchantly wrote, the emphasis changes "from student needs to student performance, and from what the school does for the student to what the student does for the school."9

Certainly one can imagine uses of data that could turn attention to the individual needs of each and every student. However, the current monolithic discourse on data-driven decision making begs for a discussion of unintended consequences. Data can be used to target some students at the expense of others, and it is happening today.

When we blindly defer to "the data," we abdicate responsibility for tough decisions, all the while claiming neutrality. But data are not actors and cannot do anything by themselves. Data do not make decisions; people make decisions that can be informed by data. Decisions about resource allocation are ethical decisions with which educators and communities must grapple and for which they must ultimately take responsibility.

What we need above all is a sustained discussion among educators and the broader polity about the very real tradeoffs involved in schools' responses to accountability systems. If schools adopt the practices of educational triage in response to NCLB, the consequence may be suboptimal outcomes for students "below the bubble," as well as for their peers who are mid-level and high-achieving students. And all of these unintended consequences can happen while official pass rates increase.

Dilemma 2. It is unfair to hold schools accountable for new students or for subgroups that are too small to yield statistically reliable estimates of a school's effectiveness; however, the consequence of excluding some students may be to deny them access to scarce educational resources. Educational triage does not end with the diversion of resources to the "bubble kids." Because of the fine print in NCLB, all students are not equally valuable to a school's test scores. Subgroups are not disaggregated if the number of test-takers does not meet a minimum size requirement, and students are not counted at all in a school's scores if they are not enrolled in a school for a full academic year. For example, in Texas, the scores of students who arrive at the school after the end of October do not count toward schools' scores. Such a definition is logical, for it attempts to isolate the impact of schools on students. Including students who have not attended the school for a reasonable period of time might bias estimates of the school's quality and unfairly penalize schools serving more mobile students.

However, if resources flow only toward those students who affect a school's outcomes, students who do not "count" may be denied access to scarce educational resources. I found that another pithy term, "the accountables" -- those students who count toward a school's scores -- was incorporated into the lexicon of Beck educators. Teachers engaged in a second kind of educational triage by focusing resources on the "accountables," to the virtual exclusion of students who "did not count." In accountability's ultimate contradiction, the protean word "accountable" retained only a semblance of its intended meaning -- taking responsibility for each and every student.

How many students are affected by the mobility provisions of NCLB? Take the Houston Independent School District as an illustrative example. Serving 211,157 students, this district is the largest in Texas and the seventh largest in the nation. The average Houston school excludes 8% of its students from its "accountables."10 Almost one-third of Houston schools (31%) exclude more than 10% of their students from scores used for accountability. By any measure, this is not an insignificant number of students. Moreover, because mobility is not uniformly distributed across the population, some demographic groups have much higher numbers of mobile -- and thus unaccountable -- students. In Houston, an average of 16% of special education students and 11% of African American students are not counted in schools' scores because they have not been enrolled in a school for a full academic year. Ironically, the very students NCLB was designed to target are often those least likely to be counted.

A second way that students may "not count" stems from states' definitions of the subgroup size required for disaggregation. If states define subgroup size expediently, the scores of various subgroups will continue to be buried in schoolwide averages. Again, Texas is a good example of artful definition of subgroup size. Under the Texas state accountability system, subgroups must include at least 30 students and account for at least 10% of all students -- or include 50 or more students -- to be evaluated. Under Texas' NCLB implementation plan, subgroups must include at least 50 students and make up at least 10% of all students -- or include 200 or more students -- to be evaluated. Under the state system, 82% of Houston schools with African American test-takers disaggregate scores for African American students, while for the purposes of NCLB, only 66% do.

Though Texas does not include a special education subgroup in its state system, the impact of using the 50 and 10% or greater than 200 definition rather than the lower threshold is significant. Shifting the definition upward reduces the percentage of Houston schools that disaggregate scores for special education from 55% to 24%. Other states have similarly gamed the subgroup-size provision of the law. In 2005, the U.S. Department of Education allowed Florida to change its minimum subgroup size to 30 students who also make up 15% of test-takers. Because special education students rarely account for more than 15% of a school's population, very few schools in Florida will be required to disaggregate scores for these students.

There is an irreconcilable tension between accurately measuring school effects and forestalling the potential negative consequences of excluding some students from accountability calculations. If accuracy of measurement is privileged, some students will necessarily be excluded from accountability calculations. In order to best estimate school effects, a school should not be responsible for students who attend it for a short period of time. Similarly, small subgroups may yield statistically unreliable estimates of the school's efficacy with a particular group of students. Moreover, mainstream state tests may be inappropriate measures for some English-language learners or special education students. In other words, there are valid reasons, from a measurement perspective, for excluding students from schools' scores. On the other hand, the consequence of excluding these students may be to deny them access to scarce educational resources.

Better Choices?

So Mrs. Dewey can choose to teach all of her students, regardless of their potential contribution to her school's bottom line, or she can participate in educational triage. If she refuses to focus her time and attention on those students most likely to raise the school's scores, she risks not only the school's survival but her professional reputation as a good teacher and, potentially, her job.

Mrs. Dewey should not be asked to make such choices, and it is unconscionable to question her ethics when she does what she has little choice but to do. Systems of public policy cannot be designed solely for those with the moral certitude to qualify them for sainthood.

Educators will respond to systemic incentives, and NCLB's current incentives structurally induce behaviors that are inimical to broader notions of equity and fairness. In many cases, these perverse incentives turn educators' attention away from NCLB's intended beneficiaries. Until these issues are addressed, we can expect to see educational triage practices flourish across the country.

1. My use of the phrase "educational triage," as well as the title of this article, draws on the work of David Gillborn and Deborah Youdell, Rationing Education: Policy, Practice, Reform, and Equity (Buckingham, U.K.: Open University Press, 2000).

2. Like Ted Sizer's Horace Smith, Mrs. Dewey is not one informant whom I encountered during an ethnographic study of an urban elementary school in Texas. Instead, she is a representative amalgam of the school's teachers. My study included 71 interviews -- 34 with teachers and administrators and 37 with students -- in addition to 180 hours of participant-observation. Some of the findings discussed here were initially reported in Jennifer Booher-Jennings, "Below the Bubble: 'Educational Triage' and the Texas Accountability System," American Educational Research Journal, vol. 42, 2005, pp. 231-68.

3. Julie B. Cullen and Randall Rebeck, "Tinkering Towards Accolades: School Gaming Under a Performance Accountability System," Working Paper, University of California, San Diego, 2006; David N. Figlio and Lawrence S. Getzler, "Accountability, Ability, and Disability: Gaming the System," Working Paper 9307, National Bureau of Economic Research, 2002, www.nber.org/papers/w9307; and Brian A. Jacob, "Accountability, Incentives, and Behavior: The Impact of High-Stakes Testing in the Chicago Public Schools," Working Paper 8968, National Bureau of Economic Research, 2002, www.nber.org/papers/w8968.

4. Walt Haney, "The Myth of the Texas Miracle in Education," Education Policy Analysis Archives, 2000, epaa.asu.edu/epaa/v8n41; Linda M. McNeil, "Faking Equity: High-Stakes Testing and the Education of Latino Youth," in Angela Valenzuela, ed., Leaving Children Behind: How "Texas-Style" Accountability Fails Latino Youth (Albany, N.Y.: SUNY Press, 2005), pp. 57-112.

5. Linda M. McNeil and Angela Valenzuela, "The Harmful Impact of TAAS Testing in Texas: Beneath the Accountability Rhetoric," in Gary Orfield and Mindy L. Kornhaber, Raising Standards or Raising Barriers? Inequality and High-Stakes Testing in Public Education (New York: Century Foundation, 2001), pp. 127-50.

6. Linda M. McNeil, Contradictions of School Reform: The Educational Costs of Standardized Testing (London: Routledge, 2000).

7. Brian A. Jacob and Steven Levitt, "Rotten Apples: An Investigation of the Prevalence and Predictors of Teacher Cheating," Quarterly Journal of Economics, vol. 118, 2003, pp. 843-77.

8. Booher-Jennings, op. cit.; Gillborn and Youdell, op. cit.; "Making AYP: Cause to Celebrate?," Philadelphia Public School Notebook, Winter 2004, www.thenotebook.org/editions/2004/winter/editorial.htm; Joel Rubin, "Are Schools Cheating Poor Learners?," Los Angeles Times, 28 November 2004, p. B-1; Daniel White, Dara Wexler, and Juliette Heinz, "How Practitioners Interpret and Link Data to Instruction: Research Findings on New York City Schools' Implementation of the Grow Network," paper presented at the annual meeting of the American Educational Research Association, San Diego, 2004; and Katie Weitz White and James Rosenbaum, "Inside the Black Box: Sociological Mechanisms Affecting Professional Deviance, Student Classification, and School Culture," in Allan R. Sadovnik et al., eds., No Child Left Behind and the Reduction of the Achievement Gap: Sociological Perspectives on Federal Education Policy (New York: Routledge, forthcoming).

9. Michael W. Apple, Educating the "Right" Way: Markets, Standards, God, and Inequality (London: Routledge, 2001), p. 71.

10. Jennifer Booher-Jennings and Andrew A. Beveridge, "Who Counts for Accountability? High-Stakes Test Exemption in a Large Urban School District," in Sadovnik et al., op. cit. All analyses of Houston data mentioned in this article derive from this paper.

JENNIFER BOOHER-JENNINGS is a doctoral candidate in the Department of Sociology at Columbia University, New York, N.Y. She would like to thank Andy Beveridge, Jason Booher-Jennings, Herb Gans, Toni Molnar, and Uri Shwed for their helpful comments and suggestions.

— Jennifer Booher-Jennings
Phi Delta Kappan
2006-06-01


INDEX OF NCLB OUTRAGES


FAIR USE NOTICE
This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.