Orwell Award Announcement SusanOhanian.Org Home

New Rothstein Book on Accountability: Review and Discussion

Publication Date: 2008-10-29

Monty Neill reviews an important new book, and then he and the author discuss specific points in the review. We all benefit from such a discussion.

Richard Rothstein, Rebecca Jacobsen and Tamara Wilder's new book, Grading Education: Getting Accountability Right, is a very valuable addition to the nation's thinking on accountability, though I think it contains one major flaw I will discuss below. (EPI and Teachers College Press, $19.95; available at http://www.epi.org/ or http://www.store.tcpress.com/0807749397.shtml; the introduction can be read here. Below are first a synopsis and then some comments on the ideas in the book.

The book argues that NCLB (and similar state test-based accountability systems) is "an utter failure" that "gave accountability a bad name." The book examines historical and current goals the public holds for education (the latter via surveys, including one by Rothstein and colleagues), sifting these to construct a list of 8 main areas (see below) of youth development. Over the next two chapters it analyzes "goal distortion," primarily the undermining of the curriculum, caused by NCLB's "perverse accountability." In these chapters, I think the authors give too little attention to the damage caused to reading and math by standardized testing, which perhaps contributes to what I think is an over-reliance on testing in their recommendations (see below; this is the flaw I noted above). The next chapter discusses ways in which "accountability by the numbers" has had counter-productive consequences in a variety of other, non-educational, arenas.

The authors then describe components of what they think will be a reasonable accountability system for all government-funded youth-development areas (school, pre-school, etc. ΓΆ€" the main focus, however, is schools). They propose first an overhauled National Assessment of Educational Progress (NAEP) that looks more like its original incarnation in the 1960s, and then an inspectorate modeled on the British system that could evolve from current US regional accreditation systems. They conclude with an outline of an accountability system in which the federal government would ensure adequate school funding and conduct an expanded NAEP, while states would engage in a variety of accountability activities, including additional testing as well as the inspection process.

The book does not propose particular actions states should take in response to accountability data, but says that states would be expected to intervene when despite assistance localities that are not doing well do not improve. It says this expanded accountability would cost perhaps one percent of what the US now spends on K-12 education, and would be well worth it, as accountability can induce better programs and save money, but more importantly because the public deserves to know how well schools and other programs are carrying out their social mission. While it proposes accountability for "schools and other institutions of youth development," most of the discussion addresses schools. Framing the accountability question more broadly than just schools is important, but much more work will need to be done to construct an integrated accountability system ΓΆ€" a point the authors at least implicitly make themselves, saying they intend this book to spur thinking and discussion.

The book has several appendices, notably "teacher accounts of goal distortions," in which some 14 teachers talk about the mostly harmful consequences of test-based accountability; and another, 'schools as scapegoats,' which rebuts those who treat schools as causing economic problems or the main solution to those problems. Each of the 8 chapters is well-written, deeply researched, thoughtful and helpful. [Note, I am one of the people who read and commented on a pre-publication draft.]

Looking in more detail at the accountability proposals:

The book's 8 goals for public education are: basic academic skills, critical thinking, arts and literature, preparation for skilled work, social skills and work ethic, citizenship, physical health, and emotional health. The authors surveyed the general public, school board members, and state legislators for how they would weight these, then they provide their own proposed weighting formula. In general, these are certainly important areas for the nation's youth. In their proposal, 'basic academic skills' (including most academic subjects, not just math and reading) carries only modestly more weighting than do the other factors (about 1/5 in the surveys and RR's own proposals, versus 8-16% for the various other factors, some of which clearly overlap academics, such as critical thinking at 16% and citizenship at 13%). We can predict charges that RR would let schools off the hook if kids do not learn, so long as they do art, are healthy and happy, and have good social skills. But after the destructive reductionism of NCLB, the nation needs a debate on how important various aspects of learning are and how to ensure our children receive a balanced opportunity for human development. The concluding chapter does not offer specific recommendations on how the weighting would play out in actions taken in response to the data, though it provides some possible examples. Plausibly, publicizing the range of data would preclude the narrowly focused actions that produce goal distortion.

The book proposes that data be gathered on all 8 goal areas, at the state and national levels largely through an expanded NAEP that would remain a sampling assessment, assess more academic as well as other areas and include more extensive surveys, then produce state-level data once every three years. The chapters on NAEP and the overall accountability plan provide some thoughtful details on how this would work, including more use of performance assessments, testing by age not grade, and assessing out-of-school 17 year olds. In any case, NAEP and substantially increased educational funding would be the major components of the federal role. The feds, they say, should get out of school- and student-level data and testing, leaving that as a state responsibility. The authors also have a good argument on why national standards will not improve education. (They note that the positive role of the federal government in support of racial equality was largely confined to the years following the Brown decision and has largely been reversed, especially in the Bush II years. I would add that Reconstruction also was positive, but the general point that the federal government, including the courts, are not necessarily progressive, is correct.)

States would then use the federal data and add on what pieces they think useful. The authors envision additional testing and other "standardized assessment instruments" that would cover all 8 areas and include performance tasks ΓΆ€" but they don't provide much detail on these. They don't specify the frequency of such testing. It would be one thing if such assessing includes only a few subjects per grade, or only a few grades are assessed each year, or many components are assessed only once every 2-3 years. However, if the system heads toward collecting significant amounts of data annually in each area in most grades, not only would costs be very high, but the assessing could become even more burdensome for schools than the current system. While a reasonably lean system can be constructed, the quantity of assessing will be a critical issue if states move toward the sort of accountability system the authors outline.

The major flaw I find in the book is that it ignores the need, value and feasibility of using classroom- and school-based evidence in an accountability system, as the Forum on Educational Accountability proposes (for details, see report of the Expert Panel on Assessment, at www.fairtest.org or www.edaccountability.org). (To be more precise, in Grading Schools, the use of such evidence appears only in the inspectors' examination of such artifacts.) Using classroom-based evidence is technically feasible, and some other nations with strong education systems do rely primarily on local assessments or a mix of local and national assessments. (A forthcoming article in Phi Delta Kappan from Linda Darling-Hammond will provide substantial detail on this via discussions of Finland, Sweden, Queensland and Victoria, Australia, England and Hong Kong; one might also consider New Zealand, which as Rothstein points out has a NAEP-like exam and otherwise relies on local information ΓΆ€" see www.fairtest.org for more on NZ and Queensland.)

The major reason to use classroom-based evidence for making evaluations of academic attainments is that it is the only feasible way to use a significant number of extended performance tasks and projects, which are necessary for assessing many significant areas of the curriculum (as well as a valuable instructional mode). RR and colleagues think that standardized tests with some performance items in NAEP, plus the same for the state's assessments, plus inspectors who look at student work, will do the job. I don't think the job will be adequately done that way. I expect the relative emphasis on standardized tests (even better ones) will lead to narrowing the modes of instruction, the range and kinds of knowledge students have an opportunity to learn, and the ways in which students can demonstrate their learning. That is, it could still be too much of a one-size-fits-all approach.

Moreover, the authors do not consider costs for state assessments, but these costs are likely to be very large if exams include a large share of performance assessment components that every student would take and that would be scored centrally. Such costs can be kept manageable when performance assessments and portfolios are part of teachers' regular work and ongoing professional learning.

NAEP and/or state sampling exams can be key parts of an accountability system, though some states have many schools so small that sampling is infeasible at that level. These exams should have performance components. However, if each school has a portfolio, learning record or work-sampling system, then teams of outside teachers and others can re-score samples and provide valuable feedback in a process that would steadily improve the quality of the system. As Linda D-H explains, other nations do this. In the end, a portfolio-based system in which samples are re-scored would be cheaper than having a large set of performance items in an on-demand state exam and would provide numerous other benefits, including variety in methods for students to demonstrate their learning, and useful professional learning that would strengthen instructional skills.

In sum, I would re-work the authors' system to include (as they do) an expanded NAEP that looked much more like early NAEP than the current NAEP, would employ a modest amount of state testing, would rely primarily on classroom- and school-based evidence for evaluating schools and making decisions about students, and strengthen the accreditation process to provide periodic school inspections. Mass Coalition for Authentic Reform in Education (CARE; see http://www.fairtest.org) in its similar recommendations for a combination of school-based evidence, limited exams, and inspections, suggests inspections every 5 years, compared to the 3 the book proposes. Both agree on more frequent inspections for schools having difficulties.

Though not elaborated in this book, Rothstein has elsewhere argued forcefully for paying more attention to non-school factors. That is, data about health, housing, employment, etc., should be gathered and factored into any accountability and improvement system. I concur. Gathering data at schools on student health, for example, can be part of this. More generally, opportunity-to-learn factors, from in and out of school, should be gathered, publicized and used.

What Richard Rothstein and his colleagues seek is a rich, comprehensive accountability system that can be used to actually improve education and youth policy and practice. It must be complex enough to prevent the harmful effects caused not only in NCLB and current state testing programs, but also in other fields by narrowly-conceived accountability programs. It also must be simple and efficient enough so that the benefits outweigh the burdens. I think this book largely provides a possible and feasible approach, with the important exception of how he views gathering data beyond NAEP on student learning outcomes, and with some concern about the level of burden that could result.

One further issue: programs of extensive data gathering can be misused. There are legitimate concerns about how data about students should be gathered and used. In this calculation, sampling is very important: few students would have much of the range of data accumulated about them (each would, as now, have her/his individual school record, available for parental inspection). However, the development of a far more extensive data system based on sampling could be a prelude to a system of detailed individualized data. Recall that NAEP started with sampling at the national level, moved to the state level, and now also reports on large cities ΓΆ€" albeit all by sampling; yet there are serious proposals about for a national exam of all students built on NAEP. This area will require serious attention, not only for issues of privacy but also for potential systemic abuses.

Lastly, I think Rothstein and colleagues could have paid more attention to other proposals for overhauling NCLB that are far more than just tinkering (as they say other proposals generally are). No doubt that is partly because I chair FEA, which has produced what I think are some positive conceptions of the federal role that are fundamentally different from federal law. I also think that by approaching NCLB reauthorization with a rich, wide set of overlapping and complementary proposals, in which proponents acknowledge one another's positive components, we strengthen the likelihood of a more comprehensive and successful overhaul of federal law.

Richard Rothstein responded to Monty Neill's review. Here it is. Richard is in blue; Monty's original text is in black, and Monty's comments on Richard's comments are in red. Thanks to both for giving permission to share this.

Dear Monty,

Thanks very much for doing such a careful and thoughtful review of our book on your website, and on the various list serves to which you distributed it. I think your criticisms are mostly fair, although I think that in some respects you are criticizing what you infer, rather than what we say. In that respect, it is, of course, our fault for not making some things more explicit.
[I appreciate your quick and clear reply; I may have at points inferred more than I should have and appreciate your efforts at making things more explicit.]

In particular, you are concerned that we are opening the door to too much standardized testing, and specifically that we are adding more required subjects for annual testing. This was not our intention. However, we do say that how standardized testing should be used is a state, not federal matter, and states should be free to design their own accountability systems. I think the book's implication is that standardized testing should not be increased, and that testing more subjects should entail testing each less frequently, and we do recommend that inspections be conducted once every three years. It was our intention that standardized testing in most subjects should also be once every three years, and this would be consistent with other recommendations we made, but I agree that this perhaps should have been made explicit, even at the risk of undermining our recommendation that this be left to state determination. [While you mainly construct the book as a proposal to change the federal law, it would not be amiss to make recommendations as to what states should do with what would be once again their freedom. That testing be every three years, as you recommend for inspections, would be consistent - and reasonable. I'd probably opt for one-two subjects per year, each subject once every 3 years, to ease the burden on any given class.]

I think that your claim that we don't sufficiently rely on examination of student work is also based more on inference than on what we say. Your review says "To be more precise, in Grading Schools, the use of such evidence appears only in the inspectors' examination of such artifacts." I don't know why you insert the word "only" here. I don't think there is any implication in what we wrote that this inspection should be cursory or superficial. Although we do not go into detail, I think that our recommendation clearly entails inspectors examining student work to whatever extent is necessary to satisfy themselves that instruction is of high quality and that learning is taking place. You recommend that each school maintain for each student "a portfolio, learning record or work-sampling system," [and] "then teams of outside teachers and others can re-score samples and provide valuable feedback in a process that would steadily improve the quality of the system." I think that our recommendation for school accreditation (inspection) does precisely this. We do not have an explicit recommendation that schools be required by federal law to maintain such a work-sampling system, although it is clear that if inspectors had no portfolios or similar student work available for examination, a school should not fare well in the accreditation process. Again, our recommendation regarding the maintenance of such a system was withheld because, although implicitly necessary for accreditation purposes, we wanted to be careful not to prescribe the particular details of state systems, which could vary. [I did not mean to imply that the inspections would be cursory - of course, they might be, and inspectors under-prepared, etc. Indeed, if tests are every three years and if inspection results together with tests comprise much/most of the state accountability data (at least as pertains to academic learning), then a careful study of student work is feasible and important. FairTest, and the FEA panel on assessment, have made recommendations regarding both federal and state assessment (as have others). As you describe it, it is closer to what the FEA and FairTest have in mind. Still, I think it would be useful to have annual processes of teachers (with some others) reviewing and providing feedback on samples of student work. Thinking further, it might be that for accountability purposes, the inspections ought to be what counts, with the annual reviewing being conceived primarily as a tool for improvement - of teaching, of assessing, of developing tasks, refining portfolio/performance systems, etc. That starts to provide alternatives, esp. in the face of what undoubtedly will be lots of pressure to continue with a lot of testing.]

You note in your review that, "Though not elaborated in this book, Rothstein has elsewhere argued forcefully for paying more attention to non-school factors. That is, data about health, housing, employment, etc., should be gathered and factored into any accountability and improvement system. I concur. Gathering data at schools on student health, for example, can be part of this." You are correct about this, but I think the book elaborates on this more than you acknowledge. First, the book calls for much more detailed background data to be collected by which accountability judgments should be adjusted. For example (and these are the chief examples given in the book), permanent student records should include not only lunch eligibility and race, but also mother's educational attainment and country of birth. As you know, mother's educational attainment is the single most powerful socioeconomic predictor of student achievement.

Our discussion of early NAEP provides considerable illustration of both survey and attitudinal questions that should be asked about student health, for example, as well as standardized tests of physical education (we mention the Fitnessgram specifically).

[You are right, it does elaborate more than I suggested. I think we agree that a range of kinds of info is needed to use in assessing how well schools (or other programs) are doing, and that info should be considered when any improvement steps are undertaken. You provide some cogent examples of what can be collected and how. I was thinking more on such issues as the availability of health care and housing, as well as opportunity to learn provided by the schools. On the former, especially in , you have done excellent work.]

None of the above should be taken a criticism of your review. It is a thoughtful, careful, and insightful review, and your thoughts about how our recommendations should be improved illustrate precisely the kind of conversation that we hoped the book would stimulate. If any other reviewer takes even a small fraction of the care that you took, we will be very grateful.

Thanks very much.


Thanks again for your reply, and I hope your book makes a major policy impact.


This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.