NY superintendent Bill Cala and WA superintendent Mike Riley offer a spirited discussion on big issues in curriculum and testing.
Superintendent Debate: Do We Need Big Tests?
By Jay Mathews
Washington Post Staff Writer
June 1, 2004
School superintendents are careful people. They don't shoot off their mouths often. Their press aides are often nearby to make sure what they say in public does not offend school board members, teachers, parents, editorial writers or any of the other people they depend on for survival.
Even one superintendent willing to stick his or her neck out with personally written and provocative speeches and e-mails is very rare. So I am very lucky to have found two of them -- among the most articulate educators in the country -- to participate in one of my e-mail debates.
I think many readers are like me. It is hard for us to understand complicated issues until we hear them argued out by people who know the territory. Mike Riley, superintendent of the Bellevue, Wash., public schools, and Bill Cala, superintendent of the Fairport, N.Y., public schools, are going to help us decide what to think about the tests that have become so important in the measuring of schools.
Riley seems to be pro-test and Cala anti-test, but I urge you to look for the less predictable portions of their arguments. They are both unusually good writers and advocates. They have both taught in the classroom. They know from painful personal experience what works with kids and what doesn't. We can learn a lot from them.
The debate begins with a message Riley sent to his staff recently that inspired me to turn this into a two-way conversation:
MIKE RILEY: We are definitely in the season of the Big Tests. Big Tests are national or state-wide measures that take on a great importance in the minds of students, teachers, parents, administrators, the press, politicians and the public at large. It is an importance that is largely unwarranted, in my opinion, since no Big Test, no matter how well crafted it may be, comes close to measuring the scope or depth of the things we spend a year -- let alone many years -- teaching kids.
I am a fan of AP [Advanced Placement] tests primarily because they connect in a fairly straightfoward way to what is taught in AP courses -- a rather sensible idea that, unfortunately, seems to escape the creators of many other Big Tests. But even an AP test cannot measure many of the important things that occur in an AP class. I recently observed an AP World History teacher conduct a review of important events that occurred along a lengthy timeline throughout many regions of the world. Student teams were responsible for describing the events they had selected for their timeline and region, and I was struck by how much they knew, how articulate and poised they were in making their presentations, how much their teacher exercised their critical thinking skills by pressing them to explain why the events they chose were significant, and how successfully they responded to his challenges.
I also found myself thinking these kids appear to be very ready for their AP test, but what a shame only a small part of what they know will be captured and reflected in their scores.
Luck plays a part too. While these students have learned a ton, there is still a chance the test will ask them about something they didn't study or don't remember. Also a factor is the emotions kids experience when they take a Big Test. When anxiety shoots too high it has a negative impact on performance.
One day last week, I received a note from an elementary principal about how the WASL [Washington Assessment of Student Learning] had affected her students and teachers. Here is what she said. "My fourth grade teachers are feeling very down-trodden. The math performance of our kids this year is not as strong as last year. The teachers noticed kids just giving up and not even trying. I have been working with the kids who got accommodations, and I have never spent so many days in a row witnessing kids in tears. One child I worked with asked me to write, 'I don't know' for every reading response. And today (our last day), a kid asked me to write, 'I'm too frustrated. I can't figure it out,' for one of the math problems."
AP, IB, SAT, PSAT, ACT, PLAN, WASL, ITBS, ITED -- Big Tests all -- will, in the aggregate, have been given this year alone to almost every student in the district. Many of these tests, like it or not, will have an important impact on students. SAT scores, for example, will keep even hardworking, talented kids out of the colleges of their choice. WASL scores, now to be posted on transcripts, will give employers and college admissions officers impressions about the ability and skill of kids several years after the tests are taken.
Big Tests have an impact on us too. When scores for a school go down, parents ask questions, teachers feel uncomfortable, the media is likely to label the school "underperforming," and until the next set of scores come out, the numbers hang over the school like a permanent dark cloud.
I have felt the impact of Big Tests too. Despite how wonderful our scores have been over the years, I have been challenged about our math curriculum when our computation scores are lower than our problem solving scores, about our English curriculum when our writing scores are lower than our reading scores, about our entire middle school program when middle school scores are lower than elementary school scores, and so on.
After what I have said up to this point, you may find this a non sequitur, but I must admit I like Big Tests. I do not like how they are often interpreted and used, and I certainly don't like the negative effects they can have on kids, educators, and communities. But those negative outcomes are not the fault of the tests themselves or of the information they provide. They are instead the result of the misunderstanding and abuse of their purposes and results.
Big Tests serve a very positive purpose when we keep them within their proper context. Here is how I think about them.
A) They provide valuable information but only a slice of information. They are a three-hour to five-day event in a student's life. They do not tell the whole story -- about kids, teachers, or programs -- not even close.
B) They are imperfect measurements because they are created by people, sometimes by committees. Too often they are the result of compromises, even politics.
C) They are not typically connected in a direct and specific way to a course, grade level, or curriculum. That math test that brought tears to the eyes of fourth graders, for example, is not simply and precisely a test of our fourth grade math curriculum or the instruction kids receive in fourth grade alone. Instead it measures something larger and much more ambiguous than that. Here's another example: Where, exactly, do we teach SAT?
d) Year to year changes in scores should not be given too much importance because too many variables can impact results in a single year. Instead, we should watch for trends over several years.
E) Speaking of variables, there are quite a few -- curriculum, instruction, what's going on in the student's life at the time of the test, what has happened to a student in both his school and personal life over time, and so on. Our job is to concentrate on the variables we can control and do our best to make these as supportive as possible of positive student performance.
I do not feel the need to downplay the importance or play up the imperfections of Big Tests because we are not doing well. In fact, our scores are great, our trends are all positive, and our good reputation hinges in part on the results of all the Big Tests our kids take. Instead, I want us all to remember this season of the Big Tests what purposes they serve, to appreciate their value but not exaggerate their importance.
Most importantly, I want us to help kids put Big Tests in their proper place so that our students do not make the mistake of thinking they measure their ability to learn, their potential to be successful, or their value as people. They are just tests. Nothing more.
BILL CALA: I must admit that I was forced to read your staff memo several times. At first, I was relieved that you appeared to understand and articulate the facts about the proper use of tests. Then, I was disturbed by your acquiescence to all of the "warts" of the BIG TESTS. Subsequent readings led me to wonder just what message you were trying to send.
What's Wrong with BIG TESTS?
You started the ball rolling on this topic with your memo to staff; however, there is much more to the story. Certainly, the note from your elementary principal concerning the "down-trodden" fourth grade teachers preparing for the Washington Assessment of Student Learning (WASL) is not a unique experience. It is pervasive across the country. In New York, we have similar tests in the fourth grade, with staff expressing the same feelings of frustration and near despair. As you accurately stated, BIG TESTS are given to almost every student in the district with very serious impact on these students. And yes, these BIG TESTS will keep bright kids out of, not only the college of their choice, but out of college -- period.
BIG TESTS have also led many to drop out of school and to be pushed into GED programs. This is not conjecture. It is the harsh reality of what these BIG TESTS are doing to our future. In the May 15th edition of The New York Times, Duncan Chaplin from the Urban Institute in Washington reported a 16 percent increase in the number of teenage GED students over the past decade. The Manhattan Institute and researcher Walter Haney reported similar statistics in their exhaustive studies this past year. Not only has the GED rate escalated, but the graduation rates also have reached abysmal levels. Last year, in a major study by Advocates for Children, it was reported that 160,000 children were pushed out of New York City Schools to hide the dropout rate.
You are also correct in stating that the tests are imperfect measurements and that they truly do not measure anything close to what kids actually learn in school. They are not connected to the curriculum with any degree of accuracy, and the year-to-year changes in scores should be looked at with a high degree of skepticism.
Then you lost me. You state that our job as school officials is to concentrate on the variables that we can control and that the fault lies not with the tests, but rather with "the misunderstanding and abuse of their purposes and results." BIG TESTS, by their nature, are never kept in a proper context. The American Association of Educational Research (AERA) and the American Psychological Association (APA) categorically state that no single test should be used to deny graduation or grade promotion. Yet, the WASL and many other high-stakes tests across the country are limiting opportunities for children and are in direct conflict with the ethics of AERA, APA, and a host of other national and international organizations. The news media feast on the scores from the BIG TESTS and are all too ready to label schools of poverty as failures.
Good tests, locally-made teacher assessments, serve a very valuable purpose. Any college admission officer will tell you that the most valid measure on a student's transcript comes from teachers' grades, not the results of the BIG TESTS. As a teacher, I quizzed my kids nearly every day and prepared my own comprehensive finals. These tests were valuable because they were aligned with the curriculum and accurately reflected what students actually learned. You stated that "BIG TESTS serve a very positive purpose when we keep them within their proper context." When these tests became high-stakes, all significant value was lost. Any test that turns kids away from school is not only valueless, but also egregiously harmful.
The only variables left for educators to control are those that support the testing system itself, ignoring by necessity what is best for children. Stephen Kramer, elementary teacher in Brush Prairie, Wash., has a poster in his classroom that he displays right next to the state standards to remind him of what is more important than the state tests:
1. Learning should be rooted in joy.
2. The most important thing to learn about reading is to love it.
3. We all need help with our writing.
4. For some of us, art and music are as important as breathing.
5. No lesson on math, reading, or writing is so important it can't be interrupted for a lesson on honesty, generosity, or compassion.
It's Not Just an Issue of High-Stakes
While you state in your memo to staff that the tests are imperfect measurements and are likely to be misused, you also speak of your affection for the BIG TESTS. As you cradle the Advanced Placement exams in your arms, I am frustrated and disappointed by your own analysis that the AP test will take only a small snapshot of what the students are actually learning and that the test may ask them about something they didn't study, cover, or remember.
For just these reasons, I find the International Baccalaureate a much more viable, realistic, and productive alternative to the AP. The test is just a slice of the IB picture. Multiple assessments and the very important teacher evaluation are the essential components of a student's IB grade. My interviews with our International Baccalaureate diploma recipients have been enlightening. Our IB students have also taken many AP courses and tests. Every student who has taken both IB and AP courses clearly describes AP in a negative light when compared to IB. They have characterized AP tests as "a mile wide and an inch deep." What a striking parallel to high-stakes testing as a whole!
Across the nation, states have often recited the rhetoric that the BIG TESTS are solid, accurate measurements of what kids must know to be successful in our competitive world. Nothing could be further from the truth. In New York, for example, there are no validity studies on our high-stakes tests. There are many reliability studies, but reliability simply means that, under similar conditions, scores on the test should be similar in different test administrations. Unfortunately, a bad test given under like circumstances will yield the same awful results time and again. It is validity that determines whether or not the test measures what it sets out to measure. New York's tests do not. Nor are there validity studies for the WASL. In fact, not only does the test lack the proper validity imprimatur, but there is a 28.9 percent chance that the student taking the WASL will have his or her test scored incorrectly.
The misuse and abuse of tests in this country is epidemic. If we consider the dropouts that have been provoked since the onset of this movement in the early 1990s, we should be horrified. Even if the dropouts do not emanate from our own well-to-do suburban schools, we have a moral responsibility to the education of the greater community. Much like yours, my district does very well on the BIG TESTS. Unfortunately, there is a handful of students who are unnecessarily and improperly damaged as a result. They need advocates. Last year, three of our high school students out of 2,100 failed the BIG TEST in my district. The result is that they will not graduate from high school. They are immigrants, and none will ever speak English well enough to pass the BIG TEST. Imagine the brilliant immigrant scientists and statesmen throughout the history of this country who would not have been eligible to receive a diploma.
I do feel the need to take a position; I cannot be ambivalent. Throughout your memo, you clearly downplay the importance of the tests and point out their imperfections, but in your closing, you state that you do not feel the need to do either. But that is exactly what you have done. Several times you asked that your reader appreciate the purposes the BIG TESTS serve and to understand their value. Yet, you have cited no specific value of the Big Test.
The number of kids whose lives have been negatively, permanently altered in New York State and Washington (not counting the states between us) is now calculated in the millions. It is just too easy to say that BIG TESTS should be "kept in their proper place," and that "they are just tests, nothing more," when one's own test scores are high. Imagine a school embedded in poverty, crime, social morass, and English language learners. I can only wonder what the Washington State superintendents in Everett and Edmonds are telling their staffs about the BIG TEST. What "place" do the tests have for them? Are they tests and nothing more? Or are they, instead, an unfair scapegoat for society's ills?
MIKE RILEY: I think two characteristics of our profession that have damaged it significantly are (a) the tendency to give black and white responses to complex issues and (b) our tradition of saying each teacher knows best when it comes to determining what kids should learn and to what extent they learned it. The first gives rise to the likes of the phonics/whole language wars, powers the pendulum swings that so frequently impede real progress, and prevents us from developing a substantial body of professional knowledge. The second leaves us susceptible to politicians and ideologues. When every teacher is an independent standard setter, we have chaos, and into the confusion step those with simple and typically wrong ideas about what kids should learn. Let me say it another way: politicians and ideologues set standards for educators because educators haven't done so themselves.
It seems you're saying standardized tests shouldn't be used because they have flaws and because people may misuse them. You're right: your position is not ambivalent. However, while just saying no may be attractive in its simplicity, it falls far short of being constructive or helpful. I think it is possible to garner good information from flawed instruments, but I think it's wrong to allow flawed instruments to determine a kid's fate, whether it be graduation from high school or acceptance to college. I think we should demonstrate how to use comprehensive analyses of student performance -- a truer, albeit more complicated and difficult to implement approach. My point of view may be more difficult to explain than yours, but I think in the long run we will gain ground if we help the public understand that teaching and learning are complex and that our thinking needs to be more sophisticated in order to deal with that complexity.
As to your comment that "any college admission officer will tell you that the most valid measure on a student's transcript comes from teachers' grades, not the results of the BIG TEST," I think you're flat out wrong. College admission officers are members of a growing chorus that includes college professors who wail about high school grade inflation. They are right when they assert that when nearly everyone earns between a 4.0 and a 3.5, the grades become almost entirely meaningless. One of the reasons so many colleges focus on the amount of AP/IB courses and the scores kids get on the tests is that they provide the best shot we have as a nation to achieve some consistency in both what is learned and how it is measured.
In the best of circumstances, teacher grades align to their curriculum. Point well taken. But be honest. There are a lot of teachers out there who are not very skillful at creating good assessments. We both agree that people paid as testing experts have a tough time capturing the right stuff. Why would we assume teachers are going to be more successful? Secondly, and more importantly, teachers teach their own curriculum, not necessarily a curriculum that is common to what other teachers teach. What's a better measure of whether kids are ready for college level writing, the grades thousands of teachers give kids for something they call "Senior English," or the score on an IB English exam?
To summarize, I am trying to avoid simple answers that don't do justice to the complexity of our work and to establish standards that are both meaningful and consistent across the nation.
BILL CALA: I am not suggesting "black and white responses." Far from it. Nor do I subscribe to the notion that every single teacher knows what's best for kids. I am asking, however, that you stop ignoring the elephant standing in the room. The Big Tests are causing enormous damage.
There are viable alternatives. Ron Wolk's Multiple Measure Model or the International Baccalaureate are both comprehensive means of measuring real student progress, not the contrived, biased devices of an overloaded, under-performing test industry. These models are not black and white and they are not simplistic; they are holistic, fair, and realistic.
Politicians and big business have driven the agenda for public schools, not because educators haven't wanted to do so themselves, but because educators have become subservient to the authority and the omnipresence of the political system. Educators since Dewey have proposed a better, constructivist agenda -- to deaf ears. There are many reasons why educators do not and cannot set educational standards. It's akin to the Stanley Milgram experiments where the "teacher" shocks his "students" repeatedly at the behest of the authority figure. Much more can be found on why educators fail to act and fight for what is right in schools in a chapter I wrote in Defending Public Education (Praeger Press, 2004).
I am not opposed to standardized tests when used appropriately -- that is, to give ONE snapshot out of an entire album of student learning. But you ignore the realities of the Big Test. They are the entire photo album when used as high-stakes weapons, and for all intents and purposes, they are ONLY used as such. Waxing about the "good information" from "flawed tests" is merely a theoretical construct, not one based in practice or reality.
Whether you think I'm wrong about college admission officers and the value of teacher grades doesn't change the reality based in research. The preponderance of research on the subject is clear (see Richard Ryan's research on Motivation and Learning, University of Rochester). Your statement about AP/IB leads me to believe that you are unfamiliar with IB. I clearly stated that IB is very different in focus and practice from AP. AP is a mile wide and an inch deep; IB can be a mile deep and not nearly as wide. We proponents of IB choose depth over coverage of the standardized AP material. Furthermore, IB uses tests as only one part of the total evaluation package.
I wonder what message a teacher in your district is to extract from your memo on the Big Tests? Do they provide valuable information? Are they better than tests prepared school-wide and/or district-wide? Do they meet the criteria on Stephen Kramer's classroom poster? Does the good outweigh the damage?
Your last response to me certainly clarifies that memo for your teachers. You stated that teachers have "significantly damaged" the profession with their tradition of daring to think that they know their students best and measuring their students' progress. You have placed teachers in the same barrel with "testing experts" who can't capture the right stuff. As a whole, the people making the Big Tests are far from experts; they are far-removed from the classroom and further removed from understanding student learning in any current concept. Teachers, for the most part, do know their kids and have a very good concept of what their kids know.
Your thoughts on teacher-driven curriculum are troublesome. One of our roles as leaders is to help bring common strands of curriculum to our school districts. Basic commonalities are important, but never should teachers' insight, local knowledge, and keen sense of the skills of their students be scuttled in the name of the E.D. Hirsch-type rhetoric and arrogance of "what's best for all kids."
Too often, we devalue and discount the work of the majority (our good teachers) because of a few whom we as leaders fail to hold accountable. How easy it is to make everyone follow a rote script than to do the hard work of dealing with individuals. This is deficit thinking, assuming the worst in people instead of expecting the best.
You said that you are trying to avoid simple answers; yet, Big Tests are the paragon of attempts of simple solutions to the complex problems of student failure, ignoring the root causes of social morass and decay. The Big Tests continue, apparently, with your support. As the Big Tests do their damage, no answers are provided relative to their value other than the questionable value of a "standard" curriculum. With studies across the country showing extremely poor alignment between state standards and standardized tests, I do not hold much faith in that answer, even if standardization were worth sacrificing our children.
Making good people, making good citizens, and nurturing the talents of the individual have been the goals of public education since the turn of the 20th century. Big Tests, high-stakes' Big Tests under the umbrella of standardization, have all but buried these noble dreams.
MIKE RILEY: You assume I don't know much about IB, but it turns out I'm intimately familiar with the program. My daughter was an IB diploma candidate, our school district proudly offers an IB program, and I've spent a considerable amount of time studying the program and its demands. I support it vigorously. At the same time, I'm an AP fan and find your sweeping condemnation of AP Programs as "a mile wide and an inch deep" to be, once again, a one dimensional view. Some courses, like AP Literature, are wonderful and not by any stretch an inch deep. Others, AP Biology, for example, try to cover too much content and thus feel rushed and, to some, shallow. It is possible, however, to make even these content-laden courses rich and deep by careful multi-year curriculum coordination, which can free teachers from covering all that content in a single 10-month period.
Another assumption you make concerns my view of teachers and their talents. You attribute to me a belief that "teachers have significantly damaged the profession with their tradition of daring to think that they know their students best and measuring their students' progress." I never suggested anything of the kind. I believe our profession has a blind faith in local control and teacher autonomy, and the result of this "every man an island" approach is indeed very damaging to many kids but especially detrimental to our least advantaged students.
I believe teachers are smart, well educated, caring, committed people. The joy I find in this profession is connected in large part to being surrounded by a group of scholars and missionaries. Nevertheless, if what teachers teach doesn't connect from one grade to the next, one teacher to the other -- no matter how spectacular the individual courses and teachers may be -- kids are left with an education marked by both gaps and unnecessary repetitions, which is, alas, a rather typical situation in our country. IB is a common curriculum, and it works well because it is founded on international standards, all those smart teachers out there can contribute to it, and the professional collegiality it inspires leads to meaningful, substantive staff development. And it even uses standardized tests as a means of improving curriculum, teaching, and the assessments themselves.
I would think as an IB supporter you would be among the first to acknowledge that a common curriculum -- and common assessments -- can produce wonderful benefits for students and their teachers. It's simplistic, I believe, to assume one is forced to choose between "teachers' keen sense of the skills of their students" and something founded on "rhetoric and arrogance." Narrow minded thinking would posit "rote scripts" as the sole alternative to total reliance on teachers' "insight and local knowledge." Once again, a more comprehensive approach serves us better.
Now to the elephant. Yes, Big Tests are causing damage. I've stated here and elsewhere that I don't support meting out heavy consequences to kids based on the results of Big Tests. In Washington, I will continue to advocate against the use of our state assessments as a graduation requirement. But I believe it is a more intelligent strategy to accept the tests and their results while fighting hard to put them in their proper place -- as part of a comprehensive school evaluation system. Rejecting them all together is a position hard to defend because these tests do indeed provide valuable information. Further, it is a position that will be easily savaged intellectually by standardized-test zealots, and, much worse, will disappoint even reasonable people, people we need to convince with the sophistication of our arguments.
BILL CALA: Long ago, I learned to assume very little.
I am surprised that you offer International Baccalaureate and your daughter is a diploma candidate, yet do not seem to really grasp the program. By design, IB not only permits exploration well beyond a "common curriculum," but that exploration is essential to student success in the program. Successful students must dig deep into research and fully understand the depth of what they are studying. The Theory of Knowledge course is a requirement that exemplifies what this is all about.
I would hardly call my objective critique of the APs a "sweeping condemnation." However, by comparison, AP exams are shallow (I taught AP courses for seven years). By your own observation, you note the wonderful work you see in AP classes yet bemoan the fact that "only a small part of what they know will be captured and reflected in their scores." Your words!
What I like about IB is that it is a multi-dimensional system of learning and evaluation and not based on just one test like AP. The standardized tests in IB are but a small piece of the whole pie -- exactly what AP and state assessments (the Big Tests) should be like. Too often, administrators become enamored with AP and IB for the wrong reasons. Yes, having large numbers of kids in these programs will earn a school a high ranking in Jay Mathews' Challenge Index, but I'd much rather see a love of learning evolve from meaningful classroom engagement. Something, that is in short supply in this standardized movement.
Now let me address your comments about teachers again. I quoted your comment about teachers being one of two factors that have "damaged the profession significantly." You continued by labeling teachers as incapable of evaluating their own children -- the children they intimately know, the ones they see learning every day, and the ones for whom they provide individually crafted lessons. You appear to want teachers to do as they are told, do the same thing that everyone else is doing regardless of who is in the class, all in the name of a common curriculum. If they follow that lead, then they are "smart, well educated, caring and committed people."
Labeling anything you don't seem to understand as "simplistic," and equating "comprehensive" with taking every side of an issue when convenient adds no clarity to the discussion. Clearly, I have laid out my plan -- multiple measures that include standardized tests and valuable teacher evaluations; in-depth teaching and learning rather than simple coverage of the material for high-stakes tests; valuing and nurturing the skills of the individual; making good people and making good citizens. All beyond the scope of the Big Tests that you so hold so dear.
It is far from simple. It's a lot of work. Perhaps that's part of the problem. It goes beyond "accepting the tests and their results," as you have decided to do. Your acceptance of the tests leaves those students currently in school high and dry. They are on a bus headed for a cliff. Perhaps through the strident advocacy of others the bus will not reach the precipice causing catastrophic results for our students. But, what about the untold number of students who are jumping off the bus on this road to perdition while you are ruminating about putting tests in their "proper place" sometime in the distant future?