Orwell Award Announcement SusanOhanian.Org Home


Constraining Elementary Teachers' Work: Dilemmas and Paradoxes Created by State Mandated Testing

Susan Notes: Although this study took place in New York, teachers across America will see themselves in the words of these 4th grade teachers who describe how state tests undermine their abilities to do their jobs with integrity. That with integrity is important. It is a word teachers must not abandon.


Abstract
There are frequent reports of the challenges to teacher professionalism associated with high stakes and mandated testing (McNeil, 2000). So, we were not surprised in this year-long study of two elementary schools in upstate New York to hear teachers talk about the many ways the 4th grade tests in English Language Arts, Mathematics and Science undermine their ability to do their jobs with integrity. We came to understand in more nuanced ways the ongoing tension created by teachers' desires to be professionals, to act with integrity, and at the same time to give every child a chance to succeed. What we found in these schools is that the high stakes tests continually forced teachers to act in ways they did not think were professional and often resulted in creating instructional environments that teachers did not think were conducive to student success.

The teachers at these elementary schools are not radicals. They do not seek complete autonomy, they do not eschew the need for accountability (even bureaucratic accountability), they find some virtue in state mandated tests, they are content within centralized systems that proscribe some aspects of their work. But, they also perceive themselves as professionals with both the responsibility and capability of doing their jobs well and in the best interests of their students. New York State's outcomes based bureaucratic accountability system tests their resolve, makes them angry or frustrated, and requires unnecessary compromises in their work.


Most of our time in fourth grade is spent test-prepping
There is very little of the extra projects
The extra fun kinds of activities
That we used to be able to do
That goes by the wayside
Because we need to test prep

Being in fourth grade is almost an advantage
If I need materials I say
Oh it's test related
Then I can get them
If I have a child that I need to have looked at
Oh it's fourth grade

There's more of an emphasis on something
Whether that's good or bad
I'm most uncomfortable at the mid-year
When it's time for us to decide
Is this child going to meet the criteria
to move on to the next grade?

You take a child to a retention committee
This child might not necessarily be ready for the next grade
But professionally I know retention is not the answer
That is no longer weighted very heavily
When you as a professional say
I know the solution for this child is not retention

What this test is testing is good
Kids should be able to read a passage
And respond to it in writing
There's nothing wrong with that
What's wrong is the way the adults in the world
Take the scores and report them

It's a benchmark
If a child can't do it in fourth grade
And they can get it in fifth grade
Why should we penalize them?
And if it takes them an extra year to master something
That's okay

We are not financial planners, where we are judged
In how many millions of dollars we brought in
We are not Wal-Mart
In how many sales we made
We are a service industry
So stop comparing success
With scores, growth, end products

What if you have a kid who got a two on the ELA
But was a knucklehead
An emotional disaster
Disruptive
But during the course of the year
In behavior
In courtesy and respect
Improved tremendously
Are you not a success then?
Did that kid not improve?

Are they measuring that?

[poetic transcription of a group interview of Willow Valley teachers]



Introduction
The current accountability strategies of school reform rely heavily on measuring outcomes, especially student achievement, and attaching consequences, either positive or negative, to various levels of performance. These accountability strategies effect everyone and every aspect of schools and schooling at local, regional, national and international levels. This article, examines the ways state mandated testing, the primary vehicle of accountability, effect teachers' work and, in particular, how their professionalism is seriously challenged by this testing.

There are frequent reports of the challenges to teacher professionalism (Note 1) associated with high stakes and mandated testing (McNeil, 2000). So, we were not surprised in this year long study of two elementary schools in upstate New York to hear teachers talk about the many ways the 4th grade tests in English Language Arts, Mathematics and Science undermine their ability to do their jobs with integrity. What we came to understand in more nuanced way is the ongoing tension created by teachers' desire to be professionals, to act with integrity, and at the same time to give every child a chance to succeed. What we found in these schools is that state mandated tests continually forced teachers to act in ways they did not think were professional, and that, in fact, this was often necessary in order to give every child an opportunity to succeed.

Context and Methodology
This year long ethnographic field study of two schools was conducted during the 2001-02 school year in two school districts in upstate New York, and is part of a larger study of the relationships among teaching, learning and state mandated testing in four upstate New York school districts. These school districts are different from many, at least at the moment, since each is participating in a National Science Foundation funded teacher enhancement project. This project is aimed specifically at providing professional development in science to elementary and middle school teachers with a pointed emphasis on helping teachers better prepare their students for the New York State 4th and 8th grade science tests. 2001-02 was the third year of this professional development project.

Our research postulates that teachers in these districts might be better able to cope with the demands of state mandated testing, certainly in science but perhaps in other subjects as well, as a result of teachers' potentially greater access to professional development. This paper does not address this issue directly, but at this stage of our research project we are doubtful that this relationship holds. This is true in part because the science tests are significantly less important to teachers, school administrators, and the New York State Education Department than are the English Language Arts and Mathematics tests at the elementary and middle school levels. Having considered this possibility, our research focuses holistically on the interactions among teaching and learning across all subject matter. Indeed, as we will discuss here the relative importance of the tests and when they are administered are key factors in decisions about curricular emphasis across the school year. Our long-term goal is to understand the complex interactions at the classroom, building and system levels among the many demands the state accountability system places on the educational enterprise.

In New York State, "outcome-based bureaucratic accountability" prevails (O'Day, 2002). This is a form of accountability that holds teachers and schools accountable to state education authorities for producing "specific levels or improvements in student learning outcomes." (p.8) These student learning outcomes are manifest in performance on state mandated tests beginning in 4th grade on through Regents Examinations required now of all students in New York's high schools. Such an outcome based bureaucratic accountability strategy focuses teachers (and students) on specific forms of limited knowledge and skills and in so doing focuses pedagogical and curricular decision-making.

The fieldwork for this study involved at least one day per week in each school--observing classrooms, talking with teachers and administrators, and attending school meetings and events. (Note 2) A great deal of our field work focused on 4th grade classrooms (since this is where the testing burden primarily lies) but we observed classrooms and talked with teachers at every grade level. Additionally, a focus group interview with teachers and a focus group interview with parents were conducted, as were individual interviews with building and district administrators. Throughout the data analysis, we engaged a number of teachers and the principal at each school as peer debriefers, continually checking our understandings and reading our case studies.

Table 1 summarizes descriptive information about the schools and districts and Table 2 indicates the schools' pass rates on the ELA, mathematics, and science tests for the past three years. Table 3 illustrates the range of state mandated tests given in New York state elementary schools. Included in this table are the dates the tests are administered and the format. Both are critical elements in teachers' decisions about what to teach, how, and when. Additionally, but not part of this study, New York has adopted, under the leadership of Commissioner Richard Mills, the "Regents for All" Plan which will require all students pass a minimum number of courses and Regent's Examinations in five subjects to receive a State recognized high school diploma.

Table 1 Description of Schools and Districts

School District # of Students # of teachers Free/reduced lunch Race/ethnicity English Lang Learners Grade levels
Hemlock Elementary* 17 buildings; urban; overall 69% of students are on free/reduced lunch; drop out rate 7%; 9000 students 395 30 90% 52% white
35% Black
12% Hispanic
1% other 0% PreK - 5
Willow Valley Elementary* 2 buildings (elementary & ms/hs); working class, predominately white; 1500 students 818 52 46% 93% white
5% Black
1% Hispanic
1% other 2% K - 6

Source: 2002 New York State School Report Cards

* pseudonyms are used for schools

Table 2 Test Scores (% of students "passing" 4th Grade State Tests)

School Year ELA Math Science
Hemlock* 1998-99 15% 48%
1999-00 40% 53% 62%
2000-01 50% 63% 63%
Willow Valley** 1998-99 44% 71%
1999-00 48% 72% 56%
2000-01 55% 77% 77%

*This school did not meet the state standard in ELA, but made adequate yearly progress (AYP) in 2000-01.

**This school met the state standard and made adequate yearly progress (AYP) in 2000-01.

Table 3 New York State Mandated Elementary Tests (2001-02)

Grade Level Fall Spring Test Format
4th English Language Arts (early Feb) Reading & 28 mc questions
Listening & written responses
Reading & written responses
Independent writing prompt
Mathematics (early May) 30 mc questions
Short and extended responses
Science (May) 45 mc questions
Performance--5 stations; 4 questions/station
5th Social Studies (Nov) 45 mc questions
3-4 constructed responses
1 document based question

Contexts for Teachers' Dilemmas
Teachers may never have had much autonomy and the professional status of teaching cannot be taken for granted. Teachers' work has historically received low pay, been perceived as relatively low status, and often operates within authoritarian and often petty school cultures (Katz, 1971). "Education has not suffered from any freedom granted teachers to run schools as they see fit; it has suffered from the suffocating atmosphere in which teachers have had to work" (p.131). Still, much educational research demonstrates the centrality of teachers in educational reform (Elmore, 1996), they are "curricular-instructional gatekeepers" (Thornton, 1991). Schools have also been the locus of almost every social change effort placing ever more demands on teachers (e.g., drug education, sex education, values education, environmentalism, bus duty, data management) with no reprieve from prior demands. The current standards based reform movement with its clear specification of content, pedagogy, and assessments adds to these demands, increases authoritarianism, and further erodes teachers' sense of professionalism (Madaus, 1998; Mathison, 1991; Noble & Smith, 1994; Ross, 2000; Vinson, Gibson & Ross, 2001). In a study of Kentucky teachers after the implementation of the Kentucky Education Reform Act, Kannapel, Coe, Aagaard, Moore & Reeves (2000) conclude that, "the educators we spoke with resented the accountability measures as an insult to their professionalism."

There is ample research describing how state mandated tests, particularly high stakes tests, challenge and compromise the professionalism of teachers. MacNeil's (2000) research in Texas illustrates a range of constraints on teachers' work, constraints that lead them to "exclude their richest knowledge from their lessons" (p.192). These constraints spring from the increased standardization and specification of important knowledge as that which is on the test. As a result, teachers adopt generic forms of content and presentation; develop a "test based curriculum"; separate content "for the test" and "real content"; further fragment knowledge; and even retire. Testing leaves little time for "real instruction" (Hoffman, Assaf & Paris, 2001). In some cases, when a mandated test demands something that has not previously been a routine part of the curriculum, such as writing or problem solving, there is refocusing although in ways driven pointedly by the test (Hillocks, 2002; Kannapel, et.al, 2000).

Teachers do not feel good about the constraints that testing places on their work. MacNeil (2000) describes teachers moving away from particularized child centered teaching to teacher centered generic teaching, because the latter reflected state mandated curriculum and assessments. Dramatically, she concludes: "The reforms required that they choose between their personal survival in the system or their students' education" (p.192).

The schools in this study reflect findings of other researchers. Teachers at Hemlock and Willow Valley Elementary Schools perceived their professionalism to be diminished. Through outcomes-based bureaucratic accountability teachers' work has come to be defined by the state-mandated tests, especially in English Language Arts, as well as district directives geared to improve state test scores. But for these teachers it is not an either or choice between personal survival and the students' education. These teachers confront the dilemma of being a good teacher, a professional, and helping kids to succeed, which is marked by performance on state tests. What we saw repeatedly was that this dilemma is almost always solved in favor of the students, that teachers sacrifice their professional integrity in order to help every child be as successful as s/he can be on the tests, even when they lack faith in the indicator. This resolution plays itself out in the classroom as well as around the administration and scoring of the state tests. The following sections elaborate how teachers experience and come to uneasy resolutions of the dilemmas they face.

Faith in Children
The popular media and politicians often portray teachers as contributing to the low achievement of children, especially children of color, by having low expectations and lacking the faith that all children can learn. The political slogan, "No Child Left Behind," which titles the current Elementary and Secondary Education Act is a manifestation of this belief. However, the teachers in these schools, both in word and deed, challenge this representation although like teachers everywhere they talked of the overwhelming social forces on children's lives outside the school building. And, they did not always feel they were able to compensate for a lack of experiences (such as rich early literacy experiences) or life circumstances (poverty, violence, homelessness).

This was especially true at Hemlock Elementary, a school where most children are on free or reduced lunch and many are African American. "These are not children that don't learn. These are children that do learn--slowly." "We are being judged on something that is largely out of our control," Hemlock teachers explain as they relate stories of student absenteeism, high mobility, and academic need. "And what does it do to the individual kid? If we have a child who's a slow learner, that is a huge concern that is being left out of this testing thing by the media and politicians and the Regents. They don't want to know that there is such a thing as a slow learner. And to tell a child, who gets to this higher level in a school year that they are a failure because they didn't reach this goal is horribly wrong, horribly wrong for that child."

There is less confidence in the children and teachers have a more limited sense of efficacy at Willow Valley. Willow Valley Elementary is a huge school, a consolidation of three elementary buildings into one, occupying an office building complex the school district acquired from a downsized business. Students here are white, working class and poor, and living in a neighborhood enclave cordoned off by industry and freeways. Teachers here frequently characterize the school's students as a high needs population: "It's hard and with the special ed kids… we need consistency and structure. As soon as the tiniest, tiniest thing changes, they're very needy in that sense. That so terrifies me about them going into fourth grade because their independence to be able to, even on a very simple task, read the directions and complete it… As long as everything is being modeled step-by-step or very guided or very structured, they're fine. But as soon as you look for that independence, they struggle." Parents are aware of the characterizations of their children: "Labels [that they give our kids]--you go to the school board meeting and you hear this, go to a PTA meeting, go to a committee meeting, and it's the socio-economic background, it's the transient populations. So, because of this we can't expect a good education for our children?" The principal is aware of the strong tendency to view high needs students as somehow less able than others and feels it is his role to continuously stress that teachers need to learn to work with what the students bring with them, not what they aren't bringing.

Willow Valley teachers don't give up on children, but they often express reaching the limits of their capabilities. "We are doing what good 4th grade teaches are supposed to do, we're teaching the students the curriculum. You can't ask us to make up for the fact that this child is deficient in this skill and has been since kindergarten. There are just, I don't know how to describe it, there are just certain things that are beyond the 4th grade classroom teacher's control and yet we are being asked what are we going to do about this child? I can't do anything more. I've done everything I can do. You have to pass it off to somebody else now." But still, teachers worry about what will happen to children, "it still eats us," and repeatedly we saw teachers making school instructive and enjoyable for their students.

In the classroom
Teaching to the test
The many meanings of 'teaching to the test' and the validity of the test itself conspire to create anxiety about the right thing to do. The basic tenet seems to be: if a test measures what is important then teaching to the test is okay, but if the test is misdirected or poorly constructed or only a partial picture of what is important then teaching to the test is not okay (Heubert & Hauser, 1999; Smith, 1991). The difficulty for teachers is that they often hold both views simultaneously. The 4th grade ELA encourages them to teach more writing than they have before and the 4th grade Math Test encourages them to teach more problem solving--so teaching to the test (in the sense of taking curricular cues from the test content) is good. But, the reading and writing on the 4th grade ELA is formulaic and focuses on syntax, and discourages creativity, exploration of language, and discussion--so teaching to the test is bad. Coupled with a context that defines these tests as high stakes tests, especially the ELA, with serious consequences for schools (threats of state intervention), for teachers (shame and rewards), and students (possibilities of retention in grade, labeling), and teachers are left with little choice. They teach to the test. At Hemlock this is a highly structured, orchestrated effort while at Willow Valley this is a more haphazard, individual response.

Content
"I'm finding that I used to read stories for enjoyment. And now when I'm reading a story I'm trying to think, 'Alright, now how am I going to use this?' And I'm trying to get the contrast and compare. And trying to do author studies. And I almost find that I'm not enjoying it. I'm enjoying it, but it's not like it used to be when we could read a story put it aside and maybe do a tracing and cutting activity to go with that story. I'm not doing so much cutting anymore. I'm doing a lot more, I'm trying to do critical thinking and we're writing in journals. It's not a fun thing anymore. I'm trying to always get two jobs done as one. How can I use this twice? How can I really push this? [Willow Valley teacher]
Teachers value what they perceive to be positive changes the tests have instigated in their teaching, resent what they have had to give up to make these changes, and sometimes defiantly teach what they think is important even though it may not help the students do well on the test.

At both Hemlock and Willow Valley schools teachers believe the state mandated tests have changed what they teach, and often for the better. Teachers believe the ELA "is a good test. It tests listening, reading and writing at an appropriately high level." The Hemlock reading teacher describes the test as focusing on "higher order thinking skills, therefore in our program the emphasis has been changed from the lower level thinking skills such as recall and detail to the higher level skills. That's a benefit. Another benefit is that we focus on writing much earlier… due to the nature of the test we've gone from filling in the missing word, which is a former emphasis, to understanding main idea, inference, conclusions, predicting and those are all higher level skills. So the result for the students is that they are getting really a much higher level instruction now than they used to." And a 3rd grade Willow Valley teacher now does, "a lot more note taking, lots of graphic organizers, and I don't think if it wasn't for the test that I would use them in such detail." In math she teaches the concepts and skills the 4th grade teachers say the kids need, but she goes on, "I feel like I'm very much rushing to say 'we've covered it and they've at least seen it' but not giving them the practice they need." Teachers identify positive changes in their curriculum because of the ELA and math tests, but seldom mention the social studies or science tests.

Recognizing the ELA test required more of their students than they had expected in the past, the Hemlock teachers used Title I money to develop a curricular strategy to prepare their students to do as well as possible on the ELA test. "We spent a lot of time analyzing tests. At the same time we were making a huge effort to integrate. We were choosing materials and making selections do double duty with science or social studies, working around the themes so there was a whole integrated package." Teachers used trade children's literature magazines (Ladybug in 3rd grade and Spider in 4th grade) as the texts and developed multiple choice and short answer questions (like those on the ELA) for each story. As the teachers were developing this trade magazine based curriculum, the district curriculum committee adopted a basal reading series (Scott-Foresman's Reading) that Hemlock teachers are required to use. Language arts instruction now consists of the regular classroom teachers teaching the basal reading series while the reading teachers travel from classroom to classroom armed with magazines and packets of ELA test-like questions providing fast paced, no nonsense instruction of material that resembles that on the test.

Teachers at Hemlock think adoption of the basal readers is an insult and a distraction. In conjunction with the text book adoption (in both language arts and math) are messages from the district office that all teachers should be on the same page at the same time. "A lot of teacher hours went into the curriculum that they produced and then when we got our new reading series it was imposed on us… it is a mandate that you're on a certain page in a certain week across the district [and this] is unrealistic depending on the kids' abilities. So the teachers here just feel like all we're doing is frustrating our children. We are not teaching them the way that we as professionals should be allowed to help all of our children learn." The teachers have more confidence in their own trade magazine based curriculum to prepare the students.

Even though teachers feel the tests, especially the ELA, has challenged them to teach more and better they stick very closely to the forms of knowledge on the test. And so there is a question about whether students are engaged in higher order thinking or merely the appearance of such. The ELA and math tests are scored as a 1 (serious academic deficiencies), 2 (needs extra help), 3 (meets the standards) and 4 (exceeds the standards) and these levels have become an organizing structure for teaching. In fact, some form of this scoring rubric is posted in every classroom in both of these schools. This excerpt from a 4th grade classroom observation illustrates how being pushed by the test to have higher expectations is simultaneously dulled by the test.

This class is reading Velveteen Rabbit. The teacher passes out a worksheet and tells the students she is going to give a response that is a 4, or a 3, or a 2. She directs them to put a 4 on the back of their worksheet and an arrow next to it.

T: If I'm going to write an answer that is going to score a 4, what does it need?
S: Answer complete.
S: Neat.
T: I agree, but I wouldn't worry about neatness first.
S: Topic sentence.
T: Yes, you need to have some sort of topic sentence. You need to remember to restate the question. What else?
S: Details.
T: YES, details, details, details. Where do you get the details?
S: In the book.
T: Ok, it's complete and it has a topic sentence. What else will people scoring be looking for?
She reminds them about the 'Daily Language Activity' hints she gave them in the morning--punctuation, spelling, capital letters, and correct grammar.
T: Leave a space and put a 3. What is going to be the difference between a 4 and a 3?
S: One of those things is not included.
T: Everything needs to be there. It will be mostly complete. Will it be perfect?
S: No.
This lesson continues until they have gone through 4, 3, 2, 1 and then the teacher shares some examples of responses to the question, "Why does the Velveteen Rabbit feel plain and ordinary?"
T writes: "He feels plain." The students give it a 1 because it is too short. The teacher comments that we don't know who 'he' is and comments on the need for more details.
T writes: "The Velveteen Rabbit feels plain and ordinary." The students give it a 3. The teacher disagrees and gives it a 2. She says it is missing details from the story--have you proven it from the story?
T writes: "The Velveteen Rabbit feels plain and ordinary because all of the toys make fun of him. For example, the expensive toys snub him and make him feel commonplace." The teacher tells them this response is a 4. One girl copies the answer but pauses to say she disagrees, that not all the toys make fun of him because one doesn't. The teacher agrees and changes the word all to most.


In spite of the pressures of the tests, teachers do exercise their professional judgment, almost with an air of defiance, and do what they think is right by the children even though it isn't consistent with the district's curricular mandates and may not be directly tied to the test. These acts of defiance are frequently tied to helping children feel successful, encouraging them, giving them an opportunity to have fun. One district uses Everyday Math and this teacher describes "absolutely breaking the rules of Everyday Math." "All of the [students] failed the multiplication test. They didn't know how to do the partial products algorithm. They felt stupid, they felt incompetent, and they failed it miserably because their brains couldn't process all those steps at one time. So I've gone back now, I've spent two class days teaching them, doing a task analysis first, which comes pretty naturally after you've taught the multiplication algorithms. I carefully added each step, if you skip one of those steps kids like this will not be able to make that mental jump, they can't do it, you have to go in a methodical way, they have to master each step, and then they feel good about themselves. They were begging me for harder problems. They get turned on by that. They love it. Now they're going to go home, they 're going to do this homework they made up and they are all going to know how to do partial products algorithms, which I guarantee will be on the test." But she adds weakly, "Not partial products, but multiplication problems."

These teachers struggle with the fear of falling behind in a system that frowns on those who do. Instead of comfortably working on what they perceive their students need to better understand the material, they push ahead until it is obvious that pushing ahead is causing their students to fall further behind. The curricular calendar and the testing schedule do not stop for make-up time and so the pressure is to catch up by covering material superficially.

Textbook Adoption
District textbook adoption occurred in both of these districts as a result of the state standards and tests. And textbooks are chosen to match the tests, not a difficult thing to do given that the textbook and test publishers are often one and the same. While these new textbook adoptions filled a void where there previously had been few resources, they also create chaos and conflict. In the case of Hemlock, the adoption of a basal reader diverted teachers from a curriculum they had created. At Willow Valley, some teachers did find the time to do "double entry teaching." "What I end up doing is double teaching because I'm teaching the series and I'm also teaching using the strategies and the plans that I had when I taught novels. I'm basically double dipping for them, but you have to in order for them to get all of the skills. And I can't teach skills in isolation. What good is teaching them the "short a" sound in ten words if they are not going to use it within a story and be able to read it. You look for stories like Little Bear that would have that "short a" sound within it, so now they can apply the skill they learned."

The other consequence of district wide textbook adoptions is a perceived added difficulty in integrating the curriculum. Because time is a scarce commodity, and teachers understand the priority of language arts, they would like a curriculum that provides language arts skills through math, science and social studies content. The Hemlock teachers had selected trade magazine stories with science and social studies content for precisely this reason. They now have textbooks that are a giant step backward in terms of integration. "As happy as I am to have a standardized curriculum across the district, this new reading program has no fourth grade social studies content and no fourth grade science content. None."

In many ways, these teachers are faced with a richness of resources but lack the time, guidance, and support for creating an integrated curricular whole out of the textbooks, trade materials, math series, science kits, newspapers, test preparation materials. One teacher summed up this frustration, "You have to wonder, do you do the math in the reading series or the reading in the math series?"

Pedagogy
The Hemlock Elementary plan to better prepare their students also dealt with how language arts would be taught and incorporated more ELA focused instruction by reading teachers in all 3rd and 4th grade classrooms. The ELA curriculum included blocking off specific times in each week at each grade, breaking students into four homogeneous groups and having four teachers working with each group in a different spot in the building. Groups were based on Terra Nova test scores, teacher judgments of reading ability and students' potential performance on the ELA tests--as solid 3s, 3s but potential 4s, 2s but potential 3s, and 1s and 2s. Teachers are confident that small homogeneous groups working closely with a teacher is the best way to meet the students' individual needs and capitalize on their strengths. "[Teachers] who had the higher groups could do a lot more of the advanced higher order thinking skills, whereas my kids would be doing a lot more of the decoding, word recognition and basic lower level comprehension skills."

This plan was thwarted by the superintendent who decreed that children could no longer be pulled out or grouped in preparation for the ELA test, and this decree left teachers feeling betrayed, undermined. The district is attempting to promote inclusion and to disrupt a tracking system that takes root in the early years of schooling. The district response was totally unexpected and seems illogical to the teachers--they are still permitted to group and use pullout strategies in math. The school's ELA scores had gone up dramatically with the teachers' plan and they have profound confidence in the power of grouping and pull out strategies. Expecting recognition, the blow is huge. "Now I wouldn't dare pull a student out to help them improve. We were told in no uncertain terms that we had to follow policy. The removal of the principal [because she permitted teachers to use this strategy] was a message to staff. First, we got the news of how well we had done. We were shocked and ecstatic, and then totally demoralized. We were stunned." Whether grouping and pull out programs are a good or bad idea the dynamics here suggest an undermining of teacher professionalism even though all parties are driven by an effort to help the kids do well on the indicator that matters most, the ELA test.

The Hemlock strategy of dedicating the reading teacher to do the "ELA curriculum" and the classroom teacher to teach the basal reader created additional challenges to teachers sense of being a good teacher. New teachers are especially frustrated: "We don't decide what is taught during that time. It's all reading teacher." Teachers' professionalism is compromised in two ways by this test score improvement strategy. First, classroom teachers are left standing around watching while reading teachers use direct instruction techniques (which some do not agree with) thus wasting valuable resources that could be used to help children. Second, this strategy leaves teachers in a bind if the reading teacher is absent or late. Sometimes they find themselves singing songs or having students read quietly, not wanting to start something new until they know what is going on. And if the reading teacher does not show up they do not have the ELA materials and have to substitute other content. On one such occasion the teacher remarked that he had been promising the kids they would do social studies and the absence of the reading teacher is what made that possible.

District textbook adoption, common curricula, standardization weigh heavily on teachers, challenging the fundamental notions of individualizing education, child centered teaching. Teachers acknowledge they need to measure students' reading, comprehension, and so on but feel they are caught on the horns of a dilemma of standardization and individualization. They are forced to ignore individual strengths and needs in an attempt to get all children ready to tackle the same test at the same time. "There are deep contradictions in the messages we are getting. Every kid is supposed to have and indeed we are supposed to encourage them to build on their individualized learning styles. The district actively supports individualized educational programs for children and then we are supposed to cram them through the test using the same approach for all children. Give me a break!"

Splitting the Curriculum
MacNeil (2000) describes teachers' use of "double entry lessons" that split the curriculum into the real content and the official (tested) content. Such a strategy would be seen as a luxury by the teachers at Hemlock and Willow Valley where time is a scarce commodity and teaching the official (tested) content takes all the time there is, and more. The strategy that has evolved in these schools is a splitting of the curriculum according to the relative importance of the test and the time of year the test is administered. Although there are 4 tests given at the elementary level in New York, everyone implicitly understands that the ELA is what matters. Reading and language arts are seen as the basis for all other subjects (and, in fact, a common criticism of all other tests is that they test reading as much as science or math or social studies) and so take precedence. It is the ELA scores that have been used for decisions about remediation, retention in grade, teacher quality. Table 3 indicates when each test is administered in 4th grade--ELA in early February, followed by math and then science in the spring. So, in primary grades and especially 4th grade the school curriculum is language arts intensive until February, followed by a couple of months of concentration on math, and much more limited emphasis on science. And, 5th grade teachers should not expect that students will be prepared during 4th grade for the social studies test which is given in November of the following year for a 4th grade cohort of students--there simply is no time.

"We structure our whole day in 4th grade right up through January, our whole day is structured towards the ELA, and then after that, after the ELA, there will be a shift in focus and then we will be structuring our entire day to focus on math and science." About 4 hours each day from September to January, the teachers prepare students specifically for the three days of ELA testing, for the moment in time when teaching and learning stop, when Hemlock stands still for the test. And the same rhythm repeats itself at Willow Valley Elementary. "So I find that I often put social studies and science on the back burner to get through the reading and the writing. And I find that I'm spending a good 2 1/2 to 3 hours a day on language arts and I'd rather not. I'd rather be able to teach every subject every day and that doesn't often happen in my class. I wish it did, but it doesn't. Right now we are under the gun, we are under pressure. You hear it from the administration, you hear it from colleagues, "Do you think they are ready?" and they don't do it to nag you, it's a concern." Another teacher anthropomorphizes science: "Poor science--it's really been pushed aside. How am I going to get [the students] ready for the science test in two weeks?"

Two days after the ELA test at Hemlock, the teachers are smiling; the pace is more relaxed, the discipline looser. In a 4th grade class, students are tackling a deductive reasoning problem. They are given clues and use them to deduce the correct answer. The lesson is interactive. There is talking among the students, and questioning and sharing between students and teacher. The students are engaged and interested. This is a welcome respite before serious preparation for the state math test begins.

These classrooms are unlike our traditional images of elementary school classrooms that focus on language arts, especially reading, in the morning while children are fresh and attentive, and then move to mathematics and finally science and social studies in the afternoon, with special subjects interspersed throughout the week. Because of the testing, the curriculum has been split across the school year, not across the school day or week. And, although language arts has always consumed most of the time in elementary classrooms, it is even more so in these schools.

The Test, Itself
During testing
When the tests arrive at schools the tension rises. Teachers must watch their students take these tests and adhere to New York State Education Department instructions about test administration. Sorting through how to administer the test, what questions can the teacher answer, how should the accommodations for special education students be implemented is a dance the teachers do throughout the testing. And, while teachers are mindful of following the rules they interpret the directions differently. Some teachers are adamant about not answering any questions and watch in silence as some students struggle, others simply sit, and many work diligently on the test. Others encourage students to ask questions hoping they will be ones teachers can answer: "Today when you are doing your questions, get your hand up and ask. Most of the time we could answer your question."

During the days of a test, teachers do quick checks on student scores, analyze the test questions, check up on students, talk with them about their perceptions, give them moral support, reprimands, and teach cram sessions based on the teacher's preview of the test. In one 4th grade class after the first session of the math test, the teacher asks two boys, "How was it?" The students respond, "easy" "fun" "boring." And then two boys ask the teacher if 50 ´ 50 = 250. She has them figure it out and they find the answer is 2500. She shows them another way to solve the equation. The teacher laughs, grateful that the boys thought the test was easy, oblivious to the fact "they have no clue." And she goes on, "Are they trying to use something I taught, then that's important to me, not so much that they got it right." In another classroom just before the second day of the math test, the teacher is more focused. She hands pencils to students that say "4th graders are #1" and tells them, "These are special pencils that only work on this portion of the test." But before they begin the test she gives the students a quick refresher on parallel lines, perpendicular lines, trapezoid, parallelogram, hexagon. And she makes a last minute plea that they remember what they have learned about probability and fractions. As the students take a bathroom break, this teacher looks over the test and her mood sinks noticeably--too many factions, decimals, but then a sigh of relief, a graphing problem. "We've done at least 5 of these in our graphing unit."

In another 4th grade class after the first day of ELA testing, students color, play board games, and play on the computers while the teachers gather the tests and make charts and record student scores. Teachers compare notes on how hard they felt the test was and how well their students did. Question by question, teachers analyze the test. One teacher does an item difficulty analysis. With this information they hope they will be better prepared next year.

In another class, after weeks of intense preparation for the ELA, a teacher watches silently as her students finish the second day of the test. Once the test booklets are collected she tells the students to sit down and listen because she is going to yell at them. And she does. "I know that was a long test. But I cannot believe--I was ready to scream when I saw you sitting there staring into space. Don't tell me you couldn't have found one run-on sentence, a spelling mistake, or checking bullets against your answers to make sure you covered everything. Two half-hour sessions is not too much to ask of a 4th grader. We've worked all year on this. You can put 10 minutes more effort. Please tomorrow, don't just sit there. Find something to fix. I saw someone spell first, f-r-s-t. If I go to read them and I find that I was wrong, I'll take it all back. But if I find that I am right, I'll be even madder than I am now. Tomorrow you have another writing session. Only tomorrow you will use all your time."

Teachers know these testing moments cannot judge the quality of their work, but they find themselves acting as if this were so. And sometimes acting in ways that may not make them proud of themselves as teachers. "I have to come to balance in my own head, about how to keep the kids just as short of being over the line with stress themselves. They are children, they have to play and have fun. They are nine."

Scoring the tests
Schools in New York are responsible for scoring their own state tests. A number of teachers indicated that scoring the tests is a critical experience for understanding the content of the test and what constitutes a 4, 3, 2, and 1 response. (This experience has been important in the past because all elementary and intermediate tests were secure, but beginning with 2002 schools may keep the tests and use them to prepare for the upcoming year's test.) It takes a small group of teachers a full day's work to score any given test, a hidden cost of the state's accountability system. New York State Education Department provides training videos and materials to be used in every scoring situation, and the scoring session begins with a review of the rubrics then scoring a sample of responses. Once they begin scoring teachers discuss disagreements or questions.

During the math test scoring the questions and concerns teachers have stem from an interest in being fair to the student. In this session the first issue that arises is around responses that give an answer but do not show any work. The rubric clearly indicates that students should receive NO points if they answer the question correctly but do not show their work when it is required. One teacher sees that the student did the work, but erased it. If you can still see it, does it count as shown work? The teachers agreed that it does. And the discussion among the teachers and the facilitators deviates from the rubric and resolves the meaning of "shown" work. And the resolution favors students on both counts.

T: Answer correct, but no work?
F: Give partial credit.
T: But what if the work is there, only erased?
F: Full credit, as long as you can see it.

The next issue to arise is in scoring a graphing problem--students can get a 3, 2, or 1. A teacher asks about the meaning of a 3 score, which the rubric says is a complete and correct answer, and a 2 which are given if some information is missing. The answer that sparks the discussion is a student's graph that is complete but for the exception of one unlabeled axis. There is a title, one axis is labeled and numbered, the names for each bar are given (e.g. horses) but the axis label (e.g. animals) is missing. "Obviously he knows how to make a graph, why should he be penalized? Does he have a complete understanding of what goes into a bar graph? Yes." Another teacher sympathizes, "We had that problem in the past and we had to give them a 2." But the teacher is not mollified and his fellow scorer says, "If you feel so strongly about it, do what you want to and give it a 3. If you go by what [the State's rubric] says, give it a 2." The facilitator intervenes, trying to calm the outraged teacher and eventually he gives the student a 2 and turns to the next student response to find exactly the same scenario. But this time the student gets a 2 because they had all the correct labels even though the bars in the graph were incorrect. "This child obviously did not understand the concept of making a graph but because she was able to follow the directions and knew enough to label, she gets the same points as the other who obviously understands how to make a graph but forgets one label. That's not right." Much like teachers redirect students to focus on preparing for taking the test, this teacher is redirected to get on with the scoring.

F: That's why you can't compare answer to answer. You have to go by the rubric.
T: OK, then you can't compare scores. You can compare scores between schools, yet we can't compare one answer to another? You're telling me that that child has the same comprehension as the other one? Right now I could fight with the state!
F: Stay on task, we have only an hour.
Another teacher interjects with a new question,
T2: If answers are completely wrong, but the process is correct?
F: It's a partial--1.

Scoring the tests leads teachers to question their judgment, the judgment of others and especially the possibility that they may have scored too harshly. These moments of uncertainty arise especially when scoring items that require students to show their work or write an explanation. Teachers agonize over finding something salvageable even in the most incomplete answers. Again, while scoring the math test teachers have to work through what it means for a student to show 'at least the beginning of a process.' The New York State Education Department help line provides them with no guidance and they conclude:

F: If we can defend our score and our interpretations then let's do it. We can give credit for the start of a correct process if it ultimately leads to the correct answer.
T: When in doubt err on the side of the student.

With this exchange, it became clear how to resolve many uncertainties--when in doubt err on the side of the student. And this is what the teachers did and the scenario repeated itself when teachers scored the science test although always with much discussion. Interestingly, this is an issue that is specifically addressed in an informational Q and A memo from New York State Education Department that says:

Q: On borderline calls, when deciding between adjacent score points, should the scorer always give the "benefit of the doubt" to the student and award the higher score?


A: No. Such a practice can result in scoring "drift." After scoring a number of responses, a scorer may gradually, even unconsciously, begin to accept less (or demand more) than is appropriate in awarding a particular score point. Scoring "drift" can create an unfair situation where a student response could receive a different score from the same scorer depending on when the response was scored. To prevent "drift" and maintain the consistency and accuracy of all scores, it is helpful to refer occasionally to the student responses used in the training materials as examples of the various score points. These responses are often called "anchor papers" because they help to fix the acceptable range within a score point and prevent the scorer from "drifting" higher or lower in their expectations for awarding a score point. Scorers should also be encouraged to consult their Table Facilitators and Scoring Leaders with responses that seem on the line between two score points.

Even at this last moment, when teachers can help students be as successful as they possibly can be on the state tests, they do so. They follow the rubric as well as they can because they believe a great deal of effort has gone into creating them, but they are willing to "give the student the benefit of the doubt."

Conclusions
The teachers of Hemlock and Willow Valley are forced into untenable situations fraught with dilemmas that are difficult to resolve and maintain teacher professionalism and help all children to succeed to the best of their ability. Repeatedly we saw teachers put in lose-lose situations. They act in ways that are inconsistent with what they believe to be best teaching practice in order to increase the likelihood that students will succeed as measured by the state tests, which at least for many teachers is a poor indicator of the achievement and success of children. Teachers must often do the wrong thing in order to do the right thing, sort of.

It is essentially a utilitarian ethic that underlies test driven curricular reform, one based on means--ends arguments (Mathison, 1991). The New York State Education Department adopts the view that the ends justify the means, and teachers too are drawn into this logic. The means are approaches to teaching and content that teachers might not chose--that do not represent good professional practice and, the state's desired ends (high test scores) are a poor but powerful proxy for the teachers' desired ends (the contextually appropriate success of every child).

The experiences of these two schools tell us a great deal about the impact of state mandated, high-stakes testing and this paper has specifically focused on how these tests challenge teachers' professionalism, especially with regard to how they treat children. Of course, this is an interesting argument only if these things matter. These teachers wonder if policy makers and politicians have any sense of children's individual differences and the centrality of that concept to teaching and learning. Current state standards based reform and assessment policies and practices would suggest that policy makers and practitioners either have no sense of this, or maybe they don't care, or maybe they are trying to redefine these ideas. Through the currently proffered solutions to problems of education, policymakers/politicians/corporate CEOs eschew what teachers know about human learning and cognition, and much of what teachers know is helpful and harmful to children's achievement.

Are policy makers and politicians unaware that outcome based bureaucratic accountability driven by state mandated tests will reduce teacher professionalism and autonomy? That some research (see O'Day, 2002) suggests lower performing schools will actually lose ground? And that these accountability strategies do relatively little to alter the fundamental injustices in schools and society, such as racism and classism? We don't know for sure, but we think probably not. There is a fundamental disagreement about what kind of work teachers and students should be doing in schools--work that requires real critical thinking that may contribute to the evolution of a just and equitable society or work that has the appearance of critical thinking and will contribute to oppression. (These are not simple political disagreements; they are disagreements connected with power and money. For a more detailed discussion of this argument see, Mathison, Vinson & Ross, 2001; Vinson, 1999.)

"By insisting that legitimate learning necessarily presents itself in and on the basis of test scores, such testing refuses to admit and accept differences (individual as well as cultural) in knowledges, values, experiences, learning styles, economic resources, and access to those dominant academic artifacts that ultimately contribute to both the appearance of achievement and the status of cultural hegemony upon which standards-based reforms depend. In effect, standardized testing encourages a singular and homogeneous public schooling-one antithetical to such contemporary ideals as diversity, multiculturalism, difference, and liberation-vis-à-vis an underlying and insidious mechanism or technology of oppression, one in which the interests of society's most powerful (the minority) are privileged at the expense of those of the less powerful (the majority)" (Vinson, Gibson & Ross, 2001).
The teachers at Hemlock and Willow Elementary Schools are not radicals. They do not seek complete autonomy, they do not challenge the need for accountability (even bureaucratic accountability), they find some virtue in state mandated tests, they are content within centralized systems that proscribe many aspects of their work. But, they also perceive themselves as professionals with both the responsibility and capability of doing their jobs well and in the best interests of their students. New York State's outcomes based bureaucratic accountability tests their resolve, makes them angry, and requires unnecessary compromises in their work. These teachers are more angry or frustrated than better, and with little indication that student achievement is advancing in genuine ways or that schools are being reformed.

References
Darling-Hammond, L. (1990). Teacher professionalism: Why and how? In A. Lieberman (Ed.), Schools as collaborative cultures: Creating the future now. (pp.25-50). Bristol, PA: Falmer Press.

Elmore, R. F. (1996). Getting to scale with successful educational practices. In S. Furhman & J. A. O'Day (Eds.), Rewards and reform: Creating educational incentives that work. (pp.294-329). San Francisco: Jossey-Bass.

Heubert, J. P. & Hauser, R. M. (1999). High stakes: Testing for tracking, promotion, and graduation. Washington, D.C.: National Academy Press.

Hillocks, Jr., G. (2002). The testing trap: How state writing assessments control learning. New York: Teachers College Press.

Hoffman, J. V., Assaf, L. C. & Paris, S. G. (2001). High stakes testing in reading: Today in Texas, tomorrow? Reading Teacher, 54(5), 482-92.

Kannapel, P. J., Coe, P., Aagard, L., Moore, B. D. & Reeves, C. A. (2000). Teacher responses to rewards and sanctions: Effects of and reactions to Kentucky's high-stakes accountability program. In B. Whitford & K. Jones (Eds.), Accountability, assessment, and teacher commitment: Lessons from Kentucky's reform efforts. Albany, NY: SUNY Press.

Katz, M. B. (1971). Class, bureaucracy, and schools: The illusion of educational change in America. New York: Praeger.

Little, J. W. (1990). The persistence of privacy: Autonomy and initiative in teachers' professional relations. Teachers College Record, 91, 509-536.

Madaus, G. (1998). The distortion of teaching and testing: High-stakes testing and instruction, Peabody Journal of Education, 65, 29-46.

Mathison, S. (1991). Implementing curricular change through state-mandated testing: Ethical issues. Journal of Curriculum and Supervision, 6, 201-212.

Mathison, S., Ross, E. W. & Vinson, K. D. (2001). Defining the social studies curriculum: The influence of and resistance to curriculum standards and testing in social studies. In E. W. Ross (Ed.), The social studies curriculum: Purposes, problems, and possibilities. Albany, NY: SUNY Press.

McLaughlin, M. W. & Talbert, J. E. (2001). Professional communities and the work of high school teaching. Chicago:University of Chicago Press.

McNeil, L. M. (2000). Contradictions of school reform: Educational costs of standardized testing. New York: Routledge.

Noble, A. J., & Smith, M. L. (1994). Old and new beliefs about measurement-driven reform: "Build it and they will come." Educational Policy, 8(2), 111-136.

O'Day, J. A. (2002). Complexity, accountability, and school improvement. Harvard Educational Review, 72(3).

Ross, E. W. (2001). Diverting democracy: The curriculum standards movement and social studies education. In D. W. Hursh & E. W. Ross (Eds.), Democratic social education: Social studies for social change. New York: Falmer Press.

Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform. Harvard Educational Review, 15(2), 4-14.

Strike, K. A. (1993). Professionalism, democracy, and discursive communities: Normative reflections on restructuring. American Educational Research Journal, 30(2), 255-275.

Thornton, S. (1991). Teacher as curricular-instructional gatekeeper in social studies. In R. Shavelson (Ed.), Handbook of research on social studies teaching and learning. New York: Macmillan.

Smith, M. L. (1991). Meanings of test preparation. American Educational Research Journal, 28 (3), 521-42.

Vinson, K. D. (1999). National curriculum standards and social studies education: Dewey, Freire, Foucault, and the construction of a radical critique. Theory and Research in Social Education, 27(3), 296-328.

Vinson, K. D., Gibson, R., & Ross, E. W. (2001). High-stakes testing and standardization: The threat to authenticity. Monographs of the John Dewey Project on Progressive Education, 3(2).

About the Authors
Sandra Mathison
College of Education and Human Development
University of Louisville
Louisville KY 40292
smathison@louisville.edu

Sandra Mathison is Professor of Education at the University of Louisville. She teaches evaluation and qualitative research methods and her research focuses on democratic and fair evaluation practices in schools.

Melissa Freeman
School of Education
University at Albany, SUNY
1400 Washington Avenue
Albany NY 12222
freeman@sover.net

Melissa Freeman is project manager of an interpretive study of the impact of high stakes testing in upstate New York. Her interests include theoretical and methodological issues in interpretive research and evaluation, democratic practices in schools, and critical social theories.

Acknowledgment
This publication is based on research supported by the National Science Foundation (Grant # ESI-9911868). The findings and opinions expressed herein do not necessarily reflect the position or priorities of the sponsoring agency.

Notes
1. While there is ample debate about whether teaching is a profession or not, and whether it ought to be considered a profession (see Strike, 1993) there are strong arguments for labeling teaching a profession (Darling-Hammond, 1990; Little, 1990; McLaughlin & Talbert, 2001). We adopt the view that teaching is a profession because it requires specialized knowledge and skills, especially as manifest in Shulman's notion of pedagogical content knowledge (1987) and contemporary theories of child development. In addition, teachers just as all other professionals are concerned simultaneously with both means and ends.

2. We wish to thank Kate Abbott and Kristen Campbell-Wilcox, our research collaborators on this project.

— Sandra Mathison and Melissa Freeman
Education Policy Analysis Archives

http://epaa.asu.edu/epaa/v11n34


INDEX OF RESEARCH THAT COUNTS


FAIR USE NOTICE
This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.