Needs Improvement: Where Teacher Report Cards Fall Short
Reader Comment: Standardized tests measure consistency, which was a valued attribute in the days of the assembly line but is a real handicap in an information age. Measuring teachers on the improvement in their ability to churn out cookie cutter students seems to be making the problem worse. Yes, we need kids who can read and write and add and subtract, but we also need kids who can think independently, experiment, take risks, and learn from experience. Standardized tests cannot possibly measure whether a child's experience with a particular teacher improved those abilities. The whole system seems designed to produce more people who are only employable at fast food restaurants, which is about the only form of employment we haven't shipped offshore.
Reader Comment: Examining systems for evaluating teachers and the surrounding politics is interesting, even provoking, but we are barking up the wrong tree -- maybe we are in the wrong forest altogether. Learning is a complex undertaking (teaching is easier to dissect). Learning is not a direct result of teaching -- learning occurs within the learner when conditions are present for learning. These conditions are created when a rich array of external and internal variables comes together -- most of which can be manipulated by teachers, other professionals, community and parents, acting as a team. We need to be looking at (1) active student voice in creating the learning environment [completely absent and discounted in the current system], (2) teacher collaboration across subjects [mostly absent or severely limited at present], and (3) active involvement of parents and community in reinforcing what goes on in schools [a few instructive models exist out there]. Tightening the screws on individual teachers, which may weed out a few bad eggs, will produce minimal change overall. Pervasive employment of technology is part of the answer in getting all parties connected in this creative endeavor. We need to unleash the potential of students and teachers, not constrain them through fear. [If education were not compulsory, would most students even show up?] I don't see politicians, teachers, parents, or administrators having a clue how to really impact the system. Students alone have the perspective to recogize the irrelevance of the current system -- so why don't they speak up? Too busy texting, I guess.
By Carl Bialik
Local school districts have started to grade teachers based on student test scores, but the early results suggest the effort deserves an incomplete.
The new type of teacher evaluations make use of the standardized tests that have become an annual rite for American public-school students. The tests mainly have been used to measure the progress of students and schools, but with some statistical finesse they can be transformed into a lens for identifying which teachers are producing the best test results.
At least, that's the hope among some education experts. But the performance numbers that have emerged from these studies rely on a flawed statistical approach.
One perplexing finding: A large proportion of teachers who rate highly one year fall to the bottom of the charts the next year. For example, in a group of elementary-school math teachers who ranked in the top 20% in five Florida counties early last decade, more than three in five didn't stay in the top quintile the following year, according to a study published last year in the journal Education Finance and Policy.
"Because education tends to have this moral-crusade element--we tend to rush to use things before they are refined or really fully baked," says Frederick Hess, director of education policy studies at the American Enterprise Institute, a conservative think tank.
But even skeptics of test-score-based evaluations acknowledge that a uniform, data-based approach for ranking teachers could be superior to subjective methods--such as principals' observations--that still predominate in schools. "Damn near anything is going to be an improvement on the status quo," says Daniel Willingham, a cognitive psychologist at the University of Virginia.
The U.S. Department of Education has pushed states to loosen restrictions on evaluating teachers through student test scores. To be eligible for a piece of the $4.35 billion in competitive grants in the Race to the Top federal program, states can't have laws barring a link between student scores and teacher evaluation. And states are scored in part based on whether they evaluate teachers using test results.
Meanwhile, the District of Columbia began evaluating teachers based on test scores last school year, and fired more than 150 teachers after the school year because of poor performance. Test scores count for 50% of teacher ratings in subjects that are tested.
These measures don't simply ding teachers for their students' low scores, because not all incoming classes start the year equally. Instead, teachers are evaluated based on how much students' scores improve by the end of their year.
But good teachers aren't easy to identify this way. For one thing,students aren't always assigned to teachers randomly. A teacher who gets more than his share of students who learn slowly because of his knack for helping them might be penalized at the end of the year.
There are other problems with the data. Elementary-school teachers might have just 15 or 20 students in their classes, which is a small sample on which to evaluate a teacher's achievements. "If you're using just one year of information, it's going to be pretty unstable," says Tim R. Sass, an economist at Florida State University.
Research suggests that using multiple years of data helps matters, though only so much. A report from the Department of Education released last month shows that even with three years of data, one in four teachers is likely to be misclassified because unrelated variables creep in.
Even with these questions, relying on student test scores to create a quantitative assessment of teachers might be better than the current standard practice. At many schools, principals grade teachers based on a few minutes of classroom observation (and then give most of them high scores). Rating teachers in this way doesn't do all that well in predicting how much their students' test scores will change, according to several studies.
Advocates of the student test-score measure say it can be improved with a carefully constructed model that takes into account such factors as students' family income and schools' support for teachers. Dan Goldhaber, director of the Center on Reinventing Public Education at the University of Washington, points out that some instability in teacher rankings from year to year is to be expected, even desired, as some teachers make more progress than others in any given year.
The Los Angeles Times has stirred the debate by commissioning its own analysis of Los Angeles elementary-school teachers. The newspaper published an article about the findings last week and plans to release a database of thousands of teachers' quintile rankings, after giving teachers time to request their rankings and respond for publication.
The Times coverage is helping to raise awareness about the lack of standards for teachers, says Steve Cantrell, senior program officer for the Bill and Melinda Gates Foundation. The foundation is funding a study in seven school districts of teacher evaluation, combining test score-based analysis with other factors, such as teacher tests of subject knowledge and independent ratings of in-class video recordings.
Dr. Cantrell says the research will help determine whether it is possible to create a "persistent and stable measure" of teacher performance that predicts student learning.
Write to Carl Bialik at email@example.com
The Numbers Guy Blog
Putting Teachers to the Test
My print column this week examines the debate over so-called value-added measures for teachers, which evaluate their performance based on how much they improve their students' standardized test scores.
Douglas Harris, associate professor of educational policy and public affairs at the University of Wisconsin, is a cautious advocate of these measures, but points out that concerns about teaching to the test could be heightened if teachers, as well as principals and school districts, are evaluated based on test results. "Teacher can generate high value-added measures by drilling the test over and over," Harris said.
If these measures catch on, they could also encourage more teachers to cheat. "If we start to place a lot of weight on these things, [you] have to expect some degree of malfeasance," said Frederick Hess, director of education policy studies at the American Enterprise Institute. "You want the benefits to outweigh the costs, and you want to police it in a smart way."
Will the benefits outweigh the costs? "That's the big unknown," Michael Hansen, a researcher in the Urban Institute's Education Policy Center in Washington, D.C., wrote in an email. "What is known is that the way most districts currently hire, evaluate, and pay teachers is misaligned with the public goal of increasing overall student learning."
Hess added, "There's a growing consensus about the smart way to approach it. But there's still a lot of sensible disagreement about the right recipe to use."
There are also other concerns about the measures. Low-income students have been shown to lose more ground than their wealthier counterparts during the summer, so a teacher in a less wealthy part of town may see her scores deflated because of the time spent to get students back up to speed.
In a paper last year, University of California, Berkeley economist Jesse Rothstein showed that there is a strong correlation between teachers who score well on these value-added measures one year and how much their students gained the prior year. That implies that teachers who do well in these systems are benefiting from favorable classroom assignments.
Another concern is that rigorous evaluation and the threat of loss of tenure or job security might make teaching a less desirable profession, thereby forcing school districts to spend more to attract the same pool of candidates, as Hansen cautioned in a recent Urban Institute paper.
Cory Koedel, an economist at the University of Missouri, wrote in an email that the extent of this risk depends on whether potential teachers trust the evaluation system -- and would be able to know if they will be rated highly by it. "If teachers think the system is completely unfair, then it would be reasonable to argue that it will be harder to attract candidates in general into teaching," Koedel wrote.
"But if teachers think the system is at least somewhat rewarding performance, it may make it easier to attract good candidates because they will benefit from the performance-based reviews, and the job will become less appealing to bad candidates because they will be penalized by the reviews."
Teachers have reason to fear they may be misidentified -- roughly one in four would be even after three years of data have been collected, according to a report last month commissioned by the Department of Education. Hanley Chiang, a Mathematica Policy Research researcher who co-authored the report, said, "How tolerable these error rates are, is really a policy judgment."
"Even with multiple years of data, there are a whole lot of false positives and negatives," said Barnett Berry, president and chief executive of the Center for Teaching Quality. "If we start using these value-added metrics inappropriately, what could have been a very powerful tool could become the baby that gets thrown out with the bathwater."
Douglas Staiger, co-author of a recent paper that shows students of teachers with high value-added scores see their gains fade over the next two years -- though not disappear completely -- pointed out that there are many "headaches" when it comes to sorting the data: "What do we do with kid who comes in midyear, leaves midyear, or comes in from other districts? In a big-picture point of view, these details don't matter. But for individual teachers, they do. So from a management perspective, details become more important."
Also, teachers may have trouble understanding all the details, because of the comparatively complex nature of these scores. "It would be nice if we could come up with a metric that is easier for teachers to understand," said Richard Buddin, a RAND economist. "You're stuck with something that's a little bit complicated."
Buddin was commissioned by the Los Angeles Times to study Los Angeles public-school data and rank teachers by their performance. The newspaper ran an article about the results and plans to publish a database of teacher scores.
Van Roekel called the Times's decision to publish names "reckless."
But Times assistant managing editor David Lauter said the paper "wanted to err on the side of caution." So, for instance, because of the concerns about statistical noise, the Times is only publishing scores for teachers who have had at least 60 students -- generally, teachers with at least three years of experience. While Lauter conceded the test-based measure is imperfect, he said, "What it tells you is there are a lot of extremely effective teachers in the public-school system, which the district has never recognized and never acknowledged, let alone tried to figure out, what are they doing right, and to emulate them."
"It can start the conversation -- 'Why is it the other teacher is doing better than me?'" Buddin said.
About 1,500 teachers had asked for their rankings as of Thursday morning, and about 150 of them had submitted comments, which the paper plans to publish, Lauter said.
John Dacey, deputy superintendent for the Los Angeles Unified School District, said the district plans to broach value-added measures when the teachers' contract is up next June. "We will work with the teachers union to construct a new teacher evaluation system," Dacey said. "And we will propose that part of that will be a value-added score." Of the L.A. Times coverage, Dacey added, "I think it has changed the debate. I have very serious concerns about the release of individual scores."
Jason Kamras, director of teacher human capital for District of Columbia Public Schools, which uses value-added to evaluate teachers, called the L.A. Times's decision to publish teacher rankings information "intriguing," saying, "More information is typically helpful for parents, and kids, and everyone else involved." Kamras added, "The most important thing a teacher does is advance the academic growth of students. Value-added is a fair way to assess teachers' impact on that growth."
Many researchers think that a composite score of multiple measures, including value-added, will prove to be superior at predicting student performance. "We think you should use multiple measures and that data should play an important part in those evaluations," said Justin Hamilton, a spokesman for the Department of Education.
The Bill & Melinda Gates Foundation is studying that premise in seven school districts, using several different kinds of scores for teachers and seeing which ones best predict the performance of their students in randomly assigned classes in subsequent years.
Though the research will examine independent teacher evaluations and other measures, value-added has been at the center of education debate. "People think about value-added as being more objective," said Steve Cantrell, senior program officer for the Gates Foundation. "That's what's captured the imagination."
Citing the problems with value-added, Dennis Van Roekel, the president of the National Education Association teachers' union, said, "It'd be like trying to decide who is a good baseball player, and you only use their batting average." He conjures up a pair of players, one of whom hits .250, and the other, .350. "You'd say, the one hitting .350 is better. But maybe the one with a .250 average is your star pitcher."
But unlike in baseball, which has a wealth of commonly accepted statistics that are better than batting average, teaching doesn't, at least not until such efforts as the Gates Foundation's yield results.
"These estimates ought to be used," said Dan Goldhaber, director of the Center for Education Data & Research at the University of Washington. "The relevant alternative looks like it is no rigorous evaluation of teachers at all."
Wall Street Journal
INDEX OF NCLB OUTRAGES