Formula to Grade Teachersâ€™ Skill Gains in Use, and Critics
The reader comments are better than the article.
Reader Comment: Karen Caruso, a national board-certified teacher who was deemed "very effective" by her principal and other professionals, has been described as "ineffective" by two major newspapers. If this doesn't constitute "libel" what does?
Reader Comment: How did the standardized tests we are giving move into the position of being so trusted, being almost sacred, so that we now base life decisions like teacher compensations and ratings, student promotions and placements upon them. The tests that I see (released versions can be seen on the Internet) frequently contain questions about obscure facts which I would not consider important for a child's future or his or her education. Why do we now assume without question that these tests (many produced by for-profit companies) are valid?
I am reminded of an NYT article in which the reporter asked a test maker, "Why so much emphasis on factoring of polynomials?" The test maker replied, "Because I can test them on their ability to factor polynomials."
Reader Comment: William L. Sanders, a senior research manager for a North Carolina company, SAS, that does value-added estimates
Wow, someone who directly profits from this thinks its a good idea. RTTT= Rush to the trough.
Reader Comment: Though you link to it you give short shrift to the devastating critique of VAM in the report "Problems with the Use of Student Test Scores to Evaluate Teachers". You attribute it to "several education researchers" not mentioning that the authors are the most distinguished people in the field.
The Duncan-Bill Gates-Obama "reforms" are nothing less than an attempt to wreck the teaching profession, to get rid of fair dismissal procedures and to fill the teaching ranks with a few high paid stars who know how to teach-to-the-test and masses of low paid terrified staff members who flee the schools after a few years. The elite will have their kids in private schools using tried and true methods and the rest will languish in low quality public and charter schools taking test after test.
That our schools have been re-segregated and that families are overwhelmed by unemployment and poverty doesn't seem to register with the "reformers". Instead of witch hunts of teachers why not try a holistic approach to poorly performing schools, with full employment for families, integrated student bodies and staff, free medical care and top notch facilities, textbooks and electronic media.
Reader Comment: I am a scientist and I work with statistical analyses of data on a regular basis. Reading this article sends up all sorts of red flags for me about the validity of the conclusions drawn from the numbers.
What they've got sounds like a fairly simple statistical tool, one that is subject to all sorts of uncertainties. You need data over multiple years; I'd suggest looking at a given set of statistics for a teacher over a period of five years at least, to smooth out year to year variations, both in the students, and intrinsic variations in test scores - give a similar test to the same student several times in a row, and they will get a variety of marks. Then you need to throw out data points for students who were in the class for only a partial year. As mentioned in the article, you also need to adjust for high performing students, because once a student reaches a certain grade, they are going to stop improving significantly, so you should probably throw out the scores of students who score above about 90%. A more subtle effect will actually make it likely for teachers with high performing students to do badly rather than neutrally - for example, if a student scores 100% one year, the next year they will either stay the same or get worse, giving a neutral or negative result for a student who is doing very well.
Then, of course, you run into the trap of small number statistics. Even the most robust and widely used statistical techniques, like the use of the normal distribution, break down when you have small samples (seen in the classic mistake of trying to bell-curve marks in small classes). Given the typical size of school classes this is definitely in the regime where small number statistics are an issue.
Then, as in all statistical analysis, you have to carefully analyze the validity of the assumptions that go into the model, and subject the results to a rigorous analysis of the significance of the result - you can have the numbers go up but if they aren't statistically significant, you can't draw any conclusions from them.
Reader Comment: I have a question that embodies my comment: if value-added evaluations can fairly measure a teacher's impact on scores that students earn on standardized examinations, tests that are themselves poor measures of student achievement and that perchance help undermine a thoughtful and creative education, can a high-value added rating for a teacher in fact represent a net subtraction from what we hope our kids will get in schools?
Reader Comment: The banks now have grabbed almost all the wealth of this nation.
But even gorged on the people's money, the banks are ironically walking dead, like Paul Krugman used to say, they're zombies. More like vampires to my mind. Because there are a few pockets of wealth and working class influence they simply must suck up if they are to survive awhile longer. There's the nation's public school system, there's public and private pension funds, there's the Social Security system, Medicare and Medicaid, there's all the public services delivered by state and local governments.
Then there's the largest unionized work force left in this country, the teachers. What, you thought this sudden flood of disparagement and denigration of teachers, including this value added assessment nonsense, was "for the kids"? What started with Reagan and PATCO is supposed to end with Obama and NEA-AFT. Can't pay teachers those exorbitant $50,000 a year salaries and a pension in their old age. We've got multi-million dollar bonuses for Lloyd Blankfein and Jamie Dimon to think of after all.
Reader Comment: Inaccuracy of value-added testing is only one problem. The other problem is - how is this supposed to improve teaching? If you fire poor teachers, who is going to replace them? Are there lots of better-qualified teachers - currently not employed as teachers - who can be hired to replace the fired teachers? Or will there be retraining? If so, we haven't heard anything about it. No money for that, for enough textbooks, smaller class sizes, or for paying teachers a wage that would entice better-qualified people to enter the teaching field.
Here in Tennessee we've already seen the downside of value-added testing. Actually, I heard that it was originally designed as a way to give schools with under-performing students a way to compete with the better schools. Schools with a high number of disadvantaged students would have low standardized-testing scores, but with value-added they compare favorably with the high-performing (wealthy suburban) schools because value-added testing measures progress, not overall scores. It is easy to show progress when you are at a low level, but hard when you are in the 90th percentile to start with. Our best elementary school was reported in the newspaper as getting a D in various subjects because they didn't show enough "improvement". Value-added testing is a very questionable measure of successful teaching. And our government is spending $4.35 billion on the 2nd round of "Race to the Top" awards that reward exactly this kind of faddish testing.
We've been here before, in the 1980's when the testing craze was applied to students instead of teachers. I remember someone commenting at the time, "You can't test students into learning." You can't test teachers into good teaching either.
Reader Comment: What amazes me is that these are the people -- Arnie Duncan and Pres. Obabma-- that the NEA so vigorously campaigned for in 2008. I am a teacher and former NEA member, until I realized how we teachers were all being manipulated by the union and Obama. We were promised that Obama would overhaul NCLB, not turbo-charge it to bash teachers. It's time for the NEA to get out of bed with the Democrats, and start representing the best interests of students and educators.
By Sam Dillon
How good is one teacher compared with another?
A growing number of school districts have adopted a system called value-added modeling to answer that question, provoking battles from Washington to Los Angeles -- with some saying it is an effective method for increasing teacher accountability, and others arguing that it can give an inaccurate picture of teachers' work.
The system calculates the value teachers add to their studentsĂ˘€™ achievement, based on changes in test scores from year to year and how the students perform compared with others in their grade.
People who analyze the data, making a few statistical assumptions, can produce a list ranking teachers from best to worst.
Use of value-added modeling is exploding nationwide. Hundreds of school systems, including those in Chicago, New York and Washington, are already using it to measure the performance of schools or teachers. Many more are expected to join them, partly because the Obama administration has prodded states and districts to develop more effective teacher-evaluation systems than traditional classroom observation by administrators.
Though the value-added method is often used to help educators improve their classroom teaching, it has also been a factor in deciding who receives bonuses, how much they are and even who gets fired.
Michelle A. Rhee, the schools chancellor in Washington, fired about 25 teachers this summer after they rated poorly in evaluations based in part on a value-added analysis of scores.
And 6,000 elementary school teachers in Los Angeles have found themselves under scrutiny this summer after The Los Angeles Times published a series of articles about their performance, including a searchable database on its Web site that rates them from least effective to most effective. The teachers' union has protested, urging a boycott of the paper.
Education Secretary Arne Duncan weighed in to support the newspaper's work, calling it an exercise in healthy transparency. In a speech last week, though, he qualified that support, noting that he had never released to news media similar information on teachers when he was the Chicago schools superintendent.
"There are real issues and competing priorities and values that we must work through together -- balancing transparency, privacy, fairness and respect for teachers," Mr. Duncan said. On The Los Angeles TimesĂ˘€™s publication of the teacher data, he added, "I don't advocate that approach for other districts."
A report released this month by several education researchers warned that the value-added methodology can be unreliable.
"If these teachers were measured in a different year, or a different model were used, the rankings might bounce around quite a bit," said Edward Haertel, a Stanford professor who was a co-author of the report. "People are going to treat these scores as if they were reflections on the effectiveness of the teachers without any appreciation of how unstable they are."
Other experts disagree.
William L. Sanders, a senior research manager for a North Carolina company, SAS, that does value-added estimates for districts in North Carolina, Tennessee and other states, said that "if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers."
Dr. Sanders helped develop value-added methods to evaluate teachers in Tennessee in the 1990s. Their use spread after the 2002 No Child Left Behind law required states to test in third to eighth grades every year, giving school districts mountains of test data that are the raw material for value-added analysis.
In value-added modeling, researchers use studentsĂ˘€™ scores on state tests administered at the end of third grade, for instance, to predict how they are likely to score on state tests at the end of fourth grade.
A student whose third-grade scores were higher than 60 percent of peers statewide is predicted to score higher than 60 percent of fourth graders a year later.
If, when actually taking the state tests at the end of fourth grade, the student scores higher than 70 percent of fourth graders, the leap in achievement represents the value the fourth-grade teacher added.
Even critics acknowledge that the method can be more accurate for rating schools than the system now required by federal law, which compares test scores of succeeding classes, for instance this yearĂ˘€™s fifth graders with last yearĂ˘€™s fifth graders.
But when the method is used to evaluate individual teachers, many factors can lead to inaccuracies. Different people crunching the numbers can get different results, said Douglas N. Harris, an education professor at the University of Wisconsin, Madison. For example, two analysts might rank teachers in a district differently if one analyst took into account certain student characteristics, like which students were eligible for free lunch, and the other did not.
Millions of students change classes or schools each year, so teachers can be evaluated on the performance of students they have taught only briefly, after students' records were linked to them in the fall.
In many schools, students receive instruction from multiple teachers, or from after-school tutors, making it difficult to attribute learning gains to a specific instructor. Another problem is known as the ceiling effect. Advanced students can score so highly one year that standardized state tests are not sensitive enough to measure their learning gains a year later.
In Houston, a district that uses value-added methods to allocate teacher bonuses, Darilyn Krieger said she had seen the ceiling effect as a physics teacher at Carnegie Vanguard High School.
"My kids come in at a very high level of competence," Ms. Krieger said.
After she teaches them for a year, most score highly on a state science test but show little gains, so her bonus is often small compared with those of other teachers, she said.
The Houston Chronicle reports teacher bonuses each year in a database, and readers view the size of the bonus as an indicator of teacher effectiveness, Ms. Krieger said.
"I have students in class ask me why I didn't earn a higher bonus," Ms. Krieger said. "I say: 'Because the system decided I wasn't doing a good enough job. But the system is flawed.'"
This year, the federal Department of Education's own research arm warned in a study that value-added estimates "are subject to a considerable degree of random error."
And last October, the Board on Testing and Assessments of the National Academies, a panel of 13 researchers led by Dr. Haertel, wrote to Mr. Duncan warning of "significant concerns" that the Race to the Top grant competition was placing "too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals."
"Value-added methodologies should be used only after careful consideration of their appropriateness for the data that are available, and if used, should be subjected to rigorous evaluation," the panel wrote. "At present, the best use of VAM techniques is in closely studied pilot projects."
Despite those warnings, the Department of Education made states with laws prohibiting linkages between student data and teachers ineligible to compete in Race to the Top, and it designed its scoring system to reward states that use value-added calculations in teacher evaluations.
"I'm uncomfortable with how fast a number of states are moving to develop teacher-evaluation systems that will make important decisions about teachers based on value-added results," said Robert L. Linn, a testing expert who is an emeritus professor at the University of Colorado, Boulder.
"They haven't taken caution into account as much as they need to," Professor Linn said.
Sam Dillon, with good reader comments
New York Times
INDEX OF NCLB OUTRAGES