There’s less than meets the eye to rising Metro test scores
by Bruce Barry
“Test scores leap in Metro schools” was the breathless Tennessean headline back in June when 2005 TCAP results came out. “Most dramatic single-year increase since 1990,” thundered a Metro schools press release. You’d think that kind of news would put some momentum behind public education, but instead autumn brought a stiff headwind: first, city voters overwhelmingly rejected a sales tax hike linked to school funding, and then the school board voted 5-4 not to extend system director Pedro Garcia’s contract beyond its 2007 expiration. A recent Tennessean editorial sided with Garcia against the school board majority that wants to send him packing: “School achievement has to be the bottom line. The board members can’t quarrel with results.”
It turns out they can. When you take off the rose-colored glasses, turn off the local press boosterism, tune out the Bransford Avenue spin, and look with care and skepticism at the numbers, a different picture emerges. It’s not a contrary picture—higher TCAP scores aren’t imaginary—but there’s a good deal less than meets the eye to the “dramatic” gains that many credit Garcia for achieving.
Let’s toss three wrenches into the numbers that feed the now-popular story line of remarkable gains in Metro performance. A cautionary note up front: the test score numbers are vast and can be sliced and diced many different eye-glazing ways.
Wrench #1: if the gains are so dramatic, why isn’t Metro moving closer to state-level average performance? When the 2005 scores for Metro came out in June, the state averages hadn’t yet been released, so press accounts compared them with 2004 state numbers—the most recent available. This created a misleading picture of Metro performance relative to the rest of Tennessee, which served the Garcia spin machine nicely, but the public poorly. Upon the release of Metro results, The Tennessean reported that the jump in elementary and middle school math and reading scores brought them to a level “just shy” of state averages. This was accurate—until we learned a couple of months later (without the help of an MNPS press release) that there were parallel gains statewide.
The numbers involved here are percentages of students who achieve basic proficiency in a given subject and grade. In elementary-and middle-school math and reading, as well as high-school math and English gateway exams, Metro proficiencies now trail state averages more than they did two years ago. So the “dramatic” results for Metro really amount to a continuation of the trend whereby Metro scores move in tandem with the state numbers. It’s also worth mentioning that Metro’s dismal high-school graduation rate of 60.4 percent is lower than it was two years ago, while the state’s has crept upward.
We see similar patterns in the latest “value added” results—numbers released just last week that use year-to-year changes in individual test scores to estimate gains in performance at the school, system and state levels. In isolation, the numbers for Metro looked good, and press accounts were exultant, but again the statewide trends put things in perspective. In lower grades, value-added scores are computed in four subjects (reading, math, science and social studies) for each of five grades (four through eight)—a total of 20 score categories. Did Metro improve more than the rest of the state in value-added gains from 2004 to 2005? Yes, but in just four of the 20 categories. Improvement trailed state progress in 16 out of 20.
Some see rising scores in Nashville that continue to lag state averages as evidence that schools all over Tennessee are going gangbusters. “The accomplishments of students outside of Metro should take nothing away from the accomplishments...made by Metro students,” Metro’s director of assessment, Paul Changas, put it recently in an email to school board members and administrators. But the upbeat test score news comes on the heels of changes in the test mandated by the federal No Child Left Behind (NCLB) law, which raises questions about whether sudden, widespread gains in proficiency might be more about how we measure than what kids learn.
Dan Long, the state’s executive director of assessment and e-learning, says the new tests, which are linked to specific learning criteria for each grade and subject, make it easier for teachers to teach to a standard, and provide a larger “target” for students to hit. “Instruction is patterned to deal with the test itself,” says Long, which a cynic might read to mean we’ve gotten better at teaching to the test. Metro’s Changas sees the revamped tests as potentially more consistent for measuring year-over-year achievement, but cautions that we need a few years of stable results before making reliable comparisons.
Wrench #2: Metro actually loses ground compared to the state as students move through the system. Although rising Metro test scores seem to be keeping pace with (but not moving up faster than) scores at the state level, there is evidence that year-by-year, a given “cohort” of Metro students (meaning: a set of students who move together from grade to grade) loses ground to the state. Look, for example, at students who were Metro third-graders back in 2000, and reached eighth grade in 2005. A third-grader in 2000 who scored in the 50th percentile for math among Metro students scored in the 44th percentile for all of Tennessee (a lower percentile because average scores statewide are higher than Metro). Now fast-forward by five years to the 2005 results and we find that an eighth-grader scoring in math at the 50th percentile in Metro comes in at just the 38th percentile statewide.
This pattern—declining state equivalents between grades three and eight—shows up for just about all grades, subjects and performance levels in Metro. Metro’s Changas thinks it may reflect the departure of those who go private in the early years of middle school. He points to good value-added numbers, implying that kids who do stay are individually progressing, even as their (now shrinking) grade cohort as a whole does worse compared to kids statewide.
Changas has a point—an exodus of good performers could account for the decline—but a problem for many parents and teachers is that Metro administrators don’t seem particularly interested in attrition. If Metro cohorts do progressively worse because parents don’t have the confidence to remain in the system, then how “good” is the news that the kids who stay make individual progress but lag collectively? How does attrition that worsens city results when benchmarked against one of the nation’s academically underperforming states make for noteworthy public school performance? Metro does almost nothing to examine who leaves the system and why, so where is the accountability for attrition in the Garcia view of school performance?
Wrench #3: the key national measure of state performance in education shows Tennessee scoring poorly and not improving, so what’s up with big jumps in state and local test scores? The national measure is the National Assessment of Educational Progress, or NAEP, which tests kids in grades four and eight in reading and math every two years in all 50 states. Not all kids take it, but it relies on a method called stratified random sampling to create representative samples that allow comparisons of whole states with one another. New NAEP results came out in October, and they weren’t pretty. The Tennessean opted again for sunny optimist mode, telling readers that Tennessee kids “gained ground in reading and math…echoing other, even stronger gains that students made on state-mandated exams.”
It’s true that raw scores on the NAEP crept upward, but the important point is that the percentage of Tennessee students reaching the “basic” level of achievement on the NAEP didn’t, and for the most part hasn’t moved in years. Fourth grade reading: no statistically significant gain in proficiency since 2003, or since 1992 for that matter. Fourth grade math: no gain since 2003, a slight gain since 2000. Eighth grade reading: no proficiency gain since 1998. Eighth grade math: no gain since 2003, a modest gain since 2000.
It adds up to a paradox: hefty local score jumps and similar state gains on a test developed expressly by the state to meet NCLB proficiency requirements, while the one independent national assessment using a consistent, reliable measure shows the state stagnating. What to make of it? Dan Long of the state education department calls that “a really good question.” Jan Lineberger, Tennessee’s NAEP coordinator, sees the discrepancy as “inevitable” because of the greater local standard-setting that goes into creating state tests.
Dan Long and Paul Changas both point to a motivational explanation: the TCAP is high-stakes and involves accountability because teachers and students see consequences of poor performance. Changas says that because students, teachers and schools get no specific feedback from the NAEP, “the incentives are not there.” An analysis of the paradox, which shows up in several states, by the testing watchdog group FairTest yields a different conclusion: “Score gains on one test mean little if there are not parallel improvements on tests that are not taught to, such as NAEP.” Apparently state education officials see the NAEP as less diagnostic because we don’t teach to it, while outside analysts see the NAEP as more diagnostic because we don’t teach to it. Go figure.
A simpler explanation would be that Tennessee (and other states where this paradox shows up) has dumbed down their tests and basic proficiency levels to make NCLB progress look good. This is easy to assert but hard to prove. You can’t judge rigor solely by how many questions kids need to get right, but it’s disconcerting that a kid in Tennessee is “proficient” in elementary- and middle-school reading by answering just one-third of the questions correctly. In a multiple-choice exam where questions have four possible options, choosing answers randomly will get you to roughly 25 percent performance, so 33 percent hardly seems like a high bar for basic competence. As former NAEP governing board member Diane Ravitch put it in a New York Times op-ed earlier this week, “states have embraced low standards and grade inflation.”
Let’s bottom-line it this way (even if it means using “bottom-line” as a verb): Metro test scores are up, but so are statewide averages. The rise is genuine in the sense that the score jumps are sizeable, but the one-year gains in Nashville mask multiyear trends that are far less impressive and in some cases alarmingly weak. State and local scores are rising while reliable national measures are flat, which seriously calls into question the persuasiveness of state gains. It all makes heaps of test-performance praise for Pedro Garcia and his staff seem like an impulsive rush to judgment.
Tests are a legitimate window on educational performance, but the danger is deluding ourselves into seeing test results as the very definition of performance. In our current test-obsessed educational climate, happy test score results need tough love, not unconditional love. As school board member Kathleen Harkey puts it, “While we’re measuring how tall they are, are we measuring how tall they need to be to compete?”
Bruce Barry is Professor of Management and Sociology, Owen Graduate School of Management,
INDEX OF NCLB OUTRAGES