Test Expert: State Exam Problem Is Worse than Reported
This piece on Pearson's corrupt testing practices in New York has wings. . . flying around the Internet.
Jeff Nichols Comment: Of course, if the tests weren't high-stakes to begin with, but rather were used as the limited tools they were originally intended to be (to inform educators and policy makers with a kind of snapshot of student work across schools), the series of errors and careless planning that Fred Smith has so eloquently described here would merely be a matter of wasted public money, rather than wreaking havoc on students, teachers and schools as promotions are denied, careers ended and schools shuttered on the basis of the unreliable data Pearson provides to the state.
Holding Pearson accountable for its shoddy work is fine and good, but won't alleviate the destructive effects that flow from basing decisions on test scores that are too important to be made in that way, no matter how carefully designed and executed the tests.
by Fred Smith
When State Education Commissioner John B. King Jr. made a presentation to legislators in Albany earlier this month, he outlined a few expectations for Pearson, the test maker, to meet following its turbulent statewide debut in April.
Dr. King listed a handful of steps to improve item writing and test production. They included two actions aimed at accountability: one amending the test contract to include penalties for poor questions and errors in translations and the other (oddly) requiring the company to pay for an independent review of its own test development processes.
Since the State Education Department and test publisher have traditionally been self-protective partners in the testing program, the disclosure of these moves is unusual. The $32 million contract with Pearson runs through 2015.
Dr. King' actions offer the smallest hint of the depth of the latest testing mess the state finds itself in -- and at a time when the test results are more important than ever, influencing everything from a child's promotion to a teacher's evaluation to a school's possible reorganization or closure.
Perhaps it took public ridicule over a silly story with puzzling multiple-choice questions to upset the pineapple cart. Though Pearson defended the material following the revelation of what some are calling "pineapplegate," Dr. King swiftly threw the story out.
As a specialist in testing during a 33-year career spent working for New York City, I believe Pearson is to blame for the current mess we find ourselves in regarding the state exams, which are given to 1.2 million students each year in grades three through eight.
But the state is even more culpable, making bad decisions about the design of the program, particularly the contractual requirements related to field testing.
Now the partners are stuck, and neither can admit the situation is beyond repair.
Here are my concerns, based on what I know about the Pearson experience and my many years in test research and development:
Last December it became known that the April exams would contain items that were being tried out for possible inclusion on 2013's English Language Arts and math exams. These "embedded" items would not count in scoring the test -- they were only being field tested for future use.
But test makers usually create numerous versions of a test -- known as "forms"-- so they can field test a large number of items, while also having enough items that count for gauging student proficiency.
For this spring's exams, just four test forms were devised for each grade in English and math. Each form contained the operational items (that counted) and one of the four sets of tryout items.
Why did the State Education Department work with only four forms per grade when states like Massachusetts try out 15 to 20 forms? The explanation, finally revealed by Dr. King during his briefing of state legislators, is budget. He said the Education Department prints its own tests but doesn't have the capacity to produce more than four forms.
In other words, the purpose of field testing -- to develop better exams for the future, as state education officials and Pearson assured us -- is being thwarted to save money.
All the profound decisions for which the state exams are used will be based on flimsy field testing because no one could make the case for more funds.
Pearson had to have known from its experience in other states that the right way to go was to embed items on a sufficient number of test versions so it would have a large crop of items to choose from without packing too many field test questions into one form. It had to know New York's field testing scheme was problematic.
But other factors played into the decision to limit the number of questions being field tested, including parents' concerns (rightly) that the exams had already expanded to enormous lengths to cram in more field test questions.
When parents learned last fall about the state's approach, the state was forced to retreat. (The proposal to lengthen the state exams by such a large amount led to the ouster of the state's top testing administrator, David Abrams.) Nevertheless, the state went forward with exams in April that doubled the testing time from 2011.
What was the state's solution? It increased the number of stand-alone field tests it planned to administer in June, out of obvious concern that it would not have enough questions tried out for the 2013 exams without this additional volley of tests.
So earlier this month, 12 different test versions containing reading passages with accompanying multiple-choice items were given to students in grades three to eight.
It was an ad hoc alteration of the state's contract with Pearson to compensate for April's limited yield of field-tested questions.
But the desperate effort to make up in June for what wasn't done in April is a futile one. Test experts know: Field tests can only be relied upon as a method to develop future tests to the extent that students take them seriously. Field testing in June when students are unmotivated? Not even close.
Parents get it. They staged a protest against the field tests outside of Pearson's headquarters on June 7. SchoolBook's report on the rally captured the spirit of the bright day -- the children in costumes, they and their parents carrying signs and chanting "a Field Trip Against Field Tests."
Pearson hastily issued an unsigned statement in response, invoking platitudes about raising standards, being steadfast in its commitment to work on behalf of students, and offering an unqualified statement about the benefits of field testing items "to make sure that the questions provide an accurate, fair and valid representation of what students know and can do."
That assumes they're administered under conditions similar to the circumstances that apply when the high-stakes operational tests are given.
Given that sequence of events this year, it is highly unlikely that the State Education Department and Pearson are left with enough strong, tried-and-tested exam questions for next year.
But listen to state officials and Pearson, and the message is "stay the course." Somehow they will cobble together 2013's operational exams -- the ones that will be used to justify decisions about students, teachers and schools. And no should be shocked when Pearson finds its own testing processes valid.
The contract calls for the same approach in 2013 and 2014: embedded field test questions incorporated in the operational exams, followed by stand-alone field testing late in the school year when test-battered kids are least likely to take them seriously.
Shouldn't Pearson be firing a flare to bring attention to this looming debacle that can bring it only further embarrassment?
While some state officials criticize Pearson, yet say the results are valid, parents, taxpayers and others can't really know that until we get some more questions answered. So here are a few issues to test Pearson's pledge of transparency:
1. How many items on April's operational tests actually counted?
This will tell us the number of items that will be used to reach the high-stakes decisions the English Language Arts and math test results are being called upon to support this year. At the same time, we will learn how much testing in April was devoted to developing future exams.
2. Because April's embedded field test questions were inadequate in number, the Education Department and Pearson added 12 stand-alone reading tests to the June tryouts. The contract with Pearson did not anticipate the need for these extra forms.
How much more is it costing to assemble, administer and score these sudden, self-contained tests? What other overruns can be expected?
3. How will the listening component of the ELA be developed? According to the contract, listening passages and accompanying items were supposed to be tried out via stand-alone test forms. This did not happen in June, evidently because the 12 reading forms were deemed a priority and displaced listening. On what basis will next year's listening component be put together?
To a wider point: Is the contract a real agreement or an ever-evolving, open-tab arrangement?
4. In the interest of transparency, will all operational items be made public after the exams have been given, as the State Education Department did for years prior to 2011? This does not include the embedded items that may be used again.
If nothing else, this allays certain doubts about the makeup and quality of the exams. It's better to operate in the sunshine.
5. Will statistics be provided for the questions that were tried out in 2011 and included on the April 2012 exams? This was done in the past, and I was able to obtain the data through a Freedom of Information request. It provides a way to check the efficacy of the field testing procedures.
The two sets of data should match -- otherwise the field test sample provides a faulty basis for constructing its operational counterpart. This is especially important when stand-alone field tests are involved and students' desire to do well may be lacking.
6. Is the Education Department willing to give an independent panel of parents, educators and measurement experts input in designing, executing and reviewing statewide exams and in reporting the results? This is a truer measure of how Albany intends to move forward with the testing program than Commissioner King's requiring Pearson to investigate itself.
This test of transparency and openness doesn't require Pearson or the state to divulge the content of any items that may appear on future exams.
Emboldened by the courage of a few parents, resistance to test-blind education is growing. Parental concerns must be respected. The ball is in the State Education Department's court -- but the ball game is changing.
Fred Smith retired from the New York City public school system as a senior analyst. He remains a consultant on testing, educational research and other statistics related to city government.
New York Times School Book