Orwell Award Announcement SusanOhanian.Org Home

NY1 Exclusive: Memo Shows Company Aware Of State Test Errors

If the public were allowed to look at these tests, they'd find a lot more errors.

by Staff

Officials with the company responsible for this year's state English and math tests have not commented publicly on recently-discovered errors in the exams, but a memo obtained exclusively by NY1 and displayed below shows the company has admitted mistakes internally. NY1's Lindsey Christ filed the following report.

by Lindsey Christ

The state's highest education official is no longer downplaying the errors in the high stakes English and math exams.

"The mistakes that have been revealed are very disturbing," said New York State Schools Chancellor Merryl Tisch.

Almost 30 different test questions have now been declared invalid because they're confusing or have outright errors. And now Pearson Publishing is scrambling to explain what went wrong and how it's going to fix things.

NY1 obtained a memo that an executive vice president at the company sent to the head of the state's testing program. The executive wrote, "We are committed to eliminating any gaps identified by the New York State Education Department between expectation and our performance."

The day before the memo was sent, unhappy state officials had called Pearson. It was a Sunday, just after students finished taking the exams. They had already pulled six questions from a English exam, related to a bizarre passage about a talking pineapple. Then they'd yanked three math questions which didn't add up and made teachers re-score a writing section where the grading guide was off.

But most of the errors were discovered in translations of the math tests into five foreign languages and Braille. Twenty questions either had no correct answer or more than one.

"These inexcusable errors from typographical to translation to a nonsense question," Tisch said.

The Pearson executive wrote that an investigation is underway but said many errors seemed to result from a lack of proof reading rather than a translation issue. He mentions a math question where a negative sign somehow became a positive sign in a translated version. In another case, the translators seem to have confused common middle school math terminology, replacing the term "mean" with a translation of the term "median."

The memo lays out steps Pearson might take to prevent similar errors in the future and is peppered with sheepish yet eager phrases, like: "Pearson agrees that we need to work diligently to improve" and "we strive for continuous improvement and pledge to continue to learn and improve as we work together."

The executive also promises to present the state with âa more comprehensive plan with timelines, tasks, responsibilities and outcomes clearly articulated and documented.â

Chancellor Tisch said she will give the company one more year. However, some parents and teachers want the state to cancel the company's five-year, $32 million contract. They say students don't get a second chance with high stakes tests, so why should the test company.

Memo From Pearson

NCS Pearson, Inc.
2510 North Dodge
Iowa City, Iowa 52241 USA

April 30, 2012

Mr. Ken Slentz
Deputy Commissioner,
Office of P-12 Education
New York State Education Department
89 Washington Avenue
Albany, New York 12234

Dear Ken,

Pursuant to our discussion of Sunday, April 29, 2012, Pearson stands behind our work in New York as we do the work provided by our subcontractors. As such, we are committed to eliminating any gaps identified by the New York State Education Department between expectation and our performance in the spring of 2012. In this regard, you identified two global issues I want to address: performance of our translation subcontractor (Eriksen Translations Inc.) and quality of the work supporting the New York State Testing Program---Scoring Guides.

1. Translation

As you may recall, Eriksen Translations Inc., is our identified subcontractor performing translations in New York in partial fulfilment of our Grade 3---8 assessment contract. Eriksen Translations Inc. is also an identified Minority and Women Business and, as such, helps both Pearson to fulfill contractual requirements as well as the state to meet its goals. This spring, several translation issues were identified ranging from the lack of a correct response option for some multiple choice items (and / or different response options than in the source English version of the assessments) to omitted words or phrases, typesetting / formatting errors, errors in vocabulary or the translation itself.

All of these issues were introduced during translation and the subsequent typesetting of the translated versions of the tests. Pearson and Eriksen are already documenting these issues and we are taking further actions to better understand why and how they occurred. For example, Eriksen is performing a "root cause" analysis working in coordination with Pearson Organizational Quality to identify required corrective actions. It is premature at this juncture to determine changes in procedures and processes until the investigation into cause is completed. However, we have identified several options for consideration regarding process improvement:

* Enterprise scheduling. One area of concern for translations is the amount of lead time required to accommodate the translation process. As you may recall, Eriksen must translate five different languages (traditional---Chinese, Haitian---Creole, Korean, Russian, and Spanish). Furthermore, Eriksen uses both a forward and backward translation process where the base English language test is forward translated into the various target languages which are then back translated to English and compared against the English language source. Such an iterative process allows for the correction of various aspects of the translation. This process, however, requires that Eriksen start with the final English language test forms. That did not occur this year. This year, because of the compressed schedule, Eriksen started the translation process during one of the review stages of the English language assessments. Many changes were introduced into the English language assessments after Eriksen started, causing unanticipated rework and versioning control issues. Going forward we plan to include Eriksen's translation needs explicitly in the enterprise schedule such that we can quantify the risks that schedule changes will have on Eriksen's ability to follow their necessary work flow.

* Production process. While the root cause analysis is not yet complete, many of the types of issues and errors seen to date involved typesetting and/or proof reading errors than actual translation errors. For example, incorrectly changing "(a+6) and (a---3)" to "(a+6) and (a+6)" is a typesetting and / or proof reading error rather than a translation error. We can address such issues by providing support to Eriksen for desk top publishing, proof reading or typesetting or provide an additional round of independent proof reading. Furthermore, making simple changes in how EPS files (Encapsulated PostScript Files)--which are typically self contained files for the transfer and display of graphics and art--are exchanged between Pearson and Eriksen can minimize the chance that additional errors will be introduced into the process.

* Independent verification. Since Eriksen uses a forward and backward translation process, it would be advisable to add a third party independent translator to verify and document the decisions made to resolve inconsistencies between the two versions (i.e., the English source and the English version resulting from translating back to English from the target language). It is at this stage where varied judgment regarding correct vocabulary use could affect the quality of the translation. For example, while it will not be known for certain until the root---cause analysis is performed, the incorrect use of the word "median" instead of the correct use of the word "mean" might have resulted due to decisions made during this stage. Regardless of the specific actions taken (as guided by the root cause analysis and in consultation with the NYSED) Pearson is ready to improve the procedures and ultimate quality of the translation process and outcomes and these suggestions represent our earliest thoughts.

2. Scoring Guides

The complete review of the scoring guides, while not documenting any significant errors or immediate action items, did reveal that the scoring guides in general need improvement to become the exemplar documents expected by the NYSED. Pearson agrees that we need to work diligently to improve these guides used by teachers to score constructed response items in New York. While we need more time to pull together a comprehensive plan (working with our own scoring experts and process engineers) some of our ideas for immediate action include:

* Mining information from field testing, prompt and rubric development. Typically, during the development of a constructed---response item, the logic for a fully correct score and each partially correct score is documented and translated into rules for scoring. While this is standard practice there are additional processes that can be undertaken. For example, during field testing, a host of unanticipated, but potentially correct (as well as incorrect) answers will be obtained from students. These answers are typically reviewed to verify that the anticipated correct answers are indeed discovered in student responses. In addition, we plan to review additional student responses for novel solution sets and document the various ways in which students obtain partial credit responses. Ultimately this might require a larger sample of student responses, but such data will allow us to document actual student performance across a variety of scenarios leading to potentially correct responses.

* Expert Review. Similar to the recent post---hoc review performed by the Regents Fellows and independent Pearson experts for the current scoring guides, we should also incorporate an expert---independent review into our process for the development of the scoring guides as routine process going forward. We are also considering hiring a dedicated scoring resource to work in and with the content development team to help align content and performance scoring activities.

* "Test Hacker" Review. One idea we have discussed that would be particularly applicable to constructed---response questions and their associated scoring rules would be to ask a team of savvy, subject matter experts who have not been associated with the item development to take the test with direction to find flaws, errors or otherwise defeat the assessment. We could then review the range of responses and/or interview these hackers to understand better what they tried and how robust items withstood various attacks. These same "hacker responses" can be scored using the developed scoring guides as another test of the ability of the scoring guides to provide correct partial and full credit responses.

* Expanded use of prototype items. Currently during field testing the prototype or exemplar items are chosen to represent a wider range of items regarding the development of scoring rules and guides. These items receive the full complement of anchor, practice, and qualification sets. We could expand this such that student responses and complete descriptions of the non--- 3 prototype items also receive the full anchor, practice, and qualification sets. Currently the non--- prototype items have only anchor and practice sets. Pearson could develop these complete training sets on all of the items generated and review in anticipation of developing a scoring guide for each and every item (even if we do not choose a particular item at that particular time to be included in an operational test).

Again, these are our immediate suggestions on how to improve the overall quality of the Scoring Guides. We would like to take additional time to develop and vet a more comprehensive plan with timelines, tasks, responsibilities and outcomes clearly articulated and documented. Pearson is here to support you as we transition to cutting edge assessments measuring the Common Core State Standards and college and career readiness. As such, we strive for continuous improvement and pledge to continue to learn and improve as we work together.

As always, if you have any questions, need clarification or additional information please drop me a note at jon.s.twing@pearson.com or call me at 319.331.6547.

Jon S. Twing, Ph.D.
Executive Vice President & Chief Measurement Officer

— Staff





This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.