Orwell Award Announcement SusanOhanian.Org Home


Test maker Pearson Defends Pineapplegate


Some Background: Here is a pdf file of the actual Pineapple passage and the questions.

Taken from Jon S. Twing, Ph.D., Executive Vice President & Chief Measurement Officer. Pearson, bio at Pearson:


Before joining Pearson in 1996, Twing was senior project director in Psychometrics and Technological Applications at The Psychological Corporation/Harcourt Brace Educational Measurement. He has also served as an instructor at The University of Iowa, Trinity University, and Coe College.

Jon has a B.S. in Psychology and Business from Central Michigan University and M.A. and Ph.D. degrees in Educational Measurement and Statistics from The University of Iowa.

There's no evidence of having ever talked to an 8th grader. . . and in this exchange on Twitter he dismisses the importance of teachers.


26 Jan Sir Michael Barber [Chief Academic Advisor, Pearson]
Future assessement equals the best games plus the best psychometricians plus real time. I wonder whether this is right?

28 Jan Jon S. Twing
@MichaelBarber9 Do we need paychometricians?


28 Jan Sir Michael Barber
@JonSTwing Most definitely combing their talent and insight with the wider education community!

28 Jan Jon S. Twing
@MichaelBarber9 Richard Feynman we needed, but my high school physics master--not so much. We need the right psychometricians.


You can see pictures of Twing's dogs, listed as family, on Facebook.


by Staff

TIME Ideas obtained this Pearson memo from an employee in New York state government who is frustrated with the lack of transparency surrounding the recent firestorm over standardized testing. The full text is below, with the exception of a phone number in the last paragraph that has been redacted. To read Andrew J. Rotherhamâs anatomy of the scandal, click here.


April 22, 2012

Mr. Ken Slentz
Deputy Commissioner, Office of P-12 Education
New York State Education Department
89 Washington Avenue

Albany, New York 12234

Dear Ken,

Pearson is confident that the NYS Grades 3-8 English Language Arts (ELA) and Mathematics assessments have been developed to support valid and reliable interpretations of scores for their intended uses. The "Hare and the Pineapple" passage and associated items were placed on the Grade 8 ELA test after the NYS field test data associated with the multiple choice items and the feedback from the "final eyes" committee determined that this was an appropriate passage and set of items to include on the test. Detailed background information about the passage and items are provided below.

Background on SAT 10 Items and Use in New York State

When the contract was awarded to Pearson in March 2011, part of the scope of work was to include norm-referenced items that would be administered each spring in the New York State

Grades 3-8 English Language Arts (ELA) and Mathematics assessments. These items would serve two purposes -- to provide national normative data and to contribute to the studentâs operational score. Form B of SAT 10 was planned to be used intact to meet both requirements. Likewise, due to the planned inclusion of these normed items, Pearson planned to meet the item development target numbers with a combination of both normed and custom developed items.

In fall of 2011, the New York State Education Department (NYSED)made a determination that the SAT 10 Form B would not be used in total on the 2012 operational assessment. This decision was made due to the fact that not all SAT 10 items are aligned to 2005 New York State standards and having such items contribute to an operational score was not ideal. With this decision, two shifts resulted. The first is that if any SAT 10 Form B items were used on the operational assessment, they would not yield normative data (as the complete SAT 10 Form B is needed to establish this). Secondly, it was determined that custom developed passages and items should be placed on the operational test forms first, and if there werenât enough eligible custom items, to use the field tested SAT 10 Form B items.

Why the "Hare and Pineapple" Passage was Chosen

During test construction it was determined that with the exclusion of the SAT 10 items on the operational form there were not enough custom items developed to assess Strand 2, therefore

"The Hare and the Pineapple" passage and associated items were chosen for the operational form. This was a sound decision in that "The Hare and the Pineapple" and associated items had been field tested in New York State, yielded appropriate statistics for inclusion, and it was aligned to the appropriate NYS Standard.

"The Hare and the Pineapple" passage is intended to measure NYS Standard "interpretation of character traits, motivations, and behavior" and "eliciting supporting detail". The associated six multiple choice items are aligned to the NYS Reading Standards, specifically to Strand 2. The NYS performance indicator assigned to the items is "Interpret characters, plot, setting, theme, and dialogue, using evidence from the text".

It is important to note that the use of SAT 10 items as operational items will not occur going forward as Pearson is developing an adequate number of custom items aligned to the Common Core Standards.

Concerns with Items Associated with "Hare and Pineapple"

There have been two items of the set of six that have been challenged by NY teachers and students as the test was under way April 17-19, 2012 -Item 7 and Item 8. The correct answers and rationales to Item 7 and Item 8 are as follows:

  • Item 7: The correct answer is C. The question regarding the animals' possible motivation for eating the pineapple requires a reader to infer the correct answer from clues conveyed in the text. While all of the options are plausible motivations, the most likely answer is that the animals were annoyed. Paragraph 13 indicates that the animals support the pineapple to win the race because they assume the pineapple has a clever plan. However, the pineapple never moves during the race. From these clues and events, a reader can infer that the animals are annoyed. The text does not support the inference that the animals are motivated by hunger, excitement, or amusement.


  • Item 8: The correct answer is D. The question regarding the wisest animal requires the reader to apply close analytic reading skills to determine which of the choices represents the wisest animal based on clues given in the text. The moose and the crow are the two animals that present the incorrect idea that the pineapple has a clever plan to win the race. This idea is proven false when the hare wins the race. The hare is presented as incredulous that a pineapple would challenge him to a race, but overconfidently [sic] agrees to race a pineapple.

    Finally, the owl declares that "Pineapples don't have sleeves," which is a factually accurate statement. This statement is also presented as the moral of the story, allowing a careful reader to infer that the owl is the wisest animal.



  • Previous Use of âHare and Pineappleâ Passage and Items

    The Stanford 10 Form B, which contains the passage and the six multiple choice items, is used exclusively as a secure form. This means that this form is available only for state-wide or large district customers who agree to maintain security of the documents at all times. Between 2004 and 2012 the form was previously used in six other states and three large districts. In 2012, the only state-wide use of this form was in NY State. Until the events of this past week, we did not have any prior knowledge that the passage entitled "The Hare and the Pineapple" had any controversy associated with it from any prior use.

    State administrations include:

    ⢠Alabama 2004-2011

    ⢠Arkansas 2008-2010

    ⢠Delaware 2005-2010

    ⢠Illinois 2006-2007

    ⢠New Mexico 2005-2007

    ⢠Florida 2006

    Large District Administrations:

    ⢠Chicago 2006-2007

    ⢠Fort Worth

    ⢠Houston

    Item Performance

    Item statistics are provided for the six items related to the Hare and the Pineapple, both based on New York state field test in 2011, and a representative sample at the national level (2002). As can be observed from the statistics on the following page, the items performed reasonably well. Based on the New York State studentsâ performance, item p values range from 0.32 to 0.86, indicating a good selection of easy and challenging items related the passage. The discrimination powers (based on point biserial values) of the items are also high, ranging from 0.27 to 0.47. The industry standard requires point biserial values to be higher than 0.20.

    Background to SAT 10 Development

    The National Research Program for the standardization of Stanford 10 took place during the spring and fall of 2002. The purpose of the National Research Programs were to provide the data used to equate the levels and forms of the test series, establish the statistical reliability and validity of the tests, and develop normative information descriptive of achievement in schools nationwide. Testing for the Spring Standardization Program of all levels and forms of Stanford 10 took place from April 1, 2002, to April 26, 2002. Testing for the Equating of Levels Program, Equating of Forms Program, and Equating of Stanford 10 to Stanford 9 took place from April 1, 2002, to May 24, 2002. Approximately 250,000 students from 650 school districts across the nation participated in the Spring Standardization Program, with another 85,000 students from 385 school districts participating in the spring equating programs. Some students participated in more than one program

    Testing for the Fall Standardization Program took place from September 9, 2002, to October 18, 2002. Testing for the Equating of Levels Program, Equating of Forms Program, and Equating of Stanford 10 to Stanford 9 took place from September 9, 2002, to November 1, 2002. Approximately 110,000 students participated in the Fall Standardization and Equating Programs. Some students participated in more than one program.

    The majority of individuals who wrote test items for Stanford 10 were practicing teachers from across the country with extensive experience in various content areas. Test item writers were thoroughly trained on the principles of test item development and review procedures. They received detailed specifications for the content area for which they were writing, as well as lists of instructional standards and examples of both properly and improperly constructed test items.

    As test items were written, and received, each test item was submitted to rigorous internal screening processes that included examinations by:

    ⢠content experts, who reviewed each test item for alignment to specified instructional standards, cognitive levels, and processes;

    ⢠measurement experts, who reviewed each test item for adequate measurement properties;

    and,

    ⢠editorial specialists, who screened each test item for grammatical and typographical errors.

    The items were then administered in a National Item Tryout Program which provided information about the pool of items from which the final forms of the test were constructed. The information provided by the Stanford 10 National Item Tryout Program included:

    ⢠The appropriateness of the item format: How well does the item measure the particular instructional standard for which it was written?

    ⢠The difficulty of the question: How many students in the tryout group responded correctly to the item?

    ⢠The sensitivity of the item: How well does the item discriminate between students who score high on the test and those who score low?

    ⢠The grade-to-grade progression in difficulty: For items trie~ out [sic] in different grades, did more students answer the question correctly at successively higher grades?

    ⢠The functioning of the item options: How many students selected each option?

    ⢠The suitability of test length: Are the number of items per subtest and recommended administration times satisfactory?

    In addition to statistical information about individual items, information was collected from teachers and students concerning the appropriateness of the questions, the clarity of the directions, quality of the artwork, and other relevant information.

    We trust this information is helpful to you. Please know that Pearson is ready to assist you and answer any additional questions you may have. As such, donât hesitate to contact me at

    Most Sincerely,

    Jon S. Twing, Ph.D.
    Executive Vice President & Chief Measurement Officer
    Pearson


    Diane Ravitch blog

    Earlier today, we saw a leaked memo in which Pearson defended the tale of the Hare and the Pineapple. It was field-tested, the spokesman said. It was psychometrically sound. It was just a splendid test item, and the corporation couldn't understand what all the fuss was about.

    There is an adage: When you are in a hole, stop digging.

    Well, Pearson, keep digging. It gets better and better.

    Fred Smith, who is a testing expert who worked at the old New York City Board of Education, often comments on testing issues in the press and on the New York City parent blog. Today, he wrote the following at the New York City parent listserv:

    Not only that, folks:

    This justification from Pearson comes two days after Commissioner John King canned the pineapple. It shows that the Pineapple and the Hare was nationally-normed ten years ago -- when the Stanford 10 was standardized.

    SED contracted for Pearson to supply 20-25 nationally-normed items per grade per subject (120 to 150 items for ELA and for math). How many of these were developed in 2002? In the testing industry, norms grow stale over time and tests are re-normed to stay up-to-date with achievement levels in the current test population. In short, old norms (based on performance exhibited by 3rd through 8th grade reference groups from ten years ago) are unacceptable. It appears that Pearson resorted to old data in its item bank in order to cut costs.

    We also learn from the memo that the Pineapple item was field tested by Pearson in New York State in 2011. [Under the contract, Pearson did stand-alone field testing in 2011 in order to develop the operational 2012 exams given three weeks ago.] This was done despite warnings that stand-alone field testing is prone to being unreliable, because students are not always well-motivated to take such field tests. That was the very reason given for embedding field test items on the April exams,

    The fact that the Pineapple item stats gathered from the 2011 field test match up nicely with the stats from data generated by a decade-old national standardization sample has little relevance to the case that Pearson is trying to advance hereâ that items such as this âhave been developed to support valid and reliable interpretations of scores for their intended uses.

    Plain and simpleâthis is a CYA memo from the publisher who apparently acted to increase its profit margin.

    ~Fred Smith

    And Lisa Donlan, a Manhattan parent activist, wrote as follows:

    A perfect object lesson in why psychometric pseudo science (and justifying babble) should not replace real live human qualified and trained TEACHERS and teacher-generated assessments.

    Why trust this flawed model with evaluating the teaching and learning of our kids, teachers, schools and districts?

    And why cut our school budgets to the bone so we can afford these outrageous for-profit vendors, when we (under) pay teachers and administrators to assess effective teaching and learning every day?

    This is a sham, a scam and all about the ADULTS, not the kids!

    Lisa

    Pineapplegate is the gift that keeps on giving, and Pearson just won't let go. Keep digging.

    — Time staff, Diane Ravitch, Fred Smith, & Lisa Donlan
    Time Ideas

    2012-05-04

    http://ideas.time.com/2012/05/04/pineapplegate-exclusive-memo-detailing-the-hare-and-the-pineapple-passage/

    na


    MORE OUTRAGES


    FAIR USE NOTICE
    This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available in our efforts to advance understanding of education issues vital to a democracy. We believe this constitutes a 'fair use' of any such copyrighted material as provided for in section 107 of the US Copyright Law. In accordance with Title 17 U.S.C. Section 107, the material on this site is distributed without profit to those who have expressed a prior interest in receiving the included information for research and educational purposes. For more information click here. If you wish to use copyrighted material from this site for purposes of your own that go beyond 'fair use', you must obtain permission from the copyright owner.