Can You Measure Temperature With a Spoon?

Susan Notes:

I give my comment over to Valiant, only to add that I feel the same way about EDDRA-2--and Mike Martin-- as Bob.

Bob Valiant Commet, Keenewick School Districk Citizens

For the past several years we have been fortunate to be part of a group started by the late Jerry Bracey, the Education Disinformation Detection and Reporting Agency (EDDRA) and, following his demise, EDDRA-2. The group follows education studies and reports and performs the task so ably described by Earnest Hemingway as "crap detection." Once detected, the group discusses the quality of the study and points out misleading and incorrect information. These discussions are rarely reported outside the group, but Mike Martin's discussion of achievement testing deserves a much wider audience. It was posted on EDDRA-2 in response to an earlier posting, "The Three Most Important Words in Education -- Assessment, Assessment, Assessment." If you would like to learn more about what achievement tests are really worth, read on. RJV

By Michael T. Martin

There are really two issues here. One is subjectivity and the other is validity. Subjectivity IS a sinister problem. In science experiments, even in very objective experiments by disinterested scientists, the "gold standard" is double blind protocols. Double blind means that the participant does not know whether they are in the experiment or the control group, AND the administrator of the experiment does not know either. These protocols arose out of necessity as it was found that bias had very subtle ways of influencing experiments. One has only to read the history of phrenology to see this. Empiricism arose to ensure that decisions were based on objective evidence rather than subjective or even dogmatic beliefs.

The real world example that clearly illustrates this is American racism. Bias against Black Americans resulted in many unfortunate consequences. I am currently writing a summary of a recent study by Smith College assistant professor of Sociology Tina Wildhagen titled "How Teachers and Schools Contribute to Racial Differences in the Realization of Academic Potential" where mathematics and reading tests were used as a measure of ability to predict student grades and this prediction was compared to actual grades with a database of student and teacher characteristics. The study found that "African-American 12th graders are more likely than White 12th graders to earn grades that are lower than their 10th grade scores on math and reading tests predicted." The report stated "The biggest reason for this unrealized stock of academic potential at the student level is that, on average, teachers perceive White students as exerting more effort and conforming more to classroom expectations as compared with African-American students." The report suggested that this bias in "teachers' perceptions of African-American students" effort may stem from the use of a seemingly neutral lens that is actually calibrated by the expectations of White middle-class culture-- and "teachers may interpret classroom behaviors that do not comply with White cultural norms as misbehavior."

What I found particularly intriguing was that the data indicated this may not have been due to racism! The study suggested "It could be that social class mismatches between students and teachers, rather than racial incongruity alone, would help explain why teachers' perceptions of students contribute such a great deal to the racial gap in the realization of academic potential." The bias was not so much racial bias but that of middle-class teachers versus generational poverty.

Here, tests were used as objective evidence to compare against decisions that might contain bias and using complex mathematical controls the bias was confirmed. However, the tests were used in a manner that corresponded with their intended usage for which they had been carefully validated. Validity is the very crucial foundation of testing that too often is unappreciated by the public. I like to use the example of professional surgeons who use sharp scalpels to perform medical miracles and compare that to people who go down the street slashing people with the claim they are surgeons doing good. That is not valid.

In my previous post I referred to my summary of research on achievement testing. That summary quotes Dr. Robert Linn, Dr. James Popham, Dr. Eva Baker, Dr. Daniel Koretz, Dr. Steve Dunbar, all professors at prestigious universities who are the world's foremost experts in achievement testing who criticize high-stakes testing. They are not condemning testing, per se, and their entire lives revolve around promoting valid testing. Valid testing provides a very crucial objective measurement of what it is designed to measure. But it has to be used by trained people, equivalent to professional surgeons, and cannot just be used to slash people willy-nilly.

The point is that high-stakes testing is a fraud. The experts in testing, the equivalent of professional surgeons, are unanimous in pointing out this fraud. As I stated in my previous post: The issue isn't even testing, the issue is fundamental reality. You can use testing for legitimate purposes, but as Dr. Popham has stated, you can't use a spoon to measure temperature. We use spoons to measure things all the time, they are perfectly suitable measuring instruments for what they are intended, but they are not intended to measure temperature. What Dr. Popham was insisting is that high-stakes testing is like using a spoon to measure temperature, and I would go even further and state that the real problem is then using that temperature to declare whether a student was sick or not. Even if you had a very exact thermometer to measure temperature, a student's temperature cannot be used to declare that he or she is sick or not. It may be evidence, but it is a misconception that it is declarative.

And that is what we have with high-stakes testing. Even if we did have an accurate measure of temperature it cannot be used to determine illness, but we don't even have a valid means of measuring temperature, we are using spoons! Or in my example, we are letting people slash students with scalpels and claiming this makes them healthy. My original draft summary of testing compared it to phrenology. But in reality what high-stakes testing is most comparable to is the tests to determine witches in Salem.

But having said that, don't discredit surgeons because people are slashed in the street. Don't discredit spoons just because they are misused. And don't discredit tests just because political idiots slash people with spoons. Subjectivity is a real concern, and there is a long continuing history of racial subjectivity that dominates the news every day. There are two famous movies that I recommend that young people view: Rashomon and Twelve Angry Men. In both movies the actual objective evidence is entirely obscured by subjective bias.

This has been the condition of education for too long. Objective evidence is very badly missing in education. My favorite example is that of phonics, which uses phony "objective" evidence but has been shown by actual objective evidence to reduce reading comprehension in children. Tests of reading comprehension show that the more phonics instruction children receive the lower their reading comprehension scores. You have some major corporations who are making money hand over fist selling phonics instruction programs. If you measure success by how much money you make (i.e. privatization), then phonics is successful, but if you care about whether children can read in order to improve their lives, you need an objective measure of whether their reading instruction actually improves or destroys their reading comprehension.

— Michael T. Martin


