How Feasible is Adequate Yearly Progress (AYP)? Simulations of School AYP “Uniform Averaging” and “Safe Harbor” under the No Child Left Behind Act

Note: This is not the full report, which contains charts and mathematics that won't appear correctly here. For the full report, go to url below.

The No Child Left Behind Act of 2001 (NCLB) requires that schools make “adequate yearly progress” (AYP) towards the goal of having 100 percent of their students become proficient by year 2013-14. Through simulation analyses of Maine and Kentucky school performance data collected during the 1990s, this study investigates how feasible schools would have met the AYP targets if the mandate had been applied in the past with “uniform averaging (rolling averages)” and “safe harbor” options that have potential to help reduce the number of schools needing improvement or corrective action. Contrary to some expectations, the applications of both options would do little to reduce the risk of massive school failure due to unreasonably high AYP targets for all student groups. Implications of the results for the NCLB school accountability system and possible ways to make the current AYP more feasible and fair are discussed.

Policy implications of this study need to be discussed carefully given the fact that the findings are based on the simulation analysis of the past school performance data in a single grade and a single subject area from two selected states. It needs to be noted that the study has some unwarranted assumptions about school AYP measures and targets within the parameters of the NCLB and that the actual results can be quite different if the two states make different choices (e.g., using an index measure of AYP, increasing the AYP target in a nonlinear, stepwise fashion). Whatever estimation methods used, this study might underestimate or overestimate the schools’ future progress expected under this new legislation, NCLB. The results may have been different if schools had faced in the past the stronger incentives embodied in current AYP rules. Moreover, the results might be different if the performance standard used in the past is significantly higher or lower than the current performance standard adopted under new testing systems in both states. However, the comparison of Kentucky and Maine (high-stakes testing vs. low-stakes testing environments with their commonly challenging state assessments and high performance standards) can give us an insight into possible consequences of the NCLB AYP policy for schools across the nation.

With these caveats in mind, the results of this simulation analysis turn out to provide very gloomy projections of schools’ chance to meet the AYP target, warning federal and state education policymakers against massive school failure under the NCLB. It does not appear to be feasible for many schools across the nation to meet the current AYP target within its given 12-year timeline. It is not realistic to expect schools to make unreasonably large achievement gains compared with what they did in the past. Many schools are doomed to fail unless drastic actions are taken to modify the course of the NCLB AYP policy or slow its pace. Contrary to some expectations, using both rolling average and safe harbor options does not work to reduce the risk of massive school failure. Although the rolling average can help improve more stable estimation of school performance, it hardly reduces the risk of school failure. The safe harbor option also fails to provide a strong safety net to at-risk schools despite what its name implies.

When a majority of schools fail, there will not be enough model sites for benchmarking nor enough resources for capacity building and interventions. This situation can raise a challenging question to the policymakers: is it school or policy that is really failing? There is a potential threat to the validity of the NCLB school accountability policy ultimately if such prevailing school failure occurs as an artifact of policy mandates with unrealistically high expectations that were not based on scientific research and empirical evidence.

One approach that policymakers can consider to make the AYP targets more realistic and fair might be to use an effect size measure for guidance. For example, one might reasonably expect that schools should make progress every year by say 20% of the standard deviation of school-level percent proficient measure; this amounts to about 2.5 - 3.0 percent in Kentucky and 1.5 – 2.0 percent in Maine. This amount of progress may be regarded as small by conventional statistical standard (Cohen, 1977), but it is exactly what an average school in both states managed to accomplish in the past. In a similar vein, one can consider setting the safe harbor threshold for a subgroup at certain percentage of the standard deviation (e.g., reduce the percentage of non-proficient low-income students by 10% of the standard deviation). A similar suggestion along with the use of scale score rather than percent proficient was made by other analysts (Linn, Baker, & Betebenner, 2002).

While using an effect size metric with scale scores may help set more realistic performance targets and better recognize schools’ academic progress, it is not permissible under the current law. This idea also raises questions as to whether to use standard deviation of student-level test scores or school-level average test scores and whether to derive the standard deviation from original test score variance or residual variance with adjustments for demographic differences among students and their schools. In Maine and Kentucky, the school-level standard deviation was only 40 percent of the student-level standard deviation of mathematics achievement scores. Once the differences among schools in their students’ racial and socioeconomic background characteristics, the adjusted school-level variance of residuals is reduced further down to the half of original school-level variance (see Lee & Coladarci, 2002 for the analysis of within-school vs. between-school math achievement distributions in Maine and Kentucky).

Using different methods with different measures would produce different results and, consequently, different conclusions. Whether one prefers a criterion-referenced or norm-referenced approach to setting AYP target and evaluating school progress, the ultimate concern is not simply improving the feasibility of schools’ meeting their AYP targets in the short term but rather enhancing the schools’ capacity for sustained academic improvement over the long haul. Given limited amount of resources available from the federal government and limited capacity of the state agencies as well, reducing the identification of schools in need of improvement would help states provide more targeted assistance to a smaller number of disadvantaged schools which have a large number of at-risk students. Nevertheless, applying the AYP options such as rolling averages and safe harbor had better not be compromised by future prospect of limited support and short-term interests in reducing school identifications. The long-term success of school accountability system does not depend on the number of passing schools but on the results of student achievement

— Jaekyung Lee, SUNY Buffalo
