Testing Takes Teachers To Task (Jay Greene and Marcus Winters)
What would you think if you opened the Wall Street Journal to find an op-ed arguing that money managers should not be measured against performance benchmarks like the S&P? Further, the author argues, managers should not have to report performance figures to clients at all because it deters otherwise hardworking people from the profession because they believe that money management cannot be distilled into a quantitative measure.
It is difficult to imagine that such an article would appear in the Journal, which has championed measurement of standards in nearly all economic and public-policy endeavors. But change “money managers” to “public-school teachers” in the above hypothetical and you have the very real op-ed lambasting the No Child Left Behind Act (NCLB) by American Enterprise Institute fellow Charles Murray that recently appeared in the Journal.
Murray, a conservative most renowned for his book “Losing Ground,” which was a highly influential criticism of the modern welfare state, joins the chorus of NCLB discontents in arguing that high-stakes testing narrows student learning to include only test-taking skills, and that it discourages teachers whose autonomy is threatened. These are popular mantras of the teacher unions and others opposed to reforming the nation’s public-school system.
But these criticisms would only be valid if “teaching to the test” meant that students weren’t also learning how to read and add. Reducing teacher autonomy by requiring students to learn tested material is only worrisome if it doesn’t also produce real learning.
In a study for the Manhattan Institute, we empirically examined whether Murray’s criticism is valid. If the accusation that high-stakes testing leads only to drilling and not real learning is correct, then the results of high-stakes tests should differ dramatically from the results of other measures of student achievement where no stakes are attached. After all, no one has an incentive to teach-to or otherwise manipulate low-stakes tests. A number of states, as well as several school districts, administered nationally respected standardized tests on which there were no stakes attached, in addition to the mandated high-stakes test. To see if the learning measured by the high-stakes test would be confirmed by the results of the low-stakes test, we compared their results.
We found that these different tests produced remarkably similar results. In Florida, for example, we found that scores on these two tests correlated at an astounding 0.96 (a perfect correlation would be 1.00). Thus, the results of our study indicate that we can believe the scores on high-stakes tests. If the scores on high-stakes tests were manipulated or if students only learned skills that would help them to “beat” that particular standardized test without gaining real knowledge, then their results would not correlate with those of other respected tests on which there is no incentive to “teach-to” or manipulate.
And test scores have gone up in response to accountability programs. Though there is little research on the effects of NCLB in particular, there is actually significant evidence that accountability systems in general have improved student performance. Separate projects by researchers at Stanford University have each found that high-stakes testing has improved student proficiency, and we and other researchers have found that low-performing schools in Florida have improved in direct response to the incentives they faced under the state’s accountability system.
Murray provides us with a colorful anecdote about a dedicated schoolteacher who left the profession because the high-stakes tests stifled his professionalism. Truly, losing quality people from the teaching profession is a shame. But the goal of our education system is student learning, not teacher autonomy. And qualified teachers have little to fear from tests that accurately measure effective teaching.
It is worth noting that Murray’s larger point — that focusing on the percent of students reaching an arbitrarily chosen benchmark we call “proficient” instead of raw scores is imprecise and can lead to misleading results — is bang on. Murray describes expertly how reporting test results as the percent who read at certain levels throws away very useful information and is prone to unreasonable spinning of the results. However, rather than using these criticisms to improve NCLB and other high-stakes testing policies, Murray would have us through the baby out with the bathwater. The answer is not less accountability, but rather a system that utilizes test scores efficiently.
Though the money-manager hypothetical may seem outlandish, respected people make such arguments about public schools every day. But in education — or anything else that matters — we have no hope to manage what we do not measure in some way. Without testing we have no way of knowing how well (or poorly) our schools are performing, and we are left to trust schools when they tell us that they are doing their best. That public schools insist that they are performing up-to-par should provide no more comfort than if your money manager insisted that you need not see your portfolio because he was working as hard as he could to invest your money properly.
Research suggests that high-stakes testing can improve real student proficiency. We should not go back to the days when we had no tools for measuring and holding schools accountable for teaching students even the most basic skills.
Jay P. Greene is Endowed Professor in the Department of Education Reform at the University of Arkansas and a senior fellow at the Manhattan Institute. Marcus A. Winters is a senior research associate at the Manhattan Institute and a Doctoral Academy Fellow at the University of Arkansas. This article previously appeared in National Review Online.