The standardized education tests that federal law now requires public schools to administer to students are capable of adequately measuring the performance of both students and schools, according to a new study from the Manhattan Institute for Policy Research.
But, at a forum on the report at a Washington think tank Wednesday,
some education policy analysts quickly pointed out that the study's data are not conclusive and that more research is needed.
"It is clear that if we are going to be using high-stakes tests in the future we have to worry a little more about how it is we might produce more reliable results," Jay P. Green, a senior fellow at the Manhattan Institute and lead author of the report, said at the event at the American Enterprise Institute.
A centerpiece of the Bush administration's domestic policy, the "No Child Left Behind Act" became law in January 2002. The measure provides the most sweeping reform of American education regulations since the enactment of the Elementary and Secondary Education Act in 1965.
The law requires school districts around the country to measure the performance of individual schools and students using standardized tests developed by the states -- so-called "high-stakes tests" -- in reading math and science. The idea is that these performance measures will help states evaluate the ways they address school problems and the reasons for their successes.
However, there is an ongoing debate in the education community about the ability of such tests to accurately measure the performance of students and educators. Critics of the law say that performance-based testing encourages educators to "teach to the test" -- to figure out ways to improve test scores without actually improving students' learning.
But because there is little empirical evidence either way, the debate on this question has long featured arguments from both sides that are supported mostly by anecdotes.
In the Manhattan Institute report, "Testing High-Stakes Tests: Can We Believe the Results of Accountability Tests?" Greene and his fellow researchers Marcus Winters and Greg Foster attempt to address this problem by comparing the result of high-stakes tests with other standardized tests that are not tied to accountability policies.
These so-called low-stakes tests were given to students at around the same time and cover the same subject areas as their high-stakes counterparts. Greene said that that since educators do not have the same incentive to manipulate test scores on low-stakes tests, they are an effective control.
He says that if the two tests produce similar results, it demonstrates that that high-stakes tests do not produce the distorted outcomes that critics contend.
Greene said that his research shows that tests can be produced that provide accurate information about student and institutional performance.
"We basically found across the board a pretty high correlation between the level of student achievement on high-stakes and low-stakes tests," he said.
Greene's research team found that on average, in the two states and seven school districts examined -- Florida; Virginia; Blue Valley, Kan.; Boston; Chicago; Columbia, Mo.; Fairfield, Ohio; Fountain Fort Carson, Colo.; and Toledo, Ohio -- there was a strong correlation rating of 0.88 between high-stakes and low-stakes test scores.
An identical correlation would be 1.00. Florida was found to have the highest correlation between the two types of tests at 0.96.
But since other factors beyond the quality of education -- such as student demographics -- also influence test scores, the No Child Left Behind Act views year-to-year increases and decreases in scores as a measure of school performance.
The study found that that in terms of year-to-year score changes, the high-stakes tests were less generally less effective in measuring a school's ability to teach students basic skills. A moderate average correlation of 0.45 was found between the year-to-year gains on high- and low-stakes tests. Florida also topped the list in this measure with a year-to-year correlation of 0.71.
"Outside the state of Florida we did not find a very high correlation between year-to-year gains on high-stakes tests and low-stakes tests," said Greene. "I think it tells us that we have some reason to worry about high-stakes tests for measuring value added, year-to-year gains."
Nevertheless, Green said the strong year-to-year correlation between Florida's aggressive high-stakes testing program and the state's low-stakes tests shows that high-stakes tests can be an effective measure of school performance.
He added that because the low-stakes test given in the Florida is widely respected and similar in design and content to the state's high-stakes test, one can have a high level of confidence that a high-stakes test could be designed to accurately measure both a student and school's performance.
During the forum's question and answer session, William Galston, director of the Institute for Philosophy and Public Policy at the University of Maryland, said that although the Manhattan Institute study was "a terrific piece of research," its conclusions about the year-to-year performance are uncomfortably weak. He said that anything that challenges the validity of measuring gains on a year-to-year basis gets to the heart of the test-based reforms central to the No Child Left Behind Act.
"This may call into question the larger framework of standards and accountability (in the law)," said Galston, who was also a domestic policy adviser to President Clinton and supports the intent of the new reforms.
Tom Loveless, director of the Brown Center on Education Policy at the Brookings Institution, told United Press International that although the study's findings on year-to-year measurements diminished its conclusions, the study's findings on the validity of testing are reassuring.
"If the results would have been more consistent across the different states I think it would have been a more powerful study," said Loveless. "Nevertheless, I think it is good evidence that high-stakes tests do not influence bad behaviors."
Andrew Rotherham, director of education policy at the Progressive Policy Institute, a think tank affiliated with the Democratic Leadership Council, also said the study's conclusions were somewhat unbalanced but generally on the mark.
"I would say that because Florida is heavy in the sample, and that some of the evidence from other jurisdictions was more mixed, this is not a slam dunk," said Rotherham.
He added that further research is needed to determine whether policy changes made around the same time as the testing -- such as increased teacher development efforts or increased investment in schools -- may have had an influence.
Laura Hamilton, a behavioral scientist at the Rand Corp. and an expert in test-based accountability, was more critical of the study.
"I don't think they have conclusively proven what they have claimed to have proven," Hamilton told UPI. "The fact that there is a high correlation (between the high-stakes and low-stake tests) doesn't necessarily mean the tests are telling you the same thing.
She said that without examining the similarities in the design of each test, you do not know how comparable they are from district to district. She also said that the effectiveness of any test is ultimately based on its design.
"I would like to see more attention being paid (in the policy sector) to testing design and accountability system design," she said.
Although some analysts argue that the impact of research like this can be limited by the unwillingness of those on opposing sides of the debate to look at the testing issue from the middle ground, Loveless said it is also important to keep in mind the limited body of data on the subject.
"We are just getting started in terms of solid research on standards, testing and accountability," said Loveless. "So it makes sense that people would be in disagreement because there is not a lot out there yet."