The Problem with Test-Based Accountability
In 1983, A Nation at Risk stated that “the educational foundations of [America] are presently being eroded by a rising tide of mediocrity that threatens our very future as a Nation and a people,” and contending that public education had “lost sight of the basic purpose of schooling” (National Commission on Excellence in Education, 1983, p. 5). The alarmist rhetoric advanced in A Nation at Risk laid the foundation for the emergence of a metric-based system of accountability at both the state and federal levels, seeking to track the quality of education through quantitative measures, rewarding schools demonstrating improvement and punishing those that failed to. However, the rising tide of test-based accountability has not only failed to demonstrate empirical success but promotes principles and practices hostile to democracy.
The fundamental flaw of test-based accountability is that assessing student performance is inherently subjective. Accountability places the burden of economic and social progress squarely on the shoulders of public education. This overlooks the obvious: economic inequality as a predictor of academic achievement (Bowles & Gintis, 1976). This oversight is what Diane Ravitch (2010) calls A Nation at Risk’s “single greatest flaw” (p. 32). In fact, socioeconomic status (SES) stands alone as the single greatest predictor of academic performance. For instance, low SES in childhood is “related to poor cognitive development, language, memory, socioemotional processing, and consequently poor income and health in adulthood” (American Psychological Association, 2017, para. 3). Standardized tests cannot possibly account for these disparities, particularly when the school systems serving the SES disadvantaged are often under-resourced (American Psychological Association, 2017).
The effect of socioeconomic status as a predictor of academic achievement has not gone unnoticed by advocates of test-based accountability. They have proffered solutions in the form of value-added measurement (VAM). VAM employs teachers as the medium through which to understand student achievement. Rather than focusing on individual test scores, VAM measures progress over time. This practice is intended to transcend individual circumstances to look at the big picture, seemingly accounting for diversity of socioeconomic background (Rand Corporation, n.d.). For example, to render a value-added measure for a given teacher, administrators consider the scores of his students the year prior, adjusted to account for socioeconomic disparities. The results are then used to make predictions for how the same students should score in his class and compare the findings against their actual scores. Teachers whose students score above prediction are considered high-value and those scoring below, low-value. However, value-added assessment is fundamentally gilded. In reality, by relying on data, “technical experts could evaluate teachers and schools without regard to the curriculum or actual lived experience of the student,” depersonalizing both student and teacher (Ravitch, 2010, p. 189). In effect, VAM exacerbates the already hostile climate of test-based accountability for teachers and students while failing to effectively address the obstacles to achievement posed by socioeconomic status. Thus, attempting to define schools in terms of test scores risks overlooking the many other variables that influence academic performance. Despite this, A Nation at Risk became a period-defining indictment of public education, the legacy of which has been brought into the present with No Child Left Behind (NCLB) and Race to the Top (RTTT). However, while A Nation at Risk was simply a set of recommendations, No Child Left Behind and Race to the Top are pieces of federal legislation with real consequences.
No Child Left Behind was once hailed as a “bipartisan breakthrough,” an engine for academic progress uplifting America’s underserved (Darling-Hammond, 2007, p. 1). However, in the twenty-four years since the bill was signed into law, NCLB’s legacy has become increasingly obfuscated. In practice, NCLB radically altered the purpose of testing, from an informational and diagnostic practice to one with tangible consequences for the livelihood of students, teachers and administrators.
For example, to encourage improved test scores, many states introduced merit pay systems, in which wages and bonuses for teachers and administrators were tied to the success of their students on standardized tests (Spring, 2023). In New York, then New York City School Chancellor Joel Klein announced bonuses of up to $40,000 for superintendents in whose districts test scores improved (Ravitch, 2010). However, a system of merit pay overlooks two important considerations for understanding the overwhelmingly damaging nature of test-based accountability.
First, the standards promoted by NCLB test exclusively for math and reading, both important subjects, but by no means a sufficient education. By tying teacher and administrator pay to their students' test performance, pressure mounted for increased time directed toward preparing students for the math and reading questions they should expect to see on tests. As a result, the curriculum narrowed. In effect, merit pay came at the expense of a holistic education, with efforts to continue to provide students with a comprehensive curriculum met with penalties in the form of lower wages and school closure.
Furthermore, in an effort to remain open and qualify for bonuses, schools dramatically lowered proficiency standards. In New York, from 2006-2009, students demonstrated marked increases in achievement. Students reaching proficiency on the state-administered math test jumped from 28.6 percent to 63.3 percent, a seemingly incredible rise, but the reflection of a benchmark of proficiency fell from 59.6 percent in 2006 to 44 percent in 2009 (Ravitch, 2010). As a result, New York schools could boast of higher test scores at the expense of developing real gains. This reality is further complicated by an increased emphasis on test preparation in the lead-up to annual tests, where teachers direct all their attention to preparing students for the questions they should expect to see and strategies to respond effectively, without cultivating a real understanding of the material, and at the expense of subjects not being tested. This phenomenon, known as teaching to the test, has been proven ineffective (Ravitch, 2010).
When the Obama administration took office in 2009, there was hope that many of NCLB’s practices would be reevaluated given their fundamentally flawed character. However, the Obama-era Race to the Top legislation served to entrench these existing practices while placing additional pressure on schools through federal funding incentives. RTTP was introduced by Secretary of Education Arne Duncan as a national competition in which states competed for federal grants by demonstrating academic improvement. Schools were encouraged to continue administering tests, evaluate teachers based on test scores, and adopt the Common Core State Standards (CCSS) (Schneider & Berkshire, 2022). CCSS existed before RTTP as a set of national standards used to evaluate students’ math and reading performance. Not only are CCSS not supported by research, but they also assume students progress at the same rate when exposed to the same material, a naive assumption overlooking the complexities of students' socioeconomic, cultural, and ethnic backgrounds that simply cannot be accounted for on a test (Mathis, 2010). Nonetheless, the RTTP agenda, leveraging federal funds, succeeded in promoting the implementation of the CCSS along with exacerbating a culture of accountability already running rampant. In all, 48 states participated in the competition (The White House, n.d.).
An overview of the accountability era is not complete without one final consideration. Testing doesn’t work. Time and again, evaluating students on a standardized curriculum has presented the facade of progress without uplifting the students themselves. Alternatively, accountability has served to shrink curriculum at the expense of a holistic education while placing a tremendous burden on teachers and administrators to improve scores or lose their jobs and schools. Most significantly, test-based accountability fundamentally distorts students' understanding of education’s purpose. Test scores carry little weight except to define the academic achievement of a given teacher or school; they have no “statistically significant effect on college attendance or the likelihood of receiving a Bachelor’s degree” (McElroy, 2023, p.1). Instead, they serve to restrict students' understanding of the purpose of education by defining identity in terms of test scores, framing peers as obstacles, and perpetuating a climate of anxiety to the detriment of students’ overall morale and love of learning (Terada, 2022). By pitting students against each other, testing breeds competition and individualism at the expense of participatory and collective exercises (Kohn, 2007).
This reality is fundamentally antidemocratic. Not only does time committed to testing represent time that could have been spent practicing the virtues of collectivism, seen through the lens of a group project, and critical dialogue in the form of class discussions, but the individualism promoted by testing perpetuates insular and zero-sum values that inhibit students’ ability to participate effectively in democratic society (Kohn, 1987).
Test-based accountability poses a serious threat to preparing students for civic life. Teachers cannot meet the responsibilities of their position when they are threatened, and students cannot develop a shared commitment to the common good when class days are spent fostering competition in the form of high-stakes testing. The emphasis on test-based accountability has fundamentally reframed the purpose of education, overlooking the role of the school as an engine for social progress and fostering principles and practices incompatible with democratic society.
Bibliography
Aikens, N. L., & Barbarin, O. (2008). Socioeconomic differences in reading trajectories: The contribution of family, neighborhood, and school contexts. Journal of Educational Psychology, 100(2), 235–251. https://doi.org/10.1037/0022-0663.100.2.235
American Psychological Association. (2017). Education and socioeconomic status. American Psychological Association. https://www.apa.org/pi/ses/resources/publications/education
Bowles, S., & Gintis, H. (1976). Schooling in capitalist America: Education reform and the contradictions of economic life. Basic Books.
Bradley, R. H., Corwyn, R. F., McAdoo, H. P., & García Coll, C. (2001). The home environments of children in the United States. Child Development, 72, 1844–1867. https://doi.org/10.1111/1467-8624.t01-1-00382
Darling-Hammond, L. (2007). Evaluating “No Child Left Behind.” Stanford University. https://web.stanford.edu/~hakuta/Courses/Ed205X%20Website/Resources/LDH_%20Evaluating%20'No%20Child...pdf
Kohn, A. (1987, September). The case against competition. Working Mother. https://www.alfiekohn.org/article/case-competition/
Kohn, A. (2004). NCLB and the effort to privatize public education. In D. Meier & G. Wood (Eds.), Many children left behind (pp. 79-97). Beacon Press.
Kohn, A. (2007, September 18). Against ‘competitiveness’. Education Week. https://www.edweek.org/teaching-learning/opinion-against-competitiveness/2007/09
Mathis, W. J. (2010). The “Common Core” Standards Initiative: An effective reform tool? Education and the Public Interest Center & Education Policy Research Unit. Retrieved from https://nepc.colorado.edu/publication/common-core-standards
McElroy, K. (2023). Does test-based accountability improve more than just test scores? Economics of Education Review, 94, 102381. https://doi.org/10.1016/j.econedurev.2023.102381
National Commission on Excellence in Education. (1983). A nation at risk. U.S. Department of Education.
Rand Corporation. (n.d.). Value-Added Modeling 101: Using student test scores to help measure teaching effectiveness. RAND Education, Employment, and Infrastructure. https://www.rand.org/education-employment-infrastructure/projects/measuring-teacher-effectiveness/value-added-modeling.html
Ravitch, D. (2010). The death and life of the great American school system: How testing and choice are undermining education. Basic Books.
Reardon, S. F., Valentino, R. A., Kalogrides, D., Shores, K. A., & Greenberg, E. H. (2013). Patterns and trends in racial academic achievement gaps among states, 1999–2011. Retrieved from https://cepa.stanford.edu/content/patterns-and-trends-racial-academic-achievement-gaps-among-states-1999-2011
Schneider, J., & Berkshire, J. (2022). A wolf at the schoolhouse door. The New Press.
Spring, J. (2023). American education (21st ed.). Routledge.
Terada, Y. (2022, October 14). The psychological toll of high-stakes testing. Edutopia. https://www.edutopia.org/article/psychological-toll-high-stakes-testing/
The White House. (n.d.). Race to the Top. https://obamawhitehouse.archives.gov/issues/education/k-12/race-to-the-top