This report contains a correction.
Introduction and summary
Beginning in 2001, the federal education law known as No Child Left Behind, or NCLB, required every state to operate a system of school performance management based on annual student outcomes. Classifying school performance is one part of this broader system of accountability, which also includes data collection and reporting, delivery of supports for school improvement, and distribution of resources to districts and schools. Yet classification systems have often received negative attention because they have been more often associated with high-stakes shame and punishment practices than with continuous school improvement.1
This is due in large part to federal school classification requirements, which were specific by design to label and differentiate treatment of schools based on whether they met annual reading and math proficiency targets.2 This often led to narrow or simple pass/fail categorization systems based on schools meeting incrementally increasing state targets for test scores and graduation rates. Schools that made progress but failed to meet these targets went unrecognized.
Federal law did allow states to classify schools using performance measures beyond test scores. But any additional measures simply meant more ways to fail, as they too were subject to the pass/fail yardstick. As a result, states stuck to the limited measures required by NCLB for their federal accountability systems.
In response to this limitation, several states created their own accountability systems—which were used within the state and not for federal accountability purposes—to measure other factors that were critical to their visions for school success and student learning.3 Nonetheless, states still based these systems primarily on academic proficiency.
Fast-forward to 2017. Measuring only how well students read, write, and do math falls woefully short of assessing the range of skills students need to succeed today. Of the slightly more than 11 million jobs created since the Great Recession, all but 100,000 of them have gone to workers with at least some college education.4 We live in a global, technology-dependent, rapidly changing economy in which reading and math skills are not enough to compete for today’s jobs. As a result, to succeed in the current workforce, students need to learn to adapt to technology and to work independently and with one another.
The Every Student Succeeds Act, or ESSA, which reauthorized NCLB in 2015, gives states the chance to respond to this demand. Under ESSA, states have an opportunity to develop dynamic school classification systems that measure a wider range of student outcomes assessing readiness for college and careers.
Toward this end, the Center for American Progress has designed three school classification system models that capture a broader range of student performance than systems of the past. This report provides an overview of these designs—including performance indices, matrices, and decision rules—in addition to their benefits and drawbacks. The report also includes recommendations for states to keep in mind so that they can meaningfully measure and compare school performance, thereby identifying the schools most in need of support.
Overview of ESSA
The Every Student Succeeds Act ushers in a broader view of student success that recognizes the realities of the current workforce and aligns with its trajectory.5 It also acknowledges that students today need a more holistic and well-rounded education to succeed, requiring states to use additional measures of school quality or student success alongside more traditional academic measures to classify schools.6 For a more detailed analysis of these additional measures, see CAP’s “Innovation in Accountability” report.7
In addition to this broader view, ESSA drives states to diversify their accountability systems by requiring overall, or summative, school classifications based on objective student outcome data. ESSA also requires states to collect and report more nuanced data about school performance and school context, such as chronic absenteeism rates and per-pupil funding amounts. As a result, states are now required both to identify schools needing the most support and to produce annual report cards that include more holistic data, allowing for strategic deployment of state- and district-level resources to improve student performance.
Under the existing ESSA regulations, states have two years to design and launch their school classification systems, which are complex and take time to develop.8 The measures and formulas that states use must meet specific technical standards set by the law, including validity, reliability, and meaningful differentiation. To be valid, each indicator in the system must be an accurate measure of what it intends to measure. Reliable indicators produce measurement results consistently, and when combined, the measures must “meaningfully differentiate” schools along each of the school performance measures.9
Once submitted, these systems will undergo technical review and approval by the U.S. Department of Education. The technical reviews will also examine the extent to which states’ school classification systems meet the law’s requirements to annually differentiate school performance using all of the measures in those systems. States must also, with limited exceptions, identify low-performing schools for either comprehensive support and improvement or targeted support and improvement every three years. By default, then, there will be a third group of schools not identified for support and improvement.10
The existing ESSA regulations clarify that states may choose to classify schools using only these three categories as they design their systems. Or they might opt to create additional categories that further distinguish school performance, such as an A through F or five-star system, while also identifying schools for support and improvement as required by law.
To create these summative classifications, indices—meaning systems that sum to 100 percent, as an A through F system would—are often the first that come to state policymakers’ minds. However, there are other approaches that states can use to combine school performance results into a summative rating, including matrices and decision rules.
A deeper look at ESSA’s specific school identification requirements
Under ESSA, all schools must receive performance information annually, and states must identify two groups of low-performing schools—comprehensive support and improvement schools and targeted support and improvement schools—at least once every three years.
Comprehensive support and improvement schools include the bottom 5 percent of Title I schools statewide, high schools with graduation rates below 67 percent, and Title I schools with chronically low-performing subgroups of students that have not improved after receiving additional targeted support.11
Targeted support and improvement schools have subgroups of students that are performing as low as all students in the bottom 5 percent of Title I schools. In addition, states must annually identify schools with consistently underperforming subgroups, as defined by the state.12
To identify these schools, ESSA requires states to use the following indicators:
- Academic achievement, which measures grade-level proficiency in reading/language arts and mathematics in the third through eighth grades and once in high school
- Graduation rate, which measures the four-year adjusted cohort high school graduation rate and, at the state’s discretion, an extended-year adjusted cohort graduation rate
- For elementary and middle schools, growth based on the required annual assessments, or another academic measure that the state chooses
- Progress in achieving English language proficiency based on English learner, or EL, performance on the state English language proficiency assessment
- One or more measures of school quality and student success, which may vary by grade span
States must assign “substantial weight” to each of the first four indicators in their school classification systems, and together, these indicators must be afforded “much greater weight” than the fifth indicator.13 States also have some flexibility in how to define these indicators, but they must remain within the law’s requirements. For example, the existing ESSA regulations clarify that states may measure multiple performance levels of academic and English language proficiency, allowing states to move away from the reliance on a single cut score.
For this report, CAP developed the following definitions of indicators to illustrate the requirements and flexibility in how states may define the indicators in their systems. Each of the examples takes advantage of this flexibility by measuring a dynamic range of performance rather than relying on a simple cutoff score or yes/no format. Items 5 and 6 serve as possible options, as states could use either of them or others of their own design.
- Performance on state assessments in English language arts, mathematics, science, and social studies, for all students and for each subgroup
- Calculated based on whether all students and each subgroup are meeting or making progress toward their state-set targets for the percentage of students achieving at grade level
- Additional credit if the performance of low-income students, students with disabilities, or ELs is in the top 25 percent of the state
- Growth or another academic indicator:
- Percentage of students making meaningful growth in English language arts and mathematics based on state assessments, for all students and for each subgroup
- Meaningful growth means at least one year’s worth of growth for students who are at or above grade level and more than one year’s worth of growth for students who are below grade level
- Also includes the percentage of ELs who reach the proficient level on the state’s English language proficiency assessment within one year of enrollment in the school
- High school graduation:
- The four-year cohort rate, or the percentage of students who graduate in four years or less with a regular high school diploma, calculated by taking the number of students who enter 9th grade; adding any students who transfer into the cohort during the 9th grade and the next three years; and subtracting any students who transfer out, emigrate to another country, or die14
- The extended-year adjusted cohort rate, for five, six, or seven years, as applicable to the state15
- English language proficiency:
- Required for ELs only
- Performance on state assessments in English language proficiency
- Calculated based on whether all students in the EL subgroup are meeting or making progress toward state-set targets for the percentage of students reaching English language proficiency
- Additional credit if ELs attain English language proficiency in 3 years or less
- Culture and climate as a measure of school quality and student success:
- Student, parent, and teacher engagement, as measured by surveys; chronic absenteeism; suspension and expulsion rates
- Measured for all students and for each subgroup
- College and career readiness as a measure of school quality and student success:
- Participation rates—calculated as the share of students enrolled—in advanced coursework or exams and career and technical education courses
- Performance in advanced coursework or exams, calculated based on students meeting specific benchmarks for courses or exams; attainment of industry-recognized certificates
- Participation of middle school students in high school-level courses
Overview of school classification systems
School classification systems provide specific kinds of value to policymakers, educators, and parents. First, school classifications help state policymakers prioritize which schools need support to ensure the progress of all students toward the state’s learning goals. They also help align the state’s K-12 educational program with related programs administered by postsecondary and workforce systems to meet college and career readiness goals. Second, school classifications help educators target resources to the needs of the whole school and within individual classrooms to meet student learning targets. Third, classifications help parents compare school quality based on which schools are meeting learning goals and for which students.
States can ensure that their school classification systems accomplish these goals by measuring a broader range of student learning, including postsecondary and workforce outcomes. Some of these measures include industry-recognized certification program enrollment, college attainment rates, and college remediation rates, which signifify that students were not ready for the academic demands of credit-bearing coursework. College dropout rates are also higher for students of color and low-income students, so persistence rates for all student groups are important data to collect.16 Additional indicators of readiness for college and careers are detailed in CAP’s “Making the Grade” report.17
States have an opportunity to link these measures with how they have defined college and career readiness, as most states have articulated a formal definition of this term. Having a broad definition of college and career readiness will also help the state prioritize what it measures toward that goal.
The importance and challenge of including performance of student subgroups
To be meaningful, the goal of college and career readiness must be attainable for all students. To achieve this vision, combined state, district, and school efforts must close significant and persistent achievement gaps, which occur when one student group statistically outperforms another.18 However, data from international, national, and state-level sources all confirm that nonwhite, disabled, poor, and non-English-speaking students perform more poorly than their peers outside of these groups.19
NCLB first exposed these achievement gaps by requiring states to report disaggregated annual achievement data. While the law aimed to close these gaps, they persist despite incremental progress.20 Even after making statistical adjustments to proficiency rates under NCLB, by 2005—four years after the law passed—the rates of schools making “adequate yearly progress” started to decline.21 Any school missing a single target for any subgroup for two years in a row initiated particular actions, such as offering free tutoring or the option for students to transfer to a higher-performing school. By 2011, more than half of schools in all states were labeled as failing due to missing performance targets for subgroups.22
NCLB’s lockstep yearly targets also failed to consider actual rates of progress of student groups, and the law punished schools for missing targets regardless of any improvement. With so many schools failing, it was difficult to target limited resources where they were needed most.
A civil rights bill at heart, ESSA plays a critical role in exposing and closing achievement gaps to ensure that schools are serving all students well. And under this law, states will likely wish to avoid labeling a school as failing if it misses a single target for a single subgroup while also ensuring that schools make progress for all students.
Accordingly, as states consider the three school classification designs detailed in the next section, they may want to identify where and how they can strike a balance between disproportionately high and low weighting of subgroup performance. For example, states can add safeguards for subgroup accountability to any school classification system. Specifically, if a subgroup falls below a certain threshold on any indicator over a certain number of years, this information could be publicly reported and the school could be notified, flagged as needing additional support but not designated as a low-performing school, or identified as a low-performing school. Additionally, such schools could drop one level on the classification system—for example, go from a B rating to a C rating.
States may also wish to set learning targets that account for where students start, as some did under the NCLB waiver initiative.23 Under this initiative, most states set targets that cut the achievement gap in half over six years. Under ESSA, states have complete discretion on setting their targets, so long as they do so for each measure of learning required by the law, apply the targets to every subgroup, and set the same timeline for all students. Accounting for where students start is a powerful signal that states value progress and can act as positive reinforcement for schools.
As states discuss the design of their school classification systems, one critical question to answer will be how great an impact they want subgroup performance to have on how schools are classified and treated as a result of this performance.
CAP used the following principles in developing each of the school classification system designs.
Offer clarity, transparency, and rich information to parents
School ratings, as well as the indicators that lead to those ratings, should be transparent and clear to parents and should reflect meaningful differences between schools. Parents care about school performance, as it helps inform school choice—when available—as well as any additional supports parents may need to obtain for their children. Therefore, information about school performance ought to clearly convey to parents how their children perform along each of the school classification system’s measures, signify in what areas their children might need additional support, and allow parents to easily compare school performance.
Reward high levels of growth for all students, including those above and below grade-level expectations
School classification systems signal whether students are on track to meet state-determined visions for education. However, since students enter school at widely different levels of learning, systems should hold schools accountable for showing high levels of growth and getting students on a trajectory that will lead them to success. Students below grade level should make more than a year’s worth of growth, and students at or above grade level should make at least a year’s worth of growth.
Meaningfully differentiate between school quality and performance
Meaningful differentiation refers to the extent to which performance on an indicator adequately sorts school performance along a spectrum. For example, if schools cluster around a value or range of values on a particular indicator, this indicator may not distinguish school performance as well as as indicators with a range of values at the bottom, middle, and top of the performance spectrum. States should test for meaningful differentiation through a trial data run of each indicator, using past student performance data when available. However, even if an indicator does not meaningfully differentiate schools, states may still wish to include it in their school classification systems because it signals what the state values. For a more detailed description of meaningful differentiation, see CAP’s “A New Vision for School Accountability” report.24
Three school identification system designs
This section presents the pros and cons of three school identification system designs for schools to consider: the performance index design, the matrix design, and the decision rules design. Each of these models takes a different technical approach to creating a summative determination. For example, a state using an index would assign a weighting, or percentage, to each indicator to calculate a single score or letter grade. Matrices, on the other hand, combine the performance of two or more dimensions of performance, such as status and growth, for each indicator. States would then assign school classifications based on how schools perform on each dimension. Finally, in a rules-based system, a state would set a threshold for performance on each indicator; a “yes” or “no” response would lead to a subsequent question; and ultimately, the combination of the responses would result in a school classification.
Performance index design
ESSA requires that the academic indicators—which include academic proficiency in reading/language arts and mathematics, academic growth, English language proficiency, and graduation rate for high schools—are each afforded substantial weight and “much greater weight” when combined.25
A school performance index is a school classification system that weights each indicator to sum to 100 percent. For example, a state that weights an indicator as 25 percent of a school’s overall rating would multiply that indicator’s raw score, such as 75 out of 100 possible points, by 25 percent. The state would then sum the subtotals for each indicator to determine a school’s total score, which can be translated into a letter grade; color; symbol, such as star ratings; or kept as a number score. Using this approach, each indicator’s percentage weight is the relative weight of that indicator compared with the whole. As a result, indicators with a greater weight will have a larger impact on the total.
Figures 1 and 2 demonstrate using the index approach with possible weightings of individual indicators for a total of 100 percent. The figures are merely an illustrative example of weightings that are in compliance with ESSA requirements; states can use different weightings than are in this example.
The indicators in Figure 1 measure the performance of all students in elementary and middle school for each subgroup, with the exception of English language proficiency, or ELP, which only applies to the English learner subgroup. Additionally, the percentages are rates of students who meet or exceed the specific performance targets on each indicator for each subgroup.
This system has three academic indicators—proficiency, growth, and ELP—and one nonacademic indicator—culture and climate. The system gives an equal weight of 30 percent to academic proficiency and growth, indicating that both static, point-in-time achievement and progress are important when generating a more complete measurement of student learning. The remaining indicators are weighted at 20 percent, which is consistent with national trends.26
In this example, states could include subgroup performance by allocating each subgroup a percentage weighting of each indicator. To do so, states could divide the indicator’s percentage by the number of subgroups so that the percentages subtotal to 100 percent of that indicator—that is, designate each subgroup as the same percentage of a percentage. This method provides the performance of each subgroup an equal weighting.
The indicators in Figure 2 include example weightings for high schools. As in Figure 1, these weightings follow the national trends described in CAP’s “Making the Grade” report.27
As in the elementary and middle school example index, states could include subgroup performance by allocating each subgroup a percentage weighting of each indicator.
Example school classification categories for school performance indices
States may translate the results from a performance index into school classification categories, such as a letter grade; symbol, such as stars or flags; a color; or a term, such as “highest performing school.” Table 1 below shows a range of possible school classification categories.
Pros and cons of an index
One important benefit of a school performance index is that it allows states to place greater emphasis on indicators that they value. For example, if a state hopes to use growth as an indicator to identify and reduce significant achievement gaps across certain schools, it could assign academic growth a greater weighting than academic proficiency. Greater weightings of growth could also incentivize schools to pay additional attention to students whose growth has stalled. As a result, indicator weightings should reflect a state’s goals for student learning. This flexibility, though, is limited by the existing ESSA regulations, as the weighting of nonacademic indicators cannot be used to remove a school from a low performance designation.28
In addition, school performance indices typically create summative classifications that are simple to understand, such as A through F letter grades. Most parents are already familiar with this grading system, making it easy for them to compare schools and make a more informed choice. It also provides a clear picture of whether a school is one that parents likely want their child to attend.
However, the summative ratings of a school performance index are compensatory, meaning higher performance on one indicator offsets low performance on another. As a result, summative ratings may mask low achievement: For example, a school with an A letter grade may have struggling subgroups. Without reviewing the performance of each indicator, parents may not have a complete understanding of how a school will serve their child.
School performance indices also translate the performance of individual indicators to a uniform performance scale, which can require several, at times complicated, steps. For example, to combine academic proficiency—usually expressed as a rate or percentage—with a measure of school culture and climate—which may be qualitative responses from a survey—states must first normalize the indicators so that the scores are on the same scale.
Finally, rolling up performance into a single score can omit critical context that provides essential information as to why a school is performing the way it is. For example, a school’s performance likely relates to conditions within the district, such as how the district allocates resources to each school. Resource allocation may not be captured in a performance index.
A matrix design uses multiple, intersecting dimensions of performance on an indicator to determine an overall classification. In this example, each dimension represents a scale of performance, such as low, medium, and high. Matrices usually have two axes, an x-axis and a y-axis, that states can apply to each indicator—that is, one matrix for each indicator—or combine for all indicators—that is, the school receives an average x-axis calculation for all indicators and an average y-axis calculation for all indicators, resulting in one matrix.
For example, the sample matrix design in Figure 3 below has two dimensions: growth and achievement. The dimensions are placed along the x- and y-axes, forming four quadrants that reflect different levels of achievement and growth. Low achievement and low growth are in the bottom left; high performance and low growth are in the bottom right; low performance and high growth are in the top left; and high performance and high growth are in the top right.
In theory, this design could create four groups of school performance—one in each of the quadrants. If states wish, they could further differentiate each quadrant by adding, for example, quartiles of performance and growth. Figure 3 includes the bottom 25 percent, the middle 50 percent, and the top 25 percent of performance and growth to create three color categories. Using this approach, states could create up to nine groups of school performance.
Pros and cons of matrix designs
Matrices allow states to determine a school’s rating using a more robust consideration of performance on a single indicator. As in the example above, the matrix has more frequent cut points—the performance quartiles—and allows for further differentiation of school performance based on the amount of growth students exhibit. Thus, the important question this type of design answers is not merely whether students grew but by how much. This design also allows states and districts to concentrate their efforts on schools with students that have the lowest growth rates. From the school’s perspective, this dissection of growth creates disincentives for focusing on a small subset of students whose performance hovers just below a single threshold. Parents can also select schools that have the highest growth rates.
Matrices, however, are not as clear-cut as letter grades, so it may not be as easy for parents or the public to understand how the school is performing. Since the indicators do not culminate in a single score, parents may need to review more dimensions of performance and fit the pieces together themselves to gain an overall understanding of how well a school is doing. This drawback is an important consideration as states weigh trade-offs between simplicity and complexity.
Decision rules design
Decision rules models classify schools based on state-determined thresholds of performance for multiple indicators. Typically, this takes the form of binary if/then, yes/no, or pass/fail statements.
Table 2 below illustrates a simple decision rules system using this approach. A series of “yes” or “pass” statements for each indicator yields a summative classification of high performance. A combination of yes/no or pass/fail statements yields a school classification that reflects average or slightly above average school performance. A series of “no” or “fail” statements identifies a school for improvement. States can include any number of rules for each indicator.
Pros and cons of decision rules
Decision rules systems do not normalize or mathematically combine indicators, an attribute that may improve transparency and make it easier for parents to understand how a school is performing on each indicator. In addition, high performance on one indicator does not artificially raise the average or mask low performance on another indicator. Another significant benefit of this design is that states can create specific questions about subgroup performance for each indicator when schools fail to meet specific performance thresholds.
However, the series of decisions in more complicated systems can be difficult to follow, and it can be hard to understand how they result in a school classification. This is because decision rules designs can require a lengthy series of questions to derive the final classification, since a school’s classification does not follow a narrow or straight path.
States should consider these benefits and drawbacks of the decision rules design when weighing this option against the performance index and matrix designs.
While each of these school classification models has unique challenges and advantages, careful development of any of them can offer meaningful information about school performance for school staff, policymakers, and families. As states choose among them and design final models, there are additional considerations that they should keep in mind to improve data quality and the ability of educators and parents to use these systems.
The following recommendations lay out key design principles that apply to each type of school classification system, in no particular order. Each of these has the potential to mitigate some of the cons discussed in each system design or to heighten the benefits.
Provide useful, actionable information to educators
School classification systems should do more than just rate, label, and sort schools. They should signal what is important and drive positive action by local leaders, parents, and teachers. When considering indicators for the system, the primary criterion should be whether low performance on the indicator will incentivize positive change that will benefit students. This positive change might include the continuous review of resources to meet student needs, enable educators to provide every student with high-quality instruction, and ensure that schools can create a safe and positive climate.
Provide districts and schools rich sets of additional data
School classification systems provide a limited snapshot of school quality and student success. Schools will always need additional information outside of a school’s influence to inform systems of support, whether for continuous improvement or to turn around low-performing schools. For example, stakeholders also need information about school context, including the amount or quality of resources a school receives.
While the Every Student Succeeds Act requires states to provide additional information to districts and schools, such as chronic absenteeism and discipline rates, additional information may still be needed at the local level. States should engage with their local stakeholders to identify what information educators need to support students.
Use multiple years of data
School classification systems should use multiple years of data to calculate performance on each indicator for the whole school and for individual subgroups. When indicators are measured consistently year over year, combining multiple years of data can smooth the effects of outlier performances in a single year. However, states should use caution when combining multiple years of data when the instrument used to measure the indicator has changed. For example, if states change their standards or the assessments used to measure the standards, results on those assessments may not be comparable.
Consider fluidity of design
States can create a hybrid system by combining components of each model system that fit their needs. For example, states could measure status and growth for each indicator in a school performance index. Or, states could assign a letter grade to each indicator and use decision rules to determine how a combination of letter grades identifies the lowest-performing schools. If states like some aspects of one design and some of another, they should be creative and use what they like and eliminate what they do not like from each design.
ESSA provides an exciting opportunity for states to experiment with measuring student and school performance and to provide valuable information to schools and parents. As part of the broader systems of accountability that states will develop, school classification systems are one way for states to communicate their values and signal to schools which measures should hold their attention.
This report is designed to provoke states’ thinking as they create their systems. In doing so, states should not aim to just comply with ESSA. Rather, they should take advantage of the flexibility afforded by the law in order to develop classification systems that reflect their state vision for education and that meaningfully distinguish school performance in attaining that objective. In doing so, states can design new systems that ultimately capture their definitions and goals for student success.
About the authors
Laura Jimenez is the Director of Standards and Accountability at the Center for American Progress.
Scott Sargrad is the Managing Director of K-12 Education Policy at the Center.
Samantha Batel is a Policy Analyst with the K-12 Education team at the Center.
Catherine Brown is the Vice President of Education Policy at the Center.
* Correction, March 10, 2017: Table 1 of this report has been updated to reflect an accurate school classification label.