Read the full report (pdf)
Our goal in this project was to measure academic achievement relative to a district’s educational spending, while controlling for factors outside their control, such as cost of living and degree of student poverty. Our work builds on a 2007 report, “Leaders and Laggards,” published by the Center in partnership with the U.S. Chamber of Commerce and Frederick Hess of the American Enterprise Institute. In that study, we evaluated state-level returns on investment, comparing scores on the National Assessment of Educational Progress with a state’s education spending after controlling for differences in student poverty, special education enrollments, and cost of living.
Our measures build on the excellent work of many other researchers. Standard & Poor’s School Evaluation Services produced in recent years a district return-on-spending index. It looked at the percentage of students achieving proficiency in reading and math for every $1,000 spent per student on core operations. Florida conducts an annual productivity examination for each of its schools, and the state uses a methodology similar to our Basic Return on Investment Index, described below, which compares school-level reading and math gains against adjusted expenditures.
Our approach was aided by an advisory group that included Bruce Baker, an associate professor at the Graduate School of Education at Rutgers University; Gary Bass, founder and executive director of OMB Watch; Jack Buckley, an associate professor of applied statistics at New York University (and now the commissioner of the National Center for Education Statistics); William Duncombe, a professor of public administration and associate director of the Education Finance and Accountability Program at Syracuse University; Daria Hall, director of K-12 policy development at the Education Trust; Craig Jerald, president of Break the Curve Consulting; Raegen Miller, associate director for education research at the Center for American Progress; and Marguerite Roza, research associate professor at the University of Washington’s College of Education (and now a senior data and economic adviser at the Bill & Melinda Gates Foundation).
We also solicited the advice of practitioners including Dr. Bonita Coleman-Potter, deputy superintendent of Prince George’s County (Maryland) Public Schools; productivity experts such as Eric Hanushek, the Paul and Jean Hanna Senior Fellow at the Hoover Institution of Stanford University; and education reform advocates including Van Schoales, executive director of Education Reform Now, a national educational policy advocacy group. Finally, we hired an independent researcher to examine our work and ensure that our results were broadly replicable. Nevertheless, we take full responsibility for the methodology and evaluations.
We produced productivity evaluations for more than 9,000 districts that enroll more than 85 percent of all U.S. students. We were unable to produce results for Alaska, the District of Columbia, Hawaii, Montana, and Vermont. Hawaii and D.C. are single-district jurisdictions, so within-state comparisons were not possible. Montana and Vermont likewise did not have enough comparable districts. We excluded Alaska because we could not sufficiently adjust for cost-of-living differences within the state.
Spending data came from the Local Education Agency Finance Survey, also known as the F-33, produced by the federal government’s National Center for Education Statistics, or NCES, the primary federal entity for collecting and analyzing data related to education. These data are from the 2007-08 school year, the most recent year for which complete data are available. Since that time, districts may have taken steps that might have significantly changed their efficiency ratings.
We used the “current expenditures” category, which includes salaries, services, and supplies. It does not include capital expenses, which tend to have dramatic increases from year to year, and thus are unreliable for comparisons. The expenditure data include money from all revenue sources, federal, state, and local. We subtracted from this sum any payments to private schools and charter schools in other districts to come up with per-pupil expenditures, as is NCES practice. The data were downloaded from the NCES website on October 18, 2010.
We restricted our study to districts with at least 250 students that offered schooling from kindergarten to 12th grade. We also excluded districts classified as a charter school agency, state-operated institution, regional education services agency, supervisory union, or federal agency. Data from New York City Public Schools were also aggregated into a single district. And to ensure that we had a sufficient number of comparable districts in each state, we included states only if more than 50 percent of their students were covered by our analysis.
We also relied on NCES to calculate district-level demographic data for the 2007- 08 school year, the number of students receiving free and reduced price lunch, the number designated as English language learners, and the number that participate in special education. We downloaded this data from the NCES Common Core of Data website on October 18, 2010.
Many districts did not report demographic data for the 2007-08 school year, necessitating the use of proxies. If a school district was missing a demographic indicator, we substituted data from either the 2008-09, 2006-07, or 2005-06 school year. Because demographic data can vary over time, we did not use data from more than three different school years for any demographic indicator. The Common Core of Data did not report the number of students eligible for free and reduced-price lunch for a number of large North Carolina districts, and so we obtained the data for seven districts in the state—Bertie, Johnston, Robeson, Sampson, Union, Vance, and Wake—from 2008 compliance reports. In no instance did we use proxies for achievement or expenditure data.
Achievement data came from the New America Foundation’s Federal Education Budget Project, which collects data from the states on district-level student outcomes. We used these data to create an achievement index, developing a score for each district by averaging together the percent of students designated proficient or above on the state assessment in reading and math in fourth grade, eighth grade, and high school for the 2007-08 school year. Because we did not have the total number of students who scored proficient or above, we simply averaged together the percent proficient for each subject and grade level.
The Federal Education Budget Project excludes districts characterized by NCES as a charter school, state-operated institution, regional education services agency, supervisory union, or federal agency. It also does not include any charter districts. The budget project also includes only districts created before 2006. We downloaded the data from the project’s website on October 18, 2010.
Our three productivity measures
To emphasize the complexity of measuring a district’s productivity, we offer three different approaches to measuring productivity rather than a single ranking. The companion website to this report allows the public to compare districts in a state using each of our metrics, as well as to easily compare school systems with similar demographics and size. The site also details each district’s achievement and spending data. We used shades of colors when ranking the districts to emphasize the fact that we did not evaluate districts against an external benchmark but rather on their relative performance.
Basic Return on Investment index rating
This measure rates school districts on how much academic achievement they get for each dollar spent, relative to other districts in their state.
Because it costs more to educate certain populations than their peers, we adjusted the expenditure data for students in special programs, such as students who receive subsidized lunches and are in special education. This is a common practice in school finance research, and we derived the weights by calculating the average weight used in a half-dozen research studies and policy papers. Based on those calculations, we used a weight of 1.4 for free and reduced-price lunch, 1.4 for English-language learners, and 2.1 for special education.
To understand how this works, consider an example. The research indicates that each student who qualifies for a subsidized lunch costs about 40 percent more to educate. So, for each additional student in the free and reduced-priced lunch program, we subtracted 40 percent from the district’s per-student spending.
To adjust for cost-of-living differences, we used the Comparable Wage Index, a measure of regional variations in the salaries of college graduates who are not educators. Lori Taylor at Texas A&M University and William Fowler at George Mason University developed the CWI to help researchers fine-tune education finance data to make better comparisons across geographic areas. We used adjustments from 2005, the most recent available.
To calculate the adjusted costs for each district, we created a needs index designed to measure how much additional funding a school district should have received based on its students in special programs, including the percentage of students in the subsidized school lunch program, special education students, and Englishlanguage learners. We created the index by multiplying the number of students in these special programs by their respective weight. We then divided the weight by the enrollment to get the average additional amount of funding that a given school district should have received. To avoid penalizing districts with greater needs, we then divided the raw per-pupil expenditure by the weighted index to produce the amount of money a district would have spent if it had no students in special programs. Finally, we adjusted this measure by the CWI to make it comparable across different geographic localities.
We then distributed districts in each state into three equal tiers based on their position on the achievement index, with the highest achievers in the top tier and the lowest achievers in the bottom tier. We also divided the districts into three equal tiers based on their adjusted expenditures, with the highest adjusted spenders in the top tier and the lowest adjusted spenders in the bottom tier. Then we used an evaluation matrix to assign colors to each district based on their achievement tier relative to their spending tier, with green being the most productive and red being the least productive.
The matrix rewards districts that had low spending and high achievement relative to other districts in their state. So if a district was in the top third of achievement and the bottom third in spending, it would receive a rating of green.
To understand better how our Basic ROI Index works in practice, consider Maryland. We first ranked the state’s 24 districts along our achievement index. That put districts with relatively high achievement, such as Queen Anne’s County Public Schools, in the top achievement tier. (Queen Anne’s County has an achievement index of 87 and ranks sixth in the state on that measure). Districts with relatively low achievement, such as Dorchester County Public Schools, went into the bottom achievement tier. (Dorchester County has an achievement index of 72 and ranks third from the bottom on this measure.)
Then we looked at each district’s adjusted spending. Dorchester County had high adjusted spending, and so it went into the highest adjusted spending tier. (Dorchester’s adjusted per-student spending is $10,462 and ranks 17th out of 24 districts on this measure.) Queen Anne’s County had relatively low adjusted spending, and so it went into the lowest adjusted spending tier. (Queen Anne’s adjusted per-student spending is $8,648 and ranks seventh in the state on this measure.)
Then we used the evaluation matrix (see box, page 19) to assign colors to each district based on its achievement tier relative to its spending tier. Queen Anne’s County had high achievement and low adjusted spending, and so it received a green rating. Dorchester had low achievement and high adjusted spending, and so it received a red rating.
Adjusted Return on Investment index rating
This measure uses the same approach as the Basic ROI rating but applies a different statistical method, called a regression analysis, to account for factors outside a district’s control, such as the added costs of educating low-income, non-English-speaking, and special education students. The adjustments, or weights, used in the Basic ROI are not always sensitive enough to account for spending differences within states. For example, states might provide districts with additional funding for students in special education, and thus a weight of 2.1 for a student in special education might be too high.
In this approach we predicted what a district would spend relative to other districts in the state. We ran the regression models separately for each state to account for variation within each state’s educational financing system. Here’s the process depicted as an equation: ln
(CWI adjusted ppe) = β0 + β1% free lunch + β2 % ELL+ β3 % Special Ed + ε
We predicted each district’s spending based on the percentage of students in special programs, including the percentage of students receiving free or reduced-price lunch, the percentage designated as English-language learners, and the percentage who participate in special education. Thus, we predicted how much more or less the school district is spending than what we predicted it should be spending—also known as a residual—and we used this as our measure of spending.
We then divided the districts into three tiers based on how much more or less the district spent than what we predicted it should have spent. Districts with lower-than- predicted scores went into the lowest tiers, and those with higher-than-predicted scores into the highest tier.
We then used the achievement index to separate the districts into three tiers, as in the Basic ROI rating. Finally, we assigned each district a color on the evaluation matrix based on its placement on the achievement and predicted-spending tiers.
To get a sense of how this worked in practice, consider again the Maryland example. First, we ranked all the districts in the state based on their achievement indexes. Again, districts with high achievement, such as Queen Anne’s County, went into the top tier while relatively low-achieving districts, such as Dorchester County, went into the bottom tier.
Next, we looked at each district’s predicted spending score, or the difference between the predicted value and the actual value. Dorchester County had an average predicted spending score, or residual, and so it went into the middle tier for predicted spending. Queen Anne’s County had an average predicted spending score, or residual, and so it went into the middle tier for predicted spending.
Then we compared the districts against our evaluation matrix. Dorchester County had low achievement and a middling predicted spending score, or residual, and so it received a rating of dark orange. Queen Anne’s County had high achievement and a middling predicted spending score, or residual, and so it received a rating of light green.
Predicted Efficiency index rating
This measure is significantly different than the first two measures.
The first two measures rate districts based on the achievement that school systems produce compared to their expenditures after controlling for factors outside the district’s control. In contrast, the predicted efficiency measure doesn’t compare achievement to spending. Instead, the approach rates districts on the results of their predicted achievement after controlling for factors outside their control. This distinction is important. The first two approaches attempt to measure how much “bang for the buck” a school district gets. This third approach attempts to eliminate the effects of spending and other factors such as students with additional needs and then evaluates districts by how much more or less achievement the district produced than would be expected.
Technically, then, this approach does not evaluate districts against an evaluation matrix, nor does it weight or predict the amount that a school district spends on education. Instead, we used a regression analysis to predict what achievement a district should have relative to other districts in the state given its spending and percentage of students in special programs.
To calculate this estimate, we used a production function, a type of regression analysis that examines the relationship of inputs to an output, and we predicted the achievement index as a function of the district’s cost of living adjusted per-pupil expenditure, the percentage of students participating in the free and reduced-priced lunch program, the percentage of students who are Englishlanguage learners, and the percentage of special education students.
This approach is shown in equation form below:
achievement = β0 + β1 ln(CWI adjusted ppe) + β2 % free lunch + β3 % ELL + β4 % Special Ed + ε
To control for differences in state finance systems, we calculated individualized production functions for each state. Then, after predicting each district’s achievement, we divided the results into six bands and awarded colors to districts that produced higher or lower levels of achievement than would be expected, with green being the most productive and red being the least productive. Districts with negative scores—or those that produced a lower level of achievement than would be expected—were given the least desirable rankings.
One of the limitations of the Predicted Efficiency index is that districts with high overall achievement can receive low productivity scores. That is not the case with the first two productivity approaches. The measure also adjusts academic expectations for students from disadvantaged backgrounds. While this is an accepted research practice in the education policy community, the Center for American Progress opposes the lowering of academic expectations as a matter of policy. The reasons are both philosophical and practical. Philosophical because we do not believe that a country that promises that everyone is created equal should have lower educational standards for students who are from low-income families or speak English as a second language. Practical because we believe that unless schools have high academic expectations, we will not ensure that all students— regardless of family background—will succeed. But as we researched various productivity measures, we found that this approach provided important insight into a district’s productivity and helped provide a more well-rounded understanding of its overall efficiency.
Consider the districts in Maryland again as an example. On the Predicted Efficiency index, Queen Anne’s County received just below-average marks, and it earned a rating of orange. That means that it did less well relative to other districts than would be expected, given its spending and percentage of students in special programs. To help understand that result, consider that Queen Anne’s has 15 percent of its students in the subsidized school lunch program, and in Maryland, the percent of students who participate in the program have a large and negative impact on achievement, one of the largest of any of the variables included in our regression.
The regression model predicted Queen Anne’s achievement relative to other districts, and there are districts in the state that have similar demographics that spent less and achieved more, which helps to explain why Queen Anne’s received just below-average marks. Calvert County, for instance, has largely similar percentages of students in special programs as Queen Anne’s, but it has an achievement index score of 90, three points higher than Queen Anne’s, and Calvert’s adjusted per-student spending was $8,091, about $500 less per student.
Dorchester County also received just below-average marks on this metric, and it received a rating of orange. This indicates that the district did less well than was estimated, given its spending and percentage of students in special programs. To help explain the evaluation, consider that Dorchester County enrolls a large percentage of low-income students, with about half of the students in the district participating in the subsidized school lunch program. Again, the percentage of students who receive subsidized lunches in Maryland has a large and negative impact on predicted achievement, one of the largest of any of the variables included in our study.
Our regression model compared Dorchester County’s achievement to that of other districts, taking into account the low performance of students who participate in the subsidized lunch program. Dorchester’s results were just below average in part because there are districts in the state with similar rates of poverty, such as Allegany County Schools, that have significantly higher achievement.
One of the aims of our study is to draw attention to the large variance in productivity within states, and while we believe that our district-level evaluations rely on the best available methods—and show important and meaningful results—we caution against making firm conclusions about the ratings of an individual district.
The literature on productivity is limited, and there’s a lot we don’t know about the relationship between spending and achievement. It appears, for instance, that the link between outcomes and money is not always linear. In other words, even in an efficient school system, the first few dollars spent on a program or school might not have the same effect as subsequent expenditures, with additional dollars not boosting outcomes as much as initial investments. We also know that additional resources are often provided to districts that already have high achievement and that this can potentially mask inefficiencies in spending.
Because of the limitations of the research, we could not evaluate the efficiency of a district against an external benchmark. We therefore rated districts based on their relative performances. That means a few things. First, we slotted districts into different evaluation levels even though in some cases the numerical value that separated the districts may not have been significant. It also means that states with a smaller number of districts had different cutoff points between rating categories than did states with larger numbers of districts.
Our measures also cannot account for all of the variables outside the control of a district, in large part because the field of education suffers from a lack of highquality data. School-by-school spending data, for instance, are not available in most states. That’s why we were able to produce only district-level productivity results, which likely mask significant variation within a district. And apart from excluding any district serving fewer than 250 students, we did not adjust for economies of scale. There are issues with the data as well as debate within the research community about what economies of scale say about the quality of a district’s management. But given the potential impact that size can have on spending, we made it easy to sort by both enrollment and geography on our interactive website so that users can compare similar districts.
The available data are also problematic. State and district data often suffer from weak definitions and questionable reliability. For instance, the federal government requires that every school report the number of students who participate in the free and reduced-price lunch program. But schools rely on parental self-reporting to determine eligibility, and so schools that are more aggressive about recruiting families into the program often have higher participation rates, even though they might not necessarily have larger percentages of low-income students.
Other data released by NCES appear to be simply flawed. Take, for instance, Connecticut’s New Canaan Public Schools, which is located about an hour north of New York City. In 2008, NCES reported that close to 100 percent of New Canaan’s students received free and reduced-priced lunch. That would make New Canaan one of the poorest districts in the country. But only 2.2 percent of students in the New Canaan area are poor, according to the Census Bureau. That’s well below the state average of 10 percent. When we informed NCES of the contradiction, an official said the state had reported the data to them and that there was no way for them to verify if the figure was too large or small. (We used subsidized lunch as the measure of poverty for our evaluations because it’s the only poverty indicator available at the district level. The rest of the poverty data are available only at the county or municipal level.)
There are problems with achievement data, too. Many state assessments don’t rigorously assess what students know and are able to do. Some of the exams use only multiple-choice questions to test student mastery of a subject, thus providing limited perspective on student skills. Other exams are not properly aligned with state curriculum standards and may be too easy. Moreover, our study looks only at reading and math test scores, an admittedly narrow slice of what students need to know to succeed in college and the workplace.
Despite these caveats, we believe our evaluations are useful, and the best available, given existing traditions and knowledge. We designed our color-rating system to empower the public to engage the issue of educational productivity, and we’ve produced an interactive website that allows users to compare the productivity of similar districts. We hope this project promotes not just further talk and deeper research—but also thoughtful action to maximize school spending.
Update, January 27, 2011: After Hurricane Katrina, New Orleans Public Schools was split into multiple entities: Orleans Parish Schools, the Recovery School District of Louisiana, and several charter school agencies. Although the map colors the entire New Orleans area, the data reported are only for Orleans Parish Schools. The Recovery School District of Louisiana and charter school agencies are omitted because this study only focuses on local school districts, not state-operated agencies or charter school agencies, and thus we ask that users consider the individual district evaluations in that area with particular caution.
Read the full report (pdf)