Let’s try a super-simple thought experiment with data. Suppose we have an inner-city middle school serving grades 6-8. Students in all three grades take the state exam annually (in this case, we’ll say that it’s at the very beginning of the year). Now, for the sake of this illustration, let’s avail ourselves of the magic of hypotheticals and assume away many of the sources of error that make year-to-year changes in public testing data unreliable.
First, we’ll say that this school reports test scores instead of proficiency rates, and that the scores are comparable between grades. Second, every year, our school welcomes a new cohort of sixth graders that is the exact same size and has the exact same average score as preceding cohorts – 30 out of 100, well below the state average of 65. Third and finally, there is no mobility at this school. Every student who enters sixth grade stays there for three years, and goes to high school upon completion of eighth grade. No new students are admitted mid-year.
Okay, here’s where it gets interesting: Suppose this school is phenomenally effective in boosting its students’ scores. In fact, each year, every single student gains 20 points. It is the highest growth rate in the state. Believe it or not, using the metrics we commonly use to judge schoolwide “growth” or “gains,” this school would still look completely ineffective. Take a look at the figure below.
It is very simple, just showing the progression of scores at our hypothetical school. As we already know, every incoming cohort of sixth graders (next to the green arrows) has an average score of 30 on the test. As indicated by the yellow arrows, the next year, in seventh grade, they all pick up 20 points, for an average of 50. And, finally, they pick up another 20 points by eighth grade, and are promoted to high school (red arrows) with an average score of 70. In other words, thanks to the wonderful effectiveness of their superstar school, every cohort enters at 30, well below the state average, and leaves at 70 (slightly above the average of 65).
But here’s the big problem: The school’s average score of 50 doesn’t change. It is completely flat. Anyone looking at this school’s score each year might say it’s doing poorly. After all, the average score of 50 is below the state average, and there has been no “growth.” How can such a wonderful school not show improvement?
The answer, as you’ve probably already realized, is that this approach – subtracting schoolwide cross-sectional scores/rates in one year from those in the previous year – isn’t really a measure of “growth” at all. It’s comparing two different groups of students.
Every year, our school graduates its relatively high-scoring eighth graders, and accepts an incoming class of sixth graders who score poorly. No matter how effective this school is, or how many grades it serves, its average score will never change unless its growth rate changes between years.*
In addition, it is very important to remember that this little thought experiment assumes away much of the transitory variation in cross-sectional rates/scores, most notably performance differences between cohorts (as well as the conversion of scores into proficiency rates). If, for instance, a single incoming sixth grade cohort, for whatever reason, happened to have a score lower than 30, this would generate a decrease in the average score for three years, even holding the school’s effectiveness constant. Our wonderful school would actually appear to be getting worse, for reasons completely out of its control.
In reality, student improvement over time tends to be modest compared to differences between students’ “starting points.” Thus, even minor changes in cohort composition can have substantial, lasting effects on aggregate performance for reasons that have nothing to do with actual school performance. Moreover, even when incoming cohort performance is consistent, as in our example above (all sixth graders enter at 30), the simple fact that the sample changes every year shapes the average scores across the whole school.
There’s something to be said for measuring performance over time and incentivizing improvement using available data, and changes in cross-sectional rates/scores are, for the moment, all that is available in many places (though that is changing). Nevertheless, we should always bear in mind that these changes are not “growth” or “gains,” and they often cannot identify the schools generating the most significant progress among their students (or lack thereof).
- Matt Di Carlo
* This is certainly a form of improvement, but it’s not a measure of school effectiveness (e.g., a relatively ineffective school might appear effective if it exhibits “growth in its growth rate”). More importantly, the only reason it would definitely show up in our hypothetical progression is because we imposed the assumption of no mobility – that is, we assumed that no students come or go, and so each individual cohort’s change between years is growth per se. This almost never happens in practice, and so any change in the “growth rate” could be “cancelled out” by sample variation (differences between cohorts).