Differences In DC Teacher Evaluation Ratings By School Poverty

In a previous post, I discussed simple data from the District of Columbia Public Schools (DCPS) on teacher turnover in high- versus lower-poverty schools. In that same report, which was issued by the D.C. Auditor and included, among other things, descriptive analyses by the excellent researchers from Mathematica, there is another very interesting table showing the evaluation ratings of DC teachers in 2010-11 by school poverty (and, indeed, DC officials deserve credit for making these kinds of data available to the public, as this is not the case in many other states).

DCPS’ well-known evaluation system (called IMPACT) varies between teachers in tested versus non-tested grades, but the final ratings are a weighted average of several components, including: the teaching and learning framework (classroom observations); commitment to the school community (attendance at meetings, mentoring, PD, etc.); schoolwide value-added; teacher-assessed student achievement data (local assessments); core professionalism (absences, etc.); and individual value-added (tested teachers only).

The table I want to discuss is on page 43 of the Auditor’s report, and it shows average IMPACT scores for each component and overall for teachers in high-poverty schools (80-100 percent free/reduced-price lunch), medium poverty schools (60-80 percent) and low-poverty schools (less than 60 percent). It is pasted below.

In short, this table suggests that there is a rather substantial discrepancy in IMPACT results between teachers in low-poverty schools versus their counterparts in medium- and high-poverty schools.

Specifically, you’ll notice that almost 30 percent of teachers in low-poverty schools receive the highest rating (“highly effective”), compared with just 7-10 percent in the other categories. In addition, just over seven percent of teachers in low-poverty schools receive one of the two lowest ratings (“minimally effective” or “ineffective," both of which may result in dismissal), versus 18-21 percent in the medium- and high-poverty schools.

So, the relationship between school poverty and IMPACT ratings may not be linear, as the distributions for medium- and high-poverty schools are quite similar. Nevertheless, it seems very clear that IMPACT results are generally better among teachers in schools serving lower proportions of poor students (i.e., students eligible for subsidized lunch), and that the discrepancies are quite large.

One question is why these differences, which have been found elsewhere in both test-based and non-test-based measures, arise.

Broadly speaking, there are two possible explanations (and they are not mutually exclusive). The first, put crudely, is that IMPACT is “biased” – that is, teachers in lower-poverty schools have an easier time getting high ratings, due to the measures themselves. The second possibility is that performance, at least as measured by IMPACT, is actually higher among teachers in lower-poverty schools.

It is, needless to say, very difficult to make this determination, especially with these descriptive statistics, but it is certainly a question that gets to the heart of the validity of these new teacher evaluation systems. If, for example, the differences by school poverty are mostly a result of systematic error, this would call into question the degree to which the ratings, by which teachers’ very employment might be in jeopardy, are actually capturing “true teacher performance."

On the other hand, there is good reason to believe that teacher “quality” may be higher in lower-poverty schools, due, for example, to differences in resources, lower turnover and other factors. For instance, if teachers in low-poverty schools are more experienced and/or less likely to leave, this would create a self-reinforcing cycle of higher ratings in these schools, as these groups tend to get higher IMPACT scores.

Put differently, what seems like bias may in fact be “real."

Another feature that stands out in the table, which is relevant to the discussion above in that it illustrates the uncertain origins of the aggregate differences, is the breakdown of average scores for each IMPACT component (the lower rows in the table).

Specifically, there are poverty-based discrepancies in scores in every single sub-category. In other words, in the case of measures based on state tests (individual and schoolwide value-added), local tests, classroom observations and principals’ assessments (of core professionalism), teachers in lower-poverty schools score better. (Though it bears mentioning that the discrepancies appear to be largest for the three test-based measures, perhaps due to their varying more in the first place.)

The fact that the differences in overall IMPACT scores by school poverty are not being driven by a single sub-component or small group of sub-components, but rather are exhibited in all the sub-components, may speak to the question of whether the overall differences by poverty are “real” or due to bias. That is, while it’s very crude evidence, it may support the argument that the overall differences are “real," since they show up in every one of the individual indicators. Or, perhaps, all the measures are to some degree "biased." Again, it's just tough to say.

Another interesting dynamic here, one that may influence the results in the table above, is the fact that teachers are dismissed every year based on their IMPACT ratings (and also that higher-rated teachers seem to be retained at somewhat higher rates).

This might cut both ways as far as its potential impact on the differences in IMPACT scores by school poverty. On the one hand, insofar as teachers in medium and higher-poverty schools are more likely to be fired than their counterparts in lower-poverty schools (because the former groups tend to get lower IMPACT scores), this might actually serve to attenuate the unequal ratings by school poverty (since the lowest-rated teachers are exited, thus “raising” the average scores for these groups in the following year). But this would only be true if the replacement teachers do better.

On the other hand, given that the vast majority of teachers in low-poverty schools receive “effective” or “highly effective” ratings, and that (as shown elsewhere in the Auditor’s report) retention is quite high among these teachers overall (83 and 89 percent, respectively), this could actually serve to reinforce the discrepancies. In other words, teachers in lower-poverty schools tend to do better and stay longer, and thus have more expertise, which helps raise the average scores for this group of schools and creates more distance between them and medium- and higher-poverty schools. In short, the effect of DCPS’ dismissal policy on IMPACT ratings by school poverty is complicated.

Overall, then, the fact that DC teachers in the lowest-poverty schools receive higher IMPACT ratings than teachers in schools with higher poverty rates is difficult to unpack and cannot be attributed to clear, black and white causes. But none of the possible explanations are particularly comforting. If the differences are “real” (and if IMPACT is actually telling us something about teacher performance), this would suggest that the students most in need of excellent instruction aren’t receiving as much of it. If, on the other hand, the differences are due to systematic bias in the IMPACT system’s measures, this means that DC teachers are being rated by a flawed instrument, one which, for example, might serve to discourage them from going to or staying in the highest poverty schools.

To reiterate, it is likely that both factors are at play here, but no matter their origins, the discrepancies are not something that should be ignored, as they matter for the overall impact of these systems, and can be addressed directly (e.g., with statistical corrections).

One final note: It is very interesting, at least to me, that there is widespread concern when teacher evaluation systems exhibit differences by school poverty (and rightfully so), but far less uproar when such differences are found with school rating systems. In fact, if you break down school ratings by poverty in virtually any state, you’ll find unequal distributions that dwarf those of DC IMPACT. It’s unclear to me why we pay attention to one and not the other.
- Matt Di Carlo