Most of the controversy surrounding value-added and other test-based models of teacher productivity centers on the high-stakes use of these estimates. This is unfortunate – no matter what you think about these methods in the high-stakes context, they have a great deal of potential to improve instruction.
When supporters of value-added and other growth models talk about low-stakes applications, they tend to assert that the data will inspire and motivate teachers who are completely unaware that they’re not raising test scores. In other words, confronted with the value-added evidence that their performance is subpar (at least as far as tests are an indication), teachers will rethink their approach. I don’t find this very compelling. Value-added data will not help teachers – even those who believe in its utility – unless they know why their students’ performance appears to be comparatively low. It’s rather like telling a baseball player they’re not getting hits, or telling a chef that the food is bad – it’s not constructive.
Granted, a big problem is that value-added models are not actually designed to tell us why teachers get different results – i.e., whether certain instructional practices are associated with better student performance. But the data can be made useful in this context; the key is to present the information to teachers in the right way, and rely on their expertise to use it effectively.
For example, one of the most promising approaches for translating less-than-informative teacher effect estimates into actionable information is disaggregation – i.e., presenting the estimates by student subgroup. For instance, if a teacher is told that her English language learners tend to make less rapid progress than her native speakers, this is potentially useful – she might rethink how she approaches those students and what additional supports they may need from the school system. Similarly, if there are strong gains among those students who started out at a lower level (i.e., their score the previous year) and stagnation for those starting out at a higher level, this suggests the need for more effective differentiation.
Needless to say, such information still requires strong professional judgment. In the end, teachers and administrators will have to figure out the specifics of any plan for improvement. In addition, since subgroups (e.g., non-native English speakers) are smaller samples, most teachers would need to have a few years of data in order to discern these patterns. Finally, and most obviously, teachers who do not understand or trust the estimates themselves are unlikely to respond to the data – explaining the methods and results, their strengths and weaknesses, is a necessary first step.
All of this suggests the critical importance of an issue that is not often discussed or researched – how to present value-added data to teachers in the most useful manner. Although a full discussion of this issue is beyond the scope of this post, a few quick suggestions, based on the discussion above, might include: a clear description of the methods and how to interpret the results, along with ongoing reach-out for training and a means (e.g., a “hotline”) for teachers to ask questions; presenting error margins for each estimate (so teachers know when results are still too imprecise); and disaggregation of estimates by student subgroup.*
The majority of teachers, even those who are strongly skeptical about value-added, have long used testing data productively. Tests have always been used to diagnose student strengths and weaknesses, and skilled teachers have always used these data to help improve instruction. Value-added estimates could add even more useful data to that arsenal of information.
Hopefully, these productive low/no-stakes uses for value-added have not been drowned out by all the controversy over its high-stakes use. Research and policy should start focusing on the former as well.
- Matt Di Carlo
* Many states and districts using growth model estimates produce some form of “teacher report.” My (admittedly limited) review of a few of these suggests that they vary quite widely in how they present the data, as well as in their descriptions and guidelines for interpretation. I might explore this more thoroughly in a future post.