A Big Open Question: Do Value-Added Estimates Match Up With Teachers’ Opinions Of Their Colleagues?

Posted by on February 9, 2012

A recent article about the implementation of new teacher evaluations in Tennessee details some of the complicated issues with which state officials, teachers and administrators are dealing in adapting to the new system. One of these issues is somewhat technical – whether the various components of evaluations, most notably principal observations and test-based productivity measures (e.g., value-added) – tend to “match up.” That is, whether teachers who score high on one measure tend to do similarly well on the other (see here for more on this issue).

In discussing this type of validation exercise, the article notes:

If they don’t match up, the system’s usefulness and reliability could come into question, and it could lose credibility among educators.

Value-added and other test-based measures of teacher productivity may have a credibility problem among many (but definitely not all) teachers, but I don’t think it’s due to – or can be helped much by – whether or not these estimates match up with observations or other measures being incorporated into states’ new systems. I’m all for this type of research (see here and here), but I’ve never seen what I think would be an extremely useful study for addressing the credibility issue among teachers: One that looked at the relationship between value-added estimates and teachers’ opinions of each other.

If a teacher has a high opinion of one of her colleagues’ effectiveness in the classroom, then it’s unlikely that a negative assessment from any external source – whether value-added models, the principal or even a fellow teacher who doesn’t work at that school – will override that judgment. This is perfectly normal – people tend to trust their own judgment above all else, especially when it’s professionals assessing on-the-job performance.*

If, on the other hand, a teacher found that all the colleagues he or she respected as educators received strong value-added scores, this is the kind of validation that might cause even the most ardent skeptic to rethink their position.

That’s 0ne reason why a systematic analysis of the relationship between teacher value-added estimates and teachers’ assessments of their colleagues could be so powerful. Obviously, such an examination would not allow individual teachers to view their own assessments by their colleagues, or those of other teachers. That would not only completely taint the results (many teachers would not be candid if they knew the individual-level results would shared), but it’s also unethical, and would likely cause serious problems within a school. The data would have to be completely private and the results reported overall, not school-by-school, and certainly not individually. For the same general reasons, I don’t think this type of measure could be incorporated into actual evaluations.

If, however, several of these analyses, or a big one that was conducted in a diverse set of schools and districts, showed that, on the whole, teachers who are highly regarded by their colleagues also tend to receive high value-added scores, this might not only boost the credibility of value-added estimates among some teachers, but, for the rest of us, it would represent a fairly powerful partial validation of these estimates’ ability to gauge teacher effectiveness (though, as always, the analysis would probably only include math and reading teachers).

And, of course, the converse is true: If a group of studies found only a weak relationship, this might erode some people’s credibility and compel less favorable policy conclusions (of course, the association would be a matter of degree, and different people could interpret it differently, especially if it turned out to be moderate).

It’s a little strange that such a study has not, to my knowledge, been conducted. After all, if the correlation between value-added and students’ opinions has policy relevance (see here), then so does the estimates’ relationship with teachers’ opinions.

One reason why this type of analysis hasn’t been conducted might be the fact that there are significant hurdles. For example, among other problems, teachers vary in their familiarity with each others’ abilities, and they also maintain personal relationships that can color judgments. In addition, teachers may hesitate to bash their colleagues, even if they’re assured that their responses are completely confidential. But these issues are not uncommon in survey research, and I believe there are means of dealing with them.

Still, any rigorous project of this sort would require full cooperation from everyone involved. It would have to carefully designed, and would probably require some financial investment. But I think it would be well worth it.

- Matt Di Carlo

*****

* The relationship been value-added and peer observations is clearly a similar approach, one that could be done immediately, anywhere such a system exists. This is a good idea, but it’s not the same thing as what I’m proposing. Peer observation is a one-shot deal, and is typically (and correctly) carried out by a observer who does not have a day-to-day working relationship with the observed teacher.


5 Comments posted so far

  • When value added and other measures, like peer or principal observation, are very highly correlated it raises the question of whether the value-added aspect of teacher evaluation is even necessary.
    From the research of seen on the topic both aspects, observation and value-added, suffer from the same issue: they can identify a schools very best and worst teachers but they struggle to make fine distinctions between the mass in the middle.

    Comment by John Doe
    February 9, 2012 at 11:25 AM
  • Agreed. But how often do teachers really observe each other in the classroom? If teachers don’t regularly observe each other’s classrooms, their opinions of each other’s teaching ability may not be all that valid.

    My guess would be you’d have to do such a study in one of the relatively few districts that are trying to encourage the Japanese practice of “lesson study.” At least in Japan, teachers from an entire school will regularly observe one of their colleagues teaching a class and then engage in discussion and criticism of how it went. Teachers who do that kind of thing will have a much stronger basis for assessing their colleagues’ performance.

    Comment by Stuart Buck
    February 9, 2012 at 11:52 AM
  • On the other hand, teachers are more in tune with the teaching context and will provide feedback based on the correct context. Tests change and value added parameters will change, the possibility of their being wrong changes more often than teacher.

    Spend the money on teachers not tests.

    Comment by Michael Endicott
    February 10, 2012 at 1:45 PM
  • @Stuart – I think part of the point is that while teachers opinions of each other’s teaching ability, if not based on classroom observations, may not be valid, they are likely to nevertheless be psychologically important to teachers in the way Matt describes. (For instance, affecting their views of VAM, if VAM assessments align with or contradict their own.)

    And my experience is certainly that teachers form opinions of each other’s teaching ability even without much or any direct classroom observation. This probably undermines the validity of those opinions some, but there are lots of other indirect sources of information available to teachers and I’d be curious to see how well those opinions match up with VAM even when they’re not informed by classroom observation.

    Comment by Paul Bruno
    February 11, 2012 at 8:49 PM
  • I hear value added but when is there going to be talk about subtracting invaluable practices?

    When it comes to teacher assessment using a Value-Added system there is a small percentage of teachers that the assessment tool doesn’t really measure well at all, due to their position, Special Education teachers of the alternative assessment students. Administration often doesn’t even know what current practices are and should look like in classrooms with students with multiple and complex needs. 99% of society doesn’t.

    Trying to use one assessment tool to assess 100% of a population doesn’t work for students and isn’t legal, why does any one think it’s a good thing for teachers?

    Comment by Houston TX Teacher
    February 16, 2012 at 7:47 PM

Sorry, the comment form is closed at this time.

Disclaimer

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the shankerblog.org may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.

Banner image adapted from 1975 photograph by Jennie Shanker, daughter of Albert Shanker.