When The Legend Becomes Fact, Print The Fact Sheet

Posted by on November 11, 2011

The New Teacher Project (TNTP) just released a “fact sheet” on value-added (VA) analysis. I’m all for efforts to clarify complex topics such as VA, and, without question, there is a great deal of misinformation floating around on this subject, both “pro-” and “anti-.”

The fact sheet presents five sets of “myths and facts.” Three of the “myths” seem somewhat unnecessary: that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect. Almost nobody believes or makes these arguments (at least in my experience). But I guess it never hurts to clarify.

In contrast, the other two are very common arguments, but they are not myths. They are serious issues with concrete policy implications. If there are any myths, they’re in the “facts” column.

The first objection – that the models aren’t “fair to teachers who work in high-needs schools, where students tend to lag far behind academically” – is a little confusing. In one sense, it’s correct to point out that value-added models focus on growth, not absolute scores, and teachers aren’t necessarily penalized just because their students “start out” low.

But most of the response to this “myth” addresses a rather different question – whether or not the models can fully account for the many factors out of teachers’ hands. TNTP’s take is that VA models “control for students’ past academic performance and demographic factors,” which, they say, means that teachers “aren’t penalized for the effects of factors beyond their control.” Even under ideal circumstances, that’s just not accurate.

The evidence they cite is a frequently-misinterpreted paper by researchers at Vanderbilt University and the SAS Institute, published in 2004. What the analysis finds is that the results of a specific type of VA model (TVAAS) – one with very extensive data requirements, spanning multiple (in this analysis, five) years and subjects, in one specific location (Tennessee) – are not substantially different when variables measuring student characteristics (i.e., free/reduced lunch eligibility and race) are added to the models.

This does not, however, mean that the TVAAS model – or any other – can account for all the factors that teachers can’t control. For one thing, the free/reduced-price lunch variable is not a very good income proxy. Eligible students vary widely in family circumstances, which is a particular problem in high-poverty areas where virtually all the students qualify.

That paper aside, it’s true that students’ prior achievement scores account for much of the income-based variation in achievement gains (ironically, prior test scores are probably better at this than free/reduced-priced lunch). But not all of poverty’s impacts are measurable/observed, and, perhaps more importantly, there are several other potential sources of bias, including the fact that students are not randomly assigned to classrooms (also here). VA scores are also affected by the choice of model, data quality and the test used. And, of course, even if there is no bias at all, many teachers will be “treated unfairly” by simple random error.

These are the important issues, the ones that need discussion. If we’re going to use these VA estimates in education policy, we need to at least do it correctly and minimize mistakes. In many places around the nation, this isn’t happening (also see Bruce Baker’s discussion of growth models). As a result, the number of teachers “penalized” unfairly – whether because they have high-needs students or for other reasons beyond their control – may actually be destructively high. TNTP calls this a “myth.” It’s not.

The second “myth” they look at is the very common argument that VA scores are too volatile between years to be useful. This too is not a “myth,” but it is indeed an issue that could use some clarifying discussion.TNTP points out that all performance measures fluctuate between years, and that they all entail uncertainty. These are valid points. However, their strongest rebuttal is that “teachers who earn very high value-added scores early in their career rarely go on to earn low scores later, and vice-versa.”

Their “evidence” is an influential paper by researchers from Florida State University and the RAND Corporation (it was published in 2009). The analysis focuses on the stability of VA estimates over time. While everyone might have a different definition of “rarely,” it’s safe to say that the word doesn’t quite apply in this case. Across all teachers, for instance, only about 25-40 percent of the top quintile (top 20%) teachers in one year were in the top quintile the next year, while between 20-30 percent of them ended up in the bottom 40%. Some of this volatility appears to have been a result of “true” improvement or degradation (within-teacher variation), but a very large proportion was due to nothing more than random error.

The accurate interpretation of this paper is that value-added estimates are, on average, moderately stable from year-to-year, but that stability improves with multiple years of data and better models (also see here and here for papers reaching similar conclusions). This does not mean that teachers scores “rarely” change over time, nor does it disprove TNTP’s “myth.” In fact, the papers’ results show that VA estimates from poorly-specified models with smaller samples are indeed very unstable, probably to the point of being useless. And, again, since many states and districts are making these poor choices, the instability “myth” is to some degree very much a reality.

Value-added models are sophisticated and have a lot of potential, but we have no idea how they are best used or whether they will work. It is, however, likely that poor models implemented in the wrong way would “penalize” critically large numbers for reasons beyond their control, as well as generate estimates that are too unstable to be useful for any purpose, even low-stakes decisions. These are not myths, they are serious risks. Given that TNTP is actively involved in redesigning teacher quality policies in dozens of states and large districts, it is somewhat disturbing that they don’t seem to know the difference.

- Matt Di Carlo


18 Comments posted so far

  • “that there’s no research behind VA; that teachers will be evaluated based solely on test scores; and that VA is useless because it’s not perfect”

    Um, Diane Ravitch and her followers make those three arguments all the time.

    Comment by Stuart Buck
    November 11, 2011 at 9:35 AM
  • The Los Angeles CMOs, with funding from gates, are attempting to use a VA model which rewards/penalizes teachers for their students percentile (as opposed to raw) change each year. I tried to point out that this implicitly assumes a zero sum game and would absolutely ensure that every year a large number of teachers would be labelled negatively, no matter how much absolute improvement was made. Let’s just say that they didn’t want to hear it. At meetings they essentially just trotted out a version of the VA fact sheet from TNTP that you have written out.

    Comment by Stephen Silvius
    November 11, 2011 at 11:05 AM
  • “Um, Diane Ravitch and her followers make those three arguments all the time.”

    The arguments made are that VA research is not as robust or generalizable as it is represented to be. That teachers will be rewarded/punished and labelled by methods that rely too heavily on test scores and poor tests at that, and that VA is not important enough to make the basis of such major education policies/decisions especially without a longer, more open, discussion. In fact, I think you will find Ravitch arguing that data like VA are more useful when they are used as just that, data, instead of as carrots and sticks for an “accountability” culture.

    Comment by Stephen Silvius
    November 11, 2011 at 11:11 AM
  • In regard to the “myth” that VA models may be unfair to teachers in high-needs schools, see also this recent study by Newton, Linda Darling-Hammond, Haertel, and Thomas — http://epaa.asu.edu/ojs/article/view/810.

    The authors find that “judgments of teacher effectiveness for a given teacher can vary substantially across statistical models, classes taught, and years. Furthermore, student characteristics can impact teacher rankings, sometimes dramatically, even when such characteristics have been previously controlled statistically in the value-added model. A teacher who teaches less advantaged students in a given course or year typically receives lower effectiveness ratings than the same teacher teaching more advantaged students in a different course or year.”

    Comment by Burnie Bond
    November 11, 2011 at 11:52 AM
  • [...] Debating the facts and fiction of a new white sheet on value-added analysis. (Shanker Blog) [...]

    November 11, 2011 at 6:19 PM
  • No, Diane Ravitch “and her followers” never argued that there is no research behind VAM. We argue that the research doesn’t support the validity or usefullness of the models, and Diane has cited the research and discussed it in detail. Matt was correct in his assessment that “nobody” is saying there’s no research.

    It’s his third dismissal that needs to be looked at: yes, some people really do have the courage to stand up and say
    VAM is useless. We don’t say “that VA is useless because it’s not perfect.” We say it’s a useless and defective product because it doesn’t churn its bogus “data” into useful information in any sense whatsoever; not for policy, not for 100% of teacher evaluations, not for 40% of teacher evaluations, not to grade schools, and not for policy decisions.

    Matt says, “Value-added models are sophisticated and have a lot of potential.” What if they don’t, and you’re afraid to say so out loud? What if its really a data-industry hoax, like real estate derivatives? Why can’t a columnist for the Shanker Institute itself even admit of the possibility?

    The emperor is naked, right now, parading down the streets of Tennessee and Florida, but his tailors still have their heels on all our throats.

    http://www.miamiherald.com/2011/11/05/2488961/complex-new-teacher-evaluations.html

    http://www.nytimes.com/2011/11/07/education/tennessees-rules-on-teacher-evaluations-bring-frustration.html

    People who just haven’t been paying attention saw these stories five days ago, and they’re looking into it right now. They’re finding evasive, apologist columns like this one, still trying to do damage control. This very morning, people are judging the enablers of this long, ugly drive to take hijack public education for private profit.

    Comment by Mary Porter
    November 12, 2011 at 8:13 AM
  • Ravitch has repeated an argument she found elsewhere, that VAM is supposedly like a car that explodes 2 times out of 5. OK, but what if the current evaluation system explodes 2 times out of 5 too? Or even more? Unless someone has the ability to compare the two systems, denigrating VAM for its level of inaccuracy is really nothing more than saying that she’s happy with whatever level of inaccuracy exists in the current system (without even caring what that level is) but won’t accept a different system that has any inaccuracies in it.

    Comment by Stuart Buck
    November 12, 2011 at 2:26 PM
  • Mary: I understand your frustration, but I’ve tried to be very clear that using VA in high-stakes decisions might easily be destructive. Actually, I said so in this very piece (paragraphs 9, 12 and 13), as well as in several previous posts (e.g., http://shankerblog.org/?p=3165). My overall view is that there is a potentially useful role for these methods in education policy, but that they are being misused in most places.

    Stuart: The question is whether anyone says that new evaluations will be 100 percent test scores. You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect. Like I said in the post, it’s fine for TNTP to clarify nonetheless, but let’s not accuse people of making arguments they’re not making.

    Thank you both for your comments,
    MD

    Comment by Matthew Di Carlo
    November 12, 2011 at 2:52 PM
  • So, you’ve gotten to the point where “it is somewhat disturbing” to you that this thing is being done to the nation’s schools, in defiance of all reason and decency, but you won’t stand all the way up and oppose it. In claiming that the problem with VAM is only “poor models implemented in the wrong way,” you place yourself as the data-masters’ last line of defense as their credibility crumbles.

    You have to try to be clearer, Matt. You say it “might easily be destructive.” You’ve played an enabling role, so far, in allowing this random-number-driven decimation machine to be installed in thousands of actual schools, by force of law! The AFT betrayed children, by well as their teachers, in embracing Gates in exchange for a place at his table. Teachers are moving to take our unions back from leader/collaborators like Weingarten (and yourself, so far).

    I am, again, inviting you to come on over to the actual oppposition. Reread your own last paragraph, and realize how disgusted you really are. You can’t straddle this question.

    Comment by Mary Porter
    November 12, 2011 at 9:33 PM
  • What Mary Porter said.

    Comment by TFT
    November 12, 2011 at 9:53 PM
  • “You know nobody says this, nor do they say that there is no research behind VA or that systems must be perfect.”

    To the contrary, I’m pretty sure that Diane Ravitch has made all three points. I don’t have time to dig through her nearly 22,000 tweets, but she exaggerates and simplifies all the time on issues like that. But you’re technically right about one thing: rather than saying there’s no research behind VAM, she’s more likely to say that all the research is against it, which is a closely-related myth.

    Moreover, if you do a Google search that uses the phrase “only on test scores,” you’ll find plenty of people arguing that teachers should not be evaluated “only on test scores.” So there’s definitely a popular perception that evaluation only on test scores is a possibility, even if well-informed people know that no such thing has been contemplated.

    By the way, TNTP isn’t “accusing” anyone by name, so you’re basically trying to prove a universal negative (come on, do you really think that no one has ever made the points that they’re refuting?)

    Comment by Stuart Buck
    November 12, 2011 at 10:27 PM
  • [...] When The Legend Becomes Fact, Print The Fact Sheet is from The Shanker Blog. [...]

    November 13, 2011 at 12:22 AM
  • Stuart: When I said “let’s not accuse people,” I was talking about you, not TNTP. Maybe “accuse” is a strong word. As for the question itself, let’s just put it this way: Among people involved in education debates, the three arguments, as stated in the document, are too infrequent to be “myths” per se, but among the general public, they’re probably more common. Either way, once again, I don’t have a problem with TNTP pointing them out.

    Mary: I appreciate your candor. If I could say I’m relatively certain that any use, no matter how it’s done, of test-based teacher productivity measures in evaluations will destroy public education, I would say so without hesitation. Based on my appraisal of the relevant research, as well as the fact that these things have never really been tried before, I can’t say that. And so I don’t.

    I do think, on the other hand, that the specific manner in which this is being done – heavily-weighted estimates, ignoring error margins, rushed implementation – is wrong, that it won’t work and that it might cause harm. This I’ve said many times.

    In general, all I can do is interpret the evidence as best I can, and point out when I think others aren’t doing likewise. If that represents betrayal in your eyes, then I’ll have to accept that, but I hope you keep reading and commenting nonetheless.

    Thanks again,
    MD

    Comment by Matthew Di Carlo
    November 13, 2011 at 1:57 PM
  • I think it’s quite reasonable to point out that the Ravitch-ite crowd seems to dislike VAM simply because it’s not perfect. They say that it has an error rate and it sometimes misclassifies teachers, but without ever specifying what error rate they would accept other than zero.

    Comment by Stuart Buck
    November 13, 2011 at 2:47 PM
  • Wait a minute, Matt. How did you slip over to this extravagantly mealy-mouthed action threshold?

    “…relatively certain that any use, no matter how it’s done, of test-based teacher productivity measures in evaluations will destroy public education, I would say so without hesitation. Based on my appraisal of the relevant research, as well as the fact that these things have never really been tried before, I can’t say that.”

    My degrees are in the natural sciences. I don’t have a doctorate of my own, but I’ve spent hundreds of hours drafting and editing peer-reviewed research papers for my husband’s medical biochemistry research (and now my son’s computational biology). You don’t really need me to tell you how unsound, unreliable, and invalid these methods are. The very research you discuss shows it, and you do a fair job of summarizing the results. No method this weak would ever (ever, ever!) be accepted or applied in the natural sciences, and a finding based on it wouldn’t be publishable, let alone mandated by law for massive implementation. All I’m asking is that people admit that, so a real public discussion can begin of what just got rammed down our legislatures.

    But very powerful forces demand it that all employed commentators pay it lip service. So, your standard for actually opposing it’s implementation has slipped all the way to being “relatively certain” it “will destroy public education”. And you can’t be certain, of course, because it has never been tried!

    You (and Weingarten and the soon-to-be-former AFT delegates who caved in to Gates) are all off the hook until public education is actually destroyed.

    Meanwhile, according to the sanitized article in Wikipedia,
    “TNTP is a revenue-generating nonprofit. The majority of its revenue comes from contracts with districts and states to supply services; additional funding for new program development and research is provided by donors such as the Bill and Melinda Gates Foundation.”

    And here’s the bullet-list for their “Smart Spending for Better Teacher Evaluations” publication:
    *Tools and Systems to guide and support the evaluation process.
    *Training for evaluations and key school district staff.
    *Communications to key audiences, especially teachers and school leaders.
    *Monitoring to ensure consistent implementation across schools and districts.
    *Sustainability of the new system over time, fiscally and substantively.

    http://tntp.org/publications/issue-analysis/view/smart-spending-for-better-teacher-evaluations/

    http://tntp.org/files/TNTP_Smart_Spending_2011_1.pdf

    Comment by Mary Porter
    November 14, 2011 at 3:24 AM
  • Matt,

    I wish you would take a less sanguine view of the myths around multiple measures. The issue should be stated with more nuance. Value-added (whether its a good idea of not) is valid enough to COMPLEMENT or SUPPLEMENT human judgments, but it should never DRIVE evaluations. Real world, being indicted as ineffective by a VAM would often be no different than being convicted. In many (most)systems that are under the gun,(and that means most poor systems) it would be a rare evaluator who dared trust his or her lying eyes and not convict a person with a low value-added. Gotham Schools has reported on NYC principals who already have delayed granting tenure to teachers who they see as worthy because the district has made its preference clear.

    And in Tennessee, D.C. Florida, and elsewhere, multiple measures just means multiple hoops to jump through. When you have an evaluation rubric that doesn’t account for difficult-to-educate populations, being used by evaluators who have trained to believe like Rhee, Huffman, Daly, Klein, et. al, then the second measure is not a check or a balance, but another gotcha.

    The validity measure, by the way, should consider George Soros’ example of a row of poisoned water. Even if one is poisoned, all are worthless. The issue is not the principals with extraordinary moral character who won’t give in to pressure from above. The issue is a) principals who love control who are being given a loaded gun, without a trigger lock, and b) the pressure on the majority of principals to go along and get along with the accountability hawks who believe we can chop up knowledge into measurable pieces.

    There will be two metrics that matter more than any other, and we won’t know where the tipping point will be. Firstly, at what point does the fear generated by VAMs create enough pressure to mandate Cover Your Ass teacher-proof scripted rote instruction? The Gates’ say, correctly, that that strategy would not be rational. But the world is not rational. CYA is the rational and predictable response by powerless institutions under siege.

    Secondly, at what point do VAMs prompt an exodus of teachers from schools where it is harder to raise test scores? And when that started, the CYA tactic become even more rational and that further incentives bureucrats and principals to surround themselves with “yes men.”

    It looks like we’re already seeing all of the above in turnaround schools. Speaking from my experience, I’m seeing school leaders going along with the top down aligned and paced curriculum, taught be scared 23 year-olds, who are being socialized into just following orders. Since these school leaders know that the tactics they were forced to embrace are doomed, the new metric is “exiting” Baby Boomers. And when VAMs get applied to principals evalaution, this fear and loathing could metastisize (sp?)

    Finally, when the TNTP apologizes for its false and misleading statements about unions in general, and peer review in particular, I’ll embrace your tone. But now, I see them as an enemy to be defeated. After all, who drafted the VAM language in RttTs? I’ve hardly seen a RttT application that did not cite the TNTP as a contributor, and as an organization that would be a consultant.

    Comment by john thompson
    November 19, 2011 at 3:25 PM
  • November 20, 2011 at 2:14 PM
  • [...] When The Legend Becomes Fact, Print The Fact Sheet is from The Shanker Blog. I’m adding it to The Best Resources For Learning About The “Value-Added” Approach Towards Teacher Evaluation. [...]

    December 7, 2011 at 8:18 AM

Sorry, the comment form is closed at this time.

Disclaimer

This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the shankerblog.org may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.

Banner image adapted from 1975 photograph by Jennie Shanker, daughter of Albert Shanker.