The recently released study of IMPACT, the teacher evaluation system in the District of Columbia Public Schools (DCPS), has garnered a great deal of attention over the past couple of months (see our post here).
Much of the commentary from the system’s opponents was predictably (and unfairly) dismissive, but I’d like to quickly discuss the reaction from supporters. Some took the opportunity to make grand proclamations about how “IMPACT is working,” and there was a lot of back and forth about the need to ensure that various states’ evaluations are as “rigorous” as IMPACT (as well as skepticism as to whether this is the case).
The claim that this study shows that “IMPACT is working” is somewhat misleading, and the idea that states should now rush to replicate IMPACT is misguided. It also misses the important points about the study and what we can learn from its results. Read More »
Linda Darling-Hammond’s new book, Getting Teacher Evaluation Right, is a detailed, practical guide about how to improve the teaching profession. It leverages the best research and best practices, offering actionable, illustrated steps to getting teacher evaluation right, with rich examples from the U.S. and abroad.
Here I offer a summary of the book’s main arguments and conclude with a couple of broad questions prompted by the book. But, before I delve into the details, here’s my quick take on Darling-Hammond’s overall stance.
We are at a crossroads in education; two paths lay before us. The first seems shorter, easier and more straightforward. The second seems long, winding and difficult. The big problem is that the first path does not really lead to where we need to go; in fact, it is taking us in the opposite direction. So, despite appearances, more steady progress will be made if we take the more difficult route. This book is a guide on how to get teacher evaluation right, not how to do it quickly or with minimal effort. So, in a way, the big message or take away is: There are no shortcuts. Read More »
A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.
Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.
The short answer is that there were meaningful differences. Read More »
The District of Columbia Public Schools (DCPS) has recently released the first round of results from its new principal evaluation system. Like the system used for teachers, the principal ratings are based on a combination of test and non-test measures. And the two systems use the same final rating categories (highly effective, effective, minimally effective and ineffective).
It was perhaps inevitable that there would be comparisons of their results. In short, principal ratings were substantially lower, on average. Roughly half of them received one of the two lowest ratings (minimally effective or ineffective), compared with around 10 percent of teachers.
Some wondered whether this discrepancy by itself means that DC teachers perform better than principals. Of course not. It is difficult to compare the performance of teachers versus that of principals, but it’s unsupportable to imply that we can get a sense of this by comparing the final rating distributions from two evaluation systems. Read More »
Our guest author today is Dan Goldhaber, Director of the Center for Education Data & Research and a Research Professor in Interdisciplinary Arts and Sciences at the University of Washington Bothell.
Let me begin with a disclosure: I am an advocate of experimenting with using value added, where possible, as part of a more comprehensive system of teacher evaluation. The reasons are pretty simple (though articulated in more detail in a brief, which you can read here). The most important reason is that value-added information about teachers appears to be a better predictor of future success in the classroom than other measures we currently use. This is perhaps not surprising when it comes to test scores, certainly an important measure of what students are getting out of schools, but research also shows that value added predicts very long run outcomes, such as college going and labor market earnings. Shouldn’t we be using valuable information about likely future performance when making high-stakes personnel decisions?
It almost goes without saying, but it’s still worth emphasizing, that it is impossible to avoid making high-stakes decisions. Policies that explicitly link evaluations to outcomes such as compensation and tenure are new, but even in the absence of such policies that are high-stakes for teachers, the stakes are high for students, because some of them are stuck with ineffective teachers when evaluation systems suggest, as is the case today, that nearly all teachers are effective. Read More »
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year. Read More »
U.S. Secretary of Education Arne Duncan recently announced that states will be given the option to postpone using the results of their new teacher evaluations for high-stakes decisions during the phase-in of the new Common Core-aligned assessments. The reaction from some advocates was swift condemnation – calling the decision little more than a “delay” and a “victory for the status quo.”
We hear these kinds of arguments frequently in education. The idea is that change must be as rapid as possible, because “kids can’t wait.” I can understand and appreciate the urgency underlying these sentiments. Policy change in education (as in other arenas) can sometimes be painfully slow, and what seem likes small roadblocks can turn out to be massive, permanent obstacles.
I will not repeat my views regarding the substance of Secretary Duncan’s decision – see this op-ed by Morgan Polikoff and myself. I would, however, like to make one very quick point about these “we need change right now because students can’t wait” arguments: Sometimes, what is called “delay” is actually better described as good policy making, and kids can wait for good policy making. Read More »
In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward. Read More »
A correlation between two variables measures the strength of the linear relationship between them. Put simply, two variables are positively correlated to the extent that individuals with relatively high or low values on one measure tend to have relatively high or low values on the other, and negatively correlated to the extent that high values on one measure are associated with low values on the other.
Correlations are used frequently in the debate about teacher evaluations. For example, researchers might assess the relationship between classroom observations and value-added measures, which is one of the simpler ways to gather information about the “validity” of one or the other – i.e., whether it is telling us what we want to know. In this case, if teachers with higher observation scores also tend to get higher value-added scores, this might be interpreted as a sign that both are capturing, at least to some extent, “true” teacher performance.
Yet there seems to be a tendency among some advocates and policy makers to get a little overeager when interpreting correlations. Read More »
** Reprinted here
in the Washington Post
The following is written by Morgan S. Polikoff and Matthew Di Carlo. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California.
One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.
The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent. Read More »
One can often hear opponents of value-added referring to these methods as “junk science.” The term is meant to express the argument that value-added is unreliable and/or invalid, and that its scientific “façade” is without merit.
Now, I personally am not opposed to using these estimates in evaluations and other personnel policies, but I certainly understand opponents’ skepticism. For one thing, there are some states and districts in which design and implementation has been somewhat careless, and, in these situations, I very much share the skepticism. Moreover, the common argument that evaluations, in order to be “meaningful,” must consist of value-added measures in a heavily-weighted role (e.g., 45-50 percent) is, in my view, unsupportable.
All that said, calling value-added “junk science” completely obscures the important issues. The real questions here are less about the merits of the models per se than how they’re being used. Read More »
Controversial proposals for new teacher evaluation systems have generated a tremendous amount of misinformation. It has come from both “sides,” ranging from minor misunderstandings to gross inaccuracies. Ostensibly to address some of these misconceptions, the advocacy group Students First (SF) recently released a “myth/fact sheet” on evaluations.
Despite the need for oversimplification inherent in “myth/fact” sheets, the genre can be useful, especially about topics such as evaluation, about which there is much confusion. When advocacy groups produce them, however, the myths and facts sometimes take the form of “arguments we don’t like versus arguments we do like.”
This SF document falls into that trap. In fact, several of its claims are a little shocking. I would still like to discuss the sheet, not because I enjoy picking apart the work of others (I don’t), but rather because I think elements of both the “myths” and “facts” in this sheet could be recast as “dual myths” in a new sheet. That is, this document helps to illustrate how, in many of our most heated education debates, the polar opposite viewpoints that receive the most attention are often both incorrect, or at least severely overstated, and usually serve to preclude more productive, nuanced discussions.
Let’s take all four of SF’s “myth/fact” combinations in turn. Read More »
In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.
For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory,” and very few were found to be “unsatisfactory.”
Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating – i.e., how many teachers will be fired.
Read More »
Some Florida officials are still having trouble understanding why they’re finding no relationship between the grades schools receive and the evaluation ratings of teachers in those schools. For his part, new Florida education Commissioner Tony Bennett is also concerned. According to the article linked above, he acknowledges (to his credit) that the two measures are different, but is also considering “revis[ing] the models to get some fidelity between the two rankings.”
This may be turning into a potentially risky situation. As discussed in a recent post, it is important to examine the results of the new teacher evaluations, but there is no reason one would expect to find a strong relationship between these ratings and the school grades, as they are in large part measuring different things (and imprecisely at that). The school grades are mostly (but not entirely) driven by how highly students score, whereas teacher evaluations are, to the degree possible, designed to be independent of these absolute performance levels. Florida cannot validate one system using the other.
However, as also mentioned in that post, this is not to say that there should be no relationship at all. For example, both systems include growth-oriented measures (albeit using very different approaches). In addition, schools with lower average performance levels sometimes have trouble recruiting and retaining good teachers. Due to these and other factors, the reasonable expectation is to find some association overall, just not one that’s extremely strong. And that’s basically what one finds, even using the same set of results upon which the claims that there is no relationship are based.
Read More »
Our guest author today is David B. Cohen, a National Board Certified high school English teacher in Palo Alto, CA, and the associate director of Accomplished California Teachers (ACT). His blog is at InterACT.
As we settle into 2013, I find myself increasingly optimistic about the future of the teaching profession. There are battles ahead, debates to be had and elections to be contested, but, as Sam Cooke sang, “A change is gonna come.”
The change that I’m most excited about is the potential for a shift towards teacher leadership in schools and school systems. I’m not naive enough to believe it will be a linear or rapid shift, but I’m confident in the long-term growth of teacher leadership because it provides a common ground for stakeholders to achieve their goals, because it’s replicable and scalable, and because it’s working already.
Much of my understanding of school improvement comes from my teaching career – now approaching two decades in the classroom, mostly in public high schools. However, until six years ago, I hadn’t seen teachers putting forth a compelling argument about how we might begin to transform our profession. A key transition for me was reading a Teacher Solutions report from the Center for Teaching Quality (CTQ). That 2007 report, Performance-Pay for Teachers: Designing a System that Students Deserve, showed how the concept of performance pay could be modified and improved upon with better definitions of a variety of performance, and differentiated pay based on differentiated professional practice, rather than arbitrary test score targets. I ended up joining the CTQ Teacher Leaders Network the same year, and have had the opportunity ever since to learn from exceptional teachers from around the country. Read More »