The recently released study of IMPACT, the teacher evaluation system in the District of Columbia Public Schools (DCPS), has garnered a great deal of attention over the past couple of months (see our post here).
Much of the commentary from the system’s opponents was predictably (and unfairly) dismissive, but I’d like to quickly discuss the reaction from supporters. Some took the opportunity to make grand proclamations about how “IMPACT is working,” and there was a lot of back and forth about the need to ensure that various states’ evaluations are as “rigorous” as IMPACT (as well as skepticism as to whether this is the case).
The claim that this study shows that “IMPACT is working” is somewhat misleading, and the idea that states should now rush to replicate IMPACT is misguided. It also misses the important points about the study and what we can learn from its results. Read More »
As discussed in a prior post, the research on applying value-added to teacher prep programs is pretty much still in its infancy. Even just a couple of years of would go a long way toward at least partially addressing the many open questions in this area (including, by the way, the evidence suggesting that differences between programs may not be meaningfully large).
Nevertheless, a few states have decided to plow ahead and begin publishing value-added estimates for their teacher preparation programs. Tennessee, which seems to enjoy being first — their Race to the Top program is, a little ridiculously, called “First to the Top” — was ahead of the pack. They have once again published ratings for the few dozen teacher preparation programs that operate within the state. As mentioned in my post, if states are going to do this (and, as I said, my personal opinion is that it would be best to wait), it is absolutely essential that the data be presented along with thorough explanations of how to interpret and use them.
Tennessee fails to meet this standard. Read More »
Linda Darling-Hammond’s new book, Getting Teacher Evaluation Right, is a detailed, practical guide about how to improve the teaching profession. It leverages the best research and best practices, offering actionable, illustrated steps to getting teacher evaluation right, with rich examples from the U.S. and abroad.
Here I offer a summary of the book’s main arguments and conclude with a couple of broad questions prompted by the book. But, before I delve into the details, here’s my quick take on Darling-Hammond’s overall stance.
We are at a crossroads in education; two paths lay before us. The first seems shorter, easier and more straightforward. The second seems long, winding and difficult. The big problem is that the first path does not really lead to where we need to go; in fact, it is taking us in the opposite direction. So, despite appearances, more steady progress will be made if we take the more difficult route. This book is a guide on how to get teacher evaluation right, not how to do it quickly or with minimal effort. So, in a way, the big message or take away is: There are no shortcuts. Read More »
A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.
Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.
The short answer is that there were meaningful differences. Read More »
Our guest author today is Dan Goldhaber, Director of the Center for Education Data & Research and a Research Professor in Interdisciplinary Arts and Sciences at the University of Washington Bothell.
Let me begin with a disclosure: I am an advocate of experimenting with using value added, where possible, as part of a more comprehensive system of teacher evaluation. The reasons are pretty simple (though articulated in more detail in a brief, which you can read here). The most important reason is that value-added information about teachers appears to be a better predictor of future success in the classroom than other measures we currently use. This is perhaps not surprising when it comes to test scores, certainly an important measure of what students are getting out of schools, but research also shows that value added predicts very long run outcomes, such as college going and labor market earnings. Shouldn’t we be using valuable information about likely future performance when making high-stakes personnel decisions?
It almost goes without saying, but it’s still worth emphasizing, that it is impossible to avoid making high-stakes decisions. Policies that explicitly link evaluations to outcomes such as compensation and tenure are new, but even in the absence of such policies that are high-stakes for teachers, the stakes are high for students, because some of them are stuck with ineffective teachers when evaluation systems suggest, as is the case today, that nearly all teachers are effective. Read More »
There is currently a push to evaluate teacher preparation programs based in part on the value-added of their graduates. Predictably, this is a highly controversial issue, and the research supporting it is, to be charitable, still underdeveloped. At present, the evidence suggests that the differences in effectiveness between teachers trained by different prep programs may not be particularly large (see here, here, and here), though there may be exceptions (see this paper).
In the meantime, there’s an interesting little conflict underlying the debate about measuring preparation programs’ effectiveness, one that’s worth pointing out. For the purposes of this discussion, let’s put aside the very important issue of whether the models are able to account fully for where teaching candidates end up working (i.e., bias in the estimates based on school assignments/preferences), as well as (valid) concerns about judging teachers and preparation programs based solely on testing outcomes. All that aside, any assessment of preparation programs using the test-based effectiveness of their graduates is picking up on two separate factors: How well they prepare their candidates; and who applies to their programs in the first place.
In other words, programs that attract and enroll highly talented candidates might look good even if they don’t do a particularly good job preparing teachers for their eventual assignments. But does that really matter? Read More »
In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward. Read More »
Education researchers have paid a lot of attention to the sorting of teachers across schools. For example, it is well known that schools serving more low-income students tend to employ teachers who are, on average, less qualified (in terms of experience, degree, certification, etc.; also see here).
Far less well-researched, however, is the issue of sorting within schools – for example, whether teachers with certain characteristics are assigned to classes with different students than their colleagues in the same school. In addition to the obvious fact that which teachers are in front of which students every day is important, this question bears on a few major issues in education policy today. For example, there is evidence that teacher turnover is influenced by the characteristics of the students teachers teach, which means that classroom assignments might either exacerbate or mitigate mobility and attrition. In addition, teacher productivity measures such as value-added may be affected by the sorting of students into classes based on characteristics for which the models do not account, and a better understanding of the teacher/student matching process could help inform this issue.
A recent article, which was published in the journal Sociology of Education, sheds light on these topics with a very interesting look at the distribution of students across teachers’ classrooms in Miami-Dade between 2003-04 and 2010-11. The authors’ primary question is: Are certain characteristics, most notably race/ethnicity, gender, experience, or pre-service qualifications (e.g., SAT scores), associated with assignment to higher or lower-scoring students among teachers in the same school, grade, and year? Read More »
** Reprinted here
in the Washington Post
The following is written by Morgan S. Polikoff and Matthew Di Carlo. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California.
One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.
The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent. Read More »
In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.
For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory,” and very few were found to be “unsatisfactory.”
Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating – i.e., how many teachers will be fired.
Read More »
Our guest author today is David B. Cohen, a National Board Certified high school English teacher in Palo Alto, CA, and the associate director of Accomplished California Teachers (ACT). His blog is at InterACT.
As we settle into 2013, I find myself increasingly optimistic about the future of the teaching profession. There are battles ahead, debates to be had and elections to be contested, but, as Sam Cooke sang, “A change is gonna come.”
The change that I’m most excited about is the potential for a shift towards teacher leadership in schools and school systems. I’m not naive enough to believe it will be a linear or rapid shift, but I’m confident in the long-term growth of teacher leadership because it provides a common ground for stakeholders to achieve their goals, because it’s replicable and scalable, and because it’s working already.
Much of my understanding of school improvement comes from my teaching career – now approaching two decades in the classroom, mostly in public high schools. However, until six years ago, I hadn’t seen teachers putting forth a compelling argument about how we might begin to transform our profession. A key transition for me was reading a Teacher Solutions report from the Center for Teaching Quality (CTQ). That 2007 report, Performance-Pay for Teachers: Designing a System that Students Deserve, showed how the concept of performance pay could be modified and improved upon with better definitions of a variety of performance, and differentiated pay based on differentiated professional practice, rather than arbitrary test score targets. I ended up joining the CTQ Teacher Leaders Network the same year, and have had the opportunity ever since to learn from exceptional teachers from around the country. Read More »
A few weeks ago, Students First NY (SFNY) released a report, in which they presented a very simple analysis of the distribution of “unsatisfactory” teacher evaluation ratings (“U-ratings”) across New York City schools in the 2011-12 school year.
The report finds that U-ratings are distributed unequally. In particular, they are more common in schools with higher poverty, more minorities, and lower proficiency rates. Thus, the authors conclude, the students who are most in need of help are getting the worst teachers.
There is good reason to believe that schools serving larger proportions of disadvantaged students have a tougher time attracting, developing and retaining good teachers, and there is evidence of this, even based on value-added estimates, which adjust for these characteristics (also see here). However, the assumptions upon which this Students First analysis is based are better seen as empirical questions, and, perhaps more importantly, the recommendations they offer are a rather crude, narrow manifestation of market-based reform principles. Read More »
Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest book, Value-Added Measures in Education, provides an accessible review of the technical and practical issues surrounding these models.
This past November, I wrote a post for this blog about shifting course in the teacher evaluation movement and using value-added as a “screening device.” This means that the measures would be used: (1) to help identify teachers who might be struggling and for whom additional classroom observations (and perhaps other information) should be gathered; and (2) to identify classroom observers who might not be doing an effective job.
Screening takes advantage of the low cost of value-added and the fact that the estimates are more accurate in making general assessments of performance patterns across teachers, while avoiding the weaknesses of value-added—especially that the measures are often inaccurate for individual teachers, as well as confusing and not very credible among teachers when used for high-stakes decisions.
I want to thank the many people who responded to the first post. There were three main camps. Read More »
Charter schools, though they comprise a remarkably diverse sector, are quite often subject to broad generalizations. Opponents, for example, promote the characterization of charters as test prep factories, though this is a sweeping claim without empirical support. Another common stereotype is that charter schools exclude students with special needs. It is often (but not always) true that charters serve disproportionately fewer students with disabilities, but the reasons for this are complicated and vary a great deal, and there is certainly no evidence for asserting a widespread campaign of exclusion.
Of course, these types of characterizations, which are also leveled frequently at regular public schools, don’t always take the form of criticism. For instance, it is an article of faith among many charter supporters that these schools, thanks to the fact that relatively few are unionized, are better able to aggressively identify and fire low-performing teachers (and, perhaps, retain high performers). Unlike many of the generalizations from both “sides,” this one is a bit more amenable to empirical testing.
A recent paper by Joshua Cowen and Marcus Winters, published in the journal Education Finance and Policy, is among the first to take a look, and some of the results might be surprising. Read More »
One of the most frequent criticisms of value-added and other growth models is that they are “unstable” (or, more accurately, modestly stable). For instance, a teacher who is rated highly in one year might very well score toward the middle of the distribution – or even lower – in the next year (see here, here and here, or this accessible review).
Some of this year-to-year variation is “real.” A teacher might get better over the course of a year, or might have a personal problem that impedes their job performance. In addition, there could be changes in educational circumstances that are not captured by the models – e.g., a change in school leadership, new instructional policies, etc. However, a great deal of the the recorded variation is actually due to sampling error, or idiosyncrasies in student testing performance. In other words, there is a lot of “purely statistical” imprecision in any given year, and so the scores don’t always “match up” so well between years. As a result, value-added critics, including many teachers, argue that it’s not only unfair to use such error-prone measures for any decisions, but that it’s also bad policy, since we might reward or punish teachers based on estimates that could be completely different the next year.
The concerns underlying these arguments are well-founded (and, often, casually dismissed by supporters and policymakers). At the same time, however, there are a few points about the stability of value-added (or lack thereof) that are frequently ignored or downplayed in our public discourse. All of them are pretty basic and have been noted many times elsewhere, but it might be useful to discuss them very briefly. Three in particular stand out. Read More »