In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.
For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory,” and very few were found to be “unsatisfactory.”
Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating – i.e., how many teachers will be fired.
Read More »
Our guest author today is David B. Cohen, a National Board Certified high school English teacher in Palo Alto, CA, and the associate director of Accomplished California Teachers (ACT). His blog is at InterACT.
As we settle into 2013, I find myself increasingly optimistic about the future of the teaching profession. There are battles ahead, debates to be had and elections to be contested, but, as Sam Cooke sang, “A change is gonna come.”
The change that I’m most excited about is the potential for a shift towards teacher leadership in schools and school systems. I’m not naive enough to believe it will be a linear or rapid shift, but I’m confident in the long-term growth of teacher leadership because it provides a common ground for stakeholders to achieve their goals, because it’s replicable and scalable, and because it’s working already.
Much of my understanding of school improvement comes from my teaching career – now approaching two decades in the classroom, mostly in public high schools. However, until six years ago, I hadn’t seen teachers putting forth a compelling argument about how we might begin to transform our profession. A key transition for me was reading a Teacher Solutions report from the Center for Teaching Quality (CTQ). That 2007 report, Performance-Pay for Teachers: Designing a System that Students Deserve, showed how the concept of performance pay could be modified and improved upon with better definitions of a variety of performance, and differentiated pay based on differentiated professional practice, rather than arbitrary test score targets. I ended up joining the CTQ Teacher Leaders Network the same year, and have had the opportunity ever since to learn from exceptional teachers from around the country. Read More »
A few weeks ago, Students First NY (SFNY) released a report, in which they presented a very simple analysis of the distribution of “unsatisfactory” teacher evaluation ratings (“U-ratings”) across New York City schools in the 2011-12 school year.
The report finds that U-ratings are distributed unequally. In particular, they are more common in schools with higher poverty, more minorities, and lower proficiency rates. Thus, the authors conclude, the students who are most in need of help are getting the worst teachers.
There is good reason to believe that schools serving larger proportions of disadvantaged students have a tougher time attracting, developing and retaining good teachers, and there is evidence of this, even based on value-added estimates, which adjust for these characteristics (also see here). However, the assumptions upon which this Students First analysis is based are better seen as empirical questions, and, perhaps more importantly, the recommendations they offer are a rather crude, narrow manifestation of market-based reform principles. Read More »
Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest book, Value-Added Measures in Education, provides an accessible review of the technical and practical issues surrounding these models.
This past November, I wrote a post for this blog about shifting course in the teacher evaluation movement and using value-added as a “screening device.” This means that the measures would be used: (1) to help identify teachers who might be struggling and for whom additional classroom observations (and perhaps other information) should be gathered; and (2) to identify classroom observers who might not be doing an effective job.
Screening takes advantage of the low cost of value-added and the fact that the estimates are more accurate in making general assessments of performance patterns across teachers, while avoiding the weaknesses of value-added—especially that the measures are often inaccurate for individual teachers, as well as confusing and not very credible among teachers when used for high-stakes decisions.
I want to thank the many people who responded to the first post. There were three main camps. Read More »
Charter schools, though they comprise a remarkably diverse sector, are quite often subject to broad generalizations. Opponents, for example, promote the characterization of charters as test prep factories, though this is a sweeping claim without empirical support. Another common stereotype is that charter schools exclude students with special needs. It is often (but not always) true that charters serve disproportionately fewer students with disabilities, but the reasons for this are complicated and vary a great deal, and there is certainly no evidence for asserting a widespread campaign of exclusion.
Of course, these types of characterizations, which are also leveled frequently at regular public schools, don’t always take the form of criticism. For instance, it is an article of faith among many charter supporters that these schools, thanks to the fact that relatively few are unionized, are better able to aggressively identify and fire low-performing teachers (and, perhaps, retain high performers). Unlike many of the generalizations from both “sides,” this one is a bit more amenable to empirical testing.
A recent paper by Joshua Cowen and Marcus Winters, published in the journal Education Finance and Policy, is among the first to take a look, and some of the results might be surprising. Read More »
One of the most frequent criticisms of value-added and other growth models is that they are “unstable” (or, more accurately, modestly stable). For instance, a teacher who is rated highly in one year might very well score toward the middle of the distribution – or even lower – in the next year (see here, here and here, or this accessible review).
Some of this year-to-year variation is “real.” A teacher might get better over the course of a year, or might have a personal problem that impedes their job performance. In addition, there could be changes in educational circumstances that are not captured by the models – e.g., a change in school leadership, new instructional policies, etc. However, a great deal of the the recorded variation is actually due to sampling error, or idiosyncrasies in student testing performance. In other words, there is a lot of “purely statistical” imprecision in any given year, and so the scores don’t always “match up” so well between years. As a result, value-added critics, including many teachers, argue that it’s not only unfair to use such error-prone measures for any decisions, but that it’s also bad policy, since we might reward or punish teachers based on estimates that could be completely different the next year.
The concerns underlying these arguments are well-founded (and, often, casually dismissed by supporters and policymakers). At the same time, however, there are a few points about the stability of value-added (or lack thereof) that are frequently ignored or downplayed in our public discourse. All of them are pretty basic and have been noted many times elsewhere, but it might be useful to discuss them very briefly. Three in particular stand out. Read More »
The New Teacher Project’s (TNTP) recent report on teacher retention, called “The Irreplaceables,” garnered quite a bit of media attention. In a discussion of this report, I argued, among other things, that the label “irreplaceable” is a highly exaggerated way of describing their definitions, which, by the way, varied between the five districts included in the analysis. In general, TNTP’s definitions are better-described as “probably above average in at least one subject” (and this distinction matters for how one interprets the results).
I’d like to elaborate a bit on this issue – that is, how to categorize teachers’ growth model estimates, which one might do, for example, when incorporating them into a final evaluation score. This choice, which receives virtually no discussion in TNTP’s report, is always a judgment call to some degree, but it’s an important one for accountability policies. Many states and districts are drawing those very lines between teachers (and schools), and attaching consequences and rewards to the outcomes.
Let’s take a very quick look, using the publicly-released 2010 “teacher data reports” from New York City (there are details about the data in the first footnote*). Keep in mind that these are just value-added estimates, and are thus, at best, incomplete measures of the performance of teachers (however, importantly, the discussion below is not specific to growth models; it can apply to many different types of performance measures). Read More »
** Reprinted here in the Washington Post
In a recent Washington Post article called “Teachers leaning in favor of reforms,” veteran reporter Jay Mathews puts forth an argument that one hears rather frequently – that teachers are “changing their minds,” in a favorable direction, about the current wave of education reform. Among other things, Mr. Mathews cites two teacher surveys. One of them, which we discussed here, is a single-year survey that doesn’t actually look at trends, and therefore cannot tell us much about shifts in teachers’ attitudes over time (it was also a voluntary online survey).
His second source, on the other hand, is in fact a useful means of (cautiously) assessing such trends (though the article doesn’t actually look at them). That is the Education Sector survey of a nationally-representative sample of U.S. teachers, which they conducted in 2003, 2007 and, most recently, in 2011.
This is a valuable resource. Like other teacher surveys, it shows that educators’ attitudes toward education policy are diverse. Opinions vary by teacher characteristics, context and, of course, by the policy being queried. Moreover, views among teachers can (and do) change over time, though, when looking at cross-sectional surveys, one must always keep in mind that observed changes (or lack thereof) might be due in part to shifts in the characteristics of the teacher workforce. There’s an important distinction between changing minds and changing workers (which Jay Mathews, to his great credit, discusses in this article).*
That said, when it comes to the many of the more controversial reforms happening in the U.S., those about which teachers might be “changing their minds,” the results of this particular survey suggest, if anything, that teachers’ attitudes are actually quite stable. Read More »
** Reprinted here in the Washington Post
Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest book, Value-Added Measures in Education, provides an excellent, accessible review of the technical and practical issues surrounding these models.
Now that the election is over, the Obama Administration and policymakers nationally can return to governing. Of all the education-related decisions that have to be made, the future of teacher evaluation has to be front and center.
In particular, how should “value-added” measures be used in teacher evaluation? President Obama’s Race to the Top initiative expanded the use of these measures, which attempt to identify how much each teacher contributes to student test scores. In doing so, the initiative embraced and expanded the controversial reliance on standardized tests that started under President Bush’s No Child Left Behind.
In many respects, The Race was well designed. It addresses an important problem – the vast majority of teachers report receiving limited quality feedback on instruction. As a competitive grants program, it was voluntary for states to participate (though involuntary for many districts within those states). The Administration also smartly embraced the idea of multiple measures of teacher performance.
But they also made one decision that I think was a mistake. They encouraged—or required, depending on your vantage point—states to lump value-added or other growth model estimates together with other measures. The raging debate since then has been over what percentage of teachers’ final ratings should be given to value-added versus the other measures. I believe there is a better way to approach this issue, one that focuses on teacher evaluations not as a measure, but rather as a process. Read More »
People often ask me for my “bottom line” on using value-added (or other growth model) estimates in teacher evaluations. I’ve written on this topic many times, and while I have in fact given my overall opinion a couple of times, I have avoided expressing it in a strong “yes or no” format. There’s a reason for this, and I thought maybe I would write a short piece and explain myself.
My first reaction to the queries about where I stand on value-added is a shot of appreciation that people are interested in my views, followed quickly by an acute rush of humility and reticence. I know think tank people aren’t supposed to say things like this, but when it comes to sweeping, big picture conclusions about the design of new evaluations, I’m not sure my personal opinion is particularly important.
Frankly, given the importance of how people on the ground respond to these types of policies, as well as, of course, their knowledge of how schools operate, I would be more interested in the views of experienced, well-informed teachers and administrators than my own. And I am frequently taken aback by the unadulterated certainty I hear coming from advocates and others about this completely untested policy. That’s why I tend to focus on aspects such as design details and explaining the research – these are things I feel qualified to discuss. (I also, by the way, acknowledge that it’s very easy for me to play armchair policy general when it’s not my job or working conditions that might be on the line.)
That said, here’s my general viewpoint, in two parts. First, my sense, based on the available evidence, is that value-added should be given a try in new teacher evaluations. Read More »
The New Teacher Project (TNTP) has released a new report on teacher retention in D.C. Public Schools (DCPS). It is a spinoff of their “The Irreplaceables” report, which was released a few months ago, and which is discussed in this post. The four (unnamed) districts from that report are also used in this one, and their results are compared with those from DCPS.
I want to look quickly at this new supplemental analysis, not to rehash the issues I raised about“The Irreplaceables,” but rather because of DCPS’s potential importance as a field test site for a host of policy reform ideas – indeed, the majority of core market-based reform policies have been in place in D.C. for several years, including teacher evaluations in which test-based measures are the dominant component, automatic dismissals based on those ratings, large performance bonuses, mutual consent for excessed teachers and a huge charter sector. There are many people itching to render a sweeping verdict, positive or negative, on these reforms, most often based on pre-existing beliefs, rather than solid evidence.
Although I will take issue with a couple of the conclusions offered in this report, I’m not going to review it systematically. I think research on retention is important, and it’s difficult to produce reports with original analysis, while very easy to pick them apart. Instead, I’m going to list a couple of findings in the report that I think are worth examining, mostly because they speak to larger issues. Read More »
One of the most important things in education policy to keep an eye on is the first round of changes to new teacher evaluation systems. Given all the moving parts and the lack of evidence on how these systems should be designed and their impact, course adjustments along the way are not just inevitable, but absolutely essential.
Changes might be guided by different types of evidence, such as feedback from teachers and administrators or analysis of ratings data. And, of course, human judgment will play a big role. One thing that states and districts should not be doing, however, is assessing their new systems – or making changes to them – based whether or not raw overall test scores go up or down within the first few years.
Here’s a little reality check: Even the best-designed, best-implemented new evaluations are unlikely to have an immediate measurable impact on aggregate student performance. Evaluations are an investment, not a quick fix. And they are not risk-free. Their effects will depend on the quality of systems, how current teachers and administrators react to them and how all of this shapes and plays out in the teacher labor market. As I’ve said before, the realistic expectation for overall performance – and this is no guarantee – is that there will be some very small, gradual improvements, unfolding over a period of years and decades.
States and districts that expect anything more risk making poor decisions during these crucial, early phases. Read More »
I’ve been reading Albert Shanker’s “The Power of Ideas: Al In His Own Words,” the American Educator’s compendium of Al’s speeches and columns, published posthumously in 1997. What an enjoyable, witty and informative collection of essays.
Two columns especially caught my attention: “That’s Very Unprofessional Mr. Shanker!” and “Does Pavarotti Need to File an Aria Plan” – where Al discusses expectations for (and treatment of) teachers. They made me reflect, yet again, on whether perceptions of teacher professionalism might be gendered. In other words, when society thinks of the attributes of a professional teacher, might we unconsciously be thinking of women teachers? And, if so, why might this be important?
In “That’s Very Unprofessional, Mr. Shanker!” Al writes: Read More »
One claim that gets tossed around a lot in education circles is that “the most effective teachers produce a year and a half of learning per year, while the least effective produce a half of a year of learning.”
This talking point is used all the time in advocacy materials and news articles. Its implications are pretty clear: Effective teachers can make all the difference, while ineffective teachers can do permanent damage.
As with most prepackaged talking points circulated in education debates, the “year and a half of learning” argument, when used without qualification, is both somewhat valid and somewhat misleading. So, seeing as it comes up so often, let’s very quickly identify its origins and what it means. Read More »
D.C. Public Schools (DCPS) recently announced a few significant changes to its teacher evaluation system (called IMPACT), including the alteration of its test-based components, the creation of a new performance category (“developing”), and a few tweaks to the observational component (discussed below). These changes will be effective starting this year.
As with any new evaluation system, a period of adjustment and revision should be expected and encouraged (though it might be preferable if the first round of changes occurs during a phase-in period, prior to stakes becoming attached). Yet, despite all the attention given to the IMPACT system over the past few years, these new changes have not been discussed much beyond a few quick news articles.
I think that’s unfortunate: DCPS is an early adopter of the “new breed” of teacher evaluation policies being rolled out across the nation, and any adjustments to IMPACT’s design – presumably based on results and feedback – could provide valuable lessons for states and districts in earlier phases of the process.
Accordingly, I thought I would take a quick look at three of these changes. Read More »