A couple of months ago, Bill Gates said something that received a lot of attention. With regard to his foundation’s education reform efforts, which focus most prominently on teacher evaluations, but encompass many other areas, he noted, “we don’t know if it will work.” In fact, according to Mr. Gates, “we won’t know for probably a decade.”
He’s absolutely correct. Most education policies, including (but not limited to) those geared toward shifting the distribution of teacher quality, take a long time to work (if they do work), and the research assessing these policies requires a great deal of patience. Yet so many of the most prominent figures in education policy routinely espouse the opposite viewpoint: Policies are expected to have an immediate, measurable impact (and their effects are assessed in the crudest manner imaginable).
A perfect example was the reaction to the recent release of results of the National Assessment of Educational Progress (NAEP). Read More »
Advocates of the so-called “Florida Formula,” a package of market-based reforms enacted throughout the 1990s and 2000s, some of which are now spreading rapidly in other states, traveled to Michigan this week to make their case to the state’s lawmakers, with particular emphasis on Florida’s school grading system. In addition to arguments about accessibility and parental involvement, their empirical (i.e., test-based) evidence consisted largely of the standard, invalid claims that cross-sectional NAEP increases prove the reforms’ effectiveness, along with a bonus appearance of the argument that since Florida starting grading schools, the grades have improved, even though this is largely (and demonstrably) a result of changes in the formula.
As mentioned in a previous post, I continue to be perplexed at advocates’ insistence on using this “evidence,” even though there is a decent amount of actual rigorous policy research available, much of it positive.
So, I thought it would be fun, though slightly strange, for me to try on my market-based reformer cap, and see what it would look like if this kind of testimony about the Florida reforms was actually research-based (at least the test-based evidence). Here’s a very rough outline of what I came up with: Read More »
U.S. Secretary of Education Arne Duncan recently announced that states will be given the option to postpone using the results of their new teacher evaluations for high-stakes decisions during the phase-in of the new Common Core-aligned assessments. The reaction from some advocates was swift condemnation – calling the decision little more than a “delay” and a “victory for the status quo.”
We hear these kinds of arguments frequently in education. The idea is that change must be as rapid as possible, because “kids can’t wait.” I can understand and appreciate the urgency underlying these sentiments. Policy change in education (as in other arenas) can sometimes be painfully slow, and what seem likes small roadblocks can turn out to be massive, permanent obstacles.
I will not repeat my views regarding the substance of Secretary Duncan’s decision – see this op-ed by Morgan Polikoff and myself. I would, however, like to make one very quick point about these “we need change right now because students can’t wait” arguments: Sometimes, what is called “delay” is actually better described as good policy making, and kids can wait for good policy making. Read More »
** Reprinted here
in the Washington Post
The following is written by Morgan S. Polikoff and Matthew Di Carlo. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California.
One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.
The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent. Read More »
** Reprinted here in the Washington Post
A big part of successful policy making is unyielding attention to detail (an argument that regular readers of this blog hear often). Choices about design and implementation that may seem unimportant can play a substantial role in determining how policies play out in practice.
A new paper, co-authored by Elizabeth Davidson, Randall Reback, Jonah Rockoff and Heather Schwartz, and presented at last month’s annual conference of The Association for Education Finance and Policy, illustrates this principle vividly, and on a grand scale: With an analysis of outcomes in all 50 states during the early years of NCLB.
After a terrific summary of the law’s rules and implementation challenges, as well as some quick descriptive statistics, the paper’s main analysis is a straightforward examination of why the proportion of schools meeting AYP varied quite a bit between states. For instance, in 2003, the first year of results, 32 percent of U.S. schools failed to make AYP, but the proportion ranged from one percent in Iowa to over 80 percent in Florida.
Surprisingly, the results suggest that the primary reasons for this variation seem to have had little to do with differences in student performance. Rather, the big factors are subtle differences in rather arcane rules that each state chose during the implementation process. These decisions received little attention, yet they had a dramatic impact on the outcomes of NCLB during this time period. Read More »
Earlier this week, New Jersey Governor Chris Christie announced that the state will assume control over Camden City School District. Camden will be the fourth NJ district to undergo takeover, though this is the first time that the state will be removing control from an elected local school board, which will now serve in an advisory role (and have three additional members appointed by the Governor). Over the next few weeks, NJ officials will choose a new superintendent, and begin to revamp evaluations, curricula and other core policies.
Accompanying the announcement, the Governor’s office released a two-page “fact sheet,” much of which is devoted to justifying this move to the public.
Before discussing it, let’s be clear about something - it may indeed be the case that Camden schools are so critically low-performing and/or dysfunctional as to warrant drastic intervention. Moreover, it’s at least possible that state takeover is the appropriate type of intervention to help these schools improve (though the research on this latter score is, to be charitable, undeveloped).
That said, the “fact sheet” presents relatively little valid evidence regarding the academic performance of Camden schools. Given the sheer magnitude of any takeover decision, it is crucial for the state to demonstrate publicly that they have left no stone unturned by presenting a case that is as comprehensive and compelling as possible. However, the discrepancy between that high bar and NJ’s evidence, at least that pertaining to academic outcomes, is more than a little disconcerting.
Read More »
In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.
For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory,” and very few were found to be “unsatisfactory.”
Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating – i.e., how many teachers will be fired.
Read More »
In his State of the City address last month, New York City Mayor Michael Bloomberg made some brief comments about the upcoming adoption of new assessments aligned with the Common Core State Standards (CCSS), including the following statement:
But no matter where the definition of proficiency is arbitrarily set on the new tests, I expect that our students’ progress will continue outpacing the rest of the State’s[,] the only meaningful measurement of progress we have.
On the surface, this may seem like just a little bit of healthy bravado. But there are a few things about this single sentence that struck me, and it also helps to illustrate an important point about the relationship between standards and testing results. Read More »
Some Florida officials are still having trouble understanding why they’re finding no relationship between the grades schools receive and the evaluation ratings of teachers in those schools. For his part, new Florida education Commissioner Tony Bennett is also concerned. According to the article linked above, he acknowledges (to his credit) that the two measures are different, but is also considering “revis[ing] the models to get some fidelity between the two rankings.”
This may be turning into a potentially risky situation. As discussed in a recent post, it is important to examine the results of the new teacher evaluations, but there is no reason one would expect to find a strong relationship between these ratings and the school grades, as they are in large part measuring different things (and imprecisely at that). The school grades are mostly (but not entirely) driven by how highly students score, whereas teacher evaluations are, to the degree possible, designed to be independent of these absolute performance levels. Florida cannot validate one system using the other.
However, as also mentioned in that post, this is not to say that there should be no relationship at all. For example, both systems include growth-oriented measures (albeit using very different approaches). In addition, schools with lower average performance levels sometimes have trouble recruiting and retaining good teachers. Due to these and other factors, the reasonable expectation is to find some association overall, just not one that’s extremely strong. And that’s basically what one finds, even using the same set of results upon which the claims that there is no relationship are based.
Read More »
A few weeks ago, Students First NY (SFNY) released a report, in which they presented a very simple analysis of the distribution of “unsatisfactory” teacher evaluation ratings (“U-ratings”) across New York City schools in the 2011-12 school year.
The report finds that U-ratings are distributed unequally. In particular, they are more common in schools with higher poverty, more minorities, and lower proficiency rates. Thus, the authors conclude, the students who are most in need of help are getting the worst teachers.
There is good reason to believe that schools serving larger proportions of disadvantaged students have a tougher time attracting, developing and retaining good teachers, and there is evidence of this, even based on value-added estimates, which adjust for these characteristics (also see here). However, the assumptions upon which this Students First analysis is based are better seen as empirical questions, and, perhaps more importantly, the recommendations they offer are a rather crude, narrow manifestation of market-based reform principles. Read More »
Our guest author today is Douglas N. Harris, associate professor of economics and University Endowed Chair in Public Education at Tulane University in New Orleans. His latest book, Value-Added Measures in Education, provides an accessible review of the technical and practical issues surrounding these models.
This past November, I wrote a post for this blog about shifting course in the teacher evaluation movement and using value-added as a “screening device.” This means that the measures would be used: (1) to help identify teachers who might be struggling and for whom additional classroom observations (and perhaps other information) should be gathered; and (2) to identify classroom observers who might not be doing an effective job.
Screening takes advantage of the low cost of value-added and the fact that the estimates are more accurate in making general assessments of performance patterns across teachers, while avoiding the weaknesses of value-added—especially that the measures are often inaccurate for individual teachers, as well as confusing and not very credible among teachers when used for high-stakes decisions.
I want to thank the many people who responded to the first post. There were three main camps. Read More »
Last week, Florida State Senate President Don Gaetz (R – Niceville) expressed his skepticism about the recently-released results of the state’s new teacher evaluation system. The senator was particularly concerned about his comparison of the ratings with schools’ “A-F” grades. He noted, “If you have a C school, 90 percent of the teachers in a C school can’t be highly effective. That doesn’t make sense.”
There’s an important discussion to be had about the results of both the school and teacher evaluation systems, and the distributions of the ratings can definitely be part of that discussion (even if this issue is sometimes approached in a superficial manner). However, arguing that we can validate Florida’s teacher evaluations using its school grades, or vice-versa, suggests little understanding of either. Actually, given the design of both systems, finding a modest or even weak association between them would make pretty good sense.
In order to understand why, there are two facts to consider. Read More »
** Reprinted here in the Washington Post
Former Florida Governor Jeb Bush has become one of the more influential education advocates in the country. He travels the nation armed with a set of core policy prescriptions, sometimes called the “Florida formula,” as well as “proof” that they work. The evidence that he and his supporters present consists largely of changes in average statewide test scores – NAEP and the state exam (FCAT) – since the reforms started going into place. The basic idea is that increases in testing results are the direct result of these policies.
Governor Bush is no doubt sincere in his effort to improve U.S. education, and, as we’ll see, a few of the policies comprising the “Florida formula” have some test-based track record. However, his primary empirical argument on their behalf – the coincidence of these policies’ implementation with changes in scores and proficiency rates – though common among both “sides” of the education debate, is simply not valid. We’ve discussed why this is the case many times (see here, here and here), as have countless others, in the Florida context as well as more generally.*
There is no need to repeat those points, except to say that they embody the most basic principles of data interpretation and causal inference. It would be wonderful if the evaluation of education policies – or of school systems’ performance more generally – was as easy as looking at raw, cross-sectional testing data. But it is not.
Luckily, one need not rely on these crude methods. We can instead take a look at some of the rigorous research that has specifically evaluated the core reforms comprising the “Florida formula.” As usual, it is a far more nuanced picture than supporters (and critics) would have you believe. Read More »
** Reprinted here in the Washington Post
2012 was another busy year for market-based education reform. The rapid proliferation of charter schools continued, while states and districts went about the hard work of designing and implementing new teacher evaluations that incorporate student testing data, and, in many cases, performance pay programs to go along with them.
As in previous years (see our 2010 and 2011 reviews), much of the research on these three “core areas” – merit pay, charter schools, and the use of value-added and other growth models in teacher evaluations – appeared rather responsive to the direction of policy making, but could not always keep up with its breakneck pace.*
Some lag time is inevitable, not only because good research takes time, but also because there’s a degree to which you have to try things before you can see how they work. Nevertheless, what we don’t know about these policies far exceeds what we know, and, given the sheer scope and rapid pace of reforms over the past few years, one cannot help but get the occasional “flying blind” feeling. Moreover, as is often the case, the only unsupportable position is certainty. Read More »
The New Teacher Project’s (TNTP) recent report on teacher retention, called “The Irreplaceables,” garnered quite a bit of media attention. In a discussion of this report, I argued, among other things, that the label “irreplaceable” is a highly exaggerated way of describing their definitions, which, by the way, varied between the five districts included in the analysis. In general, TNTP’s definitions are better-described as “probably above average in at least one subject” (and this distinction matters for how one interprets the results).
I’d like to elaborate a bit on this issue – that is, how to categorize teachers’ growth model estimates, which one might do, for example, when incorporating them into a final evaluation score. This choice, which receives virtually no discussion in TNTP’s report, is always a judgment call to some degree, but it’s an important one for accountability policies. Many states and districts are drawing those very lines between teachers (and schools), and attaching consequences and rewards to the outcomes.
Let’s take a very quick look, using the publicly-released 2010 “teacher data reports” from New York City (there are details about the data in the first footnote*). Keep in mind that these are just value-added estimates, and are thus, at best, incomplete measures of the performance of teachers (however, importantly, the discussion below is not specific to growth models; it can apply to many different types of performance measures). Read More »