** Reprinted here in the Washington Post
A big part of successful policy making is unyielding attention to detail (an argument that regular readers of this blog hear often). Choices about design and implementation that may seem unimportant can play a substantial role in determining how policies play out in practice.
A new paper, co-authored by Elizabeth Davidson, Randall Reback, Jonah Rockoff and Heather Schwartz, and presented at last month’s annual conference of The Association for Education Finance and Policy, illustrates this principle vividly, and on a grand scale: With an analysis of outcomes in all 50 states during the early years of NCLB.
After a terrific summary of the law’s rules and implementation challenges, as well as some quick descriptive statistics, the paper’s main analysis is a straightforward examination of why the proportion of schools meeting AYP varied quite a bit between states. For instance, in 2003, the first year of results, 32 percent of U.S. schools failed to make AYP, but the proportion ranged from one percent in Iowa to over 80 percent in Florida.
Surprisingly, the results suggest that the primary reasons for this variation seem to have had little to do with differences in student performance. Rather, the big factors are subtle differences in rather arcane rules that each state chose during the implementation process. These decisions received little attention, yet they had a dramatic impact on the outcomes of NCLB during this time period. Read More »
** Reprinted here in the Washington Post
Last week, I attended a Center for American Progress (CAP) discussion, where UC Berkeley professor David Kirp spoke about his research on Union City’s school system, and offered some ideas from his new book, Improbable Scholars: The Rebirth of a Great American School System and a Strategy for America’s Schools.
Kirp’s work and Union City have received a lot of attention in the last month or so, and while most find the story heartening, a few commentators have had more skeptical reactions. True, this is the story of one district in one state finding success through collaboration and hard work, but research from other disciplines – sociology, business, management, organizational studies – suggests that similar human dynamics can be observed in settings other than schools and school districts. I would like to situate Kirp’s work in this broader framework; that is, among a myriad of studies – case studies, if you will – pointing to the same fundamental phenomena.
Union City is a community with an unemployment rate 60 percent higher than the national average, where three-quarters of public school students live in homes where only Spanish is spoken. About 25 years ago, the school district was in so much trouble that state officials threatened a state takeover. Since then, Union City’s measured performance has improved considerably. In 2011, almost 90 percent of the district’s students graduated from high school, and 60 percent went on to college. The change is large enough to suggest some degree of “real” improvement, and it’s plausible to believe that better school quality had at least something to do with that. So, what was Union City’s school improvement strategy? Read More »
A recent Mathematica report on the performance of KIPP charter schools expands and elaborates on their prior analyses of these schools’ (estimated) effects on average test scores and other outcomes (also here). These findings are important and interesting, and were covered extensively elsewhere.
As is usually the case with KIPP, the results stirred the full spectrum of reactions. To over-generalize a bit, critics sometimes seem unwilling to acknowledge that KIPP’s results are real no matter how well-documented they might be, whereas some proponents are quick to use KIPP to proclaim a triumph for the charter movement, one that can justify the expansion of charter sectors nationwide.
Despite all this controversy, there may be more opportunity for agreement here than meets the eye. So, let’s try to lay out a few reasonable conclusions and see if we might find some of that common ground. Read More »
In a Slate article published last October, Daniel Engber bemoans the frequently shallow use of the classic warning that “correlation does not imply causation.” Mr. Engber argues that the correlation/causation distinction has become so overused in online comments sections and other public fora as to hinder real debate. He also posits that correlation does not mean causation, but “it sure as hell provides a hint,” and can “set us down the path toward thinking through the workings of reality.”
Correlations are extremely useful, in fact essential, for guiding all kinds of inquiry. And Engber is no doubt correct that the argument is overused in public debates, often in lieu of more substantive comments. But let’s also be clear about something – careless causal inferences likely do more damage to the quality and substance of policy debates on any given day than the misuse of the correlation/causation argument does over the course of months or even years.
We see this in education constantly. For example, mayors and superintendents often claim credit for marginal increases in testing results that coincide with their holding office. The causal leaps here are pretty stunning. Read More »
Among the more persistent arguments one hears in the debate over charter schools is that the “best evidence” shows charters are more effective. I have discussed this issue before (as have others), but it seems to come up from time to time, even in mainstream media coverage.
The basic point is that we should essentially dismiss – or at least regard with extreme skepticism – the two dozen or so high-quality “non-experimental” studies, which, on the whole, show modest or no differences in test-based effectiveness between charters and comparable regular public schools. In contrast, “randomized controlled trials” (RCTs), which exploit the random assignment of admission lotteries to control for differences between students, tend to yield positive results. Since, so the story goes, the “gold standard” research shows that charters are superior, we should go with that conclusion.
RCTs, though not without their own limitations, are without question powerful, and there is plenty of subpar charter research out there. That said, however, the “best evidence” argument is not particularly compelling (and it’s also a distraction from the positive shift away from obsessing about whether charters do or don’t work toward an examination of why). A full discussion of the methodological issues in the charter school literature would be long and burdensome, but it might be helpful to lay out three very basic points to bear in mind when you hear this argument. Read More »
** Reprinted here in the Washington Post
In a recent post, Kevin Drum of Mother Jones discusses his growing skepticism about the research behind market-based education reform, and about the claims that supporters of these policies make. He cites a recent Los Angeles Times article, which discusses how, in 2000, the San Jose Unified School District in California instituted a so-called “high expectations” policy requiring all students to pass the courses necessary to attend state universities. The reported percentage of students passing these courses increased quickly, causing the district and many others to declare the policy a success. In 2005, Los Angeles Unified, the nation’s second largest district, adopted similar requirements.
For its part, the Times performed its own analysis, and found that the San Jose pass rate was actually no higher in 2011 compared with 2000 (actually, slightly lower for some subgroups), and that the district had overstated its early results by classifying students in a misleading manner. Mr. Drum, reviewing these results, concludes: “It turns out it was all a crock.”
In one sense, that’s true – the district seems to have reported misleading data. On the other hand, neither San Jose Unified’s original evidence (with or without the misclassification) nor the Times analysis is anywhere near sufficient for drawing conclusions – “crock”-based or otherwise – about the effects of this policy. This illustrates the deeper problem here, which is less about one “side” or the other misleading with research, but rather something much more difficult to address: Common misconceptions that impede deciphering good evidence from bad.
Read More »
Charter schools, though they comprise a remarkably diverse sector, are quite often subject to broad generalizations. Opponents, for example, promote the characterization of charters as test prep factories, though this is a sweeping claim without empirical support. Another common stereotype is that charter schools exclude students with special needs. It is often (but not always) true that charters serve disproportionately fewer students with disabilities, but the reasons for this are complicated and vary a great deal, and there is certainly no evidence for asserting a widespread campaign of exclusion.
Of course, these types of characterizations, which are also leveled frequently at regular public schools, don’t always take the form of criticism. For instance, it is an article of faith among many charter supporters that these schools, thanks to the fact that relatively few are unionized, are better able to aggressively identify and fire low-performing teachers (and, perhaps, retain high performers). Unlike many of the generalizations from both “sides,” this one is a bit more amenable to empirical testing.
A recent paper by Joshua Cowen and Marcus Winters, published in the journal Education Finance and Policy, is among the first to take a look, and some of the results might be surprising. Read More »
One of the most frequent criticisms of value-added and other growth models is that they are “unstable” (or, more accurately, modestly stable). For instance, a teacher who is rated highly in one year might very well score toward the middle of the distribution – or even lower – in the next year (see here, here and here, or this accessible review).
Some of this year-to-year variation is “real.” A teacher might get better over the course of a year, or might have a personal problem that impedes their job performance. In addition, there could be changes in educational circumstances that are not captured by the models – e.g., a change in school leadership, new instructional policies, etc. However, a great deal of the the recorded variation is actually due to sampling error, or idiosyncrasies in student testing performance. In other words, there is a lot of “purely statistical” imprecision in any given year, and so the scores don’t always “match up” so well between years. As a result, value-added critics, including many teachers, argue that it’s not only unfair to use such error-prone measures for any decisions, but that it’s also bad policy, since we might reward or punish teachers based on estimates that could be completely different the next year.
The concerns underlying these arguments are well-founded (and, often, casually dismissed by supporters and policymakers). At the same time, however, there are a few points about the stability of value-added (or lack thereof) that are frequently ignored or downplayed in our public discourse. All of them are pretty basic and have been noted many times elsewhere, but it might be useful to discuss them very briefly. Three in particular stand out. Read More »
** Reprinted here in the Washington Post
Former Florida Governor Jeb Bush has become one of the more influential education advocates in the country. He travels the nation armed with a set of core policy prescriptions, sometimes called the “Florida formula,” as well as “proof” that they work. The evidence that he and his supporters present consists largely of changes in average statewide test scores – NAEP and the state exam (FCAT) – since the reforms started going into place. The basic idea is that increases in testing results are the direct result of these policies.
Governor Bush is no doubt sincere in his effort to improve U.S. education, and, as we’ll see, a few of the policies comprising the “Florida formula” have some test-based track record. However, his primary empirical argument on their behalf – the coincidence of these policies’ implementation with changes in scores and proficiency rates – though common among both “sides” of the education debate, is simply not valid. We’ve discussed why this is the case many times (see here, here and here), as have countless others, in the Florida context as well as more generally.*
There is no need to repeat those points, except to say that they embody the most basic principles of data interpretation and causal inference. It would be wonderful if the evaluation of education policies – or of school systems’ performance more generally – was as easy as looking at raw, cross-sectional testing data. But it is not.
Luckily, one need not rely on these crude methods. We can instead take a look at some of the rigorous research that has specifically evaluated the core reforms comprising the “Florida formula.” As usual, it is a far more nuanced picture than supporters (and critics) would have you believe. Read More »
** Reprinted here in the Washington Post
2012 was another busy year for market-based education reform. The rapid proliferation of charter schools continued, while states and districts went about the hard work of designing and implementing new teacher evaluations that incorporate student testing data, and, in many cases, performance pay programs to go along with them.
As in previous years (see our 2010 and 2011 reviews), much of the research on these three “core areas” – merit pay, charter schools, and the use of value-added and other growth models in teacher evaluations – appeared rather responsive to the direction of policy making, but could not always keep up with its breakneck pace.*
Some lag time is inevitable, not only because good research takes time, but also because there’s a degree to which you have to try things before you can see how they work. Nevertheless, what we don’t know about these policies far exceeds what we know, and, given the sheer scope and rapid pace of reforms over the past few years, one cannot help but get the occasional “flying blind” feeling. Moreover, as is often the case, the only unsupportable position is certainty. Read More »
When I point out that raw changes in state proficiency rates or NAEP scores are not valid evidence that a policy or set of policies is “working,” I often get the following response: “Oh Matt, we can’t have a randomized trial or peer-reviewed article for everything. We have to make decisions and conclusions based on imperfect information sometimes.”
This statement is obviously true. In this case, however, it’s also a straw man. There’s a huge middle ground between the highest-quality research and the kind of speculation that often drives our education debate. I’m not saying we always need experiments or highly complex analyses to guide policy decisions (though, in general, these are always preferred and sometimes required). The point, rather, is that we shouldn’t draw conclusions based on evidence that doesn’t support those conclusions.
This, unfortunately, happens all the time. In fact, many of the more prominent advocates in education today make their cases based largely on raw changes in outcomes immediately after (or sometimes even before) their preferred policies were implemented (also see here, here, here, here, here, and here). In order to illustrate the monumental assumptions upon which these and similar claims ride, I thought it might be fun to break them down quickly, in a highly simplified fashion. So, here are the four “requirements” that must be met in order to attribute raw test score changes to a specific policy (note that most of this can be applied not only to claims that policies are working, but also to claims that they’re not working because scores or rates are flat):
Read More »
The New Teacher Project (TNTP) has released a new report on teacher retention in D.C. Public Schools (DCPS). It is a spinoff of their “The Irreplaceables” report, which was released a few months ago, and which is discussed in this post. The four (unnamed) districts from that report are also used in this one, and their results are compared with those from DCPS.
I want to look quickly at this new supplemental analysis, not to rehash the issues I raised about“The Irreplaceables,” but rather because of DCPS’s potential importance as a field test site for a host of policy reform ideas – indeed, the majority of core market-based reform policies have been in place in D.C. for several years, including teacher evaluations in which test-based measures are the dominant component, automatic dismissals based on those ratings, large performance bonuses, mutual consent for excessed teachers and a huge charter sector. There are many people itching to render a sweeping verdict, positive or negative, on these reforms, most often based on pre-existing beliefs, rather than solid evidence.
Although I will take issue with a couple of the conclusions offered in this report, I’m not going to review it systematically. I think research on retention is important, and it’s difficult to produce reports with original analysis, while very easy to pick them apart. Instead, I’m going to list a couple of findings in the report that I think are worth examining, mostly because they speak to larger issues. Read More »
You don’t have to look very far to find very strong opinions about Race to the Top (RTTT), the U.S. Department of Education’s (USED) stimulus-funded state-level grant program (which has recently been joined by a district-level spinoff). There are those who think it is a smashing success, while others assert that it is a dismal failure. The truth, of course, is that these claims, particularly the extreme views on either side, are little more than speculation.*
To win the grants, states were strongly encouraged to make several different types of changes, such as adoption of new standards, the lifting/raising of charter school caps, the installation of new data systems and the implementation of brand new teacher evaluations. This means that any real evaluation of the program’s impact will take some years and will have to be multifaceted – that is, it is certain that the implementation/effects will vary not only by each of these components, but also between states.
In other words, the success or failure of RTTT is an empirical question, one that is still almost entirely open. But there is a silver lining here: USED is at least asking that question, in the form of a five-year, $19 million evaluation program, administered through the National Center for Education Evaluation and Regional Assistance, designed to assess the impact and implementation of various RTTT-fueled policy changes, as well as those of the controversial School Improvement Grants (SIGs). Read More »
One claim that gets tossed around a lot in education circles is that “the most effective teachers produce a year and a half of learning per year, while the least effective produce a half of a year of learning.”
This talking point is used all the time in advocacy materials and news articles. Its implications are pretty clear: Effective teachers can make all the difference, while ineffective teachers can do permanent damage.
As with most prepackaged talking points circulated in education debates, the “year and a half of learning” argument, when used without qualification, is both somewhat valid and somewhat misleading. So, seeing as it comes up so often, let’s very quickly identify its origins and what it means. Read More »
Our guest authors today are Morgan Polikoff and Andrew McEachin. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California. Andrew is an Institute of Education Science postdoctoral fellow at the University of Virginia.
By now, it is painfully clear that Congress will not be revising the Elementary and Secondary Education Act (ESEA) before the November elections. And with the new ESEA waivers, who knows when the revision will happen? Congress, however, seems to have some ideas about what next-generation accountability should look like, so we thought it might be useful to examine one leading proposal and see what the likely results would be.
The proposal we refer to is the Harkin-Enzi plan, available here for review. Briefly, the plan identifies 15 percent of schools as targets of intervention, classified in three groups. First are the persistently low-achieving schools (PLAS); these are the 5 percent of schools that are the lowest performers, based on achievement level or a combination of level and growth. Next are the achievement gap schools (AGS); these are the 5 percent of schools with the largest achievement gaps between any two subgroups. Last are the lowest subgroup achievement schools (LSAS); these are the 5 percent of schools with the lowest achievement for any significant subgroup.
The goal of this proposal is both to reduce the number of schools that are identified as low-performing and to create a new operational definition of consistently low-performing schools. To that end, we wanted to know what kinds of schools these groups would target and how stable the classifications would be over time. Read More »