One claim that gets tossed around a lot in education circles is that “the most effective teachers produce a year and a half of learning per year, while the least effective produce a half of a year of learning.”
This talking point is used all the time in advocacy materials and news articles. Its implications are pretty clear: Effective teachers can make all the difference, while ineffective teachers can do permanent damage.
As with most prepackaged talking points circulated in education debates, the “year and a half of learning” argument, when used without qualification, is both somewhat valid and somewhat misleading. So, seeing as it comes up so often, let’s very quickly identify its origins and what it means. Read More »
Our guest authors today are Morgan Polikoff and Andrew McEachin. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California. Andrew is an Institute of Education Science postdoctoral fellow at the University of Virginia.
By now, it is painfully clear that Congress will not be revising the Elementary and Secondary Education Act (ESEA) before the November elections. And with the new ESEA waivers, who knows when the revision will happen? Congress, however, seems to have some ideas about what next-generation accountability should look like, so we thought it might be useful to examine one leading proposal and see what the likely results would be.
The proposal we refer to is the Harkin-Enzi plan, available here for review. Briefly, the plan identifies 15 percent of schools as targets of intervention, classified in three groups. First are the persistently low-achieving schools (PLAS); these are the 5 percent of schools that are the lowest performers, based on achievement level or a combination of level and growth. Next are the achievement gap schools (AGS); these are the 5 percent of schools with the largest achievement gaps between any two subgroups. Last are the lowest subgroup achievement schools (LSAS); these are the 5 percent of schools with the lowest achievement for any significant subgroup.
The goal of this proposal is both to reduce the number of schools that are identified as low-performing and to create a new operational definition of consistently low-performing schools. To that end, we wanted to know what kinds of schools these groups would target and how stable the classifications would be over time. Read More »
Economist Jesse Rothstein recently released a working paper about which I am compelled to write, as it speaks directly to so many of the issues that we have raised here over the past year or two. The purpose of Rothstein’s analysis is to move beyond the talking points about teaching quality in order to see if strategies that have been proposed for improving it might yield benefits. In particular, he examines two labor market-oriented policies: performance pay and dismissing teachers.
Both strategies are, at their cores, focused on selection (and deselection) – in other words, attracting and retaining higher-performing candidates and exiting, directly or indirectly, lower-performing incumbents. Both also take time to work and have yet to be experimented with systematically in most places; thus, there is relatively little evidence on the long-term effects of either.
Rothstein’s approach is to model this complex dynamic, specifically the labor market behavior of teachers under these policies (i.e., choosing, leaving and staying in teaching), which is often ignored or assumed away, despite the fact that it is so fundamental to the policies themselves. He then calculates what would happen under this model as a result of performance pay and dismissal policies – that is, how they would affect the teacher labor market and, ultimately, student performance.*
Of course, this is just a simulation, and must be (carefully) interpreted as such, but I think the approach and findings help shed light on three fundamental points about education reform in the U.S. Read More »
Our guest author today is Ed Fuller, Associate Professor in the Education Leadership Department at Penn State University. He is also the Director of the Center for Evaluation and Education Policy Analysis as well as the Associate Director for Policy of the University Council for Educational Administration.
“No one knows who I am,” exclaimed a senior in a high-poverty, predominantly minority and low-performing high school in the Austin area. She explained, “I have been at this school four years and had four principals and six algebra I teachers.”
Elsewhere in Texas, the first school to be closed by the state for low performance was Johnston High School, which was led by 13 principals in the 11 years preceding closure. The school also had a teacher turnover rate greater than 25 percent for almost all of the years and greater than 30 percent for 7 of the years.
While the above examples are rather extreme cases, they do underscore two interconnected issues – teacher and principal turnover – that often plague low-performing schools and, in the case of principal turnover, afflict a wide range of schools regardless of performance or school demographics. Read More »
A recent Economist article on charter schools, though slightly more nuanced than most mainstream media treatments of the charter evidence, contains a very common, somewhat misleading argument that I’d like to address quickly. It’s about the findings of the so-called “CREDO study,” the important (albeit over-cited) 2009 national comparison of student achievement in charter and regular public schools in 16 states.
Specifically, the article asserts that the CREDO analysis, which finds a statistically discernible but very small negative impact of charters overall (with wide underlying variation), also finds a significant positive effect among low-income students. This leads the Economist to conclude that the entire CREDO study “has been misinterpreted,” because it’s real value is in showing that “the children who most need charters have been served well.”
Whether or not an intervention affects outcomes among subgroups of students is obviously important (though one has hardly “misinterpreted” a study by focusing on its overall results). And CREDO does indeed find a statistically significant, positive test-based impact of charters on low-income students, vis-à-vis their counterparts in regular public schools. However, as discussed here (and in countless textbooks and methods courses), statistical significance only means we can be confident that the difference is non-zero (it cannot be chalked up to random fluctuation). Significant differences are often not large enough to be practically meaningful.
And this is certainly the case with CREDO and low-income students. Read More »
Our guest author today is David Dunning, professor of psychology at Cornell University, and a fellow of both the American Psychological Society and the American Psychological Association.
When I was a younger academic, I often taught a class on research methods in the behavioral sciences. On the first day of that class, I took as my mission to teach students only one thing—that conducting research in the behavioral sciences ages a person. I meant that in two ways. First, conducting research is humbling and frustrating. I cannot count the number of pet ideas I have had through the years, all of them beloved, that have gone to die in the laboratory at the hands of data unwilling to verify them.
But, second, there is another, more positive way in which research ages a person. At times, data come back and verify a cherished idea, or even reveal a more provocative or valuable one that no one has never expected. It is a heady experience in those moments for the researcher to know something that perhaps no one else knows, to be wiser—more aged if you will—in a small corner of the human experience that he or she cares about deeply. Read More »
There is currently an ongoing rhetorical war of sorts over the gender wage gap. One “side” makes the common argument that women earn around 75 cents on the male dollar (see here, for example).
Others assert that the gender gap is a myth, or that it is so small as to be unimportant.
Often, these types of exchanges are enough to exasperate the casual observer, and inspire claims such as “statistics can be made to say anything.” In truth, however, the controversy over the gender gap is a good example of how descriptive statistics, by themselves, say nothing. What matters is how they’re interpreted.
Moreover, the manner in which one must interpret various statistics on the gender gap applies almost perfectly to the achievement gaps that are so often mentioned in education debates. Read More »
There’s a fairly large body of research showing that charter schools vary widely in test-based performance relative to regular public schools, both by location as well as subgroup. Yet, you’ll often hear people point out that the highest-quality evidence suggests otherwise (see here, here and here) – i.e., that there are a handful of studies using experimental methods (randomized controlled trials, or RCTs) and these analyses generally find stronger, more uniform positive charter impacts.
Sometimes, this argument is used to imply that the evidence, as a whole, clearly favors charters, and, perhaps by extension, that many of the rigorous non-experimental charter studies – those using sophisticated techniques to control for differences between students – would lead to different conclusions were they RCTs.*
Though these latter assertions are based on a valid point about the power of experimental studies (the few of which we have are often ignored in the debate over charters), they are dubiously overstated for a couple of reasons, discussed below. But a new report from the (indispensable) organization Mathematica addresses the issue head on, by directly comparing estimates of charter school effects that come from an experimental analysis with those from non-experimental analyses of the same group of schools.
The researchers find that there are differences in the results, but many are not statistically significant and those that are don’t usually alter the conclusions. This is an important (and somewhat rare) study, one that does not, of course, settle the issue, but does provide some additional tentative support for the use of strong non-experimental charter research in policy decisions.
Read More »
Charter schools in New Orleans (NOLA) now serve over four out of five students in the city – the largest market share of any big city in the nation. As of the 2011-12 school year, most of the city’s schools (around 80 percent), charter and regular public, are overseen by the Recovery School District (RSD), a statewide agency created in 2003 to take over low-performing schools, which assumed control of most NOLA schools in Katrina’s aftermath.
Around three-quarters of these RSD schools (50 out of 66) are charters. The remainder of NOLA’s schools are overseen either by the Orleans Parish School Board (which is responsible for 11 charters and six regular public schools, and taxing authority for all parish schools) or by the Louisiana Board of Elementary and Secondary Education (which is directly responsible for three charters, and also supervises the RSD).
New Orleans is often held up as a model for the rapid expansion of charter schools in other urban districts, based on the argument that charter proliferation since 2005-06 has generated rapid improvements in student outcomes. There are two separate claims potentially embedded in this argument. The first is that the city’s schools perform better that they did pre-Katrina. The second is that NOLA’s charters have outperformed the city’s dwindling supply of traditional public schools since the hurricane.
Although I tend strongly toward the viewpoint that whether charter schools “work” is far less important than why – e.g., specific policies and practices – it might nevertheless be useful to quickly address both of the claims above, given all the attention paid to charters in New Orleans. Read More »
Those following education know that policy focused on “teacher quality” is by far the dominant paradigm for improving schools over the past few years. Some (but not nearly all) components of this all-hands-on-deck effort are perplexing to many teachers, and have generated quite a bit of pushback. No matter one’s opinion of this approach, however, what drives it is the tantalizing allure of variation in teacher quality.
Fueled by the ever-increasing availability of detailed test score datasets linking teachers to students, the research literature on teachers’ test-based effectiveness has grown rapidly, in both size and sophistication. Analysis after analysis finds that, all else being equal, the variation in teachers’ estimated effects on students’ test growth – the difference between the “top” and “bottom” teachers – is very large. In any given year, some teachers’ students make huge progress, others’ very little. Even if part of this estimated variation is attributable to confounding factors, the discrepancies are still larger than most any other measured “input” within the jurisdiction of education policy. The underlying assumption here is that “true” teacher quality varies to a degree that is at least somewhat comparable in magnitude to the spread of the test-based estimates.
Perhaps that’s the case, but it does not, by itself, help much. The key question is whether and how we can measure teacher performance at the individual level and, more importantly, influence the distribution – that is, to raise the ceiling, the middle and/or the floor. The variation hangs out there like a drug to which we’re addicted, but haven’t really figured out how to administer. If there was some way to harness it efficiently, the potential benefits could be considerable. The focus of current education policy is in large part an effort to do anything and everything to try and figure this out. And, as might be expected given the enormity of the task, progress has been slow. Read More »
In a previous post, I compared value-added (VA) and classroom observations in terms of reliability – the degree to which they are free of error and stable over repeated measurements. But even the most reliable measures aren’t useful unless they are valid – that is, unless they’re measuring what we want them to measure.
Arguments over the validity of teacher performance measures, especially value-added, dominate our discourse on evaluations. There are, in my view, three interrelated issues to keep in mind when discussing the validity of VA and observations. The first is definitional – in a research context, validity is less about a measure itself than the inferences one draws from it. The second point might follow from the first: The validity of VA and observations should be assessed in the context of how they’re being used.
Third and finally, given the difficulties in determining whether either measure is valid in and of itself, as well as the fact that so many states and districts are already moving ahead with new systems, the best approach at this point may be to judge validity in terms of whether the evaluations are improving outcomes. And, unfortunately, there is little indication that this is happening in most places. Read More »
Although most new teacher evaluations are still in various phases of pre-implementation, it’s safe to say that classroom observations and/or value-added (VA) scores will be the most heavily-weighted components toward teachers’ final scores, depending on whether teachers are in tested grades and subjects. One gets the general sense that many – perhaps most – teachers strongly prefer the former (observations, especially peer observations) over the latter (VA).
One of the most common arguments against VA is that the scores are error-prone and unstable over time – i.e., that they are unreliable. And it’s true that the scores fluctuate between years (also see here), with much of this instability due to measurement error, rather than “real” performance changes. On a related note, different model specifications and different tests can yield very different results for the same teacher/class.
These findings are very important, and often too casually dismissed by VA supporters, but the issue of reliability is, to varying degrees, endemic to all performance measurement. Actually, many of the standard reliability-based criticisms of value-added could also be leveled against observations. Since we cannot observe “true” teacher performance, it’s tough to say which is “better” or “worse,” despite the certainty with which both “sides” often present their respective cases. And, the fact that both entail some level of measurement error doesn’t by itself speak to whether they should be part of evaluations.*
Nevertheless, many states and districts have already made the choice to use both measures, and in these places, the existence of imprecision is less important than how to deal with it. Viewed from this perspective, VA and observations are in many respects more alike than different. Read More »
Education policymaking and debates are under constant threat from an improbable assailant: Short-term changes in cross-sectional proficiency rates.
The use of rate changes is still proliferating rapidly at all levels of our education system. These measures, which play an important role in the provisions of No Child Left Behind, are already prominent components of many states’ core accountability systems (e..g, California), while several others will be using some version of them in their new, high-stakes school/district “grading systems.” New York State is awarding millions in competitive grants, with almost half the criteria based on rate changes. District consultants issue reports recommending widespread school closures and reconstitutions based on these measures. And, most recently, U.S. Secretary of Education Arne Duncan used proficiency rate increases as “preliminary evidence” supporting the School Improvement Grants program.
Meanwhile, on the public discourse front, district officials and other national leaders use rate changes to “prove” that their preferred reforms are working (or are needed), while their critics argue the opposite. Similarly, entire charter school sectors are judged, up or down, by whether their raw, unadjusted rates increase or decrease.
So, what’s the problem? In short, it’s that year-to-year changes in proficiency rates are not valid evidence of school or policy effects. These measures cannot do the job we’re having them do, even on a limited basis. This really has to stop. Read More »
Knewton, a technology firm founded in 2008, has developed an “adaptive learning platform” that received significant media attention (also here, here, here and here), as well as funding and recognition early last fall and, again, in February this year (here and here). Although the firm is not alone in the adaptive learning game – e.g., Dreambox, Carnegie Learning – Knewton’s partnership with Pearson puts the company in a whole different league.
Adaptive learning takes advantage of student-generated information; thus, important questions about data use and ownership need to be brought to the forefront of the technology debate.
Adaptive learning software adjusts the presentation of educational content to students’ needs, based on students’ prior responses to such content. In the world of research, such ‘prior responses’ would count and be treated as data. To the extent that adaptive learning is a mechanism for collecting information about learners, questions about privacy, confidentiality and ownership should be addressed. Read More »
There is a small but growing body of evidence about the (usually test-based) effectiveness of teachers from Teach for America (TFA), an extremely selective program that trains and places new teachers in mostly higher needs schools and districts. Rather than review this literature paper-by-paper, which has already been done by others (see here and here), I’ll just give you the super-short summary of the higher-quality analyses, and quickly discuss what I think it means.*
The evidence on TFA teachers focuses mostly on comparing their effect on test score growth vis-à-vis other groups of teachers who entered the profession via traditional certification (or through other alternative routes). This is no easy task, and the findings do vary quite a bit by study, as well as by the group to which TFA corps members are compared (e.g., new or more experienced teachers). One can quibble endlessly over the methodological details (and I’m all for that), and this area is still underdeveloped, but a fair summary of these papers is that TFA teachers are no more or less effective than comparable peers in terms of reading tests, and sometimes but not always more effective in math (the differences, whether positive or negative, tend to be small and/or only surface after 2-3 years). Overall, the evidence thus far suggests that TFA teachers perform comparably, at least in terms of test-based outcomes.
Somewhat in contrast with these findings, TFA has been the subject of both intensive criticism and fawning praise. I don’t want to engage this debate directly, except to say that there has to be some middle ground on which a program that brings talented young people into the field of education is not such a divisive issue. I do, however, want to make a wider point specifically about the evidence on TFA teachers – what it might suggest about the current focus to “attract the best people” to the profession.
Read More »