Anyone who follows education policy debates might hear the term “standard deviation” fairly often. Most people have at least some idea of what it means, but I thought it might be useful to lay out a quick, (hopefully) clear explanation, since it’s useful for the proper interpretation of education data and research (as well as that in other fields).
Many outcomes or measures, such as height or blood pressure, assume what’s called a “normal distribution.” Simply put, this means that such measures tend to cluster around the mean (or average), and taper off in both directions the further one moves away from the mean (due to its shape, this is often called a “bell curve”). In practice, and especially when samples are small, distributions are imperfect — e.g., the bell is messy or a bit skewed to one side — but in general, with many measures, there is clustering around the average.
Let’s use test scores as our example. Suppose we have a group of 1,000 students who take a test (scored 0-20). A simulated score distribution is presented in the figure below (called a “histogram”). Read More »
A couple of weeks ago, Michelle Rhee published an op-ed in the Washington Post speaking out against the so-called “opt out movement,” which encourages parents to refuse to let their children take standardized tests.
Personally, I oppose the “opt-out” phenomenon, but I also think it would be a mistake not to pay attention to its proponents’ fundamental issue – that standardized tests are potentially being misused and/or overused. This concern is legitimate and important. My sense is that “opting out” reflects a rather extreme version of this mindset, a belief that we cannot right the ship – i.e., we have gone so far and moved so carelessly with test-based accountability that there is no real hope that it can or will be fixed. This strikes me as a severe overreaction, but I understand the sentiment.
That said, while most of Ms. Rhee’s op-ed is the standard, reasonable fare, some of it is also laced with precisely the kind of misconceptions that contribute to the apprehensions not only of anti-testing advocates, but also among those of us who occupy a middle ground – i.e., favor some test-based accountability, but are worried about getting it right. Read More »
Last year, we published a post that included a very simple graphical illustration of what changes in cross-sectional proficiency rates or scores actually tell us about schools’ test-based effectiveness (basically nothing).
In reality, year-to-year changes in cross-sectional average rates or scores may reflect “real” improvement, at least to some degree, but, especially when measured at the school- or grade-level, they tend to be mostly error/imprecision (e.g., changes in the composition of the samples taking the test, measurement error and serious issues with converting scores to rates using cutpoints). This is why changes in scores often conflict with more rigorous indicators that employ longitudinal data.
In the aforementioned post, however, I wanted to show what the changes meant even if most of these issues disappeared magically. In this one, I would like to extend this very simple illustration, as doing so will hopefully help shed a bit more light on the common (though mistaken) assumption that effective schools or policies should generate perpetual rate/score increases.
Read More »
One of the purely presentational aspects that separates the new “generation” of CREDO charter school analyses from the old is that the more recent reports convert estimated effect sizes from standard deviations into a “days of learning” metric. You can find similar approaches in other reports and papers as well.
I am very supportive of efforts to make interpretation easier for those who aren’t accustomed to thinking in terms of standard deviations, so I like the basic motivation behind this. I do have concerns about this particular conversion — specifically, that it overstates things a bit — but I don’t want to get into that issue. If we just take CREDO’s “days of learning” conversion at face value, my primary, far more simple reaction to hearing that a given charter school sector’s impact is equivalent to a given number of additional “days of learning” is to wonder: Does this charter sector actually offer additional “days of learning,” in the form of longer school days and/or years?
This matters to me because I (and many others) have long advocated moving past the charter versus regular public school “horserace” and trying to figure out why some charters seem to do very well and others do not. Additional time is one of the more compelling observable possibilities, and while they’re not perfectly comparable, it fits nicely with the “days of learning” expression of effect sizes. Take New York City charter schools, for example. Read More »
A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data — consisting solely of changes in proficiency rates — were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.
Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.
So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now – not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions. Read More »
A recent report from the U.S. Department of Education presented a summary of three recent studies of the differences in the effectiveness of teaching provided advantaged and disadvantaged students (with the former defined in terms of value-added scores, and the latter in terms of subsidized lunch eligibility). The brief characterizes the results of these reports in an accessible manner – that the difference in estimated teaching effectiveness between advantaged and disadvantaged students varied quite widely between districts, but overall is about four percent of the achievement gap in reading and 2-3 percent in math.
Some observers were not impressed. They wondered why so-called reformers are alienating teachers and hurting students in order to address a mere 2-4 percent improvement in the achievement gap.
Just to be clear, the 2-4 percent figures describe the gap (and remember that it varies). Whether it can be narrowed or closed – e.g., by improving working conditions or offering incentives or some other means – is a separate issue. Nevertheless, let’s put aside all the substantive aspects surrounding these studies, and the issue of the distribution of teacher quality, and discuss this 2-4 percent thing, as it illustrates what I believe is the among the most important tensions underlying education policy today: Our collective failure to have a reasonable debate about expectations and the power of education policy. Read More »
In 2009, The New Teacher Project (TNTP) released a report called “The Widget Effect.” You would be hard-pressed to find too many more recent publications from an advocacy group that had a larger influence on education policy and the debate surrounding it. To this day, the report is mentioned regularly by advocates and policy makers.
The primary argument of the report was that teacher performance “is not measured, recorded, or used to inform decision making in any meaningful way.” More specifically, the report shows that most teachers received “satisfactory” or equivalent ratings, and that evaluations were not tied to most personnel decisions (e.g., compensation, layoffs, etc.). From these findings and arguments comes the catchy title – a “widget” is a fictional product commonly used in situations (e.g., economics classes) where the product doesn’t matter. Thus, treating teachers like widgets means that we treat them all as if they’re the same.
Given the influence of “The Widget Effect,” as well as how different the teacher evaluation landscape is now compared to when it was released, I decided to read it closely. Having done so, I think it’s worth discussing a few points about the report. Read More »
When looking at changes in testing results between years, many people are (justifiably) interested in comparing those changes for different student subgroups, such as those defined by race/ethnicity or income (subsidized lunch eligibility). The basic idea is to see whether increases are shared between traditionally advantaged and disadvantaged groups (and, often, to monitor achievement gaps).
Sometimes, people take this a step further by using the subgroup breakdowns as a crude check on whether cross-sectional score changes are due to changes in the sample of students taking the test. The logic is as follows: If the increases are found when comparing advantaged and more disadvantaged cohorts, then an overall increase cannot be attributed to a change in the backgrounds of students taking the test, as the subgroups exhibited the same pattern. (For reasons discussed here many times before, this is a severely limited approach.)
Whether testing data are cross-sectional or longitudinal, these subgroup breakdowns are certainly important and necessary, but it’s wise to keep in mind that standard variables, such as eligibility for free and reduced-price lunches (FRL), are imperfect proxies for student background (actually, FRL rates aren’t even such a great proxy for income). In fact, one might reach different conclusions depending on which variables are chosen. To illustrate this, let’s take a look at results from the Trial Urban District Assessment (TUDA) for the District of Columbia Public Schools between 2011 and 2013, in which there was a large overall score change that received a great deal of media attention, and break the changes down by different characteristics.
Read More »
The recent release of the National Assessment of Educational Progress (NAEP) and the companion Trial Urban District Assessment (TUDA) was predictably exploited by advocates to argue for their policy preferences. This is a blatant misuse of the data for many reasons that I have discussed here many times before, and I will not repeat them.
I do, however, want to very quickly illustrate the emptiness of this pseudo-empirical approach – finding cross-sectional cohort increases in states/districts that have recently acted policies you support, and then using the increases as evidence that the policies “work.” For example, the recent TUDA results for the District of Columbia Public Schools (DCPS), where scores increased in all four grade/subject combinations, were immediately seized upon supporters of the reforms that have been enacted by DCPS as clear-cut evidence of the policy triumph. The celebrators included the usual advocates, but also DCPS Chancellor Kaya Henderson and the U.S. Secretary of Education Arne Duncan (there was even a brief mention by President Obama in his State of The Union speech).
My immediate reaction to this bad evidence was simple (though perhaps slightly juvenile) – find a district that had similar results under a different policy environment. It was, as usual, pretty easy: Los Angeles Unified School District (LAUSD). Read More »
The U.S. Department of Education has released a very short, readable report on the comparability of value-added estimates using two different tests in Indiana – one of them norm-referenced (the Measures of Academic Progress test, or MAP), and the other criterion-referenced (the Indiana Statewide Testing for Educational Progress Plus, or ISTEP+, which is also the state’s official test for NCLB purposes).
The research design here is straightforward – fourth and fifth grade students in 46 schools across 10 districts in Indiana took both tests, their teachers’ value-added scores were calculated, and the scores were compared. Since both sets of scores were based on the same students and teachers, this is allows a direct comparison of how teachers’ value-added estimates compare between these two tests. The results are not surprising, and they square with similar prior studies (see here, here, here, for example): The estimates based on the two tests are moderately correlated. Depending on the grade/subject, they are between 0.4 and 0.7. If you’re not used to interpreting correlation coefficients, consider that only around one-third of teachers were in the same quintile (fifth) on both tests, and another 40 or so percent were one quintile higher or lower. So, most teachers were within a quartile, about a quarter of teachers moved two or more quintiles, and a small percentage moved from top to bottom or vice-versa.
Although, as mentioned above, these findings are in line with prior research, it is worth remembering why this “instability” occurs (and what can be done about it). Read More »
The Center for American Progress (CAP) recently released a short report on whether teachers were leaving the profession due to reforms implemented during the Obama Administration, as some commentators predicted.
The authors use data from the Schools and Staffing Survey (SASS), a wonderful national survey of U.S. teachers, and they report that 70 percent of first-year teachers in 2007-08 were still teaching in 2011-12. They claim that this high retention of beginning teachers, along with the fact that most teachers in 2011-12 had five or more years of experience, show that “the teacher retention concerns were unfounded.”
This report raises a couple of important points about the debate over teacher retention during this time of sweeping reform.
Read More »
Virtually all discussions of teacher turnover focuses on teachers leaving schools and/or the profession. However, a recent working paper by Allison Atteberry, Susanna Loeb and James Wyckoff, which was presented at this month’s CALDER conference, reaches a very interesting conclusion using data from New York City: There is actually more movement within NYC schools than between them.*
Specifically, the authors show that, during the years for which they had data (1997-2002 and 2004-2010), over 50 percent of teachers in any given year exhibited some form of movement (including leaving the profession or switching schools), but two-thirds of these moves were within schools – i.e., teachers changing grades or subjects. Moreover, they find that these within-school moves, like those between-schools/professions, appear to have a negative impact on testing outcomes, one which is very modest but statistically discernible in both math and reading.
There are a couple of interesting points related to these main findings. Read More »
The Washington Post reports that parents and alumni of D.C.’s Dunbar High School have quietly been putting together a proposal to revitalize what the article calls “one of the District’s worst performing schools.”
Those behind the proposal are not ready to speak about it publicly, and details are still very thin, but the Post article reports that it calls for greater flexibility in hiring, spending and other core policies. Moreover, the core of the plan – or at least its most drastic element – is to make Dunbar a selective high school, to which students must apply and be accepted, presumably based on testing results and other performance indicators (the story characterizes the proposal as a whole with the term “autonomy”). I will offer no opinion as to whether this conversion, if it is indeed submitted to the District for consideration, is a good idea. That will be up to administrators, teachers, parents, and other stakeholders.
I am, however, a bit struck by two interrelated aspects of this story. The first is the unquestioned characterization of Dunbar as a “low performing” or “struggling” school. This fateful label appears to be based mostly on the school’s proficiency rates, which are indeed dismally low – 20 percent in math and 29 percent in reading. Read More »
One of the (many) education reform proposals that has received national attention over the past few years is “extended learning time” – that is, expanding the day and/or year to give students more time in school.
Although how schools use the time they have with students, of course, is not necessarily more or less important than how much time they have with those students, the proposal to expand the school day/year may have merit, particularly for schools and districts serving larger proportions of students who need to catch up. I have noticed that one of the motivations for the extended time push is the (correct) observation that the charter school models that have proven effective (at least by the standard of test score gains) utilize extended time.
On the one hand, this is a good example of what many (including myself) have long advocated – that the handful of successful charter school models can potentially provide a great deal of guidance for all schools, regardless of their governance structure. On the other hand, it is also important to bear in mind that many of the high-profile charter chains that receive national attention don’t just expand their school years by a few days or even a few weeks, as has been proposed in several states. In many cases, they extend it by months. Read More »
Teacher turnover – the rates at which teachers leave the profession and switch schools – is obviously a very important outcome in education. Although not all turnover is necessarily a “bad thing” – some teachers simply aren’t cut out for the job and leave voluntarily (or are fired) – unusually high turnover means that schools must replace large proportions of their workforces on an annual basis. This can have serious implications not only for the characteristics (e.g., experience) of schools’ teachers, but also for schools’ costs, cohesion and professional cultures.
According to the most recent national data (which are a few years old), annual public school teacher turnover is around 16 percent, of which roughly half leave the profession (“leavers”), and half switch schools (“movers”). Both categories are equally important from the perspective of individual schools, since they must replace teachers regardless of where they go. In some subsets of schools and among certain groups of teachers, however, turnover is considerably higher. For instance, among teachers with between 1-3 years of experience, turnover is almost 23 percent. Contrary to popular opinion, though, the relationship between school poverty (i.e., free/reduced-price lunch rates) and turnover isn’t straightforward, at least at the national level. Although schools serving larger proportions of lower-income students have a larger percentage of “movers” every year, they have a considerably lower proportion of “leavers” (in part due to retirement).
This national trend, of course, masks considerable inter-district variation. One example is the District of Columbia Public Schools (DCPS). Read More »