** Reprinted here in the Washington Post
The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.
Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name.” The Daily News followed up by publishing an op-ed that compares the Success Academies’ combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).
On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it’s also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don’t care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures. Read More »
A recent story in the Chicago Tribune notes that Illinois’ NCLB waiver plan sets lower targets for certain student subgroups, including minority and low-income students. This, according to the article, means that “Illinois students of different backgrounds no longer will be held to the same standards,” and goes on to quote advocates who are concerned that this amounts to lower expectations for traditionally lower-scoring groups of children.
The argument that expectations should not vary by student characteristics is, of course, valid and important. Nevertheless, as Chad Aldeman notes, the policy of setting different targets for different groups of students has been legally required since the enactment of NCLB, under which states must “give credit to lower-performing groups that demonstrate progress.” This was supposed to ensure, albeit with exceedingly crude measures, that schools weren’t punished due to the students they serve, and how far behind were those students upon entry into the schools.
I would take that a step further by adding two additional points. The first is quite obvious, and is mentioned briefly in the Tribune article, but too often is obscured in these kinds of conversations: Neither NCLB nor the waivers actually hold students to different standards. The cut scores above which students are deemed “proficient,” somewhat arbitrary though they may be, do not vary by student subgroup, or by any other factor within a given state. All students are held to the same exact standard. Read More »
At the end of February, the District of Columbia Council’s Education Committee held its annual hearing on the performance of the District’s Public Schools (DCPS). The hearing (full video is available here) lasted over four hours, and included discussion on a variety of topics, but there was, inevitably, a block of time devoted to the discussion of DCPS testing results (and these questions were the focus of the news coverage).
These exchanges between Council members and DCPS Chancellor Kaya Henderson focused particularly on the low-stakes Trial Urban District Assessment (TUDA).* Though it was all very constructive and not even remotely hostile, it’s fair to say that Ms. Henderson was grilled quite a bit (as is often the case at these kinds of hearings). Unfortunately, the arguments from both sides of the dais were fraught with the typical misinterpretations of TUDA, and I could not get past how tragic it is to see legislators question the superintendent of a large urban school district based on a misinterpretation of what the data mean – and to hear that superintendent respond based on the same flawed premises.
But what I really kept thinking — as I have before in similar contexts — was how effective Chancellor Henderson could have been in answering the Council’s questions had she chosen to interpret the data properly (and I still hold out hope that this will become the norm some day). So, let’s take a quick look at a few major arguments that were raised during the hearing, and how they might have been answered. Read More »
A couple of weeks ago, Michelle Rhee published an op-ed in the Washington Post speaking out against the so-called “opt out movement,” which encourages parents to refuse to let their children take standardized tests.
Personally, I oppose the “opt-out” phenomenon, but I also think it would be a mistake not to pay attention to its proponents’ fundamental issue – that standardized tests are potentially being misused and/or overused. This concern is legitimate and important. My sense is that “opting out” reflects a rather extreme version of this mindset, a belief that we cannot right the ship – i.e., we have gone so far and moved so carelessly with test-based accountability that there is no real hope that it can or will be fixed. This strikes me as a severe overreaction, but I understand the sentiment.
That said, while most of Ms. Rhee’s op-ed is the standard, reasonable fare, some of it is also laced with precisely the kind of misconceptions that contribute to the apprehensions not only of anti-testing advocates, but also among those of us who occupy a middle ground – i.e., favor some test-based accountability, but are worried about getting it right. Read More »
Last year, we published a post that included a very simple graphical illustration of what changes in cross-sectional proficiency rates or scores actually tell us about schools’ test-based effectiveness (basically nothing).
In reality, year-to-year changes in cross-sectional average rates or scores may reflect “real” improvement, at least to some degree, but, especially when measured at the school- or grade-level, they tend to be mostly error/imprecision (e.g., changes in the composition of the samples taking the test, measurement error and serious issues with converting scores to rates using cutpoints). This is why changes in scores often conflict with more rigorous indicators that employ longitudinal data.
In the aforementioned post, however, I wanted to show what the changes meant even if most of these issues disappeared magically. In this one, I would like to extend this very simple illustration, as doing so will hopefully help shed a bit more light on the common (though mistaken) assumption that effective schools or policies should generate perpetual rate/score increases.
Read More »
One of the purely presentational aspects that separates the new “generation” of CREDO charter school analyses from the old is that the more recent reports convert estimated effect sizes from standard deviations into a “days of learning” metric. You can find similar approaches in other reports and papers as well.
I am very supportive of efforts to make interpretation easier for those who aren’t accustomed to thinking in terms of standard deviations, so I like the basic motivation behind this. I do have concerns about this particular conversion — specifically, that it overstates things a bit — but I don’t want to get into that issue. If we just take CREDO’s “days of learning” conversion at face value, my primary, far more simple reaction to hearing that a given charter school sector’s impact is equivalent to a given number of additional “days of learning” is to wonder: Does this charter sector actually offer additional “days of learning,” in the form of longer school days and/or years?
This matters to me because I (and many others) have long advocated moving past the charter versus regular public school “horserace” and trying to figure out why some charters seem to do very well and others do not. Additional time is one of the more compelling observable possibilities, and while they’re not perfectly comparable, it fits nicely with the “days of learning” expression of effect sizes. Take New York City charter schools, for example. Read More »
A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data — consisting solely of changes in proficiency rates — were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.
Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.
So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now – not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions. Read More »
The recent release of the National Assessment of Educational Progress (NAEP) and the companion Trial Urban District Assessment (TUDA) was predictably exploited by advocates to argue for their policy preferences. This is a blatant misuse of the data for many reasons that I have discussed here many times before, and I will not repeat them.
I do, however, want to very quickly illustrate the emptiness of this pseudo-empirical approach – finding cross-sectional cohort increases in states/districts that have recently acted policies you support, and then using the increases as evidence that the policies “work.” For example, the recent TUDA results for the District of Columbia Public Schools (DCPS), where scores increased in all four grade/subject combinations, were immediately seized upon supporters of the reforms that have been enacted by DCPS as clear-cut evidence of the policy triumph. The celebrators included the usual advocates, but also DCPS Chancellor Kaya Henderson and the U.S. Secretary of Education Arne Duncan (there was even a brief mention by President Obama in his State of The Union speech).
My immediate reaction to this bad evidence was simple (though perhaps slightly juvenile) – find a district that had similar results under a different policy environment. It was, as usual, pretty easy: Los Angeles Unified School District (LAUSD). Read More »
The Washington Post reports that parents and alumni of D.C.’s Dunbar High School have quietly been putting together a proposal to revitalize what the article calls “one of the District’s worst performing schools.”
Those behind the proposal are not ready to speak about it publicly, and details are still very thin, but the Post article reports that it calls for greater flexibility in hiring, spending and other core policies. Moreover, the core of the plan – or at least its most drastic element – is to make Dunbar a selective high school, to which students must apply and be accepted, presumably based on testing results and other performance indicators (the story characterizes the proposal as a whole with the term “autonomy”). I will offer no opinion as to whether this conversion, if it is indeed submitted to the District for consideration, is a good idea. That will be up to administrators, teachers, parents, and other stakeholders.
I am, however, a bit struck by two interrelated aspects of this story. The first is the unquestioned characterization of Dunbar as a “low performing” or “struggling” school. This fateful label appears to be based mostly on the school’s proficiency rates, which are indeed dismally low – 20 percent in math and 29 percent in reading. Read More »
Having taken a look at several states’ school rating systems (see our posts on the systems in IN, OH, FL and CO), I thought it might be interesting to examine a system used by a group of charter schools – starting with the system used by charters in the District of Columbia. This is the third year the DC charter school board has released the ratings.
For elementary and middle schools (upon which I will focus in this post*), the DC Performance Management Framework (PMF) is a weighted index composed of: 40 percent absolute performance; 40 percent growth; and 20 percent what they call “leading indicators” (a more detailed description of this formula can be found in the second footnote).** The index scores are then sorted into one of three tiers, with Tier 1 being the highest, and Tier 3 the lowest.
So, these particular ratings weight absolute performance – i.e., how highly students score on tests – a bit less heavily than do most states that have devised their own systems, and they grant slightly more importance to growth and alternative measures. We might therefore expect to find a somewhat weaker relationship between PMF scores and student characteristics such as free/reduced price lunch eligibility (FRL), as these charters are judged less predominantly on the students they serve. Let’s take a quick look. Read More »
I write often (probably too often) about the difference between measures of school performance and student performance, usually in the context of school rating systems. The basic idea is that schools cannot control the students they serve, and so absolute performance measures, such as proficiency rates, are telling you more about the students a school or district serves than how effective it is in improving outcomes (which is better-captured by growth-oriented indicators).
Recently, I was asked a simple question: Can a school with very high absolute performance levels ever actually be considered a “bad school?”
This is a good question. Read More »
Recent events in Indiana and Florida have resulted in a great deal of attention to the new school rating systems that over 25 states are using to evaluate the performance of schools, often attaching high-stakes consequences and rewards to the results. We have published reviews of several states’ systems here over the past couple of years (see our posts on the systems in Florida, Indiana, Colorado, New York City and Ohio, for example).
Virtually all of these systems rely heavily, if not entirely, on standardized test results, most commonly by combining two general types of test-based measures: absolute performance (or status) measures, or how highly students score on tests (e.g., proficiency rates); and growth measures, or how quickly students make progress (e.g., value-added scores). As discussed in previous posts, absolute performance measures are best seen as gauges of student performance, since they can’t account for the fact that students enter the schooling system at vastly different levels, whereas growth-oriented indicators can be viewed as more appropriate in attempts to gauge school performance per se, as they seek (albeit imperfectly) to control for students’ starting points (and other characteristics that are known to influence achievement levels) in order to isolate the impact of schools on testing performance.*
One interesting aspect of this distinction, which we have not discussed thoroughly here, is the idea/possibility that these two measures are “in conflict.” Let me explain what I mean by that. Read More »
A few years ago, the U.S. Department of Education (USED) launched the School Improvement Grant (SIG) program, which is designed to award grants to “persistently low-achieving schools” to carry out one of four different intervention models.
States vary in how SIG-eligible schools are selected, but USED guidelines require the use of three basic types of indicators: absolute performance level (e.g., proficiency rates); whether schools were “making progress” (e.g., rate changes); and, for high schools, graduation rates (specifically, whether the rate is under 60 percent). Two of these measures – absolute performance and graduation rates – tell you relatively little about the actual performance of schools, as they depend heavily on the characteristics (e.g., income) of students/families in the neighborhood served by a given school. It was therefore pretty much baked into the rules that the schools awarded SIGs have tended to exhibit certain characteristics, such as higher poverty rates.
Over 800 schools were awarded “Tier 1” or “Tier 2” grants for the 2010-11 school year (“SIG Cohort One”). Let’s take a quick look at a couple of key characteristics of these schools, using data from USED and the National Center for Education Statistics. Read More »
One of the (many) factors that might help explain — or at least be associated with — the wide variation in charter schools’ test-based impacts is market share. That is, the proportion of students that charters serve in a given state or district. There are a few reasons why market share might matter.
For example, charter schools compete for limited resources, including private donations and labor (teachers), and fewer competitors means more resources. In addition, there are a handful of models that seem to get fairly consistent results no matter where they operate, and authorizers who are selective and only allow “proven” operators to open up shop might increase quality (at the expense of quantity). There may be a benefit to very slow, selective expansion (and smaller market share is a symptom of that deliberate approach).
One way to get a sense of whether market share might matter is simply to check the association between measured charter performance and coverage. It might therefore be interesting, albeit exceedingly simple, to use the recently-released CREDO analysis, which provides state-level estimates based on a common analytical approach (though different tests, etc.), for this purpose. Read More »
As noted in a nice little post over at Greater Greater Washington’s education blog, the District of Columbia Office of the State Superintendent of Education (OSSE) recently started releasing growth model scores for DC’s charter and regular public schools. These models, in a nutshell, assess schools by following their students over time and gauging their testing progress relative to similar students (they can also be used for individual teachers, but DCPS uses a different model in its teacher evaluations).
In my opinion, producing these estimates and making them available publicly is a good idea, and definitely preferable to the district’s previous reliance on changes in proficiency, which are truly awful measures (see here for more on this). It’s also, however, important to note that the model chosen by OSSE – a “median growth percentile,” or MGP model, produces estimates that have been shown to be at least somewhat more heavily associated with student characteristics than other types of models, such as value-added models proper. This does not necessarily mean the growth percentile models are “inaccurate” – there are good reasons, such as resources and more difficulty with teacher recruitment/retention, to believe that schools serving poorer students might be less effective, on average, and it’s tough to separate “real” effects from bias in the models.
That said, let’s take a quick look at this relationship using the DC MGP scores from 2011, with poverty data from the National Center for Education Statistics. Read More »