One of the more visible manifestations of what I have called “informal test-based accountability” — that is, how testing results play out in the media and public discourse — is the phenomenon of superintendents, particularly big city superintendents, making their reputations based on the results during their administrations.
In general, big city superintendents are expected to promise large testing increases, and their success or failure is to no small extent judged on whether those promises are fulfilled. Several superintendents almost seem to have built entire careers on a few (misinterpreted) points in proficiency rates or NAEP scale scores. This particular phenomenon, in my view, is rather curious. For one thing, any district leader will tell you that many of their core duties, such as improving administrative efficiency, communicating with parents and the community, strengthening districts’ financial situation, etc., might have little or no impact on short-term testing gains. In addition, even those policies that do have such an impact often take many years to show up in aggregate results.
In short, judging superintendents based largely on the testing results during their tenures seems misguided. A recent report issued by the Brown Center at Brookings, and written by Matt Chingos, Grover Whitehurst and Katharine Lindquist, adds a little bit of empirical insight to this viewpoint. Read More »
** Reprinted here in the Washington Post
The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.
Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name.” The Daily News followed up by publishing an op-ed that compares the Success Academies’ combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).
On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it’s also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don’t care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures. Read More »
Our guest author today is Jennifer Borgioli, a Senior Consultant with Learner-Centered Initiatives, Ltd., where she supports schools with designing performance based assessments, data analysis, and curriculum design.
The chart below was taken from the 2014 report on student performance on the Grades 3-8 tests administered by the New York State Department of Education.
Based on this chart, which of the following statements is the most accurate?
A. “64 percent of 8th grade students failed the ELA test”
B. “36 percent of 8th graders are at grade level in reading and writing”
C. “36 percent of students meet or exceed the proficiency standard (Level 3 or 4) on the Grade 8 CCLS-aligned math test”
Read More »
The Washington Post reports on an issue that we have discussed here on many occasions: The incompleteness of the testing results released annually by the District of Columbia Public Schools (DCPS), or, more accurately, the Office of the State Superintendent of Education (OSSE), which is responsible for testing in DC schools.
Here’s the quick backstory: For the past 7-8 years or so, DCPS/OSSE have not released a single test score for the state assessment (the DC-CAS). Instead, they have released only the percentage of students whose scores meet the designated cutoff points for the NCLB-style categories of below basic, basic, proficient and advanced. I will not reiterate all of the problems with these cutpoint-based rates and how they serve to distort the underlying data, except to say that they are by themselves among the worst ways to present these data, and there is absolutely no reason why states and districts should not release both rates and average scale scores.
The Post reports, however, that one organization — the Broader, Bolder Approach to Education — was able to obtain the actual scale score data (by subgroup and grade) for 2010-2013, and that this group published a memo-style report alleging that DCPS’ public presentation of their testing results over the past few years has been misleading. I had a mixed reaction to this report and the accompanying story. Read More »
There is an ongoing debate about widespread administration of standardized tests to kindergartners. This is of course a serious decision. My personal opinion about whether this is a good idea depends on several factors, such as how good the tests will be and, most importantly, how the results will be used (and I cannot say that I am optimistic about the latter).
Although the policy itself must be considered seriously on its merits, there is one side aspect of testing kindergarteners that fascinates me: It would demonstrate how absurd it is to judge school performance, as does NCLB, using absolute performance levels – i.e., how highly students score on tests, rather than their progress over time.
Basically, the kindergarten tests would inevitably shake out the same way as those administered in later grades. Schools and districts serving more disadvantaged students would score substantially lower than their counterparts in more affluent areas. If the scores were converted to proficiency rates or similar cut-score measures, they would show extremely low pass rates in urban districts such as Detroit. Read More »
A recent story in the Chicago Tribune notes that Illinois’ NCLB waiver plan sets lower targets for certain student subgroups, including minority and low-income students. This, according to the article, means that “Illinois students of different backgrounds no longer will be held to the same standards,” and goes on to quote advocates who are concerned that this amounts to lower expectations for traditionally lower-scoring groups of children.
The argument that expectations should not vary by student characteristics is, of course, valid and important. Nevertheless, as Chad Aldeman notes, the policy of setting different targets for different groups of students has been legally required since the enactment of NCLB, under which states must “give credit to lower-performing groups that demonstrate progress.” This was supposed to ensure, albeit with exceedingly crude measures, that schools weren’t punished due to the students they serve, and how far behind were those students upon entry into the schools.
I would take that a step further by adding two additional points. The first is quite obvious, and is mentioned briefly in the Tribune article, but too often is obscured in these kinds of conversations: Neither NCLB nor the waivers actually hold students to different standards. The cut scores above which students are deemed “proficient,” somewhat arbitrary though they may be, do not vary by student subgroup, or by any other factor within a given state. All students are held to the same exact standard. Read More »
A recent story in the New York Times reports that, according to an Obama Administration-commissioned panel, the measures being used to evaluate the performance of healthcare providers are unfairly penalizing those that serve larger proportions of disadvantaged patients (thanks to Mike Petrilli for sending me the article). For example, if you’re grading hospitals based on simple, unadjusted re-admittance rates, it might appear as if hospitals serving high poverty populations are doing worse — even if the quality of their service is excellent — since readmissions are more likely for patients who can’t afford medication, or aren’t able to take off from work, or don’t have home support systems.
The panel recommended adjusting the performance measures, which, for instance, are used for Medicare reimbursement, using variables such as patient income and education, as this would provide a more fair accountability system – one that does not penalize healthcare institutions and their personnel for factors that are out of their control.
There are of course very strong, very obvious parallels here to education accountability policy, in which schools are judged in part based on raw proficiency rates that make no attempt to account for differences in the populations of students in different schools. The comparison also reveals an important feature of formal accountability systems in other policy fields. Read More »
At the end of February, the District of Columbia Council’s Education Committee held its annual hearing on the performance of the District’s Public Schools (DCPS). The hearing (full video is available here) lasted over four hours, and included discussion on a variety of topics, but there was, inevitably, a block of time devoted to the discussion of DCPS testing results (and these questions were the focus of the news coverage).
These exchanges between Council members and DCPS Chancellor Kaya Henderson focused particularly on the low-stakes Trial Urban District Assessment (TUDA).* Though it was all very constructive and not even remotely hostile, it’s fair to say that Ms. Henderson was grilled quite a bit (as is often the case at these kinds of hearings). Unfortunately, the arguments from both sides of the dais were fraught with the typical misinterpretations of TUDA, and I could not get past how tragic it is to see legislators question the superintendent of a large urban school district based on a misinterpretation of what the data mean – and to hear that superintendent respond based on the same flawed premises.
But what I really kept thinking — as I have before in similar contexts — was how effective Chancellor Henderson could have been in answering the Council’s questions had she chosen to interpret the data properly (and I still hold out hope that this will become the norm some day). So, let’s take a quick look at a few major arguments that were raised during the hearing, and how they might have been answered. Read More »
Our guest author today is John McCrann, a Math teacher and experiential educator at Harvest Collegiate High School in New York City. John is a member of the America Achieves Fellowship, Youth Opportunities Program, and Teacher Leader Study Group. He tweets at @JohnTroutMcCran.
New York City’s third through eighth graders are in the middle of state tests, and many of our city’s citizens have taken strong positions on the value (or lack thereof) of these assessments. The protests, arguments and activism surrounding these tests remind me of a day when I was a substitute civics teacher during summer school. “I need help,” Charlotte said as she approached my desk, “what is democracy?”
On that day, my mind flashed to a scene I witnessed outside the White House in the spring of 2003. On one side of the fence, protestors shouted: “Show me what democracy looks like! This is what democracy looks like!” On the other side worked an administration who had invaded another country in an effort to “expand democracy.” Passionate, bright people on both sides of that fence believed in the idea that Charlotte was asking about, but came to very different conclusions about how to enact the concept. Read More »
A couple of weeks ago, Michelle Rhee published an op-ed in the Washington Post speaking out against the so-called “opt out movement,” which encourages parents to refuse to let their children take standardized tests.
Personally, I oppose the “opt-out” phenomenon, but I also think it would be a mistake not to pay attention to its proponents’ fundamental issue – that standardized tests are potentially being misused and/or overused. This concern is legitimate and important. My sense is that “opting out” reflects a rather extreme version of this mindset, a belief that we cannot right the ship – i.e., we have gone so far and moved so carelessly with test-based accountability that there is no real hope that it can or will be fixed. This strikes me as a severe overreaction, but I understand the sentiment.
That said, while most of Ms. Rhee’s op-ed is the standard, reasonable fare, some of it is also laced with precisely the kind of misconceptions that contribute to the apprehensions not only of anti-testing advocates, but also among those of us who occupy a middle ground – i.e., favor some test-based accountability, but are worried about getting it right. Read More »
Last year, we published a post that included a very simple graphical illustration of what changes in cross-sectional proficiency rates or scores actually tell us about schools’ test-based effectiveness (basically nothing).
In reality, year-to-year changes in cross-sectional average rates or scores may reflect “real” improvement, at least to some degree, but, especially when measured at the school- or grade-level, they tend to be mostly error/imprecision (e.g., changes in the composition of the samples taking the test, measurement error and serious issues with converting scores to rates using cutpoints). This is why changes in scores often conflict with more rigorous indicators that employ longitudinal data.
In the aforementioned post, however, I wanted to show what the changes meant even if most of these issues disappeared magically. In this one, I would like to extend this very simple illustration, as doing so will hopefully help shed a bit more light on the common (though mistaken) assumption that effective schools or policies should generate perpetual rate/score increases.
Read More »
A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data — consisting solely of changes in proficiency rates — were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.
Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.
So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now – not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions. Read More »
When looking at changes in testing results between years, many people are (justifiably) interested in comparing those changes for different student subgroups, such as those defined by race/ethnicity or income (subsidized lunch eligibility). The basic idea is to see whether increases are shared between traditionally advantaged and disadvantaged groups (and, often, to monitor achievement gaps).
Sometimes, people take this a step further by using the subgroup breakdowns as a crude check on whether cross-sectional score changes are due to changes in the sample of students taking the test. The logic is as follows: If the increases are found when comparing advantaged and more disadvantaged cohorts, then an overall increase cannot be attributed to a change in the backgrounds of students taking the test, as the subgroups exhibited the same pattern. (For reasons discussed here many times before, this is a severely limited approach.)
Whether testing data are cross-sectional or longitudinal, these subgroup breakdowns are certainly important and necessary, but it’s wise to keep in mind that standard variables, such as eligibility for free and reduced-price lunches (FRL), are imperfect proxies for student background (actually, FRL rates aren’t even such a great proxy for income). In fact, one might reach different conclusions depending on which variables are chosen. To illustrate this, let’s take a look at results from the Trial Urban District Assessment (TUDA) for the District of Columbia Public Schools between 2011 and 2013, in which there was a large overall score change that received a great deal of media attention, and break the changes down by different characteristics.
Read More »
The recent release of the National Assessment of Educational Progress (NAEP) and the companion Trial Urban District Assessment (TUDA) was predictably exploited by advocates to argue for their policy preferences. This is a blatant misuse of the data for many reasons that I have discussed here many times before, and I will not repeat them.
I do, however, want to very quickly illustrate the emptiness of this pseudo-empirical approach – finding cross-sectional cohort increases in states/districts that have recently acted policies you support, and then using the increases as evidence that the policies “work.” For example, the recent TUDA results for the District of Columbia Public Schools (DCPS), where scores increased in all four grade/subject combinations, were immediately seized upon supporters of the reforms that have been enacted by DCPS as clear-cut evidence of the policy triumph. The celebrators included the usual advocates, but also DCPS Chancellor Kaya Henderson and the U.S. Secretary of Education Arne Duncan (there was even a brief mention by President Obama in his State of The Union speech).
My immediate reaction to this bad evidence was simple (though perhaps slightly juvenile) – find a district that had similar results under a different policy environment. It was, as usual, pretty easy: Los Angeles Unified School District (LAUSD). Read More »
The Washington Post reports that parents and alumni of D.C.’s Dunbar High School have quietly been putting together a proposal to revitalize what the article calls “one of the District’s worst performing schools.”
Those behind the proposal are not ready to speak about it publicly, and details are still very thin, but the Post article reports that it calls for greater flexibility in hiring, spending and other core policies. Moreover, the core of the plan – or at least its most drastic element – is to make Dunbar a selective high school, to which students must apply and be accepted, presumably based on testing results and other performance indicators (the story characterizes the proposal as a whole with the term “autonomy”). I will offer no opinion as to whether this conversion, if it is indeed submitted to the District for consideration, is a good idea. That will be up to administrators, teachers, parents, and other stakeholders.
I am, however, a bit struck by two interrelated aspects of this story. The first is the unquestioned characterization of Dunbar as a “low performing” or “struggling” school. This fateful label appears to be based mostly on the school’s proficiency rates, which are indeed dismally low – 20 percent in math and 29 percent in reading. Read More »