A 'Summary Opinion' Of The Hoxby NYC Charter School Study

Almost two years ago, a report on New York City charter schools rocked the education policy world. It was written by Hoover Institution scholar Caroline Hoxby with co-authors Sonali Murarka and Jenny Kang. Their primary finding was that:

On average, a student who attended a charter school for all of grades kindergarten through eight would close about 86 percent of the “Scarsdale-Harlem achievement gap” [the difference in scores between students in Harlem and those in the affluent NYC suburb] in math, and 66 percent of the achievement gap in English.

The headline-grabbing conclusion was uncritically repeated by most major news outlets, including the New York Post, which called the charter effects “off the charts," and the NY Daily News, which announced that, from that day forward, anyone who opposed charter schools was “fighting to block thousands of children from getting superior educations." A week or two later, Mayor Michael Bloomberg specifically cited the study in announcing that he was moving to expand the number of NYC charter schools. Even today, the report is often mentioned as primary evidence favoring the efficacy of charter schools.

I would like to revisit this study, but not as a means to relitigate the “do charters work?" debate. Indeed, I have argued previously that we spend too much time debating whether charter schools “work," and too little time asking why some few are successful. Instead, my purpose is to illustrate an important research maxim: Even well-designed, sophisticated analyses with important conclusions can be compromised by a misleading presentation of results.

Quick background on the study: Since most NYC charters are “oversubscribed," they hold blind admission lotteries. Put simply, Hoxby and co-authors are able to exploit this random selection process to compare “lotteried-in” students who attended charter schools with those who didn’t. In doing so, they can account for many of the differences among students (especially selection effects) that may influence achievement outcomes.

In a NEPC review of the study, economist Sean Reardon noted a couple of serious methodological issues with Hoxby’s research design. I won’t get into the details, but the biggest one boils down to the fact that students are only “lotteried-in” to charter schools once, rather than every year. But, since Hoxby’s model “followed” charters students for all the years after they were assigned to their schools (more specifically, the model used students’ prior achievement scores as a covariate), this acted to “dilute” the advantage of the initial random assignment – and probably means that Hoxby’s model overstated charter schools’ effects in grades four through eight. Reardon also notes that Hoxby’s failure to account for measurement error in students’ prior test scores may have also biased charter effects upward.

Despite these limitations (Hoxby subsequently addressed some of them), the NYC analysis still provides very strong causal evidence that, on average, students in oversubscribed NYC charter schools outperformed their regular public school peers in math and reading. It is an important finding, and it was also the conclusion of a subsequent analysis of the city’s charters by Stanford University researchers, who found significant, though lower, relative gains.

With that said, this report, specifically its executive summary, is also one of the more misleading I have read in a while, especially from somebody like Caroline Hoxby, who is a capable researcher with sophisticated skills. There are several problems with the way the results are presented, and I won’t get into all of them. Instead, this post will focus on a few blatant examples, with a focus on the introductory summary – the only section that most people read (unfortunately), and the part from which virtually all the media coverage was drawn.

Engineering the home run conclusion. Let’s start with the study’s major conclusion, quoted above, that nine years in a NYC charter school (K through grade eight) would, on average, eliminate most of the “Scarsdale-Harlem achievement gap." Virtually all media coverage used this finding; on occasion, you can still hear it today.

Now, it bears mentioning Hoxby doesn’t actually follow any student or group of students from kindergarten through grade eight (nine years). Actually, since her data are only for 2000-01 to 2007-08, we know for a fact that she does not have data for a single student that attended a NYC charter for nine straight years (K-8). She doesn’t report how many students in her dataset attended for eight straight years, but does note, in the technical report (released months later – see below) that only 25 percent of her sample has 6-8 years of “charter treatment." The majority of her sample is students with 3-5 years in a charter school (or less).

So, how did Hoxby come up with the “Scarsdale-Harlem” finding? Well, her models estimate an average single-year gain for charter students (most of whom have only a few years of “treatment”). Those one-year estimates are her primary results. She ignores them completely in the executive summary (and I mean that literally – she does not report the single-year gains until page 43 of the 85-page report).

Instead, she multiplies the single year gain (for math and reading separately) by nine years to produce a sensational talking point. It’s kind of like testing a new diet pill on a group of subjects, who take the pill for anywhere between one and 9-10 months, finding that they lose an average of ten pounds per month, and then launching an advertising campaign proclaiming that the pill will make people lose 120 pounds in a year.

In fairness, months after the report’s release, Hoxby and her co-authors replicated their analysis on students with different durations of charter treatment, and found that there are still large, cumulative effects among those students who have attended charters for 6-8 years. In other words, the annual effect of attending a charter schools does not necessarily depend on how long the student has been there.

Even so, the “Scarsdale-Harlem” extrapolation is a massive stretch. It ignores the aforementioned measurement error in test scores, as well as the fact that student achievement gains “fade out” between school years (which must be accounted for when adding up multiple-year progress).

It also fails to acknowledge that the long-term charter students (those in charters for 7-8 years) entered lotteries between 2002-03 and 2003-04, when there were only about a dozen charter schools in the city. It is inappropriate to use the results seen by this small group of students attending this tiny group of schools to draw conclusions about the nine-year cumulative gains produced by the entire population of NYC charter schools (to say nothing of charter schools throughout the nation). Yet that’s exactly what the report – and most of its press coverage - did.

There’s nothing wrong with giving non-technical readers a frame of reference with which to interpret results (for example, by expressing effect sizes in terms of average gaps between student subgroups). That's a good thing. But the nine-year “Scarsdale-Harlem” calculation is the absolute most extreme way to do this, especially when it is presented to a general audience in an introduction section without any disclaimer or caveat.

The proper move would have been to present the one-year gains in the executive summary (though Hoxby had to estimate separate models for K-3 and grades 4-8, the average annual gain across all grades was 0.09 standard deviations in math, 0.06 in reading), accompanied by an expression of these gains in terms of an achievement gap, and perhaps, with caution, over multiple years. But offering the nine-year extrapolation by itself – in the section of the report that most people don’t read past – is irresponsible and somewhat deceptive.

That the single-year estimates were the appropriate result is partially evident in the above-mentioned technical paper that accompanies the report (the version that might be submitted to an academic journal). The technical paper’s abstract and front-end presents the single-year estimates, and makes no mention of the “Scarsdale-Harlem” gap, which is presented much later in the paper, flanked by a bunch of caveats and robustness tests. That is, the authors offer the appropriate results in the introduction to the scholarly version of their paper, but not in the version intended for public consumption.

Misleading with simple descriptive statistics. One of the executive summary findings (which is not particularly relevant to achievement results) is that charter students are more likely to be black and less likely to be Asian or white than students in regular public schools. This is, I think, supposed to imply that charters serve more minority students than other schools in their neighborhoods. If so, the comparison is not quite appropriate.

Let me explain. The authors compare the racial composition of charter students to that of students throughout the whole city – not to that of students in the neighborhoods where the charters are located, which is the appropriate comparison (one that is made in neither the summary nor the body of the report). For example, NYC charter schools are largely concentrated in Harlem, central Brooklyn and the South Bronx, where regular public schools are predominantly non-white and non-Asian (just like the charters).

Comparing the racial distribution of charter students to that of students in the entire city (including all the neighborhoods with a lower concentration of minorities) doesn’t really tell us what we want to know: Whether charters are serving more minority students than the regular public schools from which they are drawing most of their applicants, and to which they are being compared in this analysis. This comparison is, of course, a bit more difficult to make, but it’s absolutely necessary, especially in a city as large and diverse as New York.

Selective presentation of results. In contrast to the “Scarsdale-Harlem” stretch, one of the report’s important results is not even mentioned in its executive summary. The authors find that charter schools have no statistically discernible effect on social studies and science scores. These findings were simply ignored in the summary, and therefore received almost no coverage in newspapers and other information outlets. Given all the discussion (fair or unfair) about some charters’ over-focusing on math and reading tests, the science and social studies results are definitely relevant to our public debate.

Now, in fairness, you might argue that the results of the state’s math and reading tests were the only ones that “belonged” in the key findings section, since these tests, by their design, are more appropriate for use in this type of analysis (one might also point out that the samples for these tests are smaller, which means that the estimates are less precise). The problem is that the authors do present the results of other tests (with smaller samples) in their executive summary (e.g., the NY Regents exams), as well as other outcomes, such as graduation rates. The only results that they leave out are for social studies and science, the two subject areas in which charters did not show an advantage.

Ignoring statistical significance of graduation results. Omitting the social studies and science results is questionable enough, but to make matters worse, the authors present (again, in their executive summary) the finding that, for each year students attend a charter high school, they are seven percent more likely to graduate than are comparable students in regular public schools. Much of the coverage of the report brought up this “finding," along with the “Scarsdale-Harlem” talking point, implying that charter benefits extend far beyond math and reading scores in grades 3-8. What the authors fail to mention in their summary is that this result is not statistically significant at any conventional level.

In the technical paper, Hoxby notes that “since the estimated effect [of charter schools on graduation] is very imprecise, we do not place much reliance on this magnitude." In the report released for public consumption, however, she reports the graduation results in the executive summary without any caveat.

The more proper interpretation would have been to say that the results show that charter school students are no more or less likely to graduate than their regular public school counterparts (or, perhaps, reporting the seven percent result while strongly cautioning readers that it is not statistically significant, and that more data are needed to confirm the effect size). Even in an executive summary, it is a flat-out disingenuous to present it without any disclaimer or qualification, as it fostered the incorrect impression that the analysis demonstrated conclusively that NYC charters boost graduation rates. By the conventions of social science, it did not.

Overstating the “coverage” of the analysis. The authors begin the executive summary by claiming that 94 percent of New York’s charter school students participated in lotteries, which implies that their results reflect the performance of almost all of the city’s charter students. This is again somewhat misleading. Going through the various exclusions (such as the fact that NYC charters that are not “oversubscribed” and those that opened recently are omitted completely, and that the researchers were not able to match all students in their database), I estimate that as many as 20-30 percent of the city’s charter school students were excluded from this study.

Hoxby et al. do present results of tests designed to see whether some of their excluded students are different from those included (e.g., whether the students they were able to match performed differently from those they were not able to match). Nevertheless, many (if not most) press stories about the report repeated the 94 percent coverage figure, which suggests (incorrectly) that virtually every NYC charter student was included in the analysis.

Late documentation. This is not really a methodological issue, but it’s worth noting. Caroline Hoxby’s team issued two reports. One of them was the glossy, non-technical report that I have been discussing. It was specifically intended for public consumption. The other report, mentioned briefly above, was the “technical paper” that described in more detail the methods and data that were used. That there were “dual reports” is, of course, very common. What was strange, and distressingly so, was that the technical paper was released several months after the glossy report.

Instead, at the time of the glossy report’s release, the authors referred readers to a previous version of the technical paper, which was written two years earlier, in 2007 (an update of a previous analysis, the new report included new data for 2006-07 and 2007-08). According to Hoxby, the methods for both technical papers were the same, which I suppose was the justification for not reissuing the new technical paper at the same time.

The problem was that the old technical paper omitted several key tabulations and tables, which would have allowed independent researchers to properly check and interpret the findings. One can only speculate as to why Hoxby’s team rushed out the glossy report without having finished their updates to the technical paper, but it is a poor practice that might have exacerbated the misinterpretations in the press coverage of the report. In the world of education research, this is certainly not the first time that data have been released after an early publicity campaign, but it is nevertheless disappointing for a researcher of Hoxby’s caliber.

***

In addition to the above, Hoxby’s team somewhat overstates the power of random assignment in canceling out all the difficult issues involved in charter school studies, including: peer effects from charter attrition without replacement; the “roll-up” admissions process of many NYC charters, in which they accept students only at one entrance grade; and whether the effects of “charter treatment” on applicants (those who participated in lotteries) would be different for non-applicants (those students who didn’t apply at all). There are also a bunch of (very common and often unavoidable) remaining questions, such as how the measurement of student background variables might have influenced results somewhat (e.g., there is no distinction made between free and reduced price lunch, or between different classifications of special education students), or the fact that many NYC charters get extra funding from private donations (and that they have, on average, 30 percent more school time than the city’s regular public schools).

Nevertheless, once again, it is absolutely fair to say that their results do represent powerful evidence that the small group of about 75 NYC oversubscribed charter schools included in the analysis produced larger math and reading gains than did the city’s regular public schools, and that this effect was causal. By the way, the comparison students in regular public schools also made statistically significant gains, on average; they were just much smaller than those among charter students.

(Side note: On the question of why some charters do better than others, this study provides a rare and useful glimpse into the associations between different policies/practices and performance, which offered support for one that I have discussed previously: school time.)

But the presentation of some of the results in the “glossy” report was so misleading as to be almost untrue – especially if one only read the executive summary (as most reporters seemed to do). The casual reader – of both the press accounts and the report’s executive summary – would be excused for believing that all or even most NYC charter schools are educating thousands of poor Harlem students up to the level of their wealthy Scarsdale peers. Sadly, this is simply not the case (or, at least, it has not been demonstrated to be so by this analysis). This reader might also conclude that charter school students are significantly more likely to be minority and much more likely to graduate than are their counterparts in neighborhood schools. Again, neither conclusion is warranted by these results. Finally, our reader would remain completely unaware that NYC charters seem to produce no statistically significant advantage in science and social studies, which is an important finding by any standard.

As a whole, this is a somewhat more nuanced picture than, for example, that painted by the NY Daily News’ proclamation that opposing charter schools is tantamount to stealing a quality education from poor, minority children.

Don’t get me wrong – this is not to excuse the fact that many active participants in our education debates (including journalists) don’t bother to read beyond summaries and abstracts, and there is a clear case to be made that all of us have the responsibility to do so. After all, by definition, summaries and abstracts are a superficial treatment of the subject matter (Hoxby et al. do discuss, in the body of the report, some of the issues mentioned above). But, knowing that this is not what the average reader will do, researchers need to take special care in writing these synopses – especially in situations like this one, where reports deal with controversial subjects that are basically guaranteed a lot of attention, and are specifically written to be “accessible” to policymakers and the general public.

Instead, most of “key findings” presented in the summary were at least somewhat misleading, even the most simplistic descriptive results (e.g., the racial comparisons). As a consequence, this important study, which was potentially very useful in informing policy, was – and still is – often oversimplified and/or misinterpreted.

At some point in the near future, Hoxby’s team will release the next round of results for NYC charter schools (which will, presumably, again add new data and replicate all of their analyses). The findings in that report, especially in the introductory section, should not be taken at face value. Every conclusion should be examined critically by trained eyes, and journalists and policymakers especially should seek out expert help in interpreting the results. Well-done analyses with causal conclusions are rare in education research, we should be sure to interpret their findings correctly and completely.

- Matt Di Carlo

Blog Topics

Was the study every peer-reviewed? If not, why not? And how can we extract from Hoxby's findings how much of the impact may have been due to peer effects?

Hi Leonie,

I'm not certain, but I would strongly suspect that the paper is either undergoing peer review right now, or has already been through it. I'm curious myself how it turned out (or will turn out).

As for peer effects, they are, as I note in the post (seventh to last paragraph), always difficult to isolate, even when treatment is randomly assigned. I would speculate that some (but not nearly all) of the positive charter testing gains are attributable to peer effects, but it's tough to say.

Thanks for the comment,
MD

Hi MDC,
Interestingly, this is not the first time that Hoxby's results have been questioned. In 2000, she released a paper asserting that increasing school choice improves educational outcomes for all students, to which Jesse Rothstein (currently Associate Professor of Public Policy and Economics at UC Berkeley) released a comment detailing her flawed methodology and falsification of data sets. Unable to replicate her results using her own code, he argued that her results were largely invalid. (View the paper here: http://www.ers.princeton.edu/workingpapers/10ers.pdf)
Regardless of Hoxby prestigious record, it does make one doubtful of her findings. I think your rebuttal of this important and influential report is extremely valuable to this discourse.
Anna YW

Generally a good review of some of these issues here.

One note, however. You be clear that you are reporting on gains in scores on particular math and reading tests. That might or might not be the same as performance as measured by other standardized tests or by individualized professional judgement by a knowledgeable expert.

The goal should not be simply higher scores on tests. The goal should be true mastery of the curricular content.

I have no reason to believe that Hoxby would ever made not of that potential difference. But I'll bet that Di Carlo will.