In his State of the City address last month, New York City Mayor Michael Bloomberg made some brief comments about the upcoming adoption of new assessments aligned with the Common Core State Standards (CCSS), including the following statement:
But no matter where the definition of proficiency is arbitrarily set on the new tests, I expect that our students’ progress will continue outpacing the rest of the State’s[,] the only meaningful measurement of progress we have.
On the surface, this may seem like just a little bit of healthy bravado. But there are a few things about this single sentence that struck me, and it also helps to illustrate an important point about the relationship between standards and testing results. Read More »
New York City has just released the new round of results from its school rating system (they’re called “progress reports”). It relies considerably more on student growth (60 out of 100 points) than absolute performance (25 points), and there are efforts to partially adjust most of the measures via peer group comparisons.*
All of this indicates that the city’s system is more focused on school rather than student test-based performance, compared with many other systems around the U.S.
The ratings are high-stakes. Schools receiving low grades – a D or F in any given year, or a C for three consecutive years – enter a review process by which they might be closed. The number of schools meeting these criteria jumped considerably this year.
There is plenty of controversy to go around about the NYC ratings, much of it pertaining to two important features of the system. They’re worth discussing briefly, as they are also applicable to systems in other states. Read More »
Earlier this summer, the New York City Independent Budget Office (IBO) presented findings from a longitudinal analysis of NYC student performance. That is, they followed a cohort of over 45,000 students from third grade in 2005-06 through 2009-10 (though most results are 2005-06 to 2008-09, since the state changed its definition of proficiency in 2009-10).
The IBO then simply calculated the proportion of these students who improved, declined or stayed the same in terms of the state’s cutpoint-based categories (e.g., Level 1 ["below basic" in NCLB parlance], Level 2 [basic], Level 3 [proficient], Level 4 [advanced]), with additional breakdowns by subgroup and other variables.
The short version of the results is that almost two-thirds of these students remained constant in their performance level over this time period – for instance, students who scored at Level 2 (basic) in third grade in 2006 tended to stay at that level through 2009; students at the “proficient” level remained there, and so on. About 30 percent increased a category over that time (e.g., going from Level 1 to Level 2).
The response from the NYC Department of Education (NYCDOE) was somewhat remarkable. It takes a minute to explain why, so bear with me. Read More »
There have now been several stories in the New York news media about New York City’s charter schools’ “gains” on this year’s state tests (see here, here, here, here and here). All of them trumpeted the 3-7 percentage point increase in proficiency among the city’s charter students, compared with the 2-3 point increase among their counterparts in regular public schools. The consensus: Charters performed fantastically well this year.
In fact, the NY Daily News asserted that the “clear lesson” from the data is that “public school administrators must gain the flexibility enjoyed by charter leaders,” and “adopt [their] single-minded focus on achievement.” For his part, Mayor Michael Bloomberg claimed that the scores are evidence that the city should expand its charter sector.
All of this reflects a fundamental misunderstanding of how to interpret testing data, one that is frankly a little frightening to find among experienced reporters and elected officials.
Read More »
In a New York Times article a couple of weeks ago, reporter Michael Winerip discusses New York City’s school report card grades, with a focus on an issue that I have raised many times – the role of absolute performance measures (i.e., how highly students scores) in these systems, versus that of growth measures (i.e., whether students are making progress).
Winerip uses the example of two schools – P.S. 30 and P.S. 179 – one of which (P.S. 30) received an A on this year’s report card, while the other (P.S. 179) received an F. These two schools have somewhat similar student populations, at least so far as can be determined using standard education variables, and their students are very roughly comparable in terms of absolute performance (e.g., proficiency rates). The basic reason why one received an A and the other an F is that P.S. 179 received a very low growth score, and growth is heavily weighted in the NYC grade system (representing 60 out of 100 points for elementary and middle schools).
I have argued previously that unadjusted absolute performance measures such as proficiency rates are inappropriate for test-based assessments of schools’ effectiveness, given that they tell you almost nothing about the quality of instruction schools provide, and that growth measures are the better option, albeit one that also has its own issues (e.g., they are more unstable), and must be used responsibly. In this sense, the weighting of the NYC grading system is much more defensible than most of its counterparts across the nation, at least in my view.
But the system is also an example of how details matter – each school’s growth portion is calculated using an unconventional, somewhat questionable approach, one that is, as yet, difficult to treat with a whole lot of confidence. Read More »
In a recent story, the New York Daily News uses the recently-released teacher data reports (TDRs) to “prove” that the city’s charter school teachers are better than their counterparts in regular public schools. The headline announces boldly: New York City charter schools have a higher percentage of better teachers than public schools (it has since been changed to: “Charters outshine public schools”).
Taking things even further, within the article itself, the reporters note, “The newly released records indicate charters have higher performing teachers than regular public schools.”
So, not only are they equating words like “better” with value-added scores, but they’re obviously comfortable drawing conclusions about these traits based on the TDR data.
The article is a pretty remarkable display of both poor journalism and poor research. The reporters not only attempted to do something they couldn’t do, but they did it badly to boot. It’s unfortunate to have to waste one’s time addressing this kind of thing, but, no matter your opinion on charter schools, it’s a good example of how not to use the data that the Daily News and other newspapers released to the public. Read More »
Late last week and over the weekend, New York City newspapers, including the New York Times and Wall Street Journal, published the value-added scores (teacher data reports) for thousands of the city’s teachers. Prior to this release, I and others argued that the newspapers should present margins of error along with the estimates. To their credit, both papers did so.
In the Times’ version, for example, each individual teacher’s value-added score (converted to a percentile rank) is presented graphically, for math and reading, in both 2010 and over a teacher’s “career” (averaged across previous years), along with the margins of error. In addition, both papers provided descriptions and warnings about the imprecision in the results. So, while the decision to publish was still, in my personal view, a terrible mistake, the papers at least make a good faith attempt to highlight the imprecision.
That said, they also published data from the city that use teachers’ value-added scores to label them as one of five categories: low, below average, average, above average or high. The Times did this only at the school level (i.e., the percent of each school’s teachers that are “above average” or “high”), while the Journal actually labeled each individual teacher. Presumably, most people who view the databases, particularly the Journal’s, will rely heavily on these categorical ratings, as they are easier to understand than percentile ranks surrounded by error margins. The inherent problems with these ratings are what I’d like to discuss, as they illustrate important concepts about estimation error and what can be done about it. Read More »
It seems as though New York City newspapers are going to receive the value-added scores of the city’s public school teachers, and publish them in an online database, as was the case in Los Angeles.*
In my opinion, the publication will not only serve no useful purpose educationally, but it is also a grossly unfair infringement on the privacy of teachers. I have also argued previously that putting the estimates online may serve to bias future results by exacerbating the non-random assignment of students to teachers (parents requesting [or not requesting] specific teachers based on published ratings), though it’s worth noting that the city is now using a different model.
That said, I don’t think there’s any way to avoid publication, given that about a dozen newspapers will receive the data, and it’s unlikely that every one of them will decline to do so. So, in addition to expressing my firm opposition, I would offer what I consider to be an absolutely necessary suggestion: If newspapers are going to publish the estimates, they need to publish the error margins too. Read More »
Gotham Schools reports that the New York City Department of Education rolled out this year’s school report card grades by highlighting the grades’ stability between this year and last. That is, they argued that schools’ grades were roughly the same between years, which is supposed to serve as evidence of the system’s quality.
The city’s logic here is generally sound. As I’ve noted before, most schools don’t undergo drastic changes in their operations over the course of a year, and so fluctuations in grades among a large number of schools might serve as a warning sign that there’s something wrong with the measures being used. Conversely, it’s not unreasonable to expect from a high-quality rating system that, over a two-year period, some schools would get higher grades and some lower, but that most would stay put. That was the city’s argument this year.
The only problem is that this wasn’t really the case. Read More »
Every year, around this time, states and districts throughout the nation release their official testing results. Schools are closed and reputations are made or broken by these data. But this annual tradition is, in some places, becoming a charade.
Most states and districts release two types of assessment data every year (by student subgroup, school and grade): Average scores (“scale scores”); and the percent of students who meet the standards to be labeled proficient, advanced, basic and below basic. The latter type – the rates – are of course derived from the scores – that is, they tell us the proportion of students whose scale score was above the minimum necessary to be considered proficient, advanced, etc.
Both types of data are cross-sectional. They don’t follow individual students over time, but rather give a “snapshot” of aggregate performance among two different groups of students (for example, third graders in 2010 compared with third graders in 2011). Calling the change in these results “progress” or “gains” is inaccurate; they are cohort changes, and might just as well be chalked up to differences in the characteristics of the students (especially when changes are small). Even averaged across an entire school or district, there can be huge differences in the groups compared between years – not only is there often considerable student mobility in and out of schools/districts, but every year, a new cohort enters at the lowest tested grade, while a whole other cohort exits at the highest tested grade (except for those retained).
For these reasons, any comparisons between years must be done with extreme caution, but the most common way – simply comparing proficiency rates between years – is in many respects the worst. A closer look at this year’s New York City results illustrates this perfectly. Read More »
At a press conference earlier this week, New York City Mayor Michael Bloomberg announced the city’s 2011 test results. Wall Street Journal reporter Lisa Fleisher, who was on the scene, tweeted Mayor Bloomberg’s remarks. According to Fleisher, the mayor claimed that there was a “dramatic difference” between his city’s testing progress between 2010 and 2011, as compared with the rest of state.
Putting aside the fact that the results do not measure “progress” per se, but rather cohort changes – a comparison of cross-sectional data that measures the aggregate performance of two different groups of students – I must say that I was a little astounded by this claim. Fleisher was also kind enough to tweet a photograph that the mayor put on the screen in order to illustrate the “dramatic difference” between the gains of NYC students relative to their non-NYC counterparts across the state. Here it is: Read More »
Almost two years ago, a report on New York City charter schools rocked the education policy world. It was written by Hoover Institution scholar Caroline Hoxby with co-authors Sonali Murarka and Jenny Kang. Their primary finding was that:
On average, a student who attended a charter school for all of grades kindergarten through eight would close about 86 percent of the “Scarsdale-Harlem achievement gap” [the difference in scores between students in Harlem and those in the affluent NYC suburb] in math, and 66 percent of the achievement gap in English.
The headline-grabbing conclusion was uncritically repeated by most major news outlets, including the New York Post, which called the charter effects “off the charts,” and the NY Daily News, which announced that, from that day forward, anyone who opposed charter schools was “fighting to block thousands of children from getting superior educations.” A week or two later, Mayor Michael Bloomberg specifically cited the study in announcing that he was moving to expand the number of NYC charter schools. Even today, the report is often mentioned as primary evidence favoring the efficacy of charter schools.
I would like to revisit this study, but not as a means to relitigate the “do charters work?” debate. Indeed, I have argued previously that we spend too much time debating whether charter schools “work,” and too little time asking why some few are successful. Instead, my purpose is to illustrate an important research maxim: Even well-designed, sophisticated analyses with important conclusions can be compromised by a misleading presentation of results. Read More »
As first reported by the New York Times, the New York City Department of Education released a dataset this past Sunday, which lists the number of potential teacher layoffs that would occur in each school absent a budget infusion.
Layoffs are a terrible thing for schools and students, and this list is sobering. But the primary impetus for releasing for this dataset appears to be the city’s ongoing push to end so-called seniority-based layoffs, and its support for seniority-ending legislation that is now making its way through the state legislature. One of the big talking points on this issue has always been that layoffs that take experience into account would hurt high-poverty schools the most, because these schools tend to have the least experienced teachers. As I discussed in a prior post, Michelle Rhee is making this argument everywhere she goes, and it was one of the primary themes in a new report by the New Teacher Project (released last week). Although I have not heard city officials use the argument since the database was released over the weekend, similar assertions have very recently been made by Mayor Bloomberg, former Chancellor Joel Klein, and current Chancellor Cathie Black.
I find all this a bit curious, given that the best research on the topic finds that the argument is untrue (including a study of New York City, and a statewide analysis of Washington [also here]). Now, it is at least possible that, if layoffs were conducted strictly on the basis of seniority, higher-poverty schools could end up bearing the brunt of dismissals. This is almost never the case, however – layoffs in almost every district proceed based on a variety of criteria, among which seniority is only one (albeit often the most important).
It is fortuitous, then, that the city’s dataset provides an opportunity to test the claim that the “worst-case scenario” – over 4,500 layoffs using current New York City procedures – would hurt high-poverty schools the most. Let’s take a look. Read More »
Outgoing New York City Chancellor Klein loves to try to wrap himself in the mantle of Al Shanker. He is especially fond of pulling clipped Shanker quotes out of his hat—and out of context—when speaking about his favorite education “reforms.” At first this may seem puzzling, because the ex-Chancellor is disinclined to give either the United Federation of Teachers or its parent organization, the American Federation of Teachers, credit for much of anything except intransigence. It must be an inconvenient truth for Klein that Shanker devoted his life to making both organizations into the strong and aggressive advocates for teachers and teaching that they continue to be.
In “What I Learned at the Barricades,” a December 6 Wall Street Journal column, Klein leads up to his latest Shanker references with a characteristic litany of inaccurate claims – ones that Al would be quick to correct:
First, it is wrong to assert that students’ poverty and family circumstances severely limit their educational potential.” And “Second, traditional proposals for improving education—more money, better curriculum, smaller classes, etc —aren’t going to get the job done.
Really? It’s hard to imagine which barricades Klein learned at. There is plenty of evidence to support the impact of all of these.
But, for those of us who knew and worked closely with Al (I did from 1967-1984 and from 1989 until his death in 1997), what’s truly galling is Klein’s distorted use of Al’s thinking to shore up a simplistic, narrowly punitive agenda that Shanker would have discredited. Read More »
On the heels of the Los Angeles Times’ August decision to publish a database of teachers’ value-added scores, New York City newspapers are poised to do the same, with the hearing scheduled for late November.
Here’s a proposition: Those who support the use of value-added models (VAM) for any purpose should be lobbying against the release of teachers’ names and value-added scores.
The reason? Publishing the names directly compromises the accuracy of an already-compromised measure. Those who blindly advocate for publication – often saying things like “what’s the harm?” – betray their lack of knowledge about the importance of the models’ core assumptions, and the implications they carry for the accuracy of results. Indeed, the widespread publication of these databases may even threaten VAM’s future utility in public education. Read More »