** Reprinted here in the Washington Post
Every year, a few major media outlets publish high school rankings. Most recently, Newsweek (in partnership with The Daily Beast) issued its annual list of the “nation’s best high schools.” Their general approach to this task seems quite defensible: To find the high schools that “best prepare students for college.”
The rankings are calculated using six measures: graduation rate (25 percent); college acceptance rate (25); AP/IB/AICE tests taken per student (25); average SAT/ACT score (10); average AP/IB/AICE score (10); and the percentage of students enrolled in at least one AP/IB/AICE course (5).
Needless to say, even the most rigorous, sophisticated measures of school performance will be imperfect at best, and the methods behind these lists have been subject to endless scrutiny. However, let’s take a quick look at three potentially problematic issues with the Newsweek rankings, how the results might be interpreted, and how the system compares with that published by U.S. News and World Report. Read More »
Earlier this week, New Jersey Governor Chris Christie announced that the state will assume control over Camden City School District. Camden will be the fourth NJ district to undergo takeover, though this is the first time that the state will be removing control from an elected local school board, which will now serve in an advisory role (and have three additional members appointed by the Governor). Over the next few weeks, NJ officials will choose a new superintendent, and begin to revamp evaluations, curricula and other core policies.
Accompanying the announcement, the Governor’s office released a two-page “fact sheet,” much of which is devoted to justifying this move to the public.
Before discussing it, let’s be clear about something - it may indeed be the case that Camden schools are so critically low-performing and/or dysfunctional as to warrant drastic intervention. Moreover, it’s at least possible that state takeover is the appropriate type of intervention to help these schools improve (though the research on this latter score is, to be charitable, undeveloped).
That said, the “fact sheet” presents relatively little valid evidence regarding the academic performance of Camden schools. Given the sheer magnitude of any takeover decision, it is crucial for the state to demonstrate publicly that they have left no stone unturned by presenting a case that is as comprehensive and compelling as possible. However, the discrepancy between that high bar and NJ’s evidence, at least that pertaining to academic outcomes, is more than a little disconcerting.
Read More »
In a Slate article published last October, Daniel Engber bemoans the frequently shallow use of the classic warning that “correlation does not imply causation.” Mr. Engber argues that the correlation/causation distinction has become so overused in online comments sections and other public fora as to hinder real debate. He also posits that correlation does not mean causation, but “it sure as hell provides a hint,” and can “set us down the path toward thinking through the workings of reality.”
Correlations are extremely useful, in fact essential, for guiding all kinds of inquiry. And Engber is no doubt correct that the argument is overused in public debates, often in lieu of more substantive comments. But let’s also be clear about something – careless causal inferences likely do more damage to the quality and substance of policy debates on any given day than the misuse of the correlation/causation argument does over the course of months or even years.
We see this in education constantly. For example, mayors and superintendents often claim credit for marginal increases in testing results that coincide with their holding office. The causal leaps here are pretty stunning. Read More »
In his State of the City address last month, New York City Mayor Michael Bloomberg made some brief comments about the upcoming adoption of new assessments aligned with the Common Core State Standards (CCSS), including the following statement:
But no matter where the definition of proficiency is arbitrarily set on the new tests, I expect that our students’ progress will continue outpacing the rest of the State’s[,] the only meaningful measurement of progress we have.
On the surface, this may seem like just a little bit of healthy bravado. But there are a few things about this single sentence that struck me, and it also helps to illustrate an important point about the relationship between standards and testing results. Read More »
As a strong believer in paying attention to what teachers think about policy, I always review the results of MetLife’s annual teacher survey. The big theme of this year’s survey, as pushed by the press release and reiterated in most of the media coverage, was that job satisfaction among teachers is at “its lowest level in 25 years.”
It turns out that changes in question wording over the years complicates straight comparisons of responses to the teacher job satisfaction over time. Even slight changes in wording can affect results, though it seems implausible that this one had a dramatic effect. In any case, it is instructive to take a look at the reactions to this finding. If I may generalize a bit here, one “camp” argued that the decline in teacher satisfaction is due to recent policy changes, such as eroding job protections, new evaluations, and the upcoming implementation of the Common Core. Another “camp” urged caution – they pointed out that not only is job satisfaction still rather high, but also that the decline among teachers can be found among many other groups of workers too, likely a result of the ongoing recession.
Although it is more than plausible that recent reforms are taking a toll on teacher morale, and this possibility merits attention, those urging caution, in my view, are correct. It’s simply not appropriate to draw strong conclusions as to what is causing this (or any other) trend in aggregate teacher attitudes, and it’s even more questionable to chalk it up to a reaction against specific policies, particularly during a time of economic hardship. Read More »
** Reprinted here in the Washington Post
Former Florida Governor Jeb Bush was in Virginia last week, helping push for a new law that would install an “A-F” grading system for all public schools in the commonwealth, similar to a system that has existed in Florida for well over a decade.
In making his case, Governor Bush put forth an argument about the Florida system that he and his supporters use frequently. He said that, right after the grades went into place in his state, there was a drop in the proportion of D and F schools, along with a huge concurrent increase in the proportion of A schools. For example, as Governor Bush notes, in 1999, only 12 percent of schools got A’s. In 2005, when he left office, the figure was 53 percent. The clear implication: It was the grading of schools (and the incentives attached to the grades) that caused the improvements.
There is some pretty good evidence (also here) that the accountability pressure of Florida’s grading system generated modest increases in testing performance among students in schools receiving F’s (i.e., an outcome to which consequences were attached), and perhaps higher-rated schools as well. However, putting aside the serious confusion about what Florida’s grades actually measure, as well as the incorrect premise that we can evaluate a grading policy’s effect by looking at the simple distribution of those grades over time, there’s a much deeper problem here: The grades changed in part because the criteria changed. Read More »
Our guest authors today are Morgan Polikoff and Andrew McEachin. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California. Andrew is an Institute of Education Science postdoctoral fellow at the University of Virginia.
In a previous post, we described some of the problems with the Senate’s Harkin-Enzi plan for reauthorizing the No Child Left Behind Act, based on our own analyses, which yielded three main findings. First, selecting the bottom 5% of schools for intervention based on changes in California’s composite achievement index resulted in remarkably unstable rankings. Second, identifying the bottom 5% based on schools’ lowest performing subgroup overwhelmingly targeted those serving larger numbers of special education students. Third and finally, we found evidence that middle and high schools were more likely to be identified than elementary schools, and smaller schools more likely than larger schools.
None of these findings was especially surprising (see here and here, for instance), and could easily have been anticipated. Thus, we argued that policymakers need to pay more attention to the vast (and rapidly expanding) literature on accountability system design. Read More »
In October of last year, the education advocacy group ConnCAN published a report called “The Roadmap to Closing the Gap” in Connecticut. This report says that the state must close its large achievement gaps by 2020 – that is, within eight years – and they use to data to argue that this goal is “both possible and achievable.”
There is value in compiling data and disaggregating them by district and school. And ConnCAN, to its credit, doesn’t use this analysis as a blatant vehicle to showcase its entire policy agenda, as advocacy organizations often do. But I am compelled to comment on this report, mostly as a springboard to a larger point about expectations.
However, first things first – a couple of very quick points about the analysis. There are 60-70 pages of district-by-district data in this report, all of it portrayed as a “roadmap” to closing Connecticut’s achievement gap. But it doesn’t measure gaps and won’t close them. Read More »
Let’s try a super-simple thought experiment with data. Suppose we have an inner-city middle school serving grades 6-8. Students in all three grades take the state exam annually (in this case, we’ll say that it’s at the very beginning of the year). Now, for the sake of this illustration, let’s avail ourselves of the magic of hypotheticals and assume away many of the sources of error that make year-to-year changes in public testing data unreliable.
First, we’ll say that this school reports test scores instead of proficiency rates, and that the scores are comparable between grades. Second, every year, our school welcomes a new cohort of sixth graders that is the exact same size and has the exact same average score as preceding cohorts – 30 out of 100, well below the state average of 65. Third and finally, there is no mobility at this school. Every student who enters sixth grade stays there for three years, and goes to high school upon completion of eighth grade. No new students are admitted mid-year.
Okay, here’s where it gets interesting: Suppose this school is phenomenally effective in boosting its students’ scores. In fact, each year, every single student gains 20 points. It is the highest growth rate in the state. Believe it or not, using the metrics we commonly use to judge schoolwide “growth” or “gains,” this school would still look completely ineffective. Take a look at the figure below. Read More »
In 1998, the National Institutes of Health (NIH) lowered the threshold at which people are classified as “overweight.” Literally overnight, about 25 million Americans previously considered as having a healthy weight were now overweight. If, the next day, you saw a newspaper headline that said “number of overweight Americans increases,” you would probably find that a little misleading. America’s “overweight” population didn’t really increase; the definition changed.
Fast forward to November 2012, during which Kentucky became the first state to release results from new assessments that were aligned with the Common Core Standards (CCS). This led to headlines such as, “Scores Drop on Kentucky’s Common Core-Aligned Tests” and “Challenges Seen as Kentucky’s Test Scores Drop As Expected.” Yet, these descriptions unintentionally misrepresent what happened. It’s not quite accurate – or at least highly imprecise – to say that test scores “dropped,” just as it would have been wrong to say that the number of overweight Americans increased overnight in 1998 (actually, they’re not even scores, they’re proficiency rates). Rather, the state adopted different tests, with different content, a different design, and different standards by which students are deemed “proficient.”
Over the next 2-3 years, a large group of states will also release results from their new CCS-aligned tests. It is important for parents, teachers, administrators, and other stakeholders to understand what the results mean. Most of them will rely on newspapers and blogs, and so one exceedingly simple step that might help out is some polite, constructive language-policing.
Read More »
The new breed of school rating systems, some of which are still getting off the ground, will co-exist with federal proficiency targets in many states, and they are (or will be) used for a variety of purposes, including closure, resource allocation and informing parents and the public (see our posts on the systems in IN, FL, OH, CO, NYC).*
The approach that most states are using, in part due to the “ESEA flexibility” guidelines set by the U.S. Department of Education, is to combine different types of measures, often very crudely, into a single grade or categorical rating for each school. Administrators and media coverage usually characterize these ratings as measures of school performance – low-rated schools are called “low performing,” while those receiving top ratings are characterized as “high performing.” That’s not accurate – or, at best, it’s only partially true.
Some of the indicators that comprise the ratings, such as proficiency rates, are best interpreted as (imperfectly) describing student performance on tests, whereas other measures, such as growth model estimates, make some attempt to isolate schools’ contribution to that performance. Both might have a role to play in accountability systems, but they’re more or less appropriate depending on how you’re trying to use them.
So, here’s my question: Why do we insist on throwing them all together into a single rating for each school? To illustrate why I think this question needs to be addressed, let’s take a quick look at four highly-simplified situations in which one might use ratings. Read More »
When I point out that raw changes in state proficiency rates or NAEP scores are not valid evidence that a policy or set of policies is “working,” I often get the following response: “Oh Matt, we can’t have a randomized trial or peer-reviewed article for everything. We have to make decisions and conclusions based on imperfect information sometimes.”
This statement is obviously true. In this case, however, it’s also a straw man. There’s a huge middle ground between the highest-quality research and the kind of speculation that often drives our education debate. I’m not saying we always need experiments or highly complex analyses to guide policy decisions (though, in general, these are always preferred and sometimes required). The point, rather, is that we shouldn’t draw conclusions based on evidence that doesn’t support those conclusions.
This, unfortunately, happens all the time. In fact, many of the more prominent advocates in education today make their cases based largely on raw changes in outcomes immediately after (or sometimes even before) their preferred policies were implemented (also see here, here, here, here, here, and here). In order to illustrate the monumental assumptions upon which these and similar claims ride, I thought it might be fun to break them down quickly, in a highly simplified fashion. So, here are the four “requirements” that must be met in order to attribute raw test score changes to a specific policy (note that most of this can be applied not only to claims that policies are working, but also to claims that they’re not working because scores or rates are flat):
Read More »
As states’ continue to finalize their applications for ESEA/NCLB “flexibility” (or “waivers”), controversy has arisen in some places over how these plans set proficiency goals, both overall and for demographic subgroups (see our previous post about the situation in Virginia).
One of the underlying rationales for allowing states to establish new targets (called “annual measurable objectives,” or AMOs) is that the “100 percent” proficiency goals of NCLB were unrealistic. Accordingly, some (but not all) of the new plans have set 2017-18 absolute proficiency goals that are considerably below 100 percent, and/or lower for some subgroups relative to others. This shift has generated pushback from advocates, most recently in Florida, who believe that lowering state targets is tantamount to encouraging or accepting failure.
I acknowledge the central role of goals in any accountability system, but I would like to humbly suggest that this controversy, over where and how states set proficiency targets for 2017-18, may be misguided. There are four reasons why I think this is the case (and one silver lining if it is). Read More »
The State of Indiana has received a great deal of attention for its education reform efforts, and they recently announced the details, as well as the first round of results, of their new “A-F” school grading system. As in many other states, for elementary and middle schools, the grades are based entirely on math and reading test scores.
It is probably the most rudimentary scoring system I’ve seen yet – almost painfully so. Such simplicity carries both potential advantages (easier for stakeholders to understand) and disadvantages (school performance is complex and not always amenable to rudimentary calculation).
In addition, unlike the other systems that I have reviewed here, this one does not rely on explicit “weights,” (i.e., specific percentages are not assigned to each component). Rather, there’s a rubric that combines absolute performance (passage rates) and proportions drawn from growth models (a few other states use similar schemes, but I haven’t reviewed any of them).
On the whole, though, it’s a somewhat simplistic variation on the general approach most other states are taking — but with a few twists. Read More »
** Also reprinted here in the Washington Post
In the education community, many proclaim themselves to be “completely data-driven.” Data Driven Decision Making (DDDM) has been a buzz phrase for a while now, and continues to be a badge many wear with pride. And yet, every time I hear it, I cringe.
Let me explain. During my first year in graduate school, I was taught that excessive attention to quantitative data impedes – rather than aids – in-depth understanding of social phenomena. In other words, explanations cannot simply be cranked out of statistical analyses, without the need for a precursor theory of some kind – a.k.a. “variable sociology” – and the attempt to do so constitutes a major obstacle to the advancement of knowledge.
I am no longer in graduate school, so part of me says: Okay, I know what data-driven means in education. But then, at times, I still think: No, really, what does “data-driven” mean even in this context? Read More »