The Washington Post reports on an issue that we have discussed here on many occasions: The incompleteness of the testing results released annually by the District of Columbia Public Schools (DCPS), or, more accurately, the Office of the State Superintendent of Education (OSSE), which is responsible for testing in DC schools.
Here’s the quick backstory: For the past 7-8 years or so, DCPS/OSSE have not released a single test score for the state assessment (the DC-CAS). Instead, they have released only the percentage of students whose scores meet the designated cutoff points for the NCLB-style categories of below basic, basic, proficient and advanced. I will not reiterate all of the problems with these cutpoint-based rates and how they serve to distort the underlying data, except to say that they are by themselves among the worst ways to present these data, and there is absolutely no reason why states and districts should not release both rates and average scale scores.
The Post reports, however, that one organization — the Broader, Bolder Approach to Education — was able to obtain the actual scale score data (by subgroup and grade) for 2010-2013, and that this group published a memo-style report alleging that DCPS’ public presentation of their testing results over the past few years has been misleading. I had a mixed reaction to this report and the accompanying story. Read More »
There is a tendency in education circles these days, one that I’m sure has been discussed by others, and of which I myself have been “guilty,” on countless occasions. The tendency is to use terms such “effective/ineffective teacher” or “teacher performance” interchangeably with estimates from value-added and other growth models.
Now, to be clear, I personally am not opposed to the use of value-added estimates in teacher evaluations and other policies, so long as it is done cautiously and appropriately (which, in my view, is not happening in very many places). Moreover, based on my reading of the research, I believe that these estimates can provide useful information about teachers’ performance in the classroom. In short, then, I am not disputing whether value-added scores should be considered to be one useful proxy measure for teacher performance and effectiveness (and described as such), both formally and informally.
Regardless of one’s views on value-added and its policy deployment, however, there is a point at which our failure to define terms can go too far, and perhaps cause confusion. Read More »
A couple of weeks ago, the website Vox.com published an article entitled, “11 facts about U.S. teachers and schools that put the education reform debate in context.” The article, in the wake of the Vergara decision, is supposed to provide readers with the “basic facts” about the current education reform environment, with a particular emphasis on teachers. Most of the 11 facts are based on descriptive statistics.
Vox advertises itself as a source of accessible, essential, summary information — what you “need to know” — for people interested in a topic but not necessarily well-versed in it. Right off the bat, let me say that this is an extraordinarily difficult task, and in constructing lists such as this one, there’s no way to please everyone (I’ve read a couple of Vox’s education articles and they were okay).
That said, someone sent me this particular list, and it’s pretty good overall, especially since it does not reflect overt advocacy for given policy positions, as so many of these types of lists do. But I was compelled to comment on it. I want to say that I did this to make some lofty point about the strengths and weaknesses of data and statistics packaged for consumption by the general public. It would, however, be more accurate to say that I started doing it and just couldn’t stop. In any case, here’s a little supplemental discussion of each of the 11 items: Read More »
There is an ongoing debate about widespread administration of standardized tests to kindergartners. This is of course a serious decision. My personal opinion about whether this is a good idea depends on several factors, such as how good the tests will be and, most importantly, how the results will be used (and I cannot say that I am optimistic about the latter).
Although the policy itself must be considered seriously on its merits, there is one side aspect of testing kindergarteners that fascinates me: It would demonstrate how absurd it is to judge school performance, as does NCLB, using absolute performance levels – i.e., how highly students score on tests, rather than their progress over time.
Basically, the kindergarten tests would inevitably shake out the same way as those administered in later grades. Schools and districts serving more disadvantaged students would score substantially lower than their counterparts in more affluent areas. If the scores were converted to proficiency rates or similar cut-score measures, they would show extremely low pass rates in urban districts such as Detroit. Read More »
Unlike many of my colleagues, I don’t have a negative view of the Gates Foundation’s education programs. Although I will admit that part of me is uneasy with the sheer amount of resources (and influence) they wield, and there are a few areas where I don’t see eye-to-eye with their ideas (or grantees), I agree with them on a great many things, and I think that some of their efforts, such as the Measuring Effective Teachers project, are important and beneficial (even if I found their packaging of the MET results a bit overblown).
But I feel obliged to say that I am particularly impressed with their recent announcement of support for a two-year delay on attaching stakes to the results of new assessments aligned with the Common Core. Granted, much of this is due to the fact that I think this is the correct policy decision (see my opinion piece with Morgan Polikoff). Independent of that, however, I think it took intellectual and political courage for them to take this stance, given their efforts toward new teacher evaluations that include test-based productivity measures.
The announcement was guaranteed to please almost nobody. Read More »
Over the past few years, one can find a regular flow of writing attempting to explain the increase in teacher attrition. Usually, these explanations come in the form of advocacy – that is, people who don’t like a given policy or policies assert that they are the reasons for the rise in teachers leaving. Putting aside that these arguments are usually little more than speculation, as well as the fact that they often rely on highly limited approaches to measuring attrition (e.g., teacher experience distributions), there is a prior issue that must be addressed here: Is teacher attrition really increasing?
The short answer, at least at the national level and over the longer term, is yes, but, as usual, it’s more complicated than a simple yes/no answer.
Obviously, not all attrition is “bad,” as it depends on who’s leaving, but any attempt to examine levels of or trends in teacher attrition (leaving the profession) or mobility (switching schools) requires good data. When looking at individual districts, one often must rely on administrative datasets that make it very difficult to determine whether teachers left the profession entirely or simply moved to another district (though remember that whether teachers leave the profession or simply switch schools doesn’t really matter to individual schools, since they must replace the teachers regardless). In addition, the phenomenon of teachers leaving for a temporary period and then returning (e.g., after childbirth) is more common than many people realize. Read More »
A few weeks ago, I wrote a post that made a fairly simple point about the practice of expressing estimated charter effects on test scores as “days of additional learning”: Among the handful of states, districts, and multi-site operators that consistently have been shown to have a positive effect on testing outcomes, might not those “days of learning” be explained, at least in part, by the fact that they actually do offer additional days of learning, in the form of much longer school days and years?
That is, there is a small group of charter models/chains that seem to get good results. There are many intangible factors that make a school effective, but to the degree we can chalk this up to concrete practices or policies, additional time may be the most compelling possibility. Although it’s true that school time must be used wisely, it’s difficult to believe that the sheer amount of extra time that the flagship chains offer would not improve testing performance substantially.
To their credit, many charter advocates do acknowledge the potentially crucial role of extended time in explaining their success stories. And the research, tentative though it still is, is rather promising. Nevertheless, there are a few important points that bear repeating when it comes to the idea of massive amounts of additional time, particularly given the fact that there is a push to get regular public schools to adopt the practice. Read More »
A recent story in the Chicago Tribune notes that Illinois’ NCLB waiver plan sets lower targets for certain student subgroups, including minority and low-income students. This, according to the article, means that “Illinois students of different backgrounds no longer will be held to the same standards,” and goes on to quote advocates who are concerned that this amounts to lower expectations for traditionally lower-scoring groups of children.
The argument that expectations should not vary by student characteristics is, of course, valid and important. Nevertheless, as Chad Aldeman notes, the policy of setting different targets for different groups of students has been legally required since the enactment of NCLB, under which states must “give credit to lower-performing groups that demonstrate progress.” This was supposed to ensure, albeit with exceedingly crude measures, that schools weren’t punished due to the students they serve, and how far behind were those students upon entry into the schools.
I would take that a step further by adding two additional points. The first is quite obvious, and is mentioned briefly in the Tribune article, but too often is obscured in these kinds of conversations: Neither NCLB nor the waivers actually hold students to different standards. The cut scores above which students are deemed “proficient,” somewhat arbitrary though they may be, do not vary by student subgroup, or by any other factor within a given state. All students are held to the same exact standard. Read More »
A recent story in the New York Times reports that, according to an Obama Administration-commissioned panel, the measures being used to evaluate the performance of healthcare providers are unfairly penalizing those that serve larger proportions of disadvantaged patients (thanks to Mike Petrilli for sending me the article). For example, if you’re grading hospitals based on simple, unadjusted re-admittance rates, it might appear as if hospitals serving high poverty populations are doing worse — even if the quality of their service is excellent — since readmissions are more likely for patients who can’t afford medication, or aren’t able to take off from work, or don’t have home support systems.
The panel recommended adjusting the performance measures, which, for instance, are used for Medicare reimbursement, using variables such as patient income and education, as this would provide a more fair accountability system – one that does not penalize healthcare institutions and their personnel for factors that are out of their control.
There are of course very strong, very obvious parallels here to education accountability policy, in which schools are judged in part based on raw proficiency rates that make no attempt to account for differences in the populations of students in different schools. The comparison also reveals an important feature of formal accountability systems in other policy fields. Read More »
At the end of February, the District of Columbia Council’s Education Committee held its annual hearing on the performance of the District’s Public Schools (DCPS). The hearing (full video is available here) lasted over four hours, and included discussion on a variety of topics, but there was, inevitably, a block of time devoted to the discussion of DCPS testing results (and these questions were the focus of the news coverage).
These exchanges between Council members and DCPS Chancellor Kaya Henderson focused particularly on the low-stakes Trial Urban District Assessment (TUDA).* Though it was all very constructive and not even remotely hostile, it’s fair to say that Ms. Henderson was grilled quite a bit (as is often the case at these kinds of hearings). Unfortunately, the arguments from both sides of the dais were fraught with the typical misinterpretations of TUDA, and I could not get past how tragic it is to see legislators question the superintendent of a large urban school district based on a misinterpretation of what the data mean – and to hear that superintendent respond based on the same flawed premises.
But what I really kept thinking — as I have before in similar contexts — was how effective Chancellor Henderson could have been in answering the Council’s questions had she chosen to interpret the data properly (and I still hold out hope that this will become the norm some day). So, let’s take a quick look at a few major arguments that were raised during the hearing, and how they might have been answered. Read More »
Anyone who follows education policy debates might hear the term “standard deviation” fairly often. Most people have at least some idea of what it means, but I thought it might be useful to lay out a quick, (hopefully) clear explanation, since it’s useful for the proper interpretation of education data and research (as well as that in other fields).
Many outcomes or measures, such as height or blood pressure, assume what’s called a “normal distribution.” Simply put, this means that such measures tend to cluster around the mean (or average), and taper off in both directions the further one moves away from the mean (due to its shape, this is often called a “bell curve”). In practice, and especially when samples are small, distributions are imperfect — e.g., the bell is messy or a bit skewed to one side — but in general, with many measures, there is clustering around the average.
Let’s use test scores as our example. Suppose we have a group of 1,000 students who take a test (scored 0-20). A simulated score distribution is presented in the figure below (called a “histogram”). Read More »
A couple of weeks ago, Michelle Rhee published an op-ed in the Washington Post speaking out against the so-called “opt out movement,” which encourages parents to refuse to let their children take standardized tests.
Personally, I oppose the “opt-out” phenomenon, but I also think it would be a mistake not to pay attention to its proponents’ fundamental issue – that standardized tests are potentially being misused and/or overused. This concern is legitimate and important. My sense is that “opting out” reflects a rather extreme version of this mindset, a belief that we cannot right the ship – i.e., we have gone so far and moved so carelessly with test-based accountability that there is no real hope that it can or will be fixed. This strikes me as a severe overreaction, but I understand the sentiment.
That said, while most of Ms. Rhee’s op-ed is the standard, reasonable fare, some of it is also laced with precisely the kind of misconceptions that contribute to the apprehensions not only of anti-testing advocates, but also among those of us who occupy a middle ground – i.e., favor some test-based accountability, but are worried about getting it right. Read More »
Last year, we published a post that included a very simple graphical illustration of what changes in cross-sectional proficiency rates or scores actually tell us about schools’ test-based effectiveness (basically nothing).
In reality, year-to-year changes in cross-sectional average rates or scores may reflect “real” improvement, at least to some degree, but, especially when measured at the school- or grade-level, they tend to be mostly error/imprecision (e.g., changes in the composition of the samples taking the test, measurement error and serious issues with converting scores to rates using cutpoints). This is why changes in scores often conflict with more rigorous indicators that employ longitudinal data.
In the aforementioned post, however, I wanted to show what the changes meant even if most of these issues disappeared magically. In this one, I would like to extend this very simple illustration, as doing so will hopefully help shed a bit more light on the common (though mistaken) assumption that effective schools or policies should generate perpetual rate/score increases.
Read More »
One of the purely presentational aspects that separates the new “generation” of CREDO charter school analyses from the old is that the more recent reports convert estimated effect sizes from standard deviations into a “days of learning” metric. You can find similar approaches in other reports and papers as well.
I am very supportive of efforts to make interpretation easier for those who aren’t accustomed to thinking in terms of standard deviations, so I like the basic motivation behind this. I do have concerns about this particular conversion — specifically, that it overstates things a bit — but I don’t want to get into that issue. If we just take CREDO’s “days of learning” conversion at face value, my primary, far more simple reaction to hearing that a given charter school sector’s impact is equivalent to a given number of additional “days of learning” is to wonder: Does this charter sector actually offer additional “days of learning,” in the form of longer school days and/or years?
This matters to me because I (and many others) have long advocated moving past the charter versus regular public school “horserace” and trying to figure out why some charters seem to do very well and others do not. Additional time is one of the more compelling observable possibilities, and while they’re not perfectly comparable, it fits nicely with the “days of learning” expression of effect sizes. Take New York City charter schools, for example. Read More »
A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data — consisting solely of changes in proficiency rates — were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.
Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.
So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now – not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions. Read More »