In the Washington Post, Emma Brown reports on a behind the scenes decision about how to score last year’s new, more difficult tests in the District of Columbia Public Schools (DCPS) and the District’s charter schools.
To make a long story short, the choice faced by the Office of the State Superintendent of Education, or OSSE, which oversees testing in the District, was about how to convert test scores into proficiency rates. The first option, put simply, was to convert them such that the proficiency bar was more “aligned” with the Common Core, thus resulting in lower aggregate proficiency rates in math, compared with last year’s (in other states, such as Kentucky and New York, rates declined markedly). The second option was to score the tests while “holding constant” the difficulty of the questions, in order to facilitate comparisons of aggregate rates with those from previous years.
OSSE chose the latter option (according to some, in a manner that was insufficiently transparent). The end result was a modest increase in proficiency rates (which DC officials absurdly called “historic”).
I don’t have a particularly strong opinion about how OSSE should have proceeded. I can see it going either way. Setting cut scores is as much a political and human judgment call as anything else.
The controversy surrounding this decision, however, is both ironic and instructive. My understanding is that the actual scale scores are comparable between years (albeit imperfectly), even with the changed tests. The issue, again, is where to set the minimum score above which students are called advanced, proficient, etc., which in turn determines the rates for each of these designations (with proficiency rates getting the most attention). Unfortunately, OSSE doesn’t report anything except the rates. This severely limits the value of their annual testing results as reported to the public.
In other words, much of this controversy is about choosing between two cut score configurations, both of which present the data in an incomplete manner. Changes in the actual average scores, though still cross-sectional and therefore not “progress” measures, arguably provide a better sense of the change in performance of the typical student than either of the cut score proposals that are on the table. Now, to be clear, OSSE would have had to make the cut score decision regardless — i.e., for the purposes of their accountability systems — and the rates can be useful too (particularly for looking at performance in a given year).
Nevertheless, had the scale scores been available, the public would not have been forced to rely solely on the rates, which depend heavily on the specification of cut scores. This whole situation is a perfect illustration of why OSSE should release scores and rates every year.
But the other justification for the decision was described in the story as follows:
The OSSE made an unwritten commitment years ago to maintain that trend line as a way to judge progress and the effectiveness of reform efforts, said Jeffrey Noel, who oversees testing at the agency.
These types of statements are truly disturbing. In short, to whatever degree the decision was based on this “commitment,” it is based on false premises.
As we’ve discussed countless times, changes in cross-sectional proficiency rates are not measures of “progress” (they don’t follow the same group of students over time), nor are they valid as evidence that any policy or policies are “effective.” And the situation is even shakier for DCPS, where the student population is highly mobile, both for “natural reasons” (e.g., moving) and due to charter school transfers.
Every year, DC officials, as well as their counterparts in other states and districts, make these kinds of claims. And every year they are wrong in doing so. The reforms may be working, or they may not be. This is not evidence either way.*
And the same would have been true had OSSE decided to use the alternative cut scores, which, again, would have required their reporting lower proficiency rates in math. No doubt this would have resulted in more public outcry that the DC reforms are a failure.
Such an outcry would have been both unfair and inaccurate even if the tests hadn’t changed, but this year, it would have been especially ridiculous. If the tests are scored with higher standards, then aggregate results are going to look worse than they would had the standards remained constant. And, in either case, it is not valid policy evidence.
In other words, DC officials seem to be trying to evaluate their reforms using inappropriate evidence, and, perhaps, avoid criticism that is no less inappropriate.
Until the general public understands that cross-sectional proficiency rate changes are truly poor measures of student progress, and that raw changes, whether cross-sectional or longitudinal, cannot tell us much about whether policies are “working,” officials and advocates on both “sides” will keep firing empirical blanks at each other, in a battle that neither can actually win.
- Matt Di Carlo
* It bears mentioning that the rates have been largely flat for three years, and increased only modestly this year. Thus, by OSSE’s own standard of “evidence,” the conclusion would have to be that the reforms are failing. Fortunately for them, this is not a valid approach.