The Education Reporter's Dilemma

I’ve written so many posts about the misinterpretation of testing data in news stories that I’m starting to annoy myself. For example, I’ve shown that year-to-year changes in testing results might be attributable to the fact that, each year, a different set of students takes the test. I’ve discussed the fact that proficiency rates are not test scores – they only tell you the proportion of students above a given line – and that the rates and actual scores can move in opposite directions (see this simple illustration). And I’ve pleaded with journalists, most of whom I like and respect, to write with care about these issues (and, I should note, many of them do so).

Yet here I am, back on my soapbox again. This time the culprit is the recent release of SAT testing data, generating dozens of error-plagued stories from newspapers and organizations. Like virtually all public testing data, the SAT results are cross-sectional – each year, the test is taken by a different group of students. This means that demographic changes in the sample of test takers influence the results. This problem is even more acute in the case of the SAT, since it is voluntary. Despite the best efforts of the College Board (see their press release), a slew of stories improperly equated the decline in average SAT scores since the previous year with an overall decline in student performance – a confirmation of educational malaise (in fairness, there were many exceptions).

I’ve come to think that there’s a fundamental problem here: When you interpret testing data properly, you don’t have much of a story.

For instance, over the summer, DCPS released its annual testing results, which showed that proficiency rates were basically flat compared with the previous year. Of course, district officials spun the results in a positive light by concentrating on the change in rates for seventh and eighth graders, which increased slightly, as well as on the longer-term trend in rates since 2007 (New York City officials made a similarly narrow, misleading argument about their results). Some mainstream journalists noted that the “scores” were largely flat, while others presented the longer-term trend that the district was pushing. And there was, of course, a blizzard of commentary from all corners about what the results meant for Michelle Rhee, the new DC teacher evaluation system, the city’s cheating scandal and other factors.

Here’s the basic gist of how I would have reported the story:

A slightly larger proportion of DCPS elementary school students scored above the proficiency cutoff this year compared with last year, while the rates for secondary school students declined slightly. The degree to which these changes reflect “real” performance shifts – positive or negative – remains unclear. For one thing, proficiency rates can increase while actual average test scores stay flat or even go down, and vice-versa. DCPS does not release its actual scores, so it’s impossible to tell how the performance of the “typical student” changed between years. In addition, even if there were score increases, they wouldn’t necessarily reflect real improvement, since these data do not account for whether the group of students taking the test this year is significantly different from those taking it last year in terms of key characteristics, such as income or parental involvement. This is especially true in DCPS, where student mobility is unusually high. In short, this year’s test results provide a snapshot of how many students are above different cutoff points, and the percent scoring above the district’s proficiency standards remains quite low. But the results are, at best, inconclusive evidence regarding whether performance of students improved since last year, and they certainly cannot be used to demonstrate the effects of any particular policy or individual.

I think this demonstrates two things. First, I would make a terrible journalist. Second, when you properly interpret most public testing data, you have a story that is unlikely to strike the average reader as interesting. I’m sure a better writer could spice this up (especially in a district that provides scale scores and not just pass rates), but in the end, it’s more of a statement about what we don’t know than what we do.

It’s far more compelling to write definitive stories about how scores "went up" or "went down"; offer various interpretations as to why this might have been the case; speculate on the credit or blame due to the various interested parties; and describe what it means in the context of the scandals and personalities du jour. But it’s just not correct.

Put simply, when it comes to the public discussion of most test results, we have two choices: Boring and accurate or exciting and misinterpreted. I vote for the former.

- Matt Di Carlo

Blog Topics

Matt, that wasn't boring at all! Thanks for posting it.