When Checking Under The Hood Of Overall Test Score Increases, Use Multiple Tools

Posted by on February 24, 2014

When looking at changes in testing results between years, many people are (justifiably) interested in comparing those changes for different student subgroups, such as those defined by race/ethnicity or income (subsidized lunch eligibility). The basic idea is to see whether increases are shared between traditionally advantaged and disadvantaged groups (and, often, to monitor achievement gaps).

Sometimes, people take this a step further by using the subgroup breakdowns as a crude check on whether cross-sectional score changes are due to changes in the sample of students taking the test. The logic is as follows: If the increases are found when comparing advantaged and more disadvantaged cohorts, then an overall increase cannot be attributed to a change in the backgrounds of students taking the test, as the subgroups exhibited the same pattern. (For reasons discussed here many times before, this is a severely limited approach.)

Whether testing data are cross-sectional or longitudinal, these subgroup breakdowns are certainly important and necessary, but it’s wise to keep in mind that standard variables, such as eligibility for free and reduced-price lunches (FRL), are imperfect proxies for student background (actually, FRL rates aren’t even such a great proxy for income). In fact, one might reach different conclusions depending on which variables are chosen. To illustrate this, let’s take a look at results from the Trial Urban District Assessment (TUDA) for the District of Columbia Public Schools between 2011 and 2013, in which there was a large overall score change that received a great deal of media attention, and break the changes down by different characteristics.

Read More »

Select Your Conclusions, Apply Data

Posted by on February 19, 2014

The recent release of the National Assessment of Educational Progress (NAEP) and the companion Trial Urban District Assessment (TUDA) was predictably exploited by advocates to argue for their policy preferences. This is a blatant misuse of the data for many reasons that I have discussed here many times before, and I will not repeat them.

I do, however, want to very quickly illustrate the emptiness of this pseudo-empirical approach – finding cross-sectional cohort increases in states/districts that have recently acted policies you support, and then using the increases as evidence that the policies “work.” For example, the recent TUDA results for the District of Columbia Public Schools (DCPS), where scores increased in all four grade/subject combinations, were immediately seized upon supporters of the reforms that have been enacted by DCPS as clear-cut evidence of the policy triumph. The celebrators included the usual advocates, but also DCPS Chancellor Kaya Henderson and the U.S. Secretary of Education Arne Duncan (there was even a brief mention by President Obama in his State of The Union speech).

My immediate reaction to this bad evidence was simple (though perhaps slightly juvenile) – find a district that had similar results under a different policy environment. It was, as usual, pretty easy: Los Angeles Unified School District (LAUSD). Read More »

Being Kevin Huffman

Posted by on December 11, 2013

In a post earlier this week, I noted how several state and local education leaders, advocates and especially the editorial boards of major newspapers used the results of the recently-released NAEP results inappropriately – i.e., to argue that recent reforms in states such as Tennessee and D.C. are “working.” I also discussed how this illustrates a larger phenomenon in which many people seem to expect education policies to generate immediate, measurable results in terms of aggregate student test scores, which I argued is both unrealistic and dangerous.

Mike G. from Boston, a friend whose comments I always appreciate, agrees with me, but asks a question that I think gets to the pragmatic heart of the matter. He wonders whether individuals in high-level education positions have any alternative. For instance, Mike asks, what would I suggest to Kevin Huffman, who is the head of Tennessee’s education department? Insofar as Huffman’s opponents “would use any data…to bash him if it’s trending down,” would I advise him to forego using the data in his favor when they show improvement?*

I have never held any important high-level leadership positions. My political experience and skills are (and I’m being charitable here) underdeveloped, and I have no doubt many more seasoned folks in education would disagree with me. But my answer is: Yes, I would advise him to forego using the data in this manner. Here’s why. Read More »

NAEP And Public Investment In Knowledge

Posted by on August 8, 2013

As reported over at Education Week, the so-called “sequester” has claimed yet another victim: The National Assessment of Educational Progress, or NAEP. As most people who follow education know, this highly respected test, which is often called the “nation’s report card,” is a very useful means of assessing student performance, both in any given year and over time.

Two of the “main assessments” – i.e., those administered in math and reading every two years to fourth and eighth graders – get most of the attention in our public debate, and these remain largely untouched by the cuts. But, last May, the National Assessment Governing Board, which oversees NAEP, decided to eliminate the 2014 NAEP exams in civics, history and geography for all but 8th graders (the exams were previously administered in grades 4, 8 and 12). Now, in its most recent announcement, the Board has decided to cancel its plans to expand the sample for 12th graders (in math, reading, and science) to make it large enough to allow state-level results. In addition, the 4th and 8th grade science samples will be cut back, making subgroup breakdowns very difficult, and the science exam will no longer be administered to individual districts. Finally, the “long-term trend NAEP,” which has tracked student performance for 40 years, has been suspended for 2016. These are substantial cutbacks.

Although its results are frequently misinterpreted, NAEP is actually among the few standardized tests in the U.S. that receives rather wide support from all “sides” of the testing debate. And one cannot help but notice the fact that federal and state governments are currently making significant investments in new tests that are used for high-stakes purposes, whereas NAEP, the primary low-stakes assessment, is being scaled back. Read More »

The Ever-Changing NAEP Sample

Posted by on July 3, 2013

The results of the latest National Assessment of Educational Progress long term trend tests (NAEP-LTT) were released last week. The data compare the reading and math scores of 9-, 13- and 17-year olds at various points since the early 1970s. This is an important way to monitor how these age cohorts’ performance changes over the long term.

Overall, there is ongoing improvement in scores among 9- and 13-year olds, in reading and especially math, though the trend is inconsistent and increases are somewhat slow in recent years. The scores for 17-year olds, in contrast, are relatively flat.

These data, of course, are cross-sectional – i.e., they don’t follow students over time, but rather compare children in the three age groups with their predecessors from previous years. This means that changes in average scores might be driven by differences, observable or unobservable, between cohorts. One of the simple graphs in this report, which doesn’t present a single test score, illustrates that rather vividly. Read More »

Which State Has “The Best Schools?”

Posted by on October 17, 2012

** Reprinted here in the Washington Post

I’ve written many times about how absolute performance levels – how highly students score – are not by themselves valid indicators of school quality, since, most basically, they don’t account for the fact that students enter the schooling system at different levels. One of the most blatant (and common) manifestations of this mistake is when people use NAEP results to determine the quality of a state’s schools.

For instance, you’ll often hear that Massachusetts has the “best” schools in the U.S. and Mississippi the “worst,” with both claims based solely on average scores on the NAEP (though, technically, Massachusetts public school students’ scores are statistically tied with at least one other state on two of the four main NAEP exams, while Mississippi’s rankings vary a bit by grade/subject, and its scores are also not statistically different from several other states’).

But we all know that these two states are very different in terms of basic characteristics such as income, parental education, etc. Any assessment of educational quality, whether at the state or local level, is necessarily complicated, and ignoring differences between students precludes any meaningful comparisons of school effectiveness. Schooling quality is important, but it cannot be assessed by sorting and ranking raw test scores in a spreadsheet.

Read More »

Guessing About NAEP Results

Posted by on February 15, 2012

Every two years, the release of data from the National Assessment of Educational Progress (NAEP) generates a wave of research and commentary trying to explain short- and long-term trends. For instance, there have been a bunch of recent attempts to “explain” an increase in aggregate NAEP scores during the late 1990s and 2000s. Some analyses postulate that the accountability provisions of NCLB were responsible, while more recent arguments have focused on the “effect” (or lack thereof) of newer market-based reforms – for example, looking to NAEP data to “prove” or “disprove” the idea that changes in teacher personnel and other policies have (or have not) generated “gains” in student test scores.

The basic idea here is that, for every increase or decrease in cross-sectional NAEP scores over a given period of time (both for all students and especially for subgroups such as minority and low-income students), there must be “something” in our education system that explains it. In many (but not all) cases, these discussions consist of little more than speculation. Discernible trends in NAEP test score data are almost certainly due to a combination of factors, and it’s unlikely that one policy or set of policies is dominant enough to be identified as “the one.” Now, there’s nothing necessarily wrong with speculation, so long as it is clearly identified as such, and conclusions presented accordingly. But I find it curious that some people involved with these speculative arguments seem a bit too willing to assume that schooling factors – rather than changes in cohorts’ circumstances outside of school – are the primary driver of NAEP trends.

So, let me try a little bit of illustrative speculation of my own: I might argue that changes in the economic conditions of American schoolchildren and their families are the most compelling explanation for changes in NAEP. Read More »

NAEP Shifting

Posted by on October 31, 2011

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post

Tomorrow, the education world will get the results of the 2011 National Assessment of Educational Progress (NAEP), often referred to as the “nation’s report card.” The findings – reading and math scores among a representative sample of fourth and eighth graders – will drive at least part of the debate for the next two years, when the next round comes out.

I’m going to make a prediction, one that is definitely a generalization, but is hardly uncommon in policy debates: People on all “sides” will interpret the results favorably no matter how they turn out.

If NAEP scores are positive – i.e., overall scores rise by a statistically significant margin, and/or there are encouraging increases among key subgroups such as low performers or low-income students – supporters of market-based reform will say that their preferred policies are working. They’ll claim that the era of test-based accountability, which began with the enactment of No Child Left Behind ten years ago, have produced real results. Market reform skeptics, on the other hand, will say that virtually none of the policies, such as test-based teacher evaluations and merit pay, for which reformers are pushing were in force in more than a handful of locations between 2009 and 2011. Therefore, they’ll claim, the NAEP progress shows that the system is working without these changes.

If the NAEP results are not encouraging – i.e., overall progress is flat (or negative), and there are no strong gains among key subgroups – the market-based crowd will use the occasion to argue that the “status quo” isn’t producing results, and they will strengthen their call for policies like new evaluations and merit pay. Skeptics, in contrast, will claim that NCLB and standardized test-based accountability were failures from the get-go. Some will even use the NAEP results to advocate for the wholesale elimination of standardized testing. Read More »

The Legend Of Last Fall

Posted by on February 15, 2011

The subject of Michelle Rhee’s teaching record has recently received a lot of attention. While the controversy has been interesting, it could also be argued that it’s relatively unimportant. The evidence that she exaggerated her teaching prowess is, after all, inconclusive (though highly suggestive). A little resume inflation from a job 20 years ago might be overlooked, so long as Rhee’s current claims about her more recent record are accurate. But are they?

On Rhee’s new website, her official bio - in effect, her resume today (or at least her cover letter) - contains a few sentences about her record as chancellor of D.C Public Schools (DCPS), under the header “Driving Unprecedented Growth in the D.C. Public Schools.” There, her test-based accomplishments are characterized as follows:

Under her leadership, the worst performing school district in the country became the only major city system to see double-digit growth in both their state reading and state math scores in seventh, eighth and tenth grades over three years.

This time, we can presume that the statement has been vetted thoroughly, using all the tools of data collection and analysis available to Rhee during her tenure at the helm of DCPS.

But the statement is false. Read More »

Michelle Rhee’s Testing Legacy: An Open Question

Posted by on November 1, 2010

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post.

Michelle Rhee’s resignation and departure have, predictably, provoked a flurry of conflicting reactions. Yet virtually all of them, from opponents and supporters alike, seem to assume that her tenure at the helm of the D.C. Public Schools (DCPS) helped to boost student test scores dramatically. She and D.C. Mayor Adrian Fenty made similar claims themselves in the Wall Street Journal (WSJ) just last week.

Hardly anybody, regardless of their opinion about Michelle Rhee, thinks that test scores alone are an adequate indicator of student success. But, in no small part because of her own emphasis on them, that is how this debate has unfolded. Her aim was to raise scores and, with few exceptions (also here and here), even those who objected to her “abrasive” style and controversial policies seem to believe that she succeeded wildly in the testing area.

This conclusion is premature. A review of the record shows that Michelle Rhee’s test score “legacy” is an open question. 

There are three main points to consider: Read More »

Teacher Contracts: The Phantom Menace

Posted by on October 13, 2010

In a previous post, I presented a simple tabulation of NAEP scores by whether or not states had binding teacher contracts.  The averages indicate that states without such contracts (which are therefore free of many of the “ill effects” of teachers’ unions) are among the lowest performers in the nation on all four NAEP exams. 

The post was largely a response to the constant comparisons of U.S. test scores with those of other nations (usually in the form of rankings), which make absolutely no reference to critical cross-national differences, most notably in terms of poverty/inequality (nor to the methodological issues surrounding test score comparisons). Using the same standard by which these comparisons show poor U.S. performance versus other nations, I “proved” that teacher contracts have a positive effect on states’ NAEP scores.

As I indicated at the end of that post, however, the picture is of course far more complicated. Dozens of factors – many of them unmeasurable – influence test scores, and simple averages mask them all. Still, given the fact that NAEP is arguably the best exam in the U.S. – and is the only one administered to a representative sample of all students across all states (without the selection bias of the SAT/ACT/AP) – it is worth revisiting this issue briefly, using tools that are a bit more sophisticated. If teachers’ contracts are to blame for low performance in the U.S., then when we control for core student characteristics, we should find that the contracts’ presence is associated with lower performance.  Let’s take a quick look. Read More »

Performance-Enhancing Teacher Contracts?

Posted by on October 1, 2010

** Also posted here on “Valerie Strauss’ Answer Sheet” in the Washington Post.

Please check out the “sequel” to this post, which includes a multivariate analysis, and, more importantly, two other posts (here and here), which present summaries and discussions of the actual evidence on the relationship between unions and test scores.

For years, some people have been determined to blame teachers’ unions for all that ails public education in America. This issue has been around a long time (see here and here), but, given the tenor of the current debate, it seems to bear rehashing.  According to this view, teachers unions negatively affect student achievement primarily through the mechanism of the collective bargaining agreement, or contract. These contracts are thought to include “harmful” provisions, such as seniority-based layoffs and unified salary schedules that give raises based on experience and education rather than performance.

But a fairly large proportion of public school teachers are not covered under legally-binding contracts.  In fact, there are ten states in which there are no legally binding K-12 teacher contracts at all (AL, AZ, AR, GA, LA, MS, NC, SC, TX, and VA). Districts in a few of these states have entered into what are called “meet and confer” agreements about salary, benefits, and other working conditions, but administrators have the right to break these agreements at will. For all intents and purposes, these states are free of many of the alleged “negative union effects.”

Here’s a simple proposition: If teacher union contracts are the problem, then we should expect to see higher achievement outcomes in the ten states where there are no binding teacher contracts.

So, let’s take a quick look at how states with no contracts compare with the states that have them. Read More »


This web site and the information contained herein are provided as a service to those who are interested in the work of the Albert Shanker Institute (ASI). ASI makes no warranties, either express or implied, concerning the information contained on or linked from shankerblog.org. The visitor uses the information provided herein at his/her own risk. ASI, its officers, board members, agents, and employees specifically disclaim any and all liability from damages which may result from the utilization of the information provided herein. The content in the shankerblog.org may not necessarily reflect the views or official policy positions of ASI or any related entity or organization.

Banner image adapted from 1975 photograph by Jennie Shanker, daughter of Albert Shanker.