Some of the best research out there is a product not of sophisticated statistical methods or complex research designs, but rather of painstaking manual data collection. A good example is a recent paper by Morgan Polikoff, Andrew McEachin, Stephani Wrabel and Matthew Duque, which was published in the latest issue of the journal Educational Researcher.
Polikoff and his colleagues performed a task that makes most of the rest of us cringe: They read and coded every one of the over 40 state applications for ESEA flexibility, or “waivers.” The end product is a simple but highly useful presentation of the measures states are using to identify “priority” (low-performing) and “focus” (schools “contributing to achievement gaps”) schools. The results are disturbing to anyone who believes that strong measurement should guide educational decisions.
There’s plenty of great data and discussion in the paper, but consider just one central finding: How states are identifying priority (i.e., lowest-performing) schools at the elementary level (the measures are of course a bit different for secondary schools). Read More »
I write often (probably too often) about the difference between measures of school performance and student performance, usually in the context of school rating systems. The basic idea is that schools cannot control the students they serve, and so absolute performance measures, such as proficiency rates, are telling you more about the students a school or district serves than how effective it is in improving outcomes (which is better-captured by growth-oriented indicators).
Recently, I was asked a simple question: Can a school with very high absolute performance levels ever actually be considered a “bad school?”
This is a good question. Read More »
In the Washington Post, Emma Brown reports on a behind the scenes decision about how to score last year’s new, more difficult tests in the District of Columbia Public Schools (DCPS) and the District’s charter schools.
To make a long story short, the choice faced by the Office of the State Superintendent of Education, or OSSE, which oversees testing in the District, was about how to convert test scores into proficiency rates. The first option, put simply, was to convert them such that the proficiency bar was more “aligned” with the Common Core, thus resulting in lower aggregate proficiency rates in math, compared with last year’s (in other states, such as Kentucky and New York, rates declined markedly). The second option was to score the tests while “holding constant” the difficulty of the questions, in order to facilitate comparisons of aggregate rates with those from previous years.
OSSE chose the latter option (according to some, in a manner that was insufficiently transparent). The end result was a modest increase in proficiency rates (which DC officials absurdly called “historic”). Read More »
A couple of weeks ago, Mike Petrilli of the Fordham Institute made the case that absolute proficiency rates should not be used as measures of school effectiveness, as they are heavily dependent on where students “start out” upon entry to the school. A few days later, Fordham president Checker Finn offered a defense of proficiency rates, noting that how much students know is substantively important, and associated with meaningful outcomes later in life.
They’re both correct. This is not a debate about whether proficiency rates are at all useful (by the way, I don’t read Petrilli as saying that). It’s about how they should be used and how they should not.
Let’s keep this simple. Here is a quick, highly simplified list of how I would recommend interpreting and using absolute proficiency rates, and how I would avoid using them. Read More »
The change in New York State tests, as well as their results, has inevitably resulted in a lot of discussion of how achievement gaps have changed over the past decade or so (and what they look like using the new tests). In many cases, the gaps, and trends in the gaps, are being presented in terms of proficiency rates.
I’d like to make one quick point, which is applicable both in New York and beyond: In general, it is not a good idea to present average student performance trends in terms of proficiency rates, rather than average scores, but it is an even worse idea to use proficiency rates to measure changes in achievement gaps.
Put simply, proficiency rates have a legitimate role to play in summarizing testing data, but the rates are very sensitive to the selection of cut score, and they provide a very limited, often distorted portrayal of student performance, particularly when viewed over time. There are many ways to illustrate this distortion, but among the more vivid is the fact, which we’ve shown in previous posts, that average scores and proficiency rates often move in different directions. In other words, at the school-level, it is frequently the case that the performance of the typical student — i.e., the average score — increases while the proficiency rate decreases, or vice-versa.
Unfortunately, the situation is even worse when looking achievement gaps. To illustrate this in a simple manner, let’s take a very quick look at NAEP data (4th grade math), broken down by state, between 2009 and 2011. Read More »
Last week, the results of New York’s new Common Core-aligned assessments were national news. For months, officials throughout the state, including New York City, have been preparing the public for the release of these data.
Their basic message was that the standards, and thus the tests based upon them, are more difficult, and they represent an attempt to truly gauge whether students are prepared for college and the labor market. The inevitable consequence of raising standards, officials have been explaining, is that fewer students will be “proficient” than in previous years (which was, of course, the case) – this does not mean that students are performing worse, only that they are being held to higher expectations, and that the skills and knowledge being assessed require a new, more expansive curriculum. Therefore, interpretation of the new results versus those in previous year must be extremely cautious, and educators, parents and the public should not jump to conclusions about what they mean.
For the most part, the main points of this public information campaign are correct. It would, however, be wonderful if similar caution were evident in the roll-out of testing results in past (and, more importantly, future) years. Read More »
As reported over at Education Week, the so-called “sequester” has claimed yet another victim: The National Assessment of Educational Progress, or NAEP. As most people who follow education know, this highly respected test, which is often called the “nation’s report card,” is a very useful means of assessing student performance, both in any given year and over time.
Two of the “main assessments” – i.e., those administered in math and reading every two years to fourth and eighth graders – get most of the attention in our public debate, and these remain largely untouched by the cuts. But, last May, the National Assessment Governing Board, which oversees NAEP, decided to eliminate the 2014 NAEP exams in civics, history and geography for all but 8th graders (the exams were previously administered in grades 4, 8 and 12). Now, in its most recent announcement, the Board has decided to cancel its plans to expand the sample for 12th graders (in math, reading, and science) to make it large enough to allow state-level results. In addition, the 4th and 8th grade science samples will be cut back, making subgroup breakdowns very difficult, and the science exam will no longer be administered to individual districts. Finally, the “long-term trend NAEP,” which has tracked student performance for 40 years, has been suspended for 2016. These are substantial cutbacks.
Although its results are frequently misinterpreted, NAEP is actually among the few standardized tests in the U.S. that receives rather wide support from all “sides” of the testing debate. And one cannot help but notice the fact that federal and state governments are currently making significant investments in new tests that are used for high-stakes purposes, whereas NAEP, the primary low-stakes assessment, is being scaled back. Read More »
Recent events in Indiana and Florida have resulted in a great deal of attention to the new school rating systems that over 25 states are using to evaluate the performance of schools, often attaching high-stakes consequences and rewards to the results. We have published reviews of several states’ systems here over the past couple of years (see our posts on the systems in Florida, Indiana, Colorado, New York City and Ohio, for example).
Virtually all of these systems rely heavily, if not entirely, on standardized test results, most commonly by combining two general types of test-based measures: absolute performance (or status) measures, or how highly students score on tests (e.g., proficiency rates); and growth measures, or how quickly students make progress (e.g., value-added scores). As discussed in previous posts, absolute performance measures are best seen as gauges of student performance, since they can’t account for the fact that students enter the schooling system at vastly different levels, whereas growth-oriented indicators can be viewed as more appropriate in attempts to gauge school performance per se, as they seek (albeit imperfectly) to control for students’ starting points (and other characteristics that are known to influence achievement levels) in order to isolate the impact of schools on testing performance.*
One interesting aspect of this distinction, which we have not discussed thoroughly here, is the idea/possibility that these two measures are “in conflict.” Let me explain what I mean by that. Read More »
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year. Read More »
In education today, data, particularly testing data, are everywhere. One of many potentially valuable uses of these data is helping teachers improve instruction – e.g., identifying students’ strengths and weaknesses, etc. Of course, this positive impact depends on the quality of the data and how it is presented to educators, among other factors. But there’s an even more basic requirement – teachers actually have to use it.
In an article published in the latest issue of the journal Education Finance and Policy, economist John Tyler takes a thorough look at teachers’ use of an online data system in a mid-sized urban district between 2008 and 2010. A few years prior, this district invested heavily in benchmark formative assessments (four per year) for students in grades 3-8, and an online “dashboard” system to go along with them. The assessments’ results are fed into the system in a timely manner. The basic idea is to give these teachers a continual stream of information, past and present, about their students’ performance.
Tyler uses weblogs from the district, as well as focus groups with teachers, to examine the extent and nature of teachers’ data usage (as well as a few other things, such as the relationship between usage and value-added). What he finds is not particularly heartening. In short, teachers didn’t really use the data. Read More »
** Reprinted here in the Washington Post
We’ve entered the time of year during which states and districts release their testing results. It’s fair to say that the two districts that get the most attention for their results are New York City and the District of Columbia Public Schools (DCPS), due in no small part to the fact that both enacted significant, high-profile policy changes over the past 5-10 years.
The manner in which both districts present annual test results is often misleading. Many of the issues, such as misinterpreting changes in proficiency rates as “test score growth” and chalking up all “gains” to recent policy changes, are quite common across the nation. These two districts are just among the more aggressive in doing so. That said, however, there’s one big difference between the test results they put out every year, and although I’ve noted it a few times before, I’d like to point it out once more: Unlike New York City/State, DCPS does not actually release test scores.
That’s right – despite the massive national attention to their “test scores,” DCPS – or, specifically, the Office of the State Superintendent for Education (OSSE) – hasn’t released a single test score in many years. Not one. Read More »
The results of the latest National Assessment of Educational Progress long term trend tests (NAEP-LTT) were released last week. The data compare the reading and math scores of 9-, 13- and 17-year olds at various points since the early 1970s. This is an important way to monitor how these age cohorts’ performance changes over the long term.
Overall, there is ongoing improvement in scores among 9- and 13-year olds, in reading and especially math, though the trend is inconsistent and increases are somewhat slow in recent years. The scores for 17-year olds, in contrast, are relatively flat.
These data, of course, are cross-sectional – i.e., they don’t follow students over time, but rather compare children in the three age groups with their predecessors from previous years. This means that changes in average scores might be driven by differences, observable or unobservable, between cohorts. One of the simple graphs in this report, which doesn’t present a single test score, illustrates that rather vividly. Read More »
The annual release of state testing data makes the news in every state, but Florida is one of those places where it is to some degree a national story.*
Well, it’s getting to be that time of year again. Last week, the state released its writing exam (FCAT 2.0 Writing) results for 2013 (as well as the math and reading results for third graders only). The Florida Department of Education (FLDOE) press release noted: “With significant gains in writing scores, Florida’s teachers and students continue to show that higher expectations and support at home and in the classroom enable every child to succeed.” This interpretation of the data was generally repeated without scrutiny in the press coverage of the results.
Putting aside the fact that the press release incorrectly calls the year-to-year changes “gains” (they are actually comparisons of two different groups of students; see here), the FLDOE’s presentation of the FCAT Writing results, though common, is, at best, incomplete and, at worst, misleading. Moreover, the important issues in this case are applicable in all states, and unusually easy to illustrate using the simple data released to the public.
Read More »
** Reprinted here in the Washington Post
Every year, a few major media outlets publish high school rankings. Most recently, Newsweek (in partnership with The Daily Beast) issued its annual list of the “nation’s best high schools.” Their general approach to this task seems quite defensible: To find the high schools that “best prepare students for college.”
The rankings are calculated using six measures: graduation rate (25 percent); college acceptance rate (25); AP/IB/AICE tests taken per student (25); average SAT/ACT score (10); average AP/IB/AICE score (10); and the percentage of students enrolled in at least one AP/IB/AICE course (5).
Needless to say, even the most rigorous, sophisticated measures of school performance will be imperfect at best, and the methods behind these lists have been subject to endless scrutiny. However, let’s take a quick look at three potentially problematic issues with the Newsweek rankings, how the results might be interpreted, and how the system compares with that published by U.S. News and World Report. Read More »
Earlier this week, New Jersey Governor Chris Christie announced that the state will assume control over Camden City School District. Camden will be the fourth NJ district to undergo takeover, though this is the first time that the state will be removing control from an elected local school board, which will now serve in an advisory role (and have three additional members appointed by the Governor). Over the next few weeks, NJ officials will choose a new superintendent, and begin to revamp evaluations, curricula and other core policies.
Accompanying the announcement, the Governor’s office released a two-page “fact sheet,” much of which is devoted to justifying this move to the public.
Before discussing it, let’s be clear about something - it may indeed be the case that Camden schools are so critically low-performing and/or dysfunctional as to warrant drastic intervention. Moreover, it’s at least possible that state takeover is the appropriate type of intervention to help these schools improve (though the research on this latter score is, to be charitable, undeveloped).
That said, the “fact sheet” presents relatively little valid evidence regarding the academic performance of Camden schools. Given the sheer magnitude of any takeover decision, it is crucial for the state to demonstrate publicly that they have left no stone unturned by presenting a case that is as comprehensive and compelling as possible. However, the discrepancy between that high bar and NJ’s evidence, at least that pertaining to academic outcomes, is more than a little disconcerting.
Read More »