As reported over at Education Week, the so-called “sequester” has claimed yet another victim: The National Assessment of Educational Progress, or NAEP. As most people who follow education know, this highly respected test, which is often called the “nation’s report card,” is a very useful means of assessing student performance, both in any given year and over time.
Two of the “main assessments” – i.e., those administered in math and reading every two years to fourth and eighth graders – get most of the attention in our public debate, and these remain largely untouched by the cuts. But, last May, the National Assessment Governing Board, which oversees NAEP, decided to eliminate the 2014 NAEP exams in civics, history and geography for all but 8th graders (the exams were previously administered in grades 4, 8 and 12). Now, in its most recent announcement, the Board has decided to cancel its plans to expand the sample for 12th graders (in math, reading, and science) to make it large enough to allow state-level results. In addition, the 4th and 8th grade science samples will be cut back, making subgroup breakdowns very difficult, and the science exam will no longer be administered to individual districts. Finally, the “long-term trend NAEP,” which has tracked student performance for 40 years, has been suspended for 2016. These are substantial cutbacks.
Although its results are frequently misinterpreted, NAEP is actually among the few standardized tests in the U.S. that receives rather wide support from all “sides” of the testing debate. And one cannot help but notice the fact that federal and state governments are currently making significant investments in new tests that are used for high-stakes purposes, whereas NAEP, the primary low-stakes assessment, is being scaled back. Read More »
Recent events in Indiana and Florida have resulted in a great deal of attention to the new school rating systems that over 25 states are using to evaluate the performance of schools, often attaching high-stakes consequences and rewards to the results. We have published reviews of several states’ systems here over the past couple of years (see our posts on the systems in Florida, Indiana, Colorado, New York City and Ohio, for example).
Virtually all of these systems rely heavily, if not entirely, on standardized test results, most commonly by combining two general types of test-based measures: absolute performance (or status) measures, or how highly students score on tests (e.g., proficiency rates); and growth measures, or how quickly students make progress (e.g., value-added scores). As discussed in previous posts, absolute performance measures are best seen as gauges of student performance, since they can’t account for the fact that students enter the schooling system at vastly different levels, whereas growth-oriented indicators can be viewed as more appropriate in attempts to gauge school performance per se, as they seek (albeit imperfectly) to control for students’ starting points (and other characteristics that are known to influence achievement levels) in order to isolate the impact of schools on testing performance.*
One interesting aspect of this distinction, which we have not discussed thoroughly here, is the idea/possibility that these two measures are “in conflict.” Let me explain what I mean by that. Read More »
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year. Read More »
A few years ago, the U.S. Department of Education (USED) launched the School Improvement Grant (SIG) program, which is designed to award grants to “persistently low-achieving schools” to carry out one of four different intervention models.
States vary in how SIG-eligible schools are selected, but USED guidelines require the use of three basic types of indicators: absolute performance level (e.g., proficiency rates); whether schools were “making progress” (e.g., rate changes); and, for high schools, graduation rates (specifically, whether the rate is under 60 percent). Two of these measures – absolute performance and graduation rates – tell you relatively little about the actual performance of schools, as they depend heavily on the characteristics (e.g., income) of students/families in the neighborhood served by a given school. It was therefore pretty much baked into the rules that the schools awarded SIGs have tended to exhibit certain characteristics, such as higher poverty rates.
Over 800 schools were awarded “Tier 1” or “Tier 2” grants for the 2010-11 school year (“SIG Cohort One”). Let’s take a quick look at a couple of key characteristics of these schools, using data from USED and the National Center for Education Statistics. Read More »
One of the (many) factors that might help explain — or at least be associated with — the wide variation in charter schools’ test-based impacts is market share. That is, the proportion of students that charters serve in a given state or district. There are a few reasons why market share might matter.
For example, charter schools compete for limited resources, including private donations and labor (teachers), and fewer competitors means more resources. In addition, there are a handful of models that seem to get fairly consistent results no matter where they operate, and authorizers who are selective and only allow “proven” operators to open up shop might increase quality (at the expense of quantity). There may be a benefit to very slow, selective expansion (and smaller market share is a symptom of that deliberate approach).
One way to get a sense of whether market share might matter is simply to check the association between measured charter performance and coverage. It might therefore be interesting, albeit exceedingly simple, to use the recently-released CREDO analysis, which provides state-level estimates based on a common analytical approach (though different tests, etc.), for this purpose. Read More »
As noted in a nice little post over at Greater Greater Washington’s education blog, the District of Columbia Office of the State Superintendent of Education (OSSE) recently started releasing growth model scores for DC’s charter and regular public schools. These models, in a nutshell, assess schools by following their students over time and gauging their testing progress relative to similar students (they can also be used for individual teachers, but DCPS uses a different model in its teacher evaluations).
In my opinion, producing these estimates and making them available publicly is a good idea, and definitely preferable to the district’s previous reliance on changes in proficiency, which are truly awful measures (see here for more on this). It’s also, however, important to note that the model chosen by OSSE – a “median growth percentile,” or MGP model, produces estimates that have been shown to be at least somewhat more heavily associated with student characteristics than other types of models, such as value-added models proper. This does not necessarily mean the growth percentile models are “inaccurate” – there are good reasons, such as resources and more difficulty with teacher recruitment/retention, to believe that schools serving poorer students might be less effective, on average, and it’s tough to separate “real” effects from bias in the models.
That said, let’s take a quick look at this relationship using the DC MGP scores from 2011, with poverty data from the National Center for Education Statistics. Read More »
In education today, data, particularly testing data, are everywhere. One of many potentially valuable uses of these data is helping teachers improve instruction – e.g., identifying students’ strengths and weaknesses, etc. Of course, this positive impact depends on the quality of the data and how it is presented to educators, among other factors. But there’s an even more basic requirement – teachers actually have to use it.
In an article published in the latest issue of the journal Education Finance and Policy, economist John Tyler takes a thorough look at teachers’ use of an online data system in a mid-sized urban district between 2008 and 2010. A few years prior, this district invested heavily in benchmark formative assessments (four per year) for students in grades 3-8, and an online “dashboard” system to go along with them. The assessments’ results are fed into the system in a timely manner. The basic idea is to give these teachers a continual stream of information, past and present, about their students’ performance.
Tyler uses weblogs from the district, as well as focus groups with teachers, to examine the extent and nature of teachers’ data usage (as well as a few other things, such as the relationship between usage and value-added). What he finds is not particularly heartening. In short, teachers didn’t really use the data. Read More »
** Reprinted here in the Washington Post
We’ve entered the time of year during which states and districts release their testing results. It’s fair to say that the two districts that get the most attention for their results are New York City and the District of Columbia Public Schools (DCPS), due in no small part to the fact that both enacted significant, high-profile policy changes over the past 5-10 years.
The manner in which both districts present annual test results is often misleading. Many of the issues, such as misinterpreting changes in proficiency rates as “test score growth” and chalking up all “gains” to recent policy changes, are quite common across the nation. These two districts are just among the more aggressive in doing so. That said, however, there’s one big difference between the test results they put out every year, and although I’ve noted it a few times before, I’d like to point it out once more: Unlike New York City/State, DCPS does not actually release test scores.
That’s right – despite the massive national attention to their “test scores,” DCPS – or, specifically, the Office of the State Superintendent for Education (OSSE) – hasn’t released a single test score in many years. Not one. Read More »
The results of the latest National Assessment of Educational Progress long term trend tests (NAEP-LTT) were released last week. The data compare the reading and math scores of 9-, 13- and 17-year olds at various points since the early 1970s. This is an important way to monitor how these age cohorts’ performance changes over the long term.
Overall, there is ongoing improvement in scores among 9- and 13-year olds, in reading and especially math, though the trend is inconsistent and increases are somewhat slow in recent years. The scores for 17-year olds, in contrast, are relatively flat.
These data, of course, are cross-sectional – i.e., they don’t follow students over time, but rather compare children in the three age groups with their predecessors from previous years. This means that changes in average scores might be driven by differences, observable or unobservable, between cohorts. One of the simple graphs in this report, which doesn’t present a single test score, illustrates that rather vividly. Read More »
A new report from CREDO on charter schools’ test-based performance received a great deal of attention, and rightfully so – it includes 27 states, which together serve 95 percent of the nation’s charter students.
The analysis as a whole, like its predecessor, is a great contribution. Its sheer scope, as well as a few specific parts (examination of trends), are new and important. And most of the findings serve to reaffirm the core conclusions of the existing research on charters’ estimated test-based effects. Such an interpretation may not be particularly satisfying to charter supporters and opponents looking for new ammunition, but the fact that this national analysis will not settle anything in the contentious debate about charter schools once again suggests the need to start asking a different set of questions.
Along these lines, as well as others, there are a few points worth discussing quickly. Read More »
I tend to comment on newly-released teacher surveys, primarily because I think the surveys are important and interesting, but also because teachers’ opinions are sometimes misrepresented in our debate about education reform. So, last year, I wrote about a report by the advocacy organization Teach Plus, in which they presented results from a survey focused on identifying differences in attitudes by teacher experience (an important topic). One of my major comments was that the survey was “non-scientific” – it was voluntary, and distributed via social media, e-mail, etc. This means that the results cannot be used to draw strong conclusions about the population of teachers as a whole, since those who responded might be different from those that did not.
I also noted that, even if the sample was not representative, this did not preclude finding useful information in the results. That is, my primary criticism was that the authors did not even mention the issue, or make an effort to compare the characteristics of their survey respondents with those of teachers in general (which can give a sense of the differences between the sample and the population).
Well, they have just issued a new report, which also presents the results of a teacher survey, this time focused on teachers’ attitudes toward the evaluation system used in Memphis, Tennessee (called the “Teacher Effectiveness Measure,” or TEM). In this case, not only do they raise the issue of representativeness, but they also present a little bit of data comparing their respondents to the population (i.e., all Memphis teachers who were evaluated under TEM). Read More »
U.S. Secretary of Education Arne Duncan recently announced that states will be given the option to postpone using the results of their new teacher evaluations for high-stakes decisions during the phase-in of the new Common Core-aligned assessments. The reaction from some advocates was swift condemnation – calling the decision little more than a “delay” and a “victory for the status quo.”
We hear these kinds of arguments frequently in education. The idea is that change must be as rapid as possible, because “kids can’t wait.” I can understand and appreciate the urgency underlying these sentiments. Policy change in education (as in other arenas) can sometimes be painfully slow, and what seem likes small roadblocks can turn out to be massive, permanent obstacles.
I will not repeat my views regarding the substance of Secretary Duncan’s decision – see this op-ed by Morgan Polikoff and myself. I would, however, like to make one very quick point about these “we need change right now because students can’t wait” arguments: Sometimes, what is called “delay” is actually better described as good policy making, and kids can wait for good policy making. Read More »
In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward. Read More »
If you ask a charter school supporter why charter schools tend to exhibit inconsistency in their measured test-based impact, there’s a good chance they’ll talk about authorizing. That is, they will tell you that the quality of authorization laws and practices — the guidelines by which charters are granted, renewed and revoked — drives much and perhaps even most of the variation in the performance of charters relative to comparable district schools, and that strengthening these laws is the key to improving performance.
Accordingly, a recently-announced campaign by the National Association of Charter School Authorizers aims to step up the rate at which charter authorizers close “low-performing schools” and are more selective in allowing new schools to open. In addition, a recent CREDO study found (among other things) that charter middle and high schools’ performance during their first few years is more predictive of future performance than many people may have thought, thus lending support to the idea of opening and closing schools as an improvement strategy.
Below are a few quick points about the authorization issue, which lead up to a question about the relationship between selectivity and charter sector growth. Read More »
A correlation between two variables measures the strength of the linear relationship between them. Put simply, two variables are positively correlated to the extent that individuals with relatively high or low values on one measure tend to have relatively high or low values on the other, and negatively correlated to the extent that high values on one measure are associated with low values on the other.
Correlations are used frequently in the debate about teacher evaluations. For example, researchers might assess the relationship between classroom observations and value-added measures, which is one of the simpler ways to gather information about the “validity” of one or the other – i.e., whether it is telling us what we want to know. In this case, if teachers with higher observation scores also tend to get higher value-added scores, this might be interpreted as a sign that both are capturing, at least to some extent, “true” teacher performance.
Yet there seems to be a tendency among some advocates and policy makers to get a little overeager when interpreting correlations. Read More »