Last week, the results of New York’s new Common Core-aligned assessments were national news. For months, officials throughout the state, including New York City, have been preparing the public for the release of these data.
Their basic message was that the standards, and thus the tests based upon them, are more difficult, and they represent an attempt to truly gauge whether students are prepared for college and the labor market. The inevitable consequence of raising standards, officials have been explaining, is that fewer students will be “proficient” than in previous years (which was, of course, the case) – this does not mean that students are performing worse, only that they are being held to higher expectations, and that the skills and knowledge being assessed require a new, more expansive curriculum. Therefore, interpretation of the new results versus those in previous year must be extremely cautious, and educators, parents and the public should not jump to conclusions about what they mean.
For the most part, the main points of this public information campaign are correct. It would, however, be wonderful if similar caution were evident in the roll-out of testing results in past (and, more importantly, future) years. Read More »
As reported over at Education Week, the so-called “sequester” has claimed yet another victim: The National Assessment of Educational Progress, or NAEP. As most people who follow education know, this highly respected test, which is often called the “nation’s report card,” is a very useful means of assessing student performance, both in any given year and over time.
Two of the “main assessments” – i.e., those administered in math and reading every two years to fourth and eighth graders – get most of the attention in our public debate, and these remain largely untouched by the cuts. But, last May, the National Assessment Governing Board, which oversees NAEP, decided to eliminate the 2014 NAEP exams in civics, history and geography for all but 8th graders (the exams were previously administered in grades 4, 8 and 12). Now, in its most recent announcement, the Board has decided to cancel its plans to expand the sample for 12th graders (in math, reading, and science) to make it large enough to allow state-level results. In addition, the 4th and 8th grade science samples will be cut back, making subgroup breakdowns very difficult, and the science exam will no longer be administered to individual districts. Finally, the “long-term trend NAEP,” which has tracked student performance for 40 years, has been suspended for 2016. These are substantial cutbacks.
Although its results are frequently misinterpreted, NAEP is actually among the few standardized tests in the U.S. that receives rather wide support from all “sides” of the testing debate. And one cannot help but notice the fact that federal and state governments are currently making significant investments in new tests that are used for high-stakes purposes, whereas NAEP, the primary low-stakes assessment, is being scaled back. Read More »
Recent events in Indiana and Florida have resulted in a great deal of attention to the new school rating systems that over 25 states are using to evaluate the performance of schools, often attaching high-stakes consequences and rewards to the results. We have published reviews of several states’ systems here over the past couple of years (see our posts on the systems in Florida, Indiana, Colorado, New York City and Ohio, for example).
Virtually all of these systems rely heavily, if not entirely, on standardized test results, most commonly by combining two general types of test-based measures: absolute performance (or status) measures, or how highly students score on tests (e.g., proficiency rates); and growth measures, or how quickly students make progress (e.g., value-added scores). As discussed in previous posts, absolute performance measures are best seen as gauges of student performance, since they can’t account for the fact that students enter the schooling system at vastly different levels, whereas growth-oriented indicators can be viewed as more appropriate in attempts to gauge school performance per se, as they seek (albeit imperfectly) to control for students’ starting points (and other characteristics that are known to influence achievement levels) in order to isolate the impact of schools on testing performance.*
One interesting aspect of this distinction, which we have not discussed thoroughly here, is the idea/possibility that these two measures are “in conflict.” Let me explain what I mean by that. Read More »
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year. Read More »
A few years ago, the U.S. Department of Education (USED) launched the School Improvement Grant (SIG) program, which is designed to award grants to “persistently low-achieving schools” to carry out one of four different intervention models.
States vary in how SIG-eligible schools are selected, but USED guidelines require the use of three basic types of indicators: absolute performance level (e.g., proficiency rates); whether schools were “making progress” (e.g., rate changes); and, for high schools, graduation rates (specifically, whether the rate is under 60 percent). Two of these measures – absolute performance and graduation rates – tell you relatively little about the actual performance of schools, as they depend heavily on the characteristics (e.g., income) of students/families in the neighborhood served by a given school. It was therefore pretty much baked into the rules that the schools awarded SIGs have tended to exhibit certain characteristics, such as higher poverty rates.
Over 800 schools were awarded “Tier 1” or “Tier 2” grants for the 2010-11 school year (“SIG Cohort One”). Let’s take a quick look at a couple of key characteristics of these schools, using data from USED and the National Center for Education Statistics. Read More »
One of the (many) factors that might help explain — or at least be associated with — the wide variation in charter schools’ test-based impacts is market share. That is, the proportion of students that charters serve in a given state or district. There are a few reasons why market share might matter.
For example, charter schools compete for limited resources, including private donations and labor (teachers), and fewer competitors means more resources. In addition, there are a handful of models that seem to get fairly consistent results no matter where they operate, and authorizers who are selective and only allow “proven” operators to open up shop might increase quality (at the expense of quantity). There may be a benefit to very slow, selective expansion (and smaller market share is a symptom of that deliberate approach).
One way to get a sense of whether market share might matter is simply to check the association between measured charter performance and coverage. It might therefore be interesting, albeit exceedingly simple, to use the recently-released CREDO analysis, which provides state-level estimates based on a common analytical approach (though different tests, etc.), for this purpose. Read More »
Our guest author today is William P. Jones, history professor at the University of Wisconsin, Madison, and author of The March on Washington: Jobs, Freedom and the Forgotten History of Civil Rights (W.W. Norton & Co., 2013)
If Richard Parrish had his way, the March on Washington for Jobs and Freedom would have occurred in 1941 rather than 1963. As President of the Federation of Colored College Students in New York City, the 25-year old student was a key organizer of the mass demonstration that union leader A. Philip Randolph called to protest discrimination in the armed forces and the defense industries during the Second World War. He was furious, therefore, when Randolph cancelled the march in exchange for an executive order, issued by President Franklin D. Roosevelt, prohibiting defense contractors from discriminating against workers on the basis of their race, color religion, or national origin. Parrish agreed that this was a major victory, but pointed out that it would expire when the war ended and do nothing to address discrimination in the armed forces. Accusing Randolph of acting without consulting the students and other groups that supported the mobilization, he insisted that the March on Washington be rescheduled immediately.
Randolph refused—accusing Parrish and other young militants of being “more interested in the drama and pyrotechnics of the march than the basic and main issues of putting Negroes to work”—but the disagreement did not prevent the two black radicals from working closely together to build a powerful alliance between the civil rights and labor movements in the postwar decades. After completing his bachelor’s degree in 1947, Parrish worked as a teacher and union leader until his retirement in 1976. He also worked closely with Randolph to open jobs and leadership positions for black workers in organized labor. When Randolph decided to reorganize the March on Washington in 1963, Dick Parrish was one of the first people he turned to for support. Read More »
As noted in a nice little post over at Greater Greater Washington’s education blog, the District of Columbia Office of the State Superintendent of Education (OSSE) recently started releasing growth model scores for DC’s charter and regular public schools. These models, in a nutshell, assess schools by following their students over time and gauging their testing progress relative to similar students (they can also be used for individual teachers, but DCPS uses a different model in its teacher evaluations).
In my opinion, producing these estimates and making them available publicly is a good idea, and definitely preferable to the district’s previous reliance on changes in proficiency, which are truly awful measures (see here for more on this). It’s also, however, important to note that the model chosen by OSSE – a “median growth percentile,” or MGP model, produces estimates that have been shown to be at least somewhat more heavily associated with student characteristics than other types of models, such as value-added models proper. This does not necessarily mean the growth percentile models are “inaccurate” – there are good reasons, such as resources and more difficulty with teacher recruitment/retention, to believe that schools serving poorer students might be less effective, on average, and it’s tough to separate “real” effects from bias in the models.
That said, let’s take a quick look at this relationship using the DC MGP scores from 2011, with poverty data from the National Center for Education Statistics. Read More »
In education today, data, particularly testing data, are everywhere. One of many potentially valuable uses of these data is helping teachers improve instruction – e.g., identifying students’ strengths and weaknesses, etc. Of course, this positive impact depends on the quality of the data and how it is presented to educators, among other factors. But there’s an even more basic requirement – teachers actually have to use it.
In an article published in the latest issue of the journal Education Finance and Policy, economist John Tyler takes a thorough look at teachers’ use of an online data system in a mid-sized urban district between 2008 and 2010. A few years prior, this district invested heavily in benchmark formative assessments (four per year) for students in grades 3-8, and an online “dashboard” system to go along with them. The assessments’ results are fed into the system in a timely manner. The basic idea is to give these teachers a continual stream of information, past and present, about their students’ performance.
Tyler uses weblogs from the district, as well as focus groups with teachers, to examine the extent and nature of teachers’ data usage (as well as a few other things, such as the relationship between usage and value-added). What he finds is not particularly heartening. In short, teachers didn’t really use the data. Read More »
** Reprinted here in the Washington Post
We’ve entered the time of year during which states and districts release their testing results. It’s fair to say that the two districts that get the most attention for their results are New York City and the District of Columbia Public Schools (DCPS), due in no small part to the fact that both enacted significant, high-profile policy changes over the past 5-10 years.
The manner in which both districts present annual test results is often misleading. Many of the issues, such as misinterpreting changes in proficiency rates as “test score growth” and chalking up all “gains” to recent policy changes, are quite common across the nation. These two districts are just among the more aggressive in doing so. That said, however, there’s one big difference between the test results they put out every year, and although I’ve noted it a few times before, I’d like to point it out once more: Unlike New York City/State, DCPS does not actually release test scores.
That’s right – despite the massive national attention to their “test scores,” DCPS – or, specifically, the Office of the State Superintendent for Education (OSSE) – hasn’t released a single test score in many years. Not one. Read More »
The results of the latest National Assessment of Educational Progress long term trend tests (NAEP-LTT) were released last week. The data compare the reading and math scores of 9-, 13- and 17-year olds at various points since the early 1970s. This is an important way to monitor how these age cohorts’ performance changes over the long term.
Overall, there is ongoing improvement in scores among 9- and 13-year olds, in reading and especially math, though the trend is inconsistent and increases are somewhat slow in recent years. The scores for 17-year olds, in contrast, are relatively flat.
These data, of course, are cross-sectional – i.e., they don’t follow students over time, but rather compare children in the three age groups with their predecessors from previous years. This means that changes in average scores might be driven by differences, observable or unobservable, between cohorts. One of the simple graphs in this report, which doesn’t present a single test score, illustrates that rather vividly. Read More »
A new report from CREDO on charter schools’ test-based performance received a great deal of attention, and rightfully so – it includes 27 states, which together serve 95 percent of the nation’s charter students.
The analysis as a whole, like its predecessor, is a great contribution. Its sheer scope, as well as a few specific parts (examination of trends), are new and important. And most of the findings serve to reaffirm the core conclusions of the existing research on charters’ estimated test-based effects. Such an interpretation may not be particularly satisfying to charter supporters and opponents looking for new ammunition, but the fact that this national analysis will not settle anything in the contentious debate about charter schools once again suggests the need to start asking a different set of questions.
Along these lines, as well as others, there are a few points worth discussing quickly. Read More »
I tend to comment on newly-released teacher surveys, primarily because I think the surveys are important and interesting, but also because teachers’ opinions are sometimes misrepresented in our debate about education reform. So, last year, I wrote about a report by the advocacy organization Teach Plus, in which they presented results from a survey focused on identifying differences in attitudes by teacher experience (an important topic). One of my major comments was that the survey was “non-scientific” – it was voluntary, and distributed via social media, e-mail, etc. This means that the results cannot be used to draw strong conclusions about the population of teachers as a whole, since those who responded might be different from those that did not.
I also noted that, even if the sample was not representative, this did not preclude finding useful information in the results. That is, my primary criticism was that the authors did not even mention the issue, or make an effort to compare the characteristics of their survey respondents with those of teachers in general (which can give a sense of the differences between the sample and the population).
Well, they have just issued a new report, which also presents the results of a teacher survey, this time focused on teachers’ attitudes toward the evaluation system used in Memphis, Tennessee (called the “Teacher Effectiveness Measure,” or TEM). In this case, not only do they raise the issue of representativeness, but they also present a little bit of data comparing their respondents to the population (i.e., all Memphis teachers who were evaluated under TEM). Read More »
U.S. Secretary of Education Arne Duncan recently announced that states will be given the option to postpone using the results of their new teacher evaluations for high-stakes decisions during the phase-in of the new Common Core-aligned assessments. The reaction from some advocates was swift condemnation – calling the decision little more than a “delay” and a “victory for the status quo.”
We hear these kinds of arguments frequently in education. The idea is that change must be as rapid as possible, because “kids can’t wait.” I can understand and appreciate the urgency underlying these sentiments. Policy change in education (as in other arenas) can sometimes be painfully slow, and what seem likes small roadblocks can turn out to be massive, permanent obstacles.
I will not repeat my views regarding the substance of Secretary Duncan’s decision – see this op-ed by Morgan Polikoff and myself. I would, however, like to make one very quick point about these “we need change right now because students can’t wait” arguments: Sometimes, what is called “delay” is actually better described as good policy making, and kids can wait for good policy making. Read More »
In a previous post, I discussed the initial results from new teacher evaluations in several states, and the fact that states with implausibly large proportions of teachers in the higher categories face a difficult situation – achieving greater differentiation while improving the quality and legitimacy of their systems.
I also expressed concern that pre-existing beliefs about the “proper” distribution of teacher ratings — in particular, how many teachers should receive the lowest ratings — might inappropriately influence the process of adjusting the systems based on the first round of results. In other words, there is a risk that states and districts will change their systems in a crude manner that lowers ratings simply for the sake of lowering ratings.
Such concerns of course imply a more general question: How should we assess the results of new evaluation systems? That’s a complicated issue, and these are largely uncharted waters. Nevertheless, I’d like to offer a few thoughts as states and districts move forward. Read More »