In a previous post, I discussed simple data from the District of Columbia Public Schools (DCPS) on teacher turnover in high- versus lower-poverty schools. In that same report, which was issued by the D.C. Auditor and included, among other things, descriptive analyses by the excellent researchers from Mathematica, there is another very interesting table showing the evaluation ratings of DC teachers in 2010-11 by school poverty (and, indeed, DC officials deserve credit for making these kinds of data available to the public, as this is not the case in many other states).
DCPS’ well-known evaluation system (called IMPACT) varies between teachers in tested versus non-tested grades, but the final ratings are a weighted average of several components, including: the teaching and learning framework (classroom observations); commitment to the school community (attendance at meetings, mentoring, PD, etc.); schoolwide value-added; teacher-assessed student achievement data (local assessments); core professionalism (absences, etc.); and individual value-added (tested teachers only).
The table I want to discuss is on page 43 of the Auditor’s report, and it shows average IMPACT scores for each component and overall for teachers in high-poverty schools (80-100 percent free/reduced-price lunch), medium poverty schools (60-80 percent) and low-poverty schools (less than 60 percent). It is pasted below. Read More »
The Washington Post reports on an issue that we have discussed here on many occasions: The incompleteness of the testing results released annually by the District of Columbia Public Schools (DCPS), or, more accurately, the Office of the State Superintendent of Education (OSSE), which is responsible for testing in DC schools.
Here’s the quick backstory: For the past 7-8 years or so, DCPS/OSSE have not released a single test score for the state assessment (the DC-CAS). Instead, they have released only the percentage of students whose scores meet the designated cutoff points for the NCLB-style categories of below basic, basic, proficient and advanced. I will not reiterate all of the problems with these cutpoint-based rates and how they serve to distort the underlying data, except to say that they are by themselves among the worst ways to present these data, and there is absolutely no reason why states and districts should not release both rates and average scale scores.
The Post reports, however, that one organization — the Broader, Bolder Approach to Education — was able to obtain the actual scale score data (by subgroup and grade) for 2010-2013, and that this group published a memo-style report alleging that DCPS’ public presentation of their testing results over the past few years has been misleading. I had a mixed reaction to this report and the accompanying story. Read More »
At the end of February, the District of Columbia Council’s Education Committee held its annual hearing on the performance of the District’s Public Schools (DCPS). The hearing (full video is available here) lasted over four hours, and included discussion on a variety of topics, but there was, inevitably, a block of time devoted to the discussion of DCPS testing results (and these questions were the focus of the news coverage).
These exchanges between Council members and DCPS Chancellor Kaya Henderson focused particularly on the low-stakes Trial Urban District Assessment (TUDA).* Though it was all very constructive and not even remotely hostile, it’s fair to say that Ms. Henderson was grilled quite a bit (as is often the case at these kinds of hearings). Unfortunately, the arguments from both sides of the dais were fraught with the typical misinterpretations of TUDA, and I could not get past how tragic it is to see legislators question the superintendent of a large urban school district based on a misinterpretation of what the data mean – and to hear that superintendent respond based on the same flawed premises.
But what I really kept thinking — as I have before in similar contexts — was how effective Chancellor Henderson could have been in answering the Council’s questions had she chosen to interpret the data properly (and I still hold out hope that this will become the norm some day). So, let’s take a quick look at a few major arguments that were raised during the hearing, and how they might have been answered. Read More »
When looking at changes in testing results between years, many people are (justifiably) interested in comparing those changes for different student subgroups, such as those defined by race/ethnicity or income (subsidized lunch eligibility). The basic idea is to see whether increases are shared between traditionally advantaged and disadvantaged groups (and, often, to monitor achievement gaps).
Sometimes, people take this a step further by using the subgroup breakdowns as a crude check on whether cross-sectional score changes are due to changes in the sample of students taking the test. The logic is as follows: If the increases are found when comparing advantaged and more disadvantaged cohorts, then an overall increase cannot be attributed to a change in the backgrounds of students taking the test, as the subgroups exhibited the same pattern. (For reasons discussed here many times before, this is a severely limited approach.)
Whether testing data are cross-sectional or longitudinal, these subgroup breakdowns are certainly important and necessary, but it’s wise to keep in mind that standard variables, such as eligibility for free and reduced-price lunches (FRL), are imperfect proxies for student background (actually, FRL rates aren’t even such a great proxy for income). In fact, one might reach different conclusions depending on which variables are chosen. To illustrate this, let’s take a look at results from the Trial Urban District Assessment (TUDA) for the District of Columbia Public Schools between 2011 and 2013, in which there was a large overall score change that received a great deal of media attention, and break the changes down by different characteristics.
Read More »
The recent release of the National Assessment of Educational Progress (NAEP) and the companion Trial Urban District Assessment (TUDA) was predictably exploited by advocates to argue for their policy preferences. This is a blatant misuse of the data for many reasons that I have discussed here many times before, and I will not repeat them.
I do, however, want to very quickly illustrate the emptiness of this pseudo-empirical approach – finding cross-sectional cohort increases in states/districts that have recently acted policies you support, and then using the increases as evidence that the policies “work.” For example, the recent TUDA results for the District of Columbia Public Schools (DCPS), where scores increased in all four grade/subject combinations, were immediately seized upon supporters of the reforms that have been enacted by DCPS as clear-cut evidence of the policy triumph. The celebrators included the usual advocates, but also DCPS Chancellor Kaya Henderson and the U.S. Secretary of Education Arne Duncan (there was even a brief mention by President Obama in his State of The Union speech).
My immediate reaction to this bad evidence was simple (though perhaps slightly juvenile) – find a district that had similar results under a different policy environment. It was, as usual, pretty easy: Los Angeles Unified School District (LAUSD). Read More »
The Washington Post reports that parents and alumni of D.C.’s Dunbar High School have quietly been putting together a proposal to revitalize what the article calls “one of the District’s worst performing schools.”
Those behind the proposal are not ready to speak about it publicly, and details are still very thin, but the Post article reports that it calls for greater flexibility in hiring, spending and other core policies. Moreover, the core of the plan – or at least its most drastic element – is to make Dunbar a selective high school, to which students must apply and be accepted, presumably based on testing results and other performance indicators (the story characterizes the proposal as a whole with the term “autonomy”). I will offer no opinion as to whether this conversion, if it is indeed submitted to the District for consideration, is a good idea. That will be up to administrators, teachers, parents, and other stakeholders.
I am, however, a bit struck by two interrelated aspects of this story. The first is the unquestioned characterization of Dunbar as a “low performing” or “struggling” school. This fateful label appears to be based mostly on the school’s proficiency rates, which are indeed dismally low – 20 percent in math and 29 percent in reading. Read More »
Teacher turnover – the rates at which teachers leave the profession and switch schools – is obviously a very important outcome in education. Although not all turnover is necessarily a “bad thing” – some teachers simply aren’t cut out for the job and leave voluntarily (or are fired) – unusually high turnover means that schools must replace large proportions of their workforces on an annual basis. This can have serious implications not only for the characteristics (e.g., experience) of schools’ teachers, but also for schools’ costs, cohesion and professional cultures.
According to the most recent national data (which are a few years old), annual public school teacher turnover is around 16 percent, of which roughly half leave the profession (“leavers”), and half switch schools (“movers”). Both categories are equally important from the perspective of individual schools, since they must replace teachers regardless of where they go. In some subsets of schools and among certain groups of teachers, however, turnover is considerably higher. For instance, among teachers with between 1-3 years of experience, turnover is almost 23 percent. Contrary to popular opinion, though, the relationship between school poverty (i.e., free/reduced-price lunch rates) and turnover isn’t straightforward, at least at the national level. Although schools serving larger proportions of lower-income students have a larger percentage of “movers” every year, they have a considerably lower proportion of “leavers” (in part due to retirement).
This national trend, of course, masks considerable inter-district variation. One example is the District of Columbia Public Schools (DCPS). Read More »
The recently released study of IMPACT, the teacher evaluation system in the District of Columbia Public Schools (DCPS), has garnered a great deal of attention over the past couple of months (see our post here).
Much of the commentary from the system’s opponents was predictably (and unfairly) dismissive, but I’d like to quickly discuss the reaction from supporters. Some took the opportunity to make grand proclamations about how “IMPACT is working,” and there was a lot of back and forth about the need to ensure that various states’ evaluations are as “rigorous” as IMPACT (as well as skepticism as to whether this is the case).
The claim that this study shows that “IMPACT is working” is somewhat misleading, and the idea that states should now rush to replicate IMPACT is misguided. It also misses the important points about the study and what we can learn from its results. Read More »
Having taken a look at several states’ school rating systems (see our posts on the systems in IN, OH, FL and CO), I thought it might be interesting to examine a system used by a group of charter schools – starting with the system used by charters in the District of Columbia. This is the third year the DC charter school board has released the ratings.
For elementary and middle schools (upon which I will focus in this post*), the DC Performance Management Framework (PMF) is a weighted index composed of: 40 percent absolute performance; 40 percent growth; and 20 percent what they call “leading indicators” (a more detailed description of this formula can be found in the second footnote).** The index scores are then sorted into one of three tiers, with Tier 1 being the highest, and Tier 3 the lowest.
So, these particular ratings weight absolute performance – i.e., how highly students score on tests – a bit less heavily than do most states that have devised their own systems, and they grant slightly more importance to growth and alternative measures. We might therefore expect to find a somewhat weaker relationship between PMF scores and student characteristics such as free/reduced price lunch eligibility (FRL), as these charters are judged less predominantly on the students they serve. Let’s take a quick look. Read More »
A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.
Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.
The short answer is that there were meaningful differences. Read More »
In the Washington Post, Emma Brown reports on a behind the scenes decision about how to score last year’s new, more difficult tests in the District of Columbia Public Schools (DCPS) and the District’s charter schools.
To make a long story short, the choice faced by the Office of the State Superintendent of Education, or OSSE, which oversees testing in the District, was about how to convert test scores into proficiency rates. The first option, put simply, was to convert them such that the proficiency bar was more “aligned” with the Common Core, thus resulting in lower aggregate proficiency rates in math, compared with last year’s (in other states, such as Kentucky and New York, rates declined markedly). The second option was to score the tests while “holding constant” the difficulty of the questions, in order to facilitate comparisons of aggregate rates with those from previous years.
OSSE chose the latter option (according to some, in a manner that was insufficiently transparent). The end result was a modest increase in proficiency rates (which DC officials absurdly called “historic”). Read More »
The New Teacher Project (TNTP) has released a new report on teacher retention in D.C. Public Schools (DCPS). It is a spinoff of their “The Irreplaceables” report, which was released a few months ago, and which is discussed in this post. The four (unnamed) districts from that report are also used in this one, and their results are compared with those from DCPS.
I want to look quickly at this new supplemental analysis, not to rehash the issues I raised about“The Irreplaceables,” but rather because of DCPS’s potential importance as a field test site for a host of policy reform ideas – indeed, the majority of core market-based reform policies have been in place in D.C. for several years, including teacher evaluations in which test-based measures are the dominant component, automatic dismissals based on those ratings, large performance bonuses, mutual consent for excessed teachers and a huge charter sector. There are many people itching to render a sweeping verdict, positive or negative, on these reforms, most often based on pre-existing beliefs, rather than solid evidence.
Although I will take issue with a couple of the conclusions offered in this report, I’m not going to review it systematically. I think research on retention is important, and it’s difficult to produce reports with original analysis, while very easy to pick them apart. Instead, I’m going to list a couple of findings in the report that I think are worth examining, mostly because they speak to larger issues. Read More »
One of the most important things in education policy to keep an eye on is the first round of changes to new teacher evaluation systems. Given all the moving parts and the lack of evidence on how these systems should be designed and their impact, course adjustments along the way are not just inevitable, but absolutely essential.
Changes might be guided by different types of evidence, such as feedback from teachers and administrators or analysis of ratings data. And, of course, human judgment will play a big role. One thing that states and districts should not be doing, however, is assessing their new systems – or making changes to them – based whether or not raw overall test scores go up or down within the first few years.
Here’s a little reality check: Even the best-designed, best-implemented new evaluations are unlikely to have an immediate measurable impact on aggregate student performance. Evaluations are an investment, not a quick fix. And they are not risk-free. Their effects will depend on the quality of systems, how current teachers and administrators react to them and how all of this shapes and plays out in the teacher labor market. As I’ve said before, the realistic expectation for overall performance – and this is no guarantee – is that there will be some very small, gradual improvements, unfolding over a period of years and decades.
States and districts that expect anything more risk making poor decisions during these crucial, early phases. Read More »
D.C. Public Schools (DCPS) recently announced a few significant changes to its teacher evaluation system (called IMPACT), including the alteration of its test-based components, the creation of a new performance category (“developing”), and a few tweaks to the observational component (discussed below). These changes will be effective starting this year.
As with any new evaluation system, a period of adjustment and revision should be expected and encouraged (though it might be preferable if the first round of changes occurs during a phase-in period, prior to stakes becoming attached). Yet, despite all the attention given to the IMPACT system over the past few years, these new changes have not been discussed much beyond a few quick news articles.
I think that’s unfortunate: DCPS is an early adopter of the “new breed” of teacher evaluation policies being rolled out across the nation, and any adjustments to IMPACT’s design – presumably based on results and feedback – could provide valuable lessons for states and districts in earlier phases of the process.
Accordingly, I thought I would take a quick look at three of these changes. Read More »
One of the more telling episodes in education I’ve seen over the past couple of years was a little dispute over Michelle Rhee’s testing record that flared up last year. Alan Ginsburg, a retired U.S. Department of Education official, released an informal report in which he presented the NAEP cohort changes that occurred during the first two years of Michelle Rhee’s tenure (2007-2009), and compared them with those during the superintendencies of her two predecessors.
Ginsburg concluded that the increases under Chancellor Rhee, though positive, were less rapid than in previous years (2000 to 2007 in math, 2003 to 2007 in reading). Soon thereafter, Paul Peterson, director of Harvard’s Program on Educational Leadership and Governance, published an article in Education Next that disputed Ginsburg’s findings. Peterson found that increases under Rhee amounted to roughly three scale score points per year, compared with around 1-1.5 points annually between 2000 and 2007 (the actual amounts varied by subject and grade).
Both articles were generally cautious in tone and in their conclusions about the actual causes of the testing trends. The technical details of the two reports – who’s “wrong” or “right” – are not important for this post (especially since more recent NAEP results have since been released). More interesting was how people reacted – and didn’t react – to the dueling analyses. Read More »