A couple of weeks ago, Michelle Rhee published an op-ed in the Washington Post speaking out against the so-called “opt out movement,” which encourages parents to refuse to let their children take standardized tests.
Personally, I oppose the “opt-out” phenomenon, but I also think it would be a mistake not to pay attention to its proponents’ fundamental issue – that standardized tests are potentially being misused and/or overused. This concern is legitimate and important. My sense is that “opting out” reflects a rather extreme version of this mindset, a belief that we cannot right the ship – i.e., we have gone so far and moved so carelessly with test-based accountability that there is no real hope that it can or will be fixed. This strikes me as a severe overreaction, but I understand the sentiment.
That said, while most of Ms. Rhee’s op-ed is the standard, reasonable fare, some of it is also laced with precisely the kind of misconceptions that contribute to the apprehensions not only of anti-testing advocates, but also among those of us who occupy a middle ground – i.e., favor some test-based accountability, but are worried about getting it right. Read More »
A few months ago, the U.S. Department of Education (USED) released the latest data from schools that received grants via the School Improvement (SIG) program. These data — consisting solely of changes in proficiency rates — were widely reported as an indication of “disappointing” or “mixed” results. Some even went as far as proclaiming the program a complete failure.
Once again, I have to point out that this breaks almost every rule of testing data interpretation and policy analysis. I’m not going to repeat the arguments about why changes in cross-sectional proficiency rates are not policy evidence (see our posts here, here and here, or examples from the research literature here, here and here). Suffice it to say that the changes themselves are not even particularly good indicators of whether students’ test-based performance in these schools actually improved, to say nothing of whether it was the SIG grants that were responsible for the changes. There’s more to policy analysis than subtraction.
So, in some respects, I would like to come to the defense of Secretary Arne Duncan and USED right now – not because I’m a big fan of the SIG program (I’m ambivalent at best), but rather because I believe in strong, patient policy evaluation, and these proficiency rate changes are virtually meaningless. Unfortunately, however, USED was the first to portray, albeit very cautiously, rate changes as evidence of SIG’s impact. In doing so, they provided a very effective example of why relying on bad evidence is a bad idea even if it supports your desired conclusions. Read More »
A recent report from the U.S. Department of Education presented a summary of three recent studies of the differences in the effectiveness of teaching provided advantaged and disadvantaged students (with the former defined in terms of value-added scores, and the latter in terms of subsidized lunch eligibility). The brief characterizes the results of these reports in an accessible manner – that the difference in estimated teaching effectiveness between advantaged and disadvantaged students varied quite widely between districts, but overall is about four percent of the achievement gap in reading and 2-3 percent in math.
Some observers were not impressed. They wondered why so-called reformers are alienating teachers and hurting students in order to address a mere 2-4 percent improvement in the achievement gap.
Just to be clear, the 2-4 percent figures describe the gap (and remember that it varies). Whether it can be narrowed or closed – e.g., by improving working conditions or offering incentives or some other means – is a separate issue. Nevertheless, let’s put aside all the substantive aspects surrounding these studies, and the issue of the distribution of teacher quality, and discuss this 2-4 percent thing, as it illustrates what I believe is the among the most important tensions underlying education policy today: Our collective failure to have a reasonable debate about expectations and the power of education policy. Read More »
In 2009, The New Teacher Project (TNTP) released a report called “The Widget Effect.” You would be hard-pressed to find too many more recent publications from an advocacy group that had a larger influence on education policy and the debate surrounding it. To this day, the report is mentioned regularly by advocates and policy makers.
The primary argument of the report was that teacher performance “is not measured, recorded, or used to inform decision making in any meaningful way.” More specifically, the report shows that most teachers received “satisfactory” or equivalent ratings, and that evaluations were not tied to most personnel decisions (e.g., compensation, layoffs, etc.). From these findings and arguments comes the catchy title – a “widget” is a fictional product commonly used in situations (e.g., economics classes) where the product doesn’t matter. Thus, treating teachers like widgets means that we treat them all as if they’re the same.
Given the influence of “The Widget Effect,” as well as how different the teacher evaluation landscape is now compared to when it was released, I decided to read it closely. Having done so, I think it’s worth discussing a few points about the report. Read More »
The Center for American Progress (CAP) recently released a short report on whether teachers were leaving the profession due to reforms implemented during the Obama Administration, as some commentators predicted.
The authors use data from the Schools and Staffing Survey (SASS), a wonderful national survey of U.S. teachers, and they report that 70 percent of first-year teachers in 2007-08 were still teaching in 2011-12. They claim that this high retention of beginning teachers, along with the fact that most teachers in 2011-12 had five or more years of experience, show that “the teacher retention concerns were unfounded.”
This report raises a couple of important points about the debate over teacher retention during this time of sweeping reform.
Read More »
In the three most discussed and controversial areas of market-based education reform – performance pay, charter schools and the use of value-added estimates in teacher evaluations – 2013 saw the release of a couple of truly landmark reports, in addition to the normal flow of strong work coming from the education research community (see our reviews from 2010, 2011 and 2012).*
In one sense, this building body of evidence is critical and even comforting, given not only the rapid expansion of charter schools, but also and especially the ongoing design and implementation of new teacher evaluations (which, in many cases, include performance-based pay incentives). In another sense, however, there is good cause for anxiety. Although one must try policies before knowing how they work, the sheer speed of policy change in the U.S. right now means that policymakers are making important decisions on the fly, and there is great deal of uncertainty as to how this will all turn out.
Moreover, while 2013 was without question an important year for research in these three areas, it also illustrated an obvious point: Proper interpretation and application of findings is perhaps just as important as the work itself. Read More »
A couple of months ago, Bill Gates said something that received a lot of attention. With regard to his foundation’s education reform efforts, which focus most prominently on teacher evaluations, but encompass many other areas, he noted, “we don’t know if it will work.” In fact, according to Mr. Gates, “we won’t know for probably a decade.”
He’s absolutely correct. Most education policies, including (but not limited to) those geared toward shifting the distribution of teacher quality, take a long time to work (if they do work), and the research assessing these policies requires a great deal of patience. Yet so many of the most prominent figures in education policy routinely espouse the opposite viewpoint: Policies are expected to have an immediate, measurable impact (and their effects are assessed in the crudest manner imaginable).
A perfect example was the reaction to the recent release of results of the National Assessment of Educational Progress (NAEP). Read More »
Advocates of the so-called “Florida Formula,” a package of market-based reforms enacted throughout the 1990s and 2000s, some of which are now spreading rapidly in other states, traveled to Michigan this week to make their case to the state’s lawmakers, with particular emphasis on Florida’s school grading system. In addition to arguments about accessibility and parental involvement, their empirical (i.e., test-based) evidence consisted largely of the standard, invalid claims that cross-sectional NAEP increases prove the reforms’ effectiveness, along with a bonus appearance of the argument that since Florida starting grading schools, the grades have improved, even though this is largely (and demonstrably) a result of changes in the formula.
As mentioned in a previous post, I continue to be perplexed at advocates’ insistence on using this “evidence,” even though there is a decent amount of actual rigorous policy research available, much of it positive.
So, I thought it would be fun, though slightly strange, for me to try on my market-based reformer cap, and see what it would look like if this kind of testimony about the Florida reforms was actually research-based (at least the test-based evidence). Here’s a very rough outline of what I came up with: Read More »
U.S. Secretary of Education Arne Duncan recently announced that states will be given the option to postpone using the results of their new teacher evaluations for high-stakes decisions during the phase-in of the new Common Core-aligned assessments. The reaction from some advocates was swift condemnation – calling the decision little more than a “delay” and a “victory for the status quo.”
We hear these kinds of arguments frequently in education. The idea is that change must be as rapid as possible, because “kids can’t wait.” I can understand and appreciate the urgency underlying these sentiments. Policy change in education (as in other arenas) can sometimes be painfully slow, and what seem likes small roadblocks can turn out to be massive, permanent obstacles.
I will not repeat my views regarding the substance of Secretary Duncan’s decision – see this op-ed by Morgan Polikoff and myself. I would, however, like to make one very quick point about these “we need change right now because students can’t wait” arguments: Sometimes, what is called “delay” is actually better described as good policy making, and kids can wait for good policy making. Read More »
** Reprinted here
in the Washington Post
The following is written by Morgan S. Polikoff and Matthew Di Carlo. Morgan is Assistant Professor in the Rossier School of Education at the University of Southern California.
One of the primary policy levers now being employed in states and districts nationwide is teacher evaluation reform. Well-designed evaluations, which should include measures that capture both teacher practice and student learning, have great potential to inform and improve the performance of teachers and, thus, students. Furthermore, most everyone agrees that the previous systems were largely pro forma, failed to provide useful feedback, and needed replacement.
The attitude among many policymakers and advocates is that we must implement these systems and begin using them rapidly for decisions about teachers, while design flaws can be fixed later. Such urgency is undoubtedly influenced by the history of slow, incremental progress in education policy. However, we believe this attitude to be imprudent. Read More »
** Reprinted here in the Washington Post
A big part of successful policy making is unyielding attention to detail (an argument that regular readers of this blog hear often). Choices about design and implementation that may seem unimportant can play a substantial role in determining how policies play out in practice.
A new paper, co-authored by Elizabeth Davidson, Randall Reback, Jonah Rockoff and Heather Schwartz, and presented at last month’s annual conference of The Association for Education Finance and Policy, illustrates this principle vividly, and on a grand scale: With an analysis of outcomes in all 50 states during the early years of NCLB.
After a terrific summary of the law’s rules and implementation challenges, as well as some quick descriptive statistics, the paper’s main analysis is a straightforward examination of why the proportion of schools meeting AYP varied quite a bit between states. For instance, in 2003, the first year of results, 32 percent of U.S. schools failed to make AYP, but the proportion ranged from one percent in Iowa to over 80 percent in Florida.
Surprisingly, the results suggest that the primary reasons for this variation seem to have had little to do with differences in student performance. Rather, the big factors are subtle differences in rather arcane rules that each state chose during the implementation process. These decisions received little attention, yet they had a dramatic impact on the outcomes of NCLB during this time period. Read More »
Earlier this week, New Jersey Governor Chris Christie announced that the state will assume control over Camden City School District. Camden will be the fourth NJ district to undergo takeover, though this is the first time that the state will be removing control from an elected local school board, which will now serve in an advisory role (and have three additional members appointed by the Governor). Over the next few weeks, NJ officials will choose a new superintendent, and begin to revamp evaluations, curricula and other core policies.
Accompanying the announcement, the Governor’s office released a two-page “fact sheet,” much of which is devoted to justifying this move to the public.
Before discussing it, let’s be clear about something - it may indeed be the case that Camden schools are so critically low-performing and/or dysfunctional as to warrant drastic intervention. Moreover, it’s at least possible that state takeover is the appropriate type of intervention to help these schools improve (though the research on this latter score is, to be charitable, undeveloped).
That said, the “fact sheet” presents relatively little valid evidence regarding the academic performance of Camden schools. Given the sheer magnitude of any takeover decision, it is crucial for the state to demonstrate publicly that they have left no stone unturned by presenting a case that is as comprehensive and compelling as possible. However, the discrepancy between that high bar and NJ’s evidence, at least that pertaining to academic outcomes, is more than a little disconcerting.
Read More »
In a story for Education Week, always reliable Stephen Sawchuk reports on what may be a trend in states’ first results from their new teacher evaluation systems: The ratings are skewed toward the top.
For example, the article notes that, in Michigan, Florida and Georgia, a high proportion of teachers (more than 90 percent) received one of the two top ratings (out of four or five). This has led to some grumbling among advocates and others, citing similarities between these results and those of the old systems, in which the vast majority of teachers were rated “satisfactory,” and very few were found to be “unsatisfactory.”
Differentiation is very important in teacher evaluations – it’s kind of the whole point. Thus, it’s a problem when ratings are too heavily concentrated toward one end of the distribution. However, as Aaron Pallas points out, these important conversations about evaluation results sometimes seem less focused on good measurement or even the spread of teachers across categories than on the narrower question of how many teachers end up with the lowest rating – i.e., how many teachers will be fired.
Read More »
In his State of the City address last month, New York City Mayor Michael Bloomberg made some brief comments about the upcoming adoption of new assessments aligned with the Common Core State Standards (CCSS), including the following statement:
But no matter where the definition of proficiency is arbitrarily set on the new tests, I expect that our students’ progress will continue outpacing the rest of the State’s[,] the only meaningful measurement of progress we have.
On the surface, this may seem like just a little bit of healthy bravado. But there are a few things about this single sentence that struck me, and it also helps to illustrate an important point about the relationship between standards and testing results. Read More »
Some Florida officials are still having trouble understanding why they’re finding no relationship between the grades schools receive and the evaluation ratings of teachers in those schools. For his part, new Florida education Commissioner Tony Bennett is also concerned. According to the article linked above, he acknowledges (to his credit) that the two measures are different, but is also considering “revis[ing] the models to get some fidelity between the two rankings.”
This may be turning into a potentially risky situation. As discussed in a recent post, it is important to examine the results of the new teacher evaluations, but there is no reason one would expect to find a strong relationship between these ratings and the school grades, as they are in large part measuring different things (and imprecisely at that). The school grades are mostly (but not entirely) driven by how highly students score, whereas teacher evaluations are, to the degree possible, designed to be independent of these absolute performance levels. Florida cannot validate one system using the other.
However, as also mentioned in that post, this is not to say that there should be no relationship at all. For example, both systems include growth-oriented measures (albeit using very different approaches). In addition, schools with lower average performance levels sometimes have trouble recruiting and retaining good teachers. Due to these and other factors, the reasonable expectation is to find some association overall, just not one that’s extremely strong. And that’s basically what one finds, even using the same set of results upon which the claims that there is no relationship are based.
Read More »