The College Board recently released the latest SAT results, for the first time combining this release with that of data from the PSAT and AP exams. The release of these data generated the usual stream of news coverage, much of which misinterpreted the year-to-year changes in SAT scores as a lack of improvement, even though the data are cross-sectional and the test-taking sample has been changing, and/or misinterpreted the percent of test takers who scored above the “college ready” line as a national measure of college readiness, even though the tests are not administered to a representative sample of students.
It is disheartening to watch this annual exercise, in which the most common “take home” headlines (e.g., “no progress in SAT scores” and “more, different students take SAT”) are in many important respects contradictory. In past years, much of the blame had to be placed on the College Board’s presentation of the data. This year, to their credit, the roll-out is substantially better (hopefully, this will continue).
But I don’t want to focus on this aspect of the organization’s activities (see this post for more); instead, I would like to discuss briefly the College Board’s recent change in mission. Read More »
The Foundation for Excellence in Education, an organization that advocates for education reform in Florida, in particular the set of policies sometimes called the “Florida Formula,” recently announced a competition to redesign the “appearance, presentation and usability” of the state’s school report cards. Winners of the competition will share prize money totaling $35,000.
The contest seems like a great idea. Improving the manner in which education data are presented is, of course, a laudable goal, and an open competition could potentially attract a diverse group of talented people. As regular readers of this blog know, however, I am not opposed to sensibly-designed test-based accountability policies, but my primary concern about school rating systems is focused mostly on the quality and interpretation of the measures used therein. So, while I support the idea of a competition for improving the design of the report cards, I am hoping that the end result won’t just be a very attractive, clever instrument devoted to the misinterpretation of testing data.
In this spirit, I would like to submit four simple graphs that illustrate, as clearly as possible and using the latest data from 2014, what Florida’s school grades are actually telling us. Since the scoring and measures vary a bit between different types of schools, let’s focus on elementary schools. Read More »
There’s no reason why insisting on proper causal inference can’t be fun.
A weeks ago, ASCD published a policy brief (thanks to Chad Aldeman for flagging it), the purpose of which is to argue that it is “grossly misleading” to make a “direct connection” between nations’ test scores and their economic strength.
On the one hand, it’s implausible to assert that better educated nations aren’t stronger economically. On the other hand, I can certainly respect the argument that test scores are an imperfect, incomplete measure, and the doomsday rhetoric can sometimes get out of control.
In any case, though, the primary piece of evidence put forth in the brief was the eye-catching graph below, which presented trends in NAEP versus those in U.S. GDP and productivity. Read More »
In observing all the recent controversy surrounding the Common Core State Standards (CCSS), I have noticed that one of the frequent criticisms from one of the anti-CCSS camps, particularly since the first rounds of results from CCSS-aligned tests have started to be released, is that the standards are going to be used to label more schools as “failing,” and thus ramp up the test-based accountability regime in U.S. public education.
As someone who is very receptive to a sensible, well-designed dose of test-based accountability, but sees so little of it in current policy, I am more than sympathetic to concerns about the proliferation and misuse of high-stakes testing. On the other hand, anti-CCSS arguments that focus on testing or testing results are not really arguments against the standards per se. They also strike me as ironic, as they are based on the same flawed assumptions that critics of high-stakes testing should be opposing.
Standards themselves are about students. They dictate what students should know at different points in their progression through the K-12 system. Testing whether students meet those standards makes sense, but how we use those test results is not dictated by the standards. Nor do standards require us to set bars for “proficient,” “advanced,” etc., using the tests. Read More »
One of the more visible manifestations of what I have called “informal test-based accountability” — that is, how testing results play out in the media and public discourse — is the phenomenon of superintendents, particularly big city superintendents, making their reputations based on the results during their administrations.
In general, big city superintendents are expected to promise large testing increases, and their success or failure is to no small extent judged on whether those promises are fulfilled. Several superintendents almost seem to have built entire careers on a few (misinterpreted) points in proficiency rates or NAEP scale scores. This particular phenomenon, in my view, is rather curious. For one thing, any district leader will tell you that many of their core duties, such as improving administrative efficiency, communicating with parents and the community, strengthening districts’ financial situation, etc., might have little or no impact on short-term testing gains. In addition, even those policies that do have such an impact often take many years to show up in aggregate results.
In short, judging superintendents based largely on the testing results during their tenures seems misguided. A recent report issued by the Brown Center at Brookings, and written by Matt Chingos, Grover Whitehurst and Katharine Lindquist, adds a little bit of empirical insight to this viewpoint. Read More »
In the most simplistic portrayal of the education policy landscape, one of the “sides” is a group of people who are referred to as “reformers.” Though far from monolithic, these people tend to advocate for test-based accountability, charters/choice, overhauling teacher personnel rules, and other related policies, with a particular focus on high expectations, competition and measurement. They also frequently see themselves as in opposition to teachers’ unions.
Most of the “reformers” I have met and spoken with are not quite so easy to categorize. They are also thoughtful and open to dialogue, even when we disagree. And, at least in my experience, there is far more common ground than one might expect.
Nevertheless, I believe that this “movement” (to whatever degree you can characterize it in those terms) may be doomed to stall out in the long run, not because their ideas are all bad, and certainly not because they lack the political skills and resources to get their policies enacted. Rather, they risk failure for a simple reason: They too often make promises that they cannot keep. Read More »
A couple of weeks ago, the New York State Education Department (NYSED) released data from the first year of the state’s new teacher and principal evaluation system (called the “Annual Professional Performance Review,” or APPR). In what has become a familiar pattern, this prompted a wave of criticism from advocates, much of it focused on the proportion of teachers in the state to receive the lowest ratings.
To be clear, evaluation systems that produce non-credible results should be examined and improved, and that includes those that put implausible proportions of teachers in the highest and lowest categories. Much of the commentary surrounding this and other issues has been thoughtful and measured. As usual, though, there have been some oversimplified reactions, as exemplified by this piece on the APPR results from Students First NY (SFNY).
SFNY notes what it considers to be the low proportion of teachers rated “ineffective,” and points out that there was more differentiation across rating categories for the state growth measure (worth 20 percent of teachers’ final scores), compared with the local “student learning” measure (20 percent) and the classroom observation components (60 percent). Based on this, they conclude that New York’s “state test is the only reliable measure of teacher performance” (they are actually talking about validity, not reliability, but we’ll let that go). Again, this argument is not representative of the commentary surrounding the APPR results, but let’s use it as a springboard for making a few points, most of which are not particularly original. (UPDATE: After publication of this post, SFNY changed the headline of their piece from “the only reliable measure of teacher performance” to “the most reliable measure of teacher performance.”) Read More »
There are three general factors that determine most public school teachers’ base salaries (which are usually laid out in a table called a salary schedule). The first is where they teach; districts vary widely in how much they pay. The second factor is experience. Salary schedules normally grant teachers “step raises” or “increments” each year they remain in the district, though these raises end at some point (when teachers reach the “top step”).
The third typical factor that determines teacher salary is their level of education. Usually, teachers receive a permanent raise for acquiring additional education beyond their bachelor’s degree. Most commonly, this means a master’s degree, which roughly half of teachers have earned (though most districts award raises for accumulating a certain number of credits towards a master’s and/or a Ph.D., and for getting a Ph.D.). The raise for receiving a master’s degree varies, but just to give an idea, it is, on average, about 10 percent over the base salary of bachelor’s-only teachers.
This practice of awarding raises for teachers who earn master’s degrees has come under tremendous fire in recent years. The basic argument is that these raises are expensive, but that having a master’s degree is not associated with test-based effectiveness (i.e., is not correlated with scores from value-added models of teachers’ estimated impact on their students’ testing performance). Many advocates argue that states and districts should simply cease giving teachers raises for advanced degrees, since, they say, it makes no sense to pay teachers for a credential that is not associated with higher performance. North Carolina, in fact, passed a law last year ending these raises, and there is talk of doing the same elsewhere. Read More »
Several months ago, the American Statistical Association (ASA) released a statement on the use of value-added models in education policy. I’m a little late getting to this (and might be repeating points that others made at the time), but I wanted to comment on the statement, not only because I think it’s useful to have ASA add their perspective to the debate on this issue, but also because their statement seems to have become one of the staple citations for those who oppose the use of these models in teacher evaluations and other policies.
Some of these folks claimed that the ASA supported their viewpoint – i.e., that value-added models should play no role in accountability policy. I don’t agree with this interpretation. To be sure, the ASA authors described the limitations of these estimates, and urged caution, but I think that the statement rather explicitly reaches a more nuanced conclusion: That value-added estimates might play a useful role in education policy, as one among several measures used in formal accountability systems, but this must be done carefully and appropriately.*
Much of the statement puts forth the standard, albeit important, points about value-added (e.g., moderate stability between years/models, potential for bias, etc.). But there are, from my reading, three important takeaways that bear on the public debate about the use of these measures, which are not always so widely acknowledged. Read More »
** Reprinted here in the Washington Post
The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.
Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name.” The Daily News followed up by publishing an op-ed that compares the Success Academies’ combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).
On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it’s also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don’t care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures. Read More »
In a previous post, I discussed simple data from the District of Columbia Public Schools (DCPS) on teacher turnover in high- versus lower-poverty schools. In that same report, which was issued by the D.C. Auditor and included, among other things, descriptive analyses by the excellent researchers from Mathematica, there is another very interesting table showing the evaluation ratings of DC teachers in 2010-11 by school poverty (and, indeed, DC officials deserve credit for making these kinds of data available to the public, as this is not the case in many other states).
DCPS’ well-known evaluation system (called IMPACT) varies between teachers in tested versus non-tested grades, but the final ratings are a weighted average of several components, including: the teaching and learning framework (classroom observations); commitment to the school community (attendance at meetings, mentoring, PD, etc.); schoolwide value-added; teacher-assessed student achievement data (local assessments); core professionalism (absences, etc.); and individual value-added (tested teachers only).
The table I want to discuss is on page 43 of the Auditor’s report, and it shows average IMPACT scores for each component and overall for teachers in high-poverty schools (80-100 percent free/reduced-price lunch), medium poverty schools (60-80 percent) and low-poverty schools (less than 60 percent). It is pasted below. Read More »
The so-called Vergara trial in California, in which the state’s tenure and layoff statutes were deemed unconstitutional, already has its first “spin-off,” this time in New York, where a newly-formed organization, the Partnership for Educational Justice (PEJ), is among the organizations and entities spearheading the effort.
Upon first visiting PEJ’s new website, I was immediately (and predictably) drawn to the “Research” tab. It contains five statements (which, I guess, PEJ would characterize as “facts”). Each argument is presented in the most accessible form possible, typically accompanied by one citation (or two at most). I assume that the presentation of evidence in the actual trial will be a lot more thorough than that offered on this webpage, which seems geared toward the public rather than the more extensive evidentiary requirements of the courtroom (also see Bruce Baker’s comments on many of these same issues surrounding the New York situation).
That said, I thought it might be useful to review the basic arguments and evidence PEJ presents, not really in the context of whether they will “work” in the lawsuit (a judgment I am unqualified to make), but rather because they’re very common, and also because it’s been my observation that advocates, on both “sides” of the education debate, tend to be fairly good at using data and research to describe problems and/or situations, yet sometimes fall a bit short when it comes to evidence-based discussions of what to do about them (including the essential task of acknowledging when the evidence is still undeveloped). PEJ’s five bullet points, discussed below, are pretty good examples of what I mean. Read More »
The Washington Post reports on an issue that we have discussed here on many occasions: The incompleteness of the testing results released annually by the District of Columbia Public Schools (DCPS), or, more accurately, the Office of the State Superintendent of Education (OSSE), which is responsible for testing in DC schools.
Here’s the quick backstory: For the past 7-8 years or so, DCPS/OSSE have not released a single test score for the state assessment (the DC-CAS). Instead, they have released only the percentage of students whose scores meet the designated cutoff points for the NCLB-style categories of below basic, basic, proficient and advanced. I will not reiterate all of the problems with these cutpoint-based rates and how they serve to distort the underlying data, except to say that they are by themselves among the worst ways to present these data, and there is absolutely no reason why states and districts should not release both rates and average scale scores.
The Post reports, however, that one organization — the Broader, Bolder Approach to Education — was able to obtain the actual scale score data (by subgroup and grade) for 2010-2013, and that this group published a memo-style report alleging that DCPS’ public presentation of their testing results over the past few years has been misleading. I had a mixed reaction to this report and the accompanying story. Read More »
There is a tendency in education circles these days, one that I’m sure has been discussed by others, and of which I myself have been “guilty,” on countless occasions. The tendency is to use terms such “effective/ineffective teacher” or “teacher performance” interchangeably with estimates from value-added and other growth models.
Now, to be clear, I personally am not opposed to the use of value-added estimates in teacher evaluations and other policies, so long as it is done cautiously and appropriately (which, in my view, is not happening in very many places). Moreover, based on my reading of the research, I believe that these estimates can provide useful information about teachers’ performance in the classroom. In short, then, I am not disputing whether value-added scores should be considered to be one useful proxy measure for teacher performance and effectiveness (and described as such), both formally and informally.
Regardless of one’s views on value-added and its policy deployment, however, there is a point at which our failure to define terms can go too far, and perhaps cause confusion. Read More »
A couple of weeks ago, the website Vox.com published an article entitled, “11 facts about U.S. teachers and schools that put the education reform debate in context.” The article, in the wake of the Vergara decision, is supposed to provide readers with the “basic facts” about the current education reform environment, with a particular emphasis on teachers. Most of the 11 facts are based on descriptive statistics.
Vox advertises itself as a source of accessible, essential, summary information — what you “need to know” — for people interested in a topic but not necessarily well-versed in it. Right off the bat, let me say that this is an extraordinarily difficult task, and in constructing lists such as this one, there’s no way to please everyone (I’ve read a couple of Vox’s education articles and they were okay).
That said, someone sent me this particular list, and it’s pretty good overall, especially since it does not reflect overt advocacy for given policy positions, as so many of these types of lists do. But I was compelled to comment on it. I want to say that I did this to make some lofty point about the strengths and weaknesses of data and statistics packaged for consumption by the general public. It would, however, be more accurate to say that I started doing it and just couldn’t stop. In any case, here’s a little supplemental discussion of each of the 11 items: Read More »