Our guest author today is William Schmidt, a University Distinguished Professor and co-director of the Education Policy Center at Michigan State University. He is also a member of the Shanker Institute board of directors.
Every year or two, the mass media is full of stories on the latest iterations of one of the two major international large scale assessments, the Trends in International Mathematics and Science Study (TIMSS) and the Program for International Student Assessment (PISA). What perplexes many is that the results of these two tests — both well-established and run by respectable, experienced organizations — suggest different conclusions about the state of U.S. mathematics education. Generally speaking, U.S. students do better on the TIMSS and poorly on the PISA, relative to their peers in other nations. Depending on their personal preferences, policy advocates can simply choose whichever test result is convenient to press their argument, leaving the general public without clear guidance.
Now, in one sense, the differences between the tests are more apparent than real. One reason why the U.S. ranks better on the TIMSS than the PISA is that the two tests sample students from different sets of countries. The PISA has many more wealthy countries, whose students tend to do better – hence, the U.S.’s lower ranking. It turns out that when looking at only the countries that participated in both the TIMSS and the PISA we find similar country rankings. There are also some differences in statistical sampling, but these are fairly minor. Read More »
The District of Columbia Public Charter School Board (PCSB) recently released the 2014 results of their “Performance Management Framework” (PMF), which is the rating system that the PCSB uses for its schools.
Very quick background: This system sorts schools into one of three “tiers,” with Tier 1 being the highest-performing, as measured by the system, and Tier 3 being the lowest. The ratings are based on a weighted combination of four types of factors — progress, achievement, gateway, and leading — which are described in detail in the first footnote.* As discussed in a previous post, the PCSB system, in my opinion, is better than many others out there, since growth measures play a fairly prominent role in the ratings, and, as a result, the final scores are only moderately correlated with key student characteristics such as subsidized lunch eligibility.** In addition, the PCSB is quite diligent about making the PMF results accessible to parents and other stakeholders, and, for the record, I have found the staff very open to sharing data and answering questions.
That said, PCSB’s big message this year was that schools’ ratings are improving over time, and that, as a result, a substantially larger proportion of DC charter students are attending top-rated schools. This was reported uncritically by several media outlets, including this story in the Washington Post. It is also based on a somewhat questionable use of the data. Let’s take a very simple look at the PMF dataset, first to examine this claim and then, more importantly, to see what we can learn about the PMF and DC charter schools in 2013 and 2014. Read More »
So-called achievement gaps – the differences in average test performance among student subgroups, usually defined in terms of ethnicity or income – are important measures. They demonstrate persistent inequality of educational outcomes and economic opportunities between different members of our society.
So long as these gaps remain, it means that historically lower-performing subgroups (e.g., low-income students or ethnic minorities) are less likely to gain access to higher education, good jobs, and political voice. We should monitor these gaps; try to identify all the factors that affect them, for good and for ill; and endeavor to narrow them using every appropriate policy lever – both inside and outside of the educational system.
Achievement gaps have also, however, taken on a very different role over the past 10 or so years. The sizes of gaps, and extent of “gap closing,” are routinely used by reporters and advocates to judge the performance of schools, school districts, and states. In addition, gaps and gap trends are employed directly in formal accountability systems (e.g., states’ school grading systems), in which they are conceptualized as performance measures.
Although simple measures of the magnitude of or changes in achievement gaps are potentially very useful in several different contexts, they are poor gauges of school performance, and shouldn’t be the basis for high-stakes rewards and punishments in any accountability system. Read More »
The State of Florida is currently engaged in a policy tussle of sorts with the U.S. Department of Education (USED) over Florida’s accountability system. To make a long story short, last spring, Florida passed a law saying that the test scores of English language learners (ELLs) would only count toward schools’ accountability grades (and teacher evaluations) once the ELL students had been in the system for at least two years. This runs up against federal law, which requires that ELLs’ scores be counted after only one year, and USED has indicated that it’s not willing to budge on this requirement. In response, Florida is considering legal action.
This conflict might seem incredibly inane (unless you’re in one of the affected schools, of course). Beneath the surface, though, this is actually kind of an amazing story.
Put simply, Florida’s argument against USED’s policy of counting ELL scores after just one year is a perfect example of the reason why most of the state’s core accountability measures (not to mention those of NCLB as a whole) are so inappropriate: Because they judge schools’ performance based largely on where their students’ scores end up without paying any attention to where they start out. Read More »
The College Board recently released the latest SAT results, for the first time combining this release with that of data from the PSAT and AP exams. The release of these data generated the usual stream of news coverage, much of which misinterpreted the year-to-year changes in SAT scores as a lack of improvement, even though the data are cross-sectional and the test-taking sample has been changing, and/or misinterpreted the percent of test takers who scored above the “college ready” line as a national measure of college readiness, even though the tests are not administered to a representative sample of students.
It is disheartening to watch this annual exercise, in which the most common “take home” headlines (e.g., “no progress in SAT scores” and “more, different students take SAT”) are in many important respects contradictory. In past years, much of the blame had to be placed on the College Board’s presentation of the data. This year, to their credit, the roll-out is substantially better (hopefully, this will continue).
But I don’t want to focus on this aspect of the organization’s activities (see this post for more); instead, I would like to discuss briefly the College Board’s recent change in mission. Read More »
The Foundation for Excellence in Education, an organization that advocates for education reform in Florida, in particular the set of policies sometimes called the “Florida Formula,” recently announced a competition to redesign the “appearance, presentation and usability” of the state’s school report cards. Winners of the competition will share prize money totaling $35,000.
The contest seems like a great idea. Improving the manner in which education data are presented is, of course, a laudable goal, and an open competition could potentially attract a diverse group of talented people. As regular readers of this blog know, however, I am not opposed to sensibly-designed test-based accountability policies, but my primary concern about school rating systems is focused mostly on the quality and interpretation of the measures used therein. So, while I support the idea of a competition for improving the design of the report cards, I am hoping that the end result won’t just be a very attractive, clever instrument devoted to the misinterpretation of testing data.
In this spirit, I would like to submit four simple graphs that illustrate, as clearly as possible and using the latest data from 2014, what Florida’s school grades are actually telling us. Since the scoring and measures vary a bit between different types of schools, let’s focus on elementary schools. Read More »
There’s no reason why insisting on proper causal inference can’t be fun.
A weeks ago, ASCD published a policy brief (thanks to Chad Aldeman for flagging it), the purpose of which is to argue that it is “grossly misleading” to make a “direct connection” between nations’ test scores and their economic strength.
On the one hand, it’s implausible to assert that better educated nations aren’t stronger economically. On the other hand, I can certainly respect the argument that test scores are an imperfect, incomplete measure, and the doomsday rhetoric can sometimes get out of control.
In any case, though, the primary piece of evidence put forth in the brief was the eye-catching graph below, which presented trends in NAEP versus those in U.S. GDP and productivity. Read More »
In observing all the recent controversy surrounding the Common Core State Standards (CCSS), I have noticed that one of the frequent criticisms from one of the anti-CCSS camps, particularly since the first rounds of results from CCSS-aligned tests have started to be released, is that the standards are going to be used to label more schools as “failing,” and thus ramp up the test-based accountability regime in U.S. public education.
As someone who is very receptive to a sensible, well-designed dose of test-based accountability, but sees so little of it in current policy, I am more than sympathetic to concerns about the proliferation and misuse of high-stakes testing. On the other hand, anti-CCSS arguments that focus on testing or testing results are not really arguments against the standards per se. They also strike me as ironic, as they are based on the same flawed assumptions that critics of high-stakes testing should be opposing.
Standards themselves are about students. They dictate what students should know at different points in their progression through the K-12 system. Testing whether students meet those standards makes sense, but how we use those test results is not dictated by the standards. Nor do standards require us to set bars for “proficient,” “advanced,” etc., using the tests. Read More »
One of the more visible manifestations of what I have called “informal test-based accountability” — that is, how testing results play out in the media and public discourse — is the phenomenon of superintendents, particularly big city superintendents, making their reputations based on the results during their administrations.
In general, big city superintendents are expected to promise large testing increases, and their success or failure is to no small extent judged on whether those promises are fulfilled. Several superintendents almost seem to have built entire careers on a few (misinterpreted) points in proficiency rates or NAEP scale scores. This particular phenomenon, in my view, is rather curious. For one thing, any district leader will tell you that many of their core duties, such as improving administrative efficiency, communicating with parents and the community, strengthening districts’ financial situation, etc., might have little or no impact on short-term testing gains. In addition, even those policies that do have such an impact often take many years to show up in aggregate results.
In short, judging superintendents based largely on the testing results during their tenures seems misguided. A recent report issued by the Brown Center at Brookings, and written by Matt Chingos, Grover Whitehurst and Katharine Lindquist, adds a little bit of empirical insight to this viewpoint. Read More »
** Reprinted here in the Washington Post
The recent release of the latest New York State testing results created a little public relations coup for the controversial Success Academies charter chain, which operates over 20 schools in New York City, and is seeking to expand.
Shortly after the release of the data, the New York Post published a laudatory article noting that seven of the Success Academies had overall proficiency rates that were among the highest in the state, and arguing that the schools “live up to their name.” The Daily News followed up by publishing an op-ed that compares the Success Academies’ combined 94 percent math proficiency rate to the overall city rate of 35 percent, and uses that to argue that the chain should be allowed to expand because its students “aced the test” (this is not really what high proficiency rates mean, but fair enough).
On the one hand, this is great news, and a wonderfully impressive showing by these students. On the other, decidedly less sensational hand, it’s also another example of the use of absolute performance indicators (e.g., proficiency rates) as measures of school rather than student performance, despite the fact that they are not particularly useful for the former purpose since, among other reasons, they do not account for where students start out upon entry to the school. I personally don’t care whether Success Academy gets good or bad press. I do, however, believe that how one gauges effectiveness, test-based or otherwise, is important, even if one reaches the same conclusion using different measures. Read More »
Our guest author today is Jennifer Borgioli, a Senior Consultant with Learner-Centered Initiatives, Ltd., where she supports schools with designing performance based assessments, data analysis, and curriculum design.
The chart below was taken from the 2014 report on student performance on the Grades 3-8 tests administered by the New York State Department of Education.
Based on this chart, which of the following statements is the most accurate?
A. “64 percent of 8th grade students failed the ELA test”
B. “36 percent of 8th graders are at grade level in reading and writing”
C. “36 percent of students meet or exceed the proficiency standard (Level 3 or 4) on the Grade 8 CCLS-aligned math test”
Read More »
The Washington Post reports on an issue that we have discussed here on many occasions: The incompleteness of the testing results released annually by the District of Columbia Public Schools (DCPS), or, more accurately, the Office of the State Superintendent of Education (OSSE), which is responsible for testing in DC schools.
Here’s the quick backstory: For the past 7-8 years or so, DCPS/OSSE have not released a single test score for the state assessment (the DC-CAS). Instead, they have released only the percentage of students whose scores meet the designated cutoff points for the NCLB-style categories of below basic, basic, proficient and advanced. I will not reiterate all of the problems with these cutpoint-based rates and how they serve to distort the underlying data, except to say that they are by themselves among the worst ways to present these data, and there is absolutely no reason why states and districts should not release both rates and average scale scores.
The Post reports, however, that one organization — the Broader, Bolder Approach to Education — was able to obtain the actual scale score data (by subgroup and grade) for 2010-2013, and that this group published a memo-style report alleging that DCPS’ public presentation of their testing results over the past few years has been misleading. I had a mixed reaction to this report and the accompanying story. Read More »
There is an ongoing debate about widespread administration of standardized tests to kindergartners. This is of course a serious decision. My personal opinion about whether this is a good idea depends on several factors, such as how good the tests will be and, most importantly, how the results will be used (and I cannot say that I am optimistic about the latter).
Although the policy itself must be considered seriously on its merits, there is one side aspect of testing kindergarteners that fascinates me: It would demonstrate how absurd it is to judge school performance, as does NCLB, using absolute performance levels – i.e., how highly students score on tests, rather than their progress over time.
Basically, the kindergarten tests would inevitably shake out the same way as those administered in later grades. Schools and districts serving more disadvantaged students would score substantially lower than their counterparts in more affluent areas. If the scores were converted to proficiency rates or similar cut-score measures, they would show extremely low pass rates in urban districts such as Detroit. Read More »
A recent story in the Chicago Tribune notes that Illinois’ NCLB waiver plan sets lower targets for certain student subgroups, including minority and low-income students. This, according to the article, means that “Illinois students of different backgrounds no longer will be held to the same standards,” and goes on to quote advocates who are concerned that this amounts to lower expectations for traditionally lower-scoring groups of children.
The argument that expectations should not vary by student characteristics is, of course, valid and important. Nevertheless, as Chad Aldeman notes, the policy of setting different targets for different groups of students has been legally required since the enactment of NCLB, under which states must “give credit to lower-performing groups that demonstrate progress.” This was supposed to ensure, albeit with exceedingly crude measures, that schools weren’t punished due to the students they serve, and how far behind were those students upon entry into the schools.
I would take that a step further by adding two additional points. The first is quite obvious, and is mentioned briefly in the Tribune article, but too often is obscured in these kinds of conversations: Neither NCLB nor the waivers actually hold students to different standards. The cut scores above which students are deemed “proficient,” somewhat arbitrary though they may be, do not vary by student subgroup, or by any other factor within a given state. All students are held to the same exact standard. Read More »
A recent story in the New York Times reports that, according to an Obama Administration-commissioned panel, the measures being used to evaluate the performance of healthcare providers are unfairly penalizing those that serve larger proportions of disadvantaged patients (thanks to Mike Petrilli for sending me the article). For example, if you’re grading hospitals based on simple, unadjusted re-admittance rates, it might appear as if hospitals serving high poverty populations are doing worse — even if the quality of their service is excellent — since readmissions are more likely for patients who can’t afford medication, or aren’t able to take off from work, or don’t have home support systems.
The panel recommended adjusting the performance measures, which, for instance, are used for Medicare reimbursement, using variables such as patient income and education, as this would provide a more fair accountability system – one that does not penalize healthcare institutions and their personnel for factors that are out of their control.
There are of course very strong, very obvious parallels here to education accountability policy, in which schools are judged in part based on raw proficiency rates that make no attempt to account for differences in the populations of students in different schools. The comparison also reveals an important feature of formal accountability systems in other policy fields. Read More »