One of the more visible manifestations of what I have called “informal test-based accountability” — that is, how testing results play out in the media and public discourse — is the phenomenon of superintendents, particularly big city superintendents, making their reputations based on the results during their administrations.
In general, big city superintendents are expected to promise large testing increases, and their success or failure is to no small extent judged on whether those promises are fulfilled. Several superintendents almost seem to have built entire careers on a few (misinterpreted) points in proficiency rates or NAEP scale scores. This particular phenomenon, in my view, is rather curious. For one thing, any district leader will tell you that many of their core duties, such as improving administrative efficiency, communicating with parents and the community, strengthening districts’ financial situation, etc., might have little or no impact on short-term testing gains. In addition, even those policies that do have such an impact often take many years to show up in aggregate results.
In short, judging superintendents based largely on the testing results during their tenures seems misguided. A recent report issued by the Brown Center at Brookings, and written by Matt Chingos, Grover Whitehurst and Katharine Lindquist, adds a little bit of empirical insight to this viewpoint. Read More »
We know oral language is young children’s door into the world of knowledge and ideas, the foundation for reading, and the bedrock of all academic learning. But, can language also protect young kids against behavioral problems?
A number of studies have identified a co-occurrence of language delays and behavioral maladjustment, an association that remains after controlling for socio-demographic characteristics and academic achievement (here and here). However, most research on the issue has been cross-sectional and correlational making it hard to establish whether behavioral issues cause language delays, language delays cause behavioral issues, or another factor is responsible for both.
A recent paper by Marc Bornstein, Chun-Shin Hahn, and Joan Suwalsky (2013) was able to shed some light on these questions concluding that “language competencies in early childhood keep behavioral adjustment problems at bay.” This is important given the fact that minority children raised in poverty tend to have smaller than average vocabularies and are also overrepresented in pre-K expulsions and suspensions. Read More »
The U.S. Department of Education has released a very short, readable report on the comparability of value-added estimates using two different tests in Indiana – one of them norm-referenced (the Measures of Academic Progress test, or MAP), and the other criterion-referenced (the Indiana Statewide Testing for Educational Progress Plus, or ISTEP+, which is also the state’s official test for NCLB purposes).
The research design here is straightforward – fourth and fifth grade students in 46 schools across 10 districts in Indiana took both tests, their teachers’ value-added scores were calculated, and the scores were compared. Since both sets of scores were based on the same students and teachers, this is allows a direct comparison of how teachers’ value-added estimates compare between these two tests. The results are not surprising, and they square with similar prior studies (see here, here, here, for example): The estimates based on the two tests are moderately correlated. Depending on the grade/subject, they are between 0.4 and 0.7. If you’re not used to interpreting correlation coefficients, consider that only around one-third of teachers were in the same quintile (fifth) on both tests, and another 40 or so percent were one quintile higher or lower. So, most teachers were within a quartile, about a quarter of teachers moved two or more quintiles, and a small percentage moved from top to bottom or vice-versa.
Although, as mentioned above, these findings are in line with prior research, it is worth remembering why this “instability” occurs (and what can be done about it). Read More »
Virtually all discussions of teacher turnover focuses on teachers leaving schools and/or the profession. However, a recent working paper by Allison Atteberry, Susanna Loeb and James Wyckoff, which was presented at this month’s CALDER conference, reaches a very interesting conclusion using data from New York City: There is actually more movement within NYC schools than between them.*
Specifically, the authors show that, during the years for which they had data (1997-2002 and 2004-2010), over 50 percent of teachers in any given year exhibited some form of movement (including leaving the profession or switching schools), but two-thirds of these moves were within schools – i.e., teachers changing grades or subjects. Moreover, they find that these within-school moves, like those between-schools/professions, appear to have a negative impact on testing outcomes, one which is very modest but statistically discernible in both math and reading.
There are a couple of interesting points related to these main findings. Read More »
Our guest author today is Ian Robinson, Lecturer in the Department of Sociology and in the Residential College’s interdisciplinary Social Theory and Practice program at the University of Michigan.
Poverty is (by definition) a function of inadequate income relative to family or household size. Low income has two possible proximate causes: insufficient hours of employment and/or insufficient hourly wages. In 2001, there were four times more poor U.S. households in which someone had a job than there were in households in which no one did. The same is still true today. In other words, despite levels of unemployment far above post-World War Two norms, low wage jobs are by far the most important proximate cause of poverty in America today.
Perversely, despite this reality, the academic literature on U.S. poverty pays less attention to such jobs than it does to unemployment. A recent article, published in the journal American Sociological Review, both identifies and makes up for that shortcoming. In the process, its authors arrive at some striking conclusions. In particular, they find that unions are a major force for reducing poverty rates among households with at least one employed person. Read More »
Some of the best research out there is a product not of sophisticated statistical methods or complex research designs, but rather of painstaking manual data collection. A good example is a recent paper by Morgan Polikoff, Andrew McEachin, Stephani Wrabel and Matthew Duque, which was published in the latest issue of the journal Educational Researcher.
Polikoff and his colleagues performed a task that makes most of the rest of us cringe: They read and coded every one of the over 40 state applications for ESEA flexibility, or “waivers.” The end product is a simple but highly useful presentation of the measures states are using to identify “priority” (low-performing) and “focus” (schools “contributing to achievement gaps”) schools. The results are disturbing to anyone who believes that strong measurement should guide educational decisions.
There’s plenty of great data and discussion in the paper, but consider just one central finding: How states are identifying priority (i.e., lowest-performing) schools at the elementary level (the measures are of course a bit different for secondary schools). Read More »
** Reprinted here in the Core Knowledge Blog
How much do preschoolers from disadvantaged and more affluent backgrounds know about the world and why does that matter? One recent study by Tanya Kaefer (Lakehead University) Susan B. Neuman (New York University) and Ashley M. Pinkham (University of Michigan) provides some answers.
The researchers randomly selected children from preschool classrooms in two sites, one serving kids from disadvantaged backgrounds, the other serving middle-class kids. They then set about to answer three questions: Read More »
A new working paper, published by the National Bureau of Economic Research, is the first high quality assessment of one of the new teacher evaluation systems sweeping across the nation. The study, by Thomas Dee and James Wyckoff, both highly respected economists, focuses on the first three years of IMPACT, the evaluation system put into place in the District of Columbia Public Schools in 2009.
Under IMPACT, each teacher receives a point total based on a combination of test-based and non-test-based measures (the formula varies between teachers who are and are not in tested grades/subjects). These point totals are then sorted into one of four categories – highly effective, effective, minimally effective and ineffective. Teachers who receive a highly effective (HE) rating are eligible for salary increases, whereas teachers rated ineffective are dismissed immediately and those receiving minimally effective (ME) for two consecutive years can also be terminated. The design of this study exploits that incentive structure by, put very simply, comparing the teachers who were directly above the ME and HE thresholds to those who were directly below them, and to see whether they differed in terms of retention and performance from those who were not. The basic idea is that these teachers are all very similar in terms of their measured performance, so any differences in outcomes can be (cautiously) attributed to the system’s incentives.
The short answer is that there were meaningful differences. Read More »
In a new NBER working paper, economist Derek Neal makes an important point, one of which many people in education are aware, but is infrequently reflected in actual policy. The point is that using the same assessment to measure both student and teacher performance often contaminates the results for both purposes.
In fact, as Neal notes, some of the very features required to measure student performance are the ones that make possible the contamination when the tests are used in high-stakes accountability systems. Consider, for example, a situation in which a state or district wants to compare the test scores of a cohort of fourth graders in one year with those of fourth graders the next year. One common means of facilitating this comparability is administering some of the questions to both groups (or to some “pilot” sample of students prior to those being tested). Otherwise, any difference in scores between the two cohorts might simply be due to differences in the difficulty of the questions. If you cannot check that out, it’s tough to make meaningful comparisons.
But it’s precisely this need to repeat questions that enables one form of so-called “teaching to the test,” in which administrators and educators use questions from prior assessments to guide their instruction for the current year. Read More »
In education today, data, particularly testing data, are everywhere. One of many potentially valuable uses of these data is helping teachers improve instruction – e.g., identifying students’ strengths and weaknesses, etc. Of course, this positive impact depends on the quality of the data and how it is presented to educators, among other factors. But there’s an even more basic requirement – teachers actually have to use it.
In an article published in the latest issue of the journal Education Finance and Policy, economist John Tyler takes a thorough look at teachers’ use of an online data system in a mid-sized urban district between 2008 and 2010. A few years prior, this district invested heavily in benchmark formative assessments (four per year) for students in grades 3-8, and an online “dashboard” system to go along with them. The assessments’ results are fed into the system in a timely manner. The basic idea is to give these teachers a continual stream of information, past and present, about their students’ performance.
Tyler uses weblogs from the district, as well as focus groups with teachers, to examine the extent and nature of teachers’ data usage (as well as a few other things, such as the relationship between usage and value-added). What he finds is not particularly heartening. In short, teachers didn’t really use the data. Read More »
Education researchers have paid a lot of attention to the sorting of teachers across schools. For example, it is well known that schools serving more low-income students tend to employ teachers who are, on average, less qualified (in terms of experience, degree, certification, etc.; also see here).
Far less well-researched, however, is the issue of sorting within schools – for example, whether teachers with certain characteristics are assigned to classes with different students than their colleagues in the same school. In addition to the obvious fact that which teachers are in front of which students every day is important, this question bears on a few major issues in education policy today. For example, there is evidence that teacher turnover is influenced by the characteristics of the students teachers teach, which means that classroom assignments might either exacerbate or mitigate mobility and attrition. In addition, teacher productivity measures such as value-added may be affected by the sorting of students into classes based on characteristics for which the models do not account, and a better understanding of the teacher/student matching process could help inform this issue.
A recent article, which was published in the journal Sociology of Education, sheds light on these topics with a very interesting look at the distribution of students across teachers’ classrooms in Miami-Dade between 2003-04 and 2010-11. The authors’ primary question is: Are certain characteristics, most notably race/ethnicity, gender, experience, or pre-service qualifications (e.g., SAT scores), associated with assignment to higher or lower-scoring students among teachers in the same school, grade, and year? Read More »
** Reprinted here in the Washington Post
A big part of successful policy making is unyielding attention to detail (an argument that regular readers of this blog hear often). Choices about design and implementation that may seem unimportant can play a substantial role in determining how policies play out in practice.
A new paper, co-authored by Elizabeth Davidson, Randall Reback, Jonah Rockoff and Heather Schwartz, and presented at last month’s annual conference of The Association for Education Finance and Policy, illustrates this principle vividly, and on a grand scale: With an analysis of outcomes in all 50 states during the early years of NCLB.
After a terrific summary of the law’s rules and implementation challenges, as well as some quick descriptive statistics, the paper’s main analysis is a straightforward examination of why the proportion of schools meeting AYP varied quite a bit between states. For instance, in 2003, the first year of results, 32 percent of U.S. schools failed to make AYP, but the proportion ranged from one percent in Iowa to over 80 percent in Florida.
Surprisingly, the results suggest that the primary reasons for this variation seem to have had little to do with differences in student performance. Rather, the big factors are subtle differences in rather arcane rules that each state chose during the implementation process. These decisions received little attention, yet they had a dramatic impact on the outcomes of NCLB during this time period. Read More »
Charter schools, though they comprise a remarkably diverse sector, are quite often subject to broad generalizations. Opponents, for example, promote the characterization of charters as test prep factories, though this is a sweeping claim without empirical support. Another common stereotype is that charter schools exclude students with special needs. It is often (but not always) true that charters serve disproportionately fewer students with disabilities, but the reasons for this are complicated and vary a great deal, and there is certainly no evidence for asserting a widespread campaign of exclusion.
Of course, these types of characterizations, which are also leveled frequently at regular public schools, don’t always take the form of criticism. For instance, it is an article of faith among many charter supporters that these schools, thanks to the fact that relatively few are unionized, are better able to aggressively identify and fire low-performing teachers (and, perhaps, retain high performers). Unlike many of the generalizations from both “sides,” this one is a bit more amenable to empirical testing.
A recent paper by Joshua Cowen and Marcus Winters, published in the journal Education Finance and Policy, is among the first to take a look, and some of the results might be surprising. Read More »
Drawing on a half century of empirical evidence, as well as new data and analysis, a team of scholars has challenged the substance of many of the attacks on public employees and their unions –urging political leaders and the research community to take this “transformational” moment in the divisive and ideologically driven debate over the role of government and the value of public services to deepen their commitment to evidence-based policy ideas.
These arguments were outlined in “The Great New Debate about Unionism and Collective Bargaining in U.S. State and Local Governments,” published by Cornell University’s ILR Review. The authors – David Lewin (UCLA), Jeffrey Keefe (Rutgers), and Thomas Kochan (MIT) – point out that, with half a century of experience, there is now a wealth of data by which to evaluate public sector unionism and its effects.
In that context, the authors spell out the history, arguments and empirical findings on three key issues: 1) Are public employees overpaid?; 2) Do labor-management dispute resolution procedures, which are part of many state and local government collective bargaining laws, enhance or hinder effective governance?; 3) Have unions and managers in the public sector demonstrated the ability to respond constructively to fiscal crises? Read More »
In education policy debates, we like the “big picture.” We love to say things like “hold schools accountable” and “set high expectations.” Much less frequent are substantive discussions about the details of accountability systems, but it’s these details that make or break policy. The technical specs just aren’t that sexy. But even the best ideas with the sexiest catchphrases won’t improve things a bit unless they’re designed and executed well.
In this vein, I want to recommend a very interesting CALDER working paper by Mark Ehlert, Cory Koedel, Eric Parsons and Michael Podgursky. The paper takes a quick look at one of these extremely important, yet frequently under-discussed details in school (and teacher) accountability systems: The choice of growth model.
When value-added or other growth models come up in our debates, they’re usually discussed en masse, as if they’re all the same. They’re not. It’s well-known (though perhaps overstated) that different models can, in many cases, lead to different conclusions for the same school or teacher. This paper, which focuses on school-level models but might easily be extended to teacher evaluations as well, helps illustrate this point in a policy-relevant manner.
Read More »