Randomized Trials and Quasi-Experiments in Education Research
The 2001 No Child Left Behind (NCLB) Act promises a series of significant reforms. The hope is that these reforms will jump-start under-performing American schools. Most public discussion of the Act has focused on the mandate for test-based school accountability and the federal endorsements of charter schools and other forms of school choice. Other important provisions include changes in funding rules for states and a new emphasis on reading instruction. The NCLB Act also repeatedly calls for education policy to rely on a foundation of scientifically based research. Although this appears to be a bland technical statement, it strikes me as potentially at least as significant as other components of the Act.
What is Scientifically Based Research
NCLB defines scientifically-based research as research using rigorous methodological designs and techniques, including control groups and random assignment. In a presentation made shortly after President Bush signed NCLB into law, the deputy director of the Office of Research in the Department of Education put studies involving randomized trials and quasi-experiments at the top of the methodological hierarchy.
Randomized trials are experiments in which the division into treatment and control groups is determined at random (for example, by tossing a coin). Quasi-experimental research designs are based on naturally occurring circumstances or institutions that (perhaps unintentionally) divide people into treatment and control groups in a manner akin to purposeful random assignment.
A reliance on control groups and random assignment indeed would mark a new direction for education research. For example, an important question on the education research agenda is the role of technology in schools. Most previous research on the use of technology in the classroom (computer-aided instruction or CAI) relies on uncontrolled measurements, such as the level of satisfaction experienced by technology users. Not surprisingly, teachers and students typically report that they enjoy using new computer equipment (as shown in a recent study of laptops in Maine's public schools). But this does not establish that students who use the laptops are learning more, or that the expenditure on computers meets a cost-benefit standard (after all, computer hardware and software is expensive).
Randomized trials provide the best scientific evidence on the effects of policies like educational technology, changes in class size, or school vouchers because differences between the treatment and control group can be attributed confidently to the treatment. A good quasi- or natural experiment is the next best thing to a real experiment. In some cases, quasi-experiments also involve random assignment, such as in the lotteries sometimes used to distribute school vouchers. In addition to comparing apples to apples, randomized trials and natural experiments also rely on assessments by disinterested non-participants and on clearly defined outcomes that other researchers can reproduce and interpret. This is what science is all about. In contrast, U.S. education policy has often relied on evidence that is fragmentary or anecdotal, uses subjective outcomes, and, most importantly, fails to make rigorous comparisons of treatment and control groups.
If successful, a shift to scientifically based research will move the study of education much closer to medicine, which has been experiencing a similar transition to scientifically based research over the last half-century. NBER researchers have been in the vanguard of this transition to scientifically based research on education. We have used natural experiments -- and in some cases, actual randomized trials -- to provide powerful evidence on issues ranging from the effects of compulsory attendance laws to changing class size. I describe some of this work below, focusing on my own efforts. I have used quasi-experiments -- and in recent and ongoing projects, randomized trials -- to make scientifically grounded inferences regarding the effects of achievement incentives and school choice, school resources, and macro education policy.
School Incentives and School Choice
The desire to help disadvantaged teens get through high school is a recurring theme of school reform proposals. Most anti-dropout efforts involve the provision of support services to low-achieving students. But the results from recent demonstration projects assessing services for at-risk high-school students have been disappointing.2 Motivated by the economic view of education, which sees student effort in school as determined partly by a comparison of the costs and benefits of effort devoted to schooling, Victor Lavy and I developed a unique program that rewards Israeli high school students who pass their high school matriculation exams (something like the New York Regents exam or British A Levels) with cash payments. Although this project was controversial in Israel (and eventually was cancelled as a political liability), it is in the spirit of a 1998 proposal by former Labor Secretary Reich, who suggested that students from low-income families in the United States be offered a $25,000 cash bonus for graduating from high school. It is also similar to the merit-based stipends common in higher education.
Perhaps most unusually, Lavy and I implemented the Achievement Awards program as a school-based randomized trial.3 That is, we identified 40 of the lowest-achieving schools in Israel, and randomly selected 20 of them for participation in the program. Any student from the 20 treatment schools who passed their exams was eligible for a $1500 payment, quite a large sum in Israel, although small relative to the private and social costs of dropping out. Only about 18 percent of students in the control group completed their matriculation exams. Students in our control group were about 7 percentage points more likely to complete their matriculation exams, a statistically significant difference with an economic benefit that easily outweighs the cost of bonus payments.
The Achievement Awards demonstration is the first of what we hope will be a series of randomized trials designated to test education incentive plans. In research in progress, Lavy and I are evaluating a package of incentives that provides awards for teachers as well as for students. A unique feature of our ongoing work is that the new demonstration project includes a component specifically designed to explore the interaction of student and teacher incentives.4 We also plan a long-run follow up study of the Achievement Awards program.
One of the most controversial innovations highlighted by NCLB is school choice. In a recently published paper,5 my collaborators and I studied what appears to be the largest school voucher program to date. This program provided over 125,000 pupils from poor neighborhoods in the country of Colombia with vouchers that covered approximately half the cost of private secondary school. Colombia is an especially interesting setting for testing the voucher concept because private secondary schooling in Colombia is a widely available and often inexpensive alternative to crowded public schools. (In Bogota, over half of secondary school students are in private schools.) Moreover, governments in many poor countries are increasingly likely to experiment with demand-side education finance programs, including vouchers.
Although not a randomized trial, a key feature of our Colombia study is the exploitation of voucher lotteries as the basis for a quasi-experimental research design. Because demand for vouchers exceeded supply, the available vouchers were allocated by lottery in large cities. Our study compares voucher applicants who won a voucher in the lottery to those who lost. Since the lotteries used random assignment, losers provide a good control group for winners. A comparison of voucher winners and losers shows that three years after the lotteries were held, winners were 15 percentage points more likely to have attended private school and were about 10 percentage points more likely to have finished eighth grade, primarily because they were less likely to repeat grades. Lottery winners also scored 0.2 standard deviations higher on standardized tests. A follow-up study in progress shows that voucher winners also were more likely to apply to college. On balance, our study provides some of the strongest evidence to date for the possible benefits of demand-side financing of secondary schooling, at least in a developing country setting.6
Research on vouchers naturally focuses on the question of whether voucher recipients benefit from the opportunity to use vouchers. A related question that gets less attention arises from the fact that voucher recipients and other school choice beneficiaries are typically low-income. For example, NCLB singles out the students in the worst schools as being eligible for choice. In particular, NCLB requires districts to allow students in schools judged to be "failing" the opportunity to change schools. Policymakers and parents in the schools that accept these students have wondered what the consequences will be for high-achieving children when low achievers from poor areas choose to attend their schools. Economists refer to research on questions of this sort as the measurement of peer effects.
Boston's long-running Metco program provides a unique opportunity to estimate peer effects in the classroom using a quasi-experimental research design. Metco gives mostly black students in the Boston public school district the opportunity to attend schools in more affluent suburban districts. Kevin Lang and I focus on the impact of Metco on the students in one of the largest Metco-receiving districts.7 Because Metco students have substantially lower test scores than local students, this inflow generates a significant decline in average scores. Our research shows that the overall decline in scores is attributable to a composition effect, though, because we find no impact on average scores in a sample limited to non-Metco students. This weighs against the hypothesis of significant negative peer effects as a result of school choice (although we do find a short-lived negative effect on the scores of minority third graders in reading and language). Our research on Metco exploits idiosyncratic features of the process used to allocate Metco students to different schools through what is known as the "regression-discontinuity" method for analysis of quasi-experiments.
School Resources
Another strand of my work uses quasi-experiments to look at what economists call the education production function. This research links school resources, including computers and class size, with outcomes such as student achievement on standardized tests. The principal challenge in research of this type, as in most empirical research in economics, is in isolating cause and effect. Many factors make the observed correlation between school resources and student achievement hard to interpret. Rich and poor districts differ on many dimensions, teachers sort students into classes of different size, and students and parents make systematic choices that are reflected in the resources/achievement relationship.
The question of how technology affects learning has been at the center of recent debates over educational inputs. My most recent research on school resources, joint with Lavy,8 exploits a natural experiment arising from the fact that the Israeli State lottery, which uses lottery profits to sponsor various social programs, funded a large-scale computerization effort in many elementary and middle schools. Although lottery officials did not use random assignment to allocate the computers across towns and schools, they used an idiosyncratic priority scheme that appears to have an essentially random component. We used this to estimate the impact of computerization on both the instructional use of computers and pupil achievement. Results from a survey of Israeli school-teachers show that the influx of new computers increased teachers' use of CAI in the fourth grade, with a smaller effect on CAI in eighth grade. Perhaps surprisingly, CAI does not appear to have had educational benefits that translated into higher test scores. In fact, estimates for fourth graders show lower math scores in the group that was awarded computers, with smaller (insignificant) negative effects on language scores. These results call into question the widely-held view that additional resources should be devoted to CAI.9
Another central question in the school resources debate is the importance of class size. Although recent years have seen renewed interest in the class-size question, academic interest in this topic is not simply a modern phenomenon: the choice of class size has been of concern to scholars and teachers for hundreds of years. One of the earliest references on this topic is the Babylonian Talmud, completed around the beginning of the 6th century, which discusses rules for the determination of class size and pupil-teacher ratios in bible study. The great 12th century Rabbinic scholar, Maimonides, interprets the Talmud's discussion of class size as follows: "Twenty-five children may be put in charge of one teacher. If the number in the class exceeds twenty-five but is not more than forty, he should have an assistant to help with the instruction. If there are more than forty, two teachers must be appointed."
In my first study of school resources, also joint with Lavy,10 we use Maimonides' rule capping class size at 40 to construct a natural experiment to estimate the effects of class size on the scholastic achievement of Israeli pupils. To see how this experiment works, note that according to Maimonides' rule, class size increases one-for-one with enrollment until 40 pupils are enrolled, but when 41 students are enrolled, there will be a sharp drop in class size, to an average of 20.5 pupils. Similarly, when 80 pupils are enrolled, the average class size will again be 40, but when 81 pupils are enrolled the average class size drops to 27. Our use of this variation is an application of the quasi-experimental regression-discontinuity method.
Interestingly, the observed association between class size and student achievement in our data is always perverse (that is, students in larger classes tend to do better). But this illustrates the importance of research using a good experiment. Estimates of class size effects using Maimonides' Rule suggest that reductions in class size induce a significant and substantial increase in math and reading achievement for fifth graders, and a modest increase in reading achievement for fourth graders. We gain confidence is this result (as opposed to the simple correlation between class size and test scores) because a randomized trial manipulating class size in Tennessee generated similar estimates.11
The Effects of Macro-Education Policy
The work discussed above focuses on a micro-level analysis of students and schools. I have also used natural experiments to study legislative and other macro-level education policies. Here, experiments are harder to come by and research may have to rely on simple policy shifts that affect some states and not others. Nevertheless, this work follows the natural-experiments model in that there is always a well-defined control group. For example, Jon Guryan and I recently looked at state changes in teacher certification requirements.12 We find that states that introduced teacher tests (such as the national teachers examination) ended up paying higher teacher salaries with no measurable increase in teacher quality. This suggests that tests are more of a barrier to entry than an effective quality screen.
In earlier work, Alan Krueger and I looked at the effects of compulsory attendance laws.13 This research exploits the interaction between individuals' quarter of birth and state laws (children born earlier in the year are allowed to drop out of school after having completed less schooling than those born later). More recently, Daron Acemoglu and I have used state compulsory attendance laws to estimate the social returns to education (that is, an economic benefit beyond that accruing to the more educated individuals themselves).14 I also have looked at natural experiments increasing the education infrastructure, for example a large-scale expansion of higher education in the West Bank and Gaza Strip.15 Finally, Lavy and I studied the economic consequences of the change in language of instruction in Morocco's secondary schools.16
Conclusion
In addition to providing evidence on specific questions, I believe that an important overall contribution of my work on education has been to document the feasibility and promise of both quasi-experimental methods and randomized trials in education research. Many other NBER researchers are also involved in this work and I expect that education research along these lines will be a growth area for economists in the years to come. I am certainly looking forward to doing more of it.
Endnotes
M. Dynarski and P. Gleason, How Can We Help? What Have We Learned from Evaluations of Federal Dropout-Prevention Program, Mathematica Policy Research report 8014-140, Princeton, NJ, June 1998.
J. D. Angrist and V. Lavy, "The Effect of High School Matriculation Awards" Evidence from Randomized Trials,' NBER Working Paper 9389, December 2002.
See P. Glewwe, N. Ilias, and M. Kremer, "Teacher Incentives," NBER Working Paper 9671, May 2003, for a recent randomized trial of teacher incentives in Kenya.
J. D. Angrist, E. P. Bettinger, E. Bloom, E. King, and M. Kremer, "Vouchers for Private Schooling in Colombia: Evidence from a Randomized Natural Experiment," NBER Working Paper 8343, June 2001, and in American Economic Review, 92 (5) (December 2002), pp. 1535-58.
Evidence on voucher effects for the United States has been more mixed. Two studies involving randomization are Alan B. Krueger and P. Zhu, "Another Look at the New York City Voucher Experiment," NBER Working Paper 9418, January 2003 and C. E. Rouse, "Private School Vouchers and Student Achievement: An Evaluation of the Milwaukee Parental Choice Program," The Quarterly Journal of Economics, 113 (2) (May 1998). C. M. Hoxby, "Does Competition Among Public Schools Benefit Students and Taxpayers?" American Economic Review, 90 (5) (December 2000), pp. 1209-38 is a quasi-experimental study of school choice. In work in progress, J. B. Cullen, S. D. Levitt, and B. A. Jacob are using lotteries to study school choice in the Chicago public schools in, "The Impact of School Choice on Enrollment and Achievement: Evidence from over 1000 Lotteries," manuscript, 2003 (forthcoming as an NBER Working Paper).
J. D. Angrist and K. Lang, "How Important are Classroom Peer Effects? Evidence from Boston's METCO Program," NBER Working Paper 9263, October 2002.
J. D. Angrist and V. Lavy, "Using Maimonides' Rule to Estimate the Effect of Class Size on Student Achievement," Quarterly Journal of Economics, 114 (2) (May 1999), pp. 533-75.
A. B. Krueger, "Experimental Estimates of Education Production Functions," Quarterly Journal of Economics, 104 (1999) pp. 497-532. But see also C. M. Hoxby, "Does Competition Among Public Schools Benefit Students and Taxpayers?, " which finds little evidence of a class size effect using quasi-experimental methods to analyze data from Connecticut.
J. D. Angrist and J. Guryan, "Does Teacher Testing Raise Teacher Quality? Evidence from State Certification Requirements," NBER Working Paper 9545, March 2003.
J. D. Angrist and A. B. Krueger, "Does Compulsory School Attendance Affect Schooling and Earnings?" Quarterly Journal of Economics, 106 (November 1991), and "The Effect of Age at School Entry on Educational Attainment: An Application of Instrumental Variables with Moments from Two Samples," Journal of the American Statistical Association, (June 1992).
J. D. Angrist and D. Acemoglu, "How Large are the Social Returns to Education? Evidence from Compulsory Attendance Laws," NBER Macro Annual, 15, Cambridge, MA: MIT Press, 2000.
J. D. Angrist, "The Economic Returns to Schooling in the West Bank and Gaza Strip," American Economic Review, 85 (5) (December 1995), pp. 1065-87. A related and more recent study in this spirit is E. Duflo, "The Medium Run Effects of Educational Expansion: Evidence from a Large School Construction Program in Indonesia," NBER Working Paper 8710, January 2002.