During the war in Vietnam, the United States increased the size of it’s armed forces through a draft. In order to decide which citizens would be called into service, the US government held a lottery. Every birthdate was represented on a piece of paper, and these papers were placed in a large glass jar. As shown in the picture at the top of this post, these slips of paper were drawn from the jar one at a time to determine the order that young men would be called to serve (young women were not subject to the draft). Based on the results of the 1970 lottery, men born on September 14 were called first, men born on April 24 were called second, and so on. Ultimately, in this lottery, men born on 195 different days were called to service while men born on 171 days were not called. (If you want to determine whether you would have been called, reference this table provided by the Selective Service. If you are male and if the number for your birthdate is 195 or lower, then you would have been called).
Although it might not be immediately apparent, a draft lottery has a critical similarity to a randomized controlled experiment: in both situations participants are randomly assigned to receive a treatment. Thus, by comparing outcomes for those who received the treatment and those who did not, we can measure the effect of that treatment. In the case of the draft lottery, if we are interested in learning about the effects of draft-eligibility and military service on subsequent labor market earnings, we can compare outcomes for people whose birthdates were below the lottery cutoff (e.g., September 14, April 24, etc.) with the outcomes for people whose birthdays were after the cutoff (e.g., February 20, December 2, etc.). Using a version of this analytic strategy, Angrist concluded that the earnings of white veterans were about 15% less than the earnings of comparable non-veterans. In other words, fighting in Vietnam had a negative effect on post-war earnings.
As this example illustrates, sometimes social, political, or natural forces create experiments or near-experiments that can be leveraged by researchers. These “accidental” experiments are called natural experiments. That is, from the perspective of the researcher, natural experiments are fortunate accidents, even if these situations are in no way fortunate for their participants. Often natural experiments are the best way to estimate cause-and-effect relationships in settings where it is not ethical or practical to run randomized controlled experiments.
However, the analysis of natural experiments can be quite tricky. For example, in the case of the Vietnam draft, not everyone who was draft-eligible ended up serving (there were a variety of exemptions). And, at the same time, some people who were not draft-eligible volunteered for service. It was as if in a clinical trial of a new drug, some people in the treatment group did not take their medicine and some of the people in the control group somehow received the drug. This problem, called two-sided noncompliance, as well as many other problems are described in greater detail in Thad Dunning’s excellent book, Natural Experiments in the Social Science: A Design-Based Approach.
More generally, one can think of natural experiments as being between observational studies, where the researcher does not intervene in the world at all, and randomized controlled experiments, where the researcher intervenes in a very carefully manner. Dunning suggests that we distinguish these three research designs—observational studies, natural experiments, randomized controlled experiments—by three criteria:
|observational study||natural experiment||randomized controlled experiment|
|responses of people in the treatment group are compared to the responses of people in the control group|
|the assignment of people to the treatment and control group is at random (or nearly random)|
|the treatment and assignment process are under the control of the researcher|
The strategy of searching for natural experiments precedes the digital age, going back as far as John Snow’s amazing detective work to understand the causes of the 1854 cholera outbreak in London. However, the automatically collected “digital exhaust” of big data greatly facilities researchers ability to leverage natural experiments when they happen. That is, once you discover that a natural experiment has occurred, big data can provide the outcome data that you need in order to compare the results for people in the treatment and control conditions. For example, in his study of the effects of the draft and military service, Angrist made use of earnings records from the Social Security Administration; without this data, his study would not have been possible.
Once you know what to look for, you too can start discovering natural experiments. In order to build your intuition about natural experiments, I’ll describe some neat examples of researchers discovering natural experiments and then leveraging automatically-created digital data. After going through this series of examples, I’ll write a blog post with an overall framework for evaluating natural experiments. As we will see, identifying natural experiments is an important strategy for finding scientific value in big data.
For more on using the Vietnam draft lottery as a natural experiment, see:
- Angrist (1990). Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records. The American Economic Review. (see this errata to understand the figures)
- Hearst et al. (1986) “Delayed Effects of the Military Draft on Mortality. New England Journal of Medicine.
- Erikson and Stoker. (2011) Caught in the Draft: The Effects of the Vietnam Draft Lottery Status on Political Attitudes. American Political Science Review.
For more on natural experiments in general, see:
- Dunning (2012). Natural Experiments in the Social Science: A Design-Based Approach. Cambridge University Press.