meetup about teaching computational social science at ASA

Please join me for an informal meetup about teaching computational social science Monday, August 14 at 3pm.  We will meet at the Princeton University Press booth in the exhibit hall at ASA.  The purpose of the meetup is for people teaching computational social science—or thinking about teaching it—to share experiences and troubleshoot common problems.  The number and variety of courses on computational social science is growing rapidly, and I think that we can all benefit from hearing about the exciting things that people are doing.  I look forward to seeing you in Montreal.

natural experiments created by online and offline processes

In an earlier post, I described how the “discovery” of natural experiments in the world can enable researchers to leverage automatically collected big data to learn about cause-and-effect relationships. When combining natural experiments with big data, a useful conceptual distinction is whether the source of randomization (or near randomization) comes from an online or offline source. To clarify this distinction, in this post I’ll write about two different natural experiments: one that studies the effect of a new tax on smoking behavior and one that studies the effect of auction information on revenue.

Natural experiment created by offline processes

Smoking is a leading cause of premature death around the world so there is great interest in understanding how social policies, such as tax rates, effect smoking behavior. On April 1, 2009 the US government nearly tripled the federal tax on a cigarettes from $0.39 to $1.01 per pack. In order to measure the effect of this policy change on rates of smoking cessation, Ayers et al. tracked search query volume from the US for the phrases like “quit smoking” and “cheap cigarettes” before and after the new tax went into effect.  Thus, they were attempting to measure whether the effect of the policy was to encourage people to quit smoking or just to look for cheaper cigarettes.  The time-series of these searches in the US were then compared to a comparable time-series from Canada, a jurisdiction where these was no comparable change.


By comparing query volume in US (solid lines) and Canada (dotted lines), the authors found that there was surge in search volume for the phrase “quit smoking” (shown in blue) at the time of the tax increase, but that search volume quickly returned to original levels suggesting that the tax change had no long-term effect on smoking behavior.

The use of legal changes and jurisdictional boundaries to create natural experiments has a long history and numerous potential pitfalls, as described in more detail in Thad Dunning’s excellent book Natural Experiments in the Social Sciences: A Design-Based Approach. However, for the purposes of this blog post, there are two important features of this study. First, exogenous variation created offline can be merged with big data sources. Second, the outcome in the big data sources (e.g., search volume) was not the same as the real outcome of interest (e.g., change in actual smoking behavior). This mismatch is common because it is unlikely that the real outcome of interest will just happen to be collected. Therefore, an important consideration in natural experiments is the fit between what is measured and what is actually important.

For more on this study, see:

Natural experiment created by online change

In addition to natural experiments created by offline changes, it is also possible to study natural experiments created by online changes.  For example, Brown et al. took advantage of a change in eBay policy to study the effect of price shrouding on revenue.  Price shrouding occurs when a seller hides part of the price of an item from the buyer (e.g., a seller might list a shipping and handling fee is very small print).  Economic theory offers conflicting predictions about the effect of price shrouding on revenue.  If consumers are fully informed, price shrouding might have no effect, but if consumers do not have all the necessary information, price shrouding could increase revenue (by essential tricking consumers) or decrease revenue (by causing suspicious consumers to withdraw from the market).

On October 28, 2004 eBay changed their policy on the shrouding of shipping information, providing researchers a change to estimate the effect of this information on revenue.  Before the policy change it was relatively easy to shroud shipping costs, but after the change this information could not be shrouded.  How did this change effect revenue?  By comparing similar items — more specifically gold and silver coins — sold before and after the policy change, Brown et al. concluded that raising shipping changes increases revenue and the effect is bigger when shipping prices can be shrouded.

As with the study of search volume following a change in the cigarette taxes, the study of the effect of shrouding takes advantage of changes in policy.  However, these two studies have different strengths and weaknesses.  While the study of the effect of the increase in cigarette taxes suffered because the outcome information (search volume) was different from the desired outcome information (smoking behavior), in the case of the eBay study, the outcome information (revenue) is exactly the desired outcome.  However, the eBay study suffers from the limitation that it only measures the effect of shrouded shipping prices at one place (eBay) at one time (September to December 2004) for two types of items (gold and silver coins).  If the effect of shrouding on prices is heterogeneous, then no single measurement, no matter how clever, can fully describe the effect of shrouding on revenue.

For more on this study, see:


As described in an earlier post, the merging of natural experiments happening in the world with automatically collected big data can provide researchers new ways to learn about cause-and-effect relationships.  As we move to a world where all behavior is recorded digitally, it will soon be the case that every natural experiment enables researchers to learn about the responses in many different ways.  Sometimes the source of the natural experiment can be an online change (e.g., change in eBay policy) and sometimes it can be offline change (e.g., change in cigarette tax).  The two examples in this blog post also highlight some possible limitations of natural experiments.  In some situations the outcome measure that is automatically recorded in big data sources will not be the outcome that is really important.  Also, an effect measured from one particular natural experiment might not provide us a full understanding of the heterogeneity of treatment effects.