When I discuss open and reproducible research with graduate students, their minds often drift toward all the bad things that could happen from having their code and data available. It is certainly true that bad things could happen, but my sense is that people often overestimate these kinds of risks and underestimate the benefits of being open. So, in this post I wanted to highlight an example of something good that can happen from being open with data and code: it can raise the visibility of your work and help make it more useful to others.
I’ve never meet Kevin Munger, but I wanted to thank him for posting his data and code. It helped me, and it helped my students. This is just one small example of a good thing that can come from being more open.
My colleagues and I are currently working on a study that applies several network reporting methods—survey methods that ask respondents to report about other people—to estimate crack use, mortality rates, and migration rates in Brazil. We have interviewed a random sample of 25,000 people spread across 27 different cities, and we’re excited about analyzing the data. But, before we can do that, the written survey forms—which were created during face-to-face interviews—had to be entered into an electronic database. In our case, the company that we hired to implement our survey did this data entry. However, we wanted to be sure that the data entry was done well, so we also conducted an independent round of data entry for a subset of the completed questionnaires using Captricity, a new online service that converts paper forms to digital data using a combination of machine learning and human verification.
In this post, we explain the system we developed in order to use Captricity to check the quality of data entry; we summarize our findings (it turns out that our survey company did a good job); and we offer advice for people that might wish to use Captricity in the future.
Bad Hessian is a “collaborative computational social science blog” that is “Don Knuth meets Charles Tilly.” If you know who both of those guys are, then I expect that you will enjoy reading Bad Hessian.
The distinction between “basic” and “applied” research is made so often that it seems natural to many people. In fact, depending on where you sit — in academia or in industry — you probably look down on one or the other. One example, however, shows this dichotomy to be false: the research of Louis Pasteur. In fact, Pasteur’s work not only demolishes the current framework, it also suggests a better way to think about research.
The argument that I’m going to summarize here was first put forward by Donald Stokes in his book Pasteur’s Quadrant. But, before getting to how Pasteur should change how you think, let’s examine how we got to where we are today.
As World War II was winding down President Roosevelt asked Vannevar Bush, the director of the wartime Office of Scientific Research and Development, to prepare a plan for the role of scientific research in peacetime. Bush’s report — Science, the Endless Frontier — had two main elements that are still recognizable today in the way that many scholars and funders think about research:
Basic research and applied research sit on opposite ends of a one-dimensional continuum. In other words, your research cannot be basic and applied at the same time.
Insights from basic research trickle down to applied research and finally product development.
Bush’s report, and these two ideas in particular, have framed the way that many Americans think about science and science policy. The only problem is that this framework is broken, as is demonstrated by the work of Louis Pasteur.
Was Pasteur’s research basic or applied? In one series of projects, Pasteur worked on the problem of converting beet juice into alcohol for an industrialist in Lille; it is hard to think of a problem that seems more applied. Yet,
“as he pursued this research, he began to fashion a framework for understanding a whole new class of natural phenomena, and he obtained the strikingly original result certain microorganisms were capable of living without free oxygen. This work launched his assault on the medieval doctrine of the spontaneous generation of life and led to brilliant later studies in which he developed the germ theory of disease.” (Stokes, 1997, p. 13)
In other words, Pasteur’s work was both “basic” (e.g., the germ theory of diseases) and “applied” (e.g., converting beet juice into alcohol) at the same time. Building on the example of Pasteur, Stokes proposed a two dimension classification scheme based on 1) whether the research is use-inspired and 2) whether the research involves a quest for fundamental understanding. This figure pretty much summarizes the argument:
I find this two dimensional framework very appealing because it matches my own experiences with how research happens, and Pasteur’s Quadrant crystallized my own beliefs that solving real problems is in no way in conflict with doing “real science.” We can’t all be Pasteur, but if we design our research correctly, we can all be in Pasteur’s quadrant.
P.S. I’d like to thank Neal Patel for first telling me about Pasteur’s Quadrant.
P.P.S. Coincidentally, I checked out my copy of Pasteur’s Quadrant from the Donald E. Stokes Library which is directly beneath my office. Stokes has a library named after him because he was the Dean of Princeton’s Woodrow Wilson School of Public and International Affairs for 18 years (1974 to 1992). My guess is that being the Dean of a policy school brings one into frequent contact with use-inspired research that seeks fundamental understanding.
P.P.P.S. Google also seems to use the framework developed in Pasteur’s Quadrant to think about the research they do. In this article, they write “. . . in the terminology of Pasteur’s Quadrant, we do ‘use-inspired basic’ and ‘pure applied’ research.” In other words, they are in Edison’s quadrant and Pasteur’s quadrant.