Bit by Bit: Social Research in the Digital Age is now in Open Review

Bit by Bit: Social Research in the Digital Age is a book for social scientists who want to do more data science and data scientists who want to do more social science.  I’m happy to announce that the entire manuscript is now in Open Review.  That means that you can read it and help make it better by adding annotations.

Here’s how the book starts:

In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls between family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. The researchers were studying wealth and poverty by conducting a survey of people who had been randomly sampled from a database of 1.5 million customers from Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the participants if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.

Everything I have said up until now makes this sound like a traditional social science survey. But, what comes next is not traditional, at least not yet. They used the survey data to train a machine learning model to predict someone’s wealth from their call data, and then they used this model to estimate the wealth of all 1.5 million customers. Next, they estimated the place of residence of all 1.5 million customers by using the geographic information embedded in the call logs. Putting these two estimates together—the estimated wealth and the estimated place of residence—Blumenstock and colleagues were able to produce high-resolution estimates of the geographic distribution of wealth across Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.

It was impossible to validate these estimates because no one had ever produced estimates for such small geographic areas in Rwanda. But, when Blumenstock and colleagues aggregated their estimates to Rwanda’s 30 districts, they found that their estimates were similar to estimates from the Demographic and Health Survey, the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about 10 times faster and 50 times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and lower cost estimates create new possibilities for researchers, governments, and companies (Blumenstock, Cadamuro, and On 2015).

In addition to developing a new methodology, this study is kind of like a Rorschach inkblot test; what people see depends on their background. Many social scientists see a new measurement tool that can be used to test theories about economic development. Manydata scientists see a cool new machine learning problem. Many business people see a powerful approach for unlocking value in the digital trace data that they have already collected. Many privacy advocates see a scary reminder that we live in a time of mass surveillance. Many policy makers see a way that new technology can help create a better world. In fact, this study is all of those things, and that is why it is a window into the future of social research.

If you want to see why I think that study is a window into the future of social research,  check out the rest of the book:

a gallery of personal networks

In an earlier post I described how you could create a visualization of your personal network on Facebook. To give you some sense of what these visualizations look like, here are a few that were donated by my students.

If you are teaching a class in social networks this semester and would like to have your students do this activity too, my students would be excited to compare the results.


Continue reading

lagged evaluation


The standard time to evaluate a course is at the end of the semester.  That is when we typically solicit feedback from students in order to improve our course for the future.  But, that’s not the only time that we can get feedback.  If we got feedback during the class, we could use it to improve that particular class, and, in general, iterate and improve more quickly.  And, if we get feedback long after the class is over, I think that we can gain insights about deeper and more interesting kinds of things.  Did the students actually remember what we taught?  What turned out to be most useful to them?  What did they think was useful that turned out not to be?  Therefore, to evaluate my course from last spring, I did an evaluation this spring.  Here’s what I learned from my lagged evaluation.

Continue reading

the visible hand: an online field experiment to measure discrimination


Source: Fig 1 of Doleac and Stein (working paper)

In several previous posts, I’ve written about the value of moving beyond simple measures of treatment effects in experiments.  That is, in addition to measuring an effect of a treatment, we would like to know 1) when the effect might be big or small and 2) why the effect occurs.  In this post, I’ll write about the online field experiment of Doleac and Stein (2013) that makes use of digital technologies to study discrimination in market transactions and that attempts to understand the when and why of the effect they measured.

To provide some context for the Doleac and Stein study, there is intense debate about the amount of racial discrimination that takes place in contemporary America.  Experimental approaches are a promising way to meet this measurement challenge because they can clearly detect any difference in outcomes for two people who are identical other than their race.  There are two main experimental approaches to measuring discrimination—correspondence studies and audit studies—and the Doleac and Stein study combines some of the best features of both approaches.

Correspondence studies, which usually involve sending written application materials to potential employers, signal the race of the applicant by manipulating the applicant’s name.  A great example of a correspondence study is Bertrand and Mullainathan’s (2004) paper with the memorable title “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination.”  Correspondence studies have relatively low cost per observation, which enables a single researcher to collect thousands of observations in a typical study.  But, correspondence studies of racial discrimination have been questioned because names potentially signal many things in addition to the race of the applicant.  That is, names such as Greg, Emily, Lakisha, and Jamal may signal social class in addition to race.  Thus, any difference in treatment for resumes of Greg’s and Jamal’s might be due to more than presumed race differences of the applicants.  Audit studies, on the other hand, involve hiring actors of different races to apply in person for jobs or housing.  A great example of an audit study is Ayres and Siegelman (1995) in which actors negotiated the purchase of new cars and found that black customers were consistently quoted higher prices than white customers.  Unfortunately, audit studies also have limitations, and they are extremely expensive per observation.  This cost structure typically limits the size of audit studies to hundreds of observations.  Doleac and Stein take advantage of a web-based marketplace perform something a hybrid.  They are able to collect data at relatively low cost per observation, resulting in thousands of observations (as in a correspondence study), and they are able to signal race using photographs, which creates a signal of race that is unconfounded (as in an audit study).

To study the effect of race on market transactions the authors advertised thousands of iPods in an online marketplace (e.g., craigslist).  The advertisements that they posted varied along three main dimensions.  First, they varied the characteristics of the seller, which was signaled by the hand photographed holding the iPod as shown at the top of this post [white, black, white with tattoo].  Second, they varied the asking price [$90, $110, $130].  Third, they varied the quality of the ad text [high-quality and low-quality (e.g., cApitalization errors and spelin errors)].  Thus, the authors had a 3 X 3 X 2 design which was deployed across more than 300 local markets ranging from towns (e.g., Kokomoe, IN and North Platte, NE) to mega-cities (e.g., New York and Los Angeles).  Averaged across all conditions, the outcomes were better for the white seller than the black seller, with the tattooed seller having intermediate results.  For example, white sellers received more offers and had higher final sale prices.

Beyond these average effects, the experimental design allows for richer comparisons that can help us understand where and why discrimination might be happening.  For example, one prediction from theory is that discrimination would be less in markets that are more competitive.  Using the number of offers received as a proxy for market competition, the authors found that black sellers do indeed receive worse offers in markets with a low degree of competition between buyers.  Further, by comparing outcomes for the ads with high-quality and low-quality text, the authors find that ad quality does not impact the disadvantage faced by black and tattooed sellers.  Finally, taking advantage of the fact that advertisements were placed in more than 300 markets, the authors find that black sellers are more disadvantaged in cities with high crime rates and high residential segregation.

None of these results give us a precise understanding of exactly why black sellers had worse outcomes, but, when combined with the results of other studies, they can begin to inform theories about the causes of racial discrimination in different types of economic transactions.  Thus, by making use of the web, Doleac and Stein were able to combine the scale of correspondence studies with an unconfounded signal of race.  Their design could serve as a useful model for others interested in studying discrimination.

To read the entire paper, check out:

For a related study, check out:

Zotero and BibTeX

zoteroI love using Zotero as my bibliography management system.  It is easy to use, free, and open-source.  However, I’ve recently encountered a minor technical problem exporting my Zotero database to a BibTeX file: I was having trouble preserving the capitalization in the titles of journal articles.  This problem stumped me for longer than I can to report, so in the hopes of saving others some times, here’s how I solved the problem.

Problem: When Zotero was exporting to a BibTeX file, it was not properly preserving the capitalization for words in the titles of papers.  For example, I have a paper with the title “An analysis of respondent driven sampling with injection drug users (IDU) in Albania and the Russian Federation”.  When you export it, the title Zotero creates for BibTeX is: “An analysis of respondent driven sampling with injection drug users ({IDU)} in Albania and the Russian Federation”.  The problem here is that given this formatting, BibTeX does not know that “Albania” and “Russian Federation” are proper nouns that should be capitalized.  To tell that to BibTeX the appropriate title in the BibTeX file should be: “An analysis of respondent driven sampling with injection drug users ({IDU)} in {Albania} and the {Russian Federation}”.  Using the {} tells BibTeX to preserve the case .  So, now the question: how can I get Zotero to create that as the BibTeX file?  As a first guess I tried in Zotero: “An analysis of respondent driven sampling with injection drug users (IDU) in {Albania} and the {Russian Federation}” but that produced in BibTeX something like: “An analysis of respondent driven sampling with injection drug users (IDU) in {\{Albania\}} and the {\{Russian Federation\}”.  In essence, Zotero was thinking that I actually wanted the “{” to appear in the title so it was adding the “\” to preserve them.  After trying many variations and doing lots of web searching, I found a great blog post by Ohad Schneider, which lead me to a solution.

To get things to work as I wanted, I needed to manually edit the Zotero BibTeX export function.  It might sound a bit scary, but Ohad’s blog post led me to the right spot.  Here’s what you can do.  Find this part of the file BibTeX.js located in the “translators” sub-directory of your Zotero data directory

Change it to (note the changes in line 8 and 9):

After making this change, I closed Firefox and everything seems to work now.

Update: You may need to do this change more than once.  It seems that every time that Zotero updates itself, it overwrites this change.