Announcing the Open Review Toolkit

Originally post on Freedom to Tinker

ort-logo

I’m happy to announce the release of the Open Review Toolkit, open source software that enables you to convert your book manuscript into a website that can be used for Open Review. During the Open Review process everyone can read and annotate your manuscript, and you can collect valuable data to help launch your book. The goals of the Open Review process are better books, higher sales, and increased access to knowledge. In an earlier post, I described some of the helpful feedback that I’ve received during the Open Review of my book Bit by Bit: Social Research in the Digital Age.  Now, in this post I’ll describe more about the Open Review Toolkit—which has been generously supported by a grant from the Alfred P. Sloan Foundation—and how you can use it for your book.

Continue reading

Open Review leads to better books

Originally posted on Freedom to Tinker

open-review-bit-by-bit.png

My book manuscript, Bit by Bit: Social Research in the Digital Age, is now in Open Review. That means that while the book manuscript goes through traditional peer review, I also posted it online for a parallel Open Review. During the Open Review everyone—not just traditional peer reviewers—can read the manuscript and help make it better.

open-review-schematic-4.png

I think that the Open Review process will lead to better books, higher sales, and increased access to knowledge.  In this blog post, I’d like to describe the feedback that I’ve received during the first month of Open Review and what I’ve learned from the process.

Continue reading

Bit by Bit: Social Research in the Digital Age is now in Open Review

Bit by Bit: Social Research in the Digital Age is a book for social scientists who want to do more data science and data scientists who want to do more social science.  I’m happy to announce that the entire manuscript is now in Open Review.  That means that you can read it and help make it better by adding annotations.

Here’s how the book starts:

In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls between family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. The researchers were studying wealth and poverty by conducting a survey of people who had been randomly sampled from a database of 1.5 million customers from Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the participants if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.

Everything I have said up until now makes this sound like a traditional social science survey. But, what comes next is not traditional, at least not yet. They used the survey data to train a machine learning model to predict someone’s wealth from their call data, and then they used this model to estimate the wealth of all 1.5 million customers. Next, they estimated the place of residence of all 1.5 million customers by using the geographic information embedded in the call logs. Putting these two estimates together—the estimated wealth and the estimated place of residence—Blumenstock and colleagues were able to produce high-resolution estimates of the geographic distribution of wealth across Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.

It was impossible to validate these estimates because no one had ever produced estimates for such small geographic areas in Rwanda. But, when Blumenstock and colleagues aggregated their estimates to Rwanda’s 30 districts, they found that their estimates were similar to estimates from the Demographic and Health Survey, the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about 10 times faster and 50 times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and lower cost estimates create new possibilities for researchers, governments, and companies (Blumenstock, Cadamuro, and On 2015).

In addition to developing a new methodology, this study is kind of like a Rorschach inkblot test; what people see depends on their background. Many social scientists see a new measurement tool that can be used to test theories about economic development. Manydata scientists see a cool new machine learning problem. Many business people see a powerful approach for unlocking value in the digital trace data that they have already collected. Many privacy advocates see a scary reminder that we live in a time of mass surveillance. Many policy makers see a way that new technology can help create a better world. In fact, this study is all of those things, and that is why it is a window into the future of social research.

If you want to see why I think that study is a window into the future of social research,  check out the rest of the book: http://www.bitbybitbook.com.

Planning a book manuscript workshop

I recently finished a manuscript workshop for my book-in-progress, Bit by Bit: Social Research in the Digital Age.  The book is for social scientists that want to do more data science and data scientists that want to do more social science.  I’m very grateful to everyone that participated in the workshop; I know that it will make my book much better.  The goal of this blog post is to write down everything that I learned planning and participating in workshop in order to make it easier for others in the future.

Continue reading

Turning paper forms into digital data: Our experience using Captricity

Guest blog post by Dennis Feehan

example_digit_9_2.png

My colleagues and I are currently working on a study that applies several network reporting methods—survey methods that ask respondents to report about other people—to estimate crack use, mortality rates, and migration rates in Brazil. We have interviewed a random sample of 25,000 people spread across 27 different cities, and we’re excited about analyzing the data.  But, before we can do that, the written survey forms—which were created during face-to-face interviews—had to be entered into an electronic database.  In our case, the company that we hired to implement our survey did this data entry.  However, we wanted to be sure that the data entry was done well, so we also conducted an independent round of data entry for a subset of the completed questionnaires using Captricity, a new online service that converts paper forms to digital data using a combination of machine learning and human verification.

In this post, we explain the system we developed in order to use Captricity to check the quality of data entry; we summarize our findings (it turns out that our survey company did a good job); and we offer advice for people that might wish to use Captricity in the future.

Continue reading

fast, flexible, and scalable feedback on teaching with end-of-class micro surveys

soc204_s2015_survey

I just received the feedback that Princeton collected from students in my undergraduate course in Social Networks this spring.  But, by now, all my students have left for the summer, and I’m not going to teach this class again for a while.  In other words, this university-collected feedback might be good for evaluating me as a teacher, but it is not well-suited for making me a better teacher.

The timeliness and granularity of this end-of-semester feedback differs than what I’ve seen happening inside of tech companies like Microsoft, Facebook, and Google (and even in some of my own online research projects).  I think that one reason that online systems are improving at an impressive rate is that there is often a very tight feedback loop between action and feedback.  And, this tight feedback loop enables continual improvement.  Therefore, this semester I tried to create a tighter feedback loop between teaching and feedback.  My teaching assistants and I created a simple system for micro surveys that we deployed at the end of each class.  I found the feedback very helpful, and it caused me to make two concrete improvements to my teaching: more demonstrations and better class endings.  In this post, I’ll describe exactly what we did and how it could be better next time.  I’ll also include an example report and a link to the open source code that we used to generate it.

Continue reading