Guest blog post by Dennis Feehan
My colleagues and I are currently working on a study that applies several network reporting methods—survey methods that ask respondents to report about other people—to estimate crack use, mortality rates, and migration rates in Brazil. We have interviewed a random sample of 25,000 people spread across 27 different cities, and we’re excited about analyzing the data. But, before we can do that, the written survey forms—which were created during face-to-face interviews—had to be entered into an electronic database. In our case, the company that we hired to implement our survey did this data entry. However, we wanted to be sure that the data entry was done well, so we also conducted an independent round of data entry for a subset of the completed questionnaires using Captricity, a new online service that converts paper forms to digital data using a combination of machine learning and human verification.
In this post, we explain the system we developed in order to use Captricity to check the quality of data entry; we summarize our findings (it turns out that our survey company did a good job); and we offer advice for people that might wish to use Captricity in the future.
In order to conduct an independent data entry, we scanned about 1,000 of the survey forms from one of the 27 cities in our sample. We then used Captricity to turn these scanned forms into a second electronic dataset. And, finally, we compared the datasets from the survey company and from Captricity. Wherever the two data sources disagreed, members of our study team manually reviewed the questionnaires to determine the correct value. This process required us to develop some software (see figure for an overview). These tools were mostly written in Python, using the webpy framework, a sqlite database, and bootstrap.
The first set of software tools we developed were focused on converting scanned questionnaires into an electronic dataset using Captricity’s API. All told, this first set of tools required us to write scripts to scrape uploaded scans; chop the scans up into different pieces to be sent through Captricity; upload these chopped up pieces to the Captricity service; download the resulting dataset; and then insert the results into a database. As part of this step, we also had to use Captricity’s online interface to specify exactly which parts of each scanned survey form corresponded to which electronic variables in our dataset.
The second set of software tools we developed was used to resolve what we call diffs: cases where Captricity produced different results from the electronic dataset that the survey company provided us. We wanted to review these diffs ourselves to ascertain whether each diff resulted from
- a mistake from Captricity;
- a mistake by the survey company;
- a mistake by Captricity and the survey company; or
- a mistake in our software.
Therefore, the centerpiece of these tools was a web application that would show us an image of the part of the survey form where Captricity and the survey company disagreed. One of the study team would then type in what the image showed, and the results would get stored in the database. Originally, we wanted this third round of data entry to be based only on the image of the actual question on the form. However, as described below, it proved to be very useful to allow study team members to look at the question in the context of the survey form. So we added the ability to look at the entire form, as well as the image for a particular diff. This ended up giving us a much better understanding of what was causing many of the discrepancies between Captricity and the survey company dataset.
Quantitatively, the results of this analysis confirmed that the data entry performed by the survey company was highly accurate; we crudely estimate that about 99.7% – 99.9% of the database entries are exactly correct.
Qualitatively, we learned several things that might be helpful for future researchers who want to use Captricity or similar tools in conjunction with survey forms. Somewhat surprisingly to us, most of the diffs our analysis produced were actually not substantive disagreements about the contents of the survey form. Instead, most diffs resulted from special cases where contextual information was not available to the machine learning and people involved in the Captricity process but was available to data entry clerks at the survey company. We think two examples may be particularly helpful for other survey researchers who consider using Captricity.
First, it turns out that many Brazilians write numerals in distinctive ways that are difficult-to-read for people who are not from Brazil. This seems to be especially true of numbers 1 and 9. Captricity raters often confused these numerals for other values. Here are some examples of the way some Brazilians write the numeral ‘1’:
And here are some examples of the way some Brazilians write the numeral ‘9’:
If we did this study again, we think we could help avoid this problem through some combination of (1) clearer instructions handling this special case to Captricity raters; (2) instructing survey interviewers to use a pre-defined, standard way of writing numbers down on a survey form; (3) more structured surveys forms that do not rely on hand-written numerals (including possibly digital entry, as with a tablet computer).
The second thing we learned that might be helpful to future researchers using Captricity is that editorial marks made on the survey form after surveys are completed can confuse Captricity raters, who do not get to see the form in its entirety. For example, in many cases, our interviewers filled in values that were implied by skip patterns on the survey form. For example, q228 asks how many people the respondent is connected to who have ever used crack. If the respondent reports 0, then the interviewer should skip to q229—which asks how many people the respondent is connected to who used crack in the past 6 months—and go directly to q310. In other words, a respondent who reported 0 in q228 implicitly also has 0 for q229. Some interviewers explicitly wrote this 0 which is implied by the skip pattern on the form. The city supervisor or the data entry team at the survey company later crossed these responses out (since they should be structurally missing, as part of the skip pattern.) In these cases, the Captricity raters either indicated that it was not possible to read the result (because it was crossed out) or they reported the answer that had been crossed out. We suspect that problems like these could be minimized in future studies by giving clearer instructions to Captricity raters about how to handle cases like crossed-out responses.
We learned a lot through our analysis of data entry quality, and we think online services like Captricity can be a powerful tool for survey researchers. For example, we think Captricity could be very useful to rapidly pilot a few different versions of a survey module. Captricity may also be an appealing cost-saving option for a large-scale survey, but in that situation, Captricity-related considerations should be built into the design of the survey instrument, interviewer training protocols, and so on.
We would like to thank Chang Chung and Ale Abdo for his help developing our software. We would also like to thank the people we worked with at Captricity—especially Andrea Spillmann-Gajek and Kuang Chen. They were both amazingly helpful. Captricity provided us a discount from their usual rate, and this discount is also available to other researchers and non-profit organizations through the captricity.org program.