As I’ve written about in other posts (here, here, and here), this semester I taught the second course in my department’s quantitative methods sequence that is required for all of our graduate students: Advanced Data Analysis for the Social Science. Sociology departments around the country all have a pretty similar required sequence. In teaching the course this time, I tried to modernize it so that it would train students for the future (not just the present or the past).

One big aspect of this modernization was requiring students to complete a project where they replicate and extend an already published paper. Overall, this change was a big success, and I’d recommend that other classes also try it. In this post, I’ll share some of what worked about the project and how I will do it better next time. I’ve also made all of the materials that we’ve used available on the class website.

**Why do a replication and extension project?**

There are two main reasons that I decided to add a replication and extension project to my class: 1) it is great for the students and 2) it is great for the field. As students transition from consumers of quantitative research to creators of quantitative research, there are a variety of challenges that they encounter related to analyzing real, messy data. These challenges are hard to describe to students but are painfully familiar to anyone who has created a serious piece of quantitative research. I’ve come to believe that courses where students work exclusively with toy data—such as courses that I’ve taught previously—do not fully prepare students to become creators of researchers. In fact, this realization, that I was not preparing students for “real” research, was one of the most important pieces of feedback that I’ve received about my old courses.

In addition to helping my students, I hope that the replication and extension projects will help my field move to a system of open and reproducible research.

**What exactly did the students do?**

Just deciding the scope of the project was difficult. To me the most important things were that they should reproduce (or try to reproduce) a paper exactly; they should try to do something new; and they should *not* try to write a whole new paper. After my course, students take a yearlong course where write an entire empirical paper. I did not want to replicate that class. Rather, I wanted to give them focused practice on just one part of writing a paper: the data analysis.

I ended up with a six part structure.

- Pick a paper
- Get your plan approved
- Reproduce the results exactly
- Get feedback including from peers
- Do something new
- Get feedback including from peers

More information about each step is available on the class webpage.

**How did it work?**

As I said earlier, I think it was a big success. I was impressed by how much students seem to care more about their projects than they cared about their homework. I don’t yet have a way of getting detailed, real-time measures of student learning, but I would *guess* that they learned more per hour working on their projects than on their homework. Further, I would guess that the things that they learned on their projects were higher-order skills that are harder to acquire from a book or a video.

Also, the projects really enriched our class discussions. For example, when we were learning about multi-level modeling, the students who were working on a paper with multi-level modeling could explain that paper to their peers.

Ultimately the best measure of the success of the project will be the students experience doing their own research. Fortunately, they will do that next year, and I’ll do an evaluation at the end of that to see how it turned out.

**What would you do differently next time?**

Although it might all sound smooth here, there were a lot of rocky parts to this project. Here’s what I would do differently next time. I hope that these tips can save you and your students some pain.

*More support choosing papers*

By far the biggest problem was that some students picked papers that were too difficult given their background. The schedule calls for students to reproduce the results of a paper in six weeks, which is quite fast. Because of this, I required students to use projects where data is already public. But, still several students ran into problems. To address this, I’ve developed an improved format for the project proposals. This highly structure format might seem strict, but it would have alerted us to all of the papers that turned out to be too hard. I guess this is something that I’ll just get better at understanding over time.

*More work pairing students*

Students completed the project in pairs (similar to Gary King does it Gov 2001). This turned out to work well as many of the pairs were made up of students with complimentary skills. However, next time I would work a bit more to ensure that students with strong backgrounds in data analysis and coding were more evenly distributed across the groups. Pairs with two novice students really struggled at points.

*More chances for the students to present their progress*

Each team made a 5 minute presentation at the end of the semester about their project. Next time, I would offer students more opportunities to share their work. I think that this would motivate progress and promote collaboration across teams. I think that a better presentation schedule would be: halfway point (replication complete or nearly complete); three-quarters point (extension in progress); and end of the semester (everything complete).

*Really push hard on replication by spring break*

The schedule called for students to complete their replication by spring break, the midpoint of our semester. Honestly, very few of the groups really made this deadline, and many were far from it. This created problems later in the semester because it meant that many groups had little to no time for their extensions. Perhaps by choosing more reasonable papers and pushing hard, the students could have more time for an extension.

In designing this project, I benefited from the experience of a number of teachers who have assigned similar projects in their classes: Kosuke Imai, Nicole Janz, Gary King, Brandon Stewart, and Cristobal Young. I want to thank them for sharing their advice either in person or in writing. Finally, I especially want to thank our class TA Angela Dixon who helped our students with their projects and help me make this project more focused and more useful.

[…] students also used GitHub to collaborate and receive feedback on their replication and extension projects (you can see that stuff here). This is also went pretty smoothly, with one exception described […]

[…] I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is […]

[…] I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is […]