Good things can come from being open

Neon_Open_green

When I discuss open and reproducible research with graduate students, their minds often drift toward all the bad things that could happen from having their code and data available. It is certainly true that bad things could happen, but my sense is that people often overestimate these kinds of risks and underestimate the benefits of being open. So, in this post I wanted to highlight an example of something good that can happen from being open with data and code: it can raise the visibility of your work and help make it more useful to others.

This fall I was teaching an undergraduate course on data analysis, and I was looking for a compelling, modern example of real research that involved dummy variables. Fortunately, Kevin Munger had done an interesting experiment on harassment on Twitter, and all of the data and code were available on github. So, I download his data and code, tweaked them a bit, and then built my lecture around his study. In case they are helpful to someone else, here are the slides (and here are the slides in R Markdown format).

I’ve never meet Kevin Munger, but I wanted to thank him for posting his data and code. It helped me, and it helped my students. This is just one small example of a good thing that can come from being more open.

Rapid feedback on code with lintr

https://i0.wp.com/imgs.xkcd.com/comics/code_quality.png

As I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students.  Sociology departments around the US all have a pretty similar required sequence.  In teaching the course this time, I tried to modernize it so that it would train students for the future, not just the present or the past.  Two main themes of that modernization were 1) borrowing ideas from software engineering and 2) borrowing ideas from MOOCs.  Both of those themes came together with the idea of linting.

Continue reading

DataCamp, dplyr, and blended learning

datacamp_logo

As I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students.  I’ve taught this course in the past, and in teaching the course this time, I tried to modernize it both in content and in form.  Therefore, I partnered with DataCamp to make their dplyr course, taught by Garrett Grolemund, available to my students.   This combination of face-to-face teaching and online content is called blended learning, and it’s something that I’d like to explore more in future classes.  For a first attempt, I think it worked pretty well, and the people at DataCamp were very helpful.  Here’s more about what happened.

Continue reading

Git and GitHub in a data analysis class

Git_icon.svg GitHub-Mark-120px-plus

As I’ve written about in other posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students. Sociology departments around the US all have a pretty similar required sequence. In teaching the course this time, I tried to modernize it so that it would train students for the future, not just the present or the past.

Because so much of actually doing data analysis requires writing code, I wanted to teach my students some modern software engineering practices. This is not because I wanted to make them software engineers. Rather, I wanted to empower them to be creative social scientist, and writing clean, reliable, reusable code really helps with that.

So, this semester, I required all the students in my class to use Git and GitHub. I was a bit hesitant to do it because Git is notoriously confusing and I didn’t even know how to use it myself. But, it all worked out pretty well, and I would recommend it to others. In this post, I’ll describe what we did and how it worked.

Continue reading

replication and extension projects: making class more interesting and useful

As I’ve written about in other posts (here, here, and here), this semester I taught the second course in my department’s quantitative methods sequence that is required for all of our graduate students: Advanced Data Analysis for the Social Science. Sociology departments around the country all have a pretty similar required sequence. In teaching the course this time, I tried to modernize it so that it would train students for the future (not just the present or the past).

One big aspect of this modernization was requiring students to complete a project where they replicate and extend an already published paper. Overall, this change was a big success, and I’d recommend that other classes also try it. In this post, I’ll share some of what worked about the project and how I will do it better next time. I’ve also made all of the materials that we’ve used available on the class website.

Continue reading

Reflections on teaching statsitics

commandments_wordle

I’ve just finished teaching Sociology 504, which is the second and final required statistics course for all of the Ph.D. students in the Sociology department.  The students have been great, and I’ve learned a lot about statistics this semester.  But, it has been challenging to teach a group of students that is so diverse, in terms of both interests and technical background.  Now that the semester is over I thought that I would reflect on what what worked well and what I could do better next time.

Continue reading