Good things can come from being open

Neon_Open_green

When I discuss open and reproducible research with graduate students, their minds often drift toward all the bad things that could happen from having their code and data available. It is certainly true that bad things could happen, but my sense is that people often overestimate these kinds of risks and underestimate the benefits of being open. So, in this post I wanted to highlight an example of something good that can happen from being open with data and code: it can raise the visibility of your work and help make it more useful to others.

This fall I was teaching an undergraduate course on data analysis, and I was looking for a compelling, modern example of real research that involved dummy variables. Fortunately, Kevin Munger had done an interesting experiment on harassment on Twitter, and all of the data and code were available on github. So, I download his data and code, tweaked them a bit, and then built my lecture around his study. In case they are helpful to someone else, here are the slides (and here are the slides in R Markdown format).

I’ve never meet Kevin Munger, but I wanted to thank him for posting his data and code. It helped me, and it helped my students. This is just one small example of a good thing that can come from being more open.

meetup about teaching computational social science at ASA

Please join me for an informal meetup about teaching computational social science Monday, August 14 at 3pm.  We will meet at the Princeton University Press booth in the exhibit hall at ASA.  The purpose of the meetup is for people teaching computational social science—or thinking about teaching it—to share experiences and troubleshoot common problems.  The number and variety of courses on computational social science is growing rapidly, and I think that we can all benefit from hearing about the exciting things that people are doing.  I look forward to seeing you in Montreal.

fast, flexible, and scalable feedback on teaching with end-of-class micro surveys

soc204_s2015_survey

I just received the feedback that Princeton collected from students in my undergraduate course in Social Networks this spring.  But, by now, all my students have left for the summer, and I’m not going to teach this class again for a while.  In other words, this university-collected feedback might be good for evaluating me as a teacher, but it is not well-suited for making me a better teacher.

The timeliness and granularity of this end-of-semester feedback differs than what I’ve seen happening inside of tech companies like Microsoft, Facebook, and Google (and even in some of my own online research projects).  I think that one reason that online systems are improving at an impressive rate is that there is often a very tight feedback loop between action and feedback.  And, this tight feedback loop enables continual improvement.  Therefore, this semester I tried to create a tighter feedback loop between teaching and feedback.  My teaching assistants and I created a simple system for micro surveys that we deployed at the end of each class.  I found the feedback very helpful, and it caused me to make two concrete improvements to my teaching: more demonstrations and better class endings.  In this post, I’ll describe exactly what we did and how it could be better next time.  I’ll also include an example report and a link to the open source code that we used to generate it.

Continue reading

Rapid feedback on code with lintr

https://i2.wp.com/imgs.xkcd.com/comics/code_quality.png

As I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students.  Sociology departments around the US all have a pretty similar required sequence.  In teaching the course this time, I tried to modernize it so that it would train students for the future, not just the present or the past.  Two main themes of that modernization were 1) borrowing ideas from software engineering and 2) borrowing ideas from MOOCs.  Both of those themes came together with the idea of linting.

Continue reading

DataCamp, dplyr, and blended learning

datacamp_logo

As I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students.  I’ve taught this course in the past, and in teaching the course this time, I tried to modernize it both in content and in form.  Therefore, I partnered with DataCamp to make their dplyr course, taught by Garrett Grolemund, available to my students.   This combination of face-to-face teaching and online content is called blended learning, and it’s something that I’d like to explore more in future classes.  For a first attempt, I think it worked pretty well, and the people at DataCamp were very helpful.  Here’s more about what happened.

Continue reading

Git and GitHub in a data analysis class

Git_icon.svg GitHub-Mark-120px-plus

As I’ve written about in other posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students. Sociology departments around the US all have a pretty similar required sequence. In teaching the course this time, I tried to modernize it so that it would train students for the future, not just the present or the past.

Because so much of actually doing data analysis requires writing code, I wanted to teach my students some modern software engineering practices. This is not because I wanted to make them software engineers. Rather, I wanted to empower them to be creative social scientist, and writing clean, reliable, reusable code really helps with that.

So, this semester, I required all the students in my class to use Git and GitHub. I was a bit hesitant to do it because Git is notoriously confusing and I didn’t even know how to use it myself. But, it all worked out pretty well, and I would recommend it to others. In this post, I’ll describe what we did and how it worked.

Continue reading