As I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social Science, which is the second course in our department’s required sequence for Ph.D. students. Sociology departments around the US all have a pretty similar required sequence. In teaching the course this time, I tried to modernize it so that it would train students for the future, not just the present or the past. Two main themes of that modernization were 1) borrowing ideas from software engineering and 2) borrowing ideas from MOOCs. Both of those themes came together with the idea of linting.
For those that are not familiar with it, linting is the process of automatically checking code for syntax errors, style inconsistencies, or other possible problems. Linting is standard practice in software engineering in industry because it leads to more readable, more consistent code. For example, this talk describes the linting process that Google uses for R code. So what does this have to do with teaching? This semester we were able to automatically lint students’ homework assignments. That meant that they could get perfectly consistent feedback on their code style at any time and as often as they wanted. And, this meant that Angela Dixon (my awesome teaching assistant) and I could focus our feedback on substantive issues.
How did it work?
The main elements of our system were:
- lintr, the great R linting package by Jim Hester (thank you Jim!)
- Travis-CI, a continuous integration service
And, all of this required the students to be working with Git and GitHub.
Once we got it set up correctly (more on that below), whenever a student would push her homework to her repo on GitHub, a process would automatically be triggered on TravisCI that would run lintr on their code. Because of the way everything was set-up, a student could get feedback many times before actually submitting their homework.
For example, here’s how one student, Andres Lajous, used lintr when completing his fourth assignment; he’s OK with me sharing this, and as you will see, he writes very nice code. Andres committed his first draft of the homework, and then he immediately received the lintr report below about his code. As you can see from the report, the only issues were about lines of longer than 80 characters (other students had more serious problems with their code, and you can see the full list of things that lintr checks).
In response to this feedback, Andres made some fixes, and pushed again. The next code still did not quite pass the tests either so he did it again and again. Finally, he pushed this version of the assignment, and it passed all of the tests.
Once the code passed all the tests, he submitted his homework. Then, we could give him feedback about logistic regression — the statistical topic of the assignment — rather about his code.
What would you do differently next time?
It actually took us a long, long time to get this set up for a variety of reasons that are pretty boring. If I was going to do this again, I would start with the particular combination of elements that we settled on: Travis-CI and lintr (thank you Brandon Stewart for recommending lintr).
Also, if this system was in place starting at the beginning of the semester, I would tell students that after the first assignment, we will not grade any assignments that don’t pass all the lintr tests.
Was this worth it?
It might sound like this was a lot of work just to get students to write code consistent with a style guide. Could that possibly be worth the effort? In this case, I think it was because one of my goals was for the course was to have students “write clean, reusable, and reliable R code.” This might seem unrelated to data analysis in the social sciences, but I think modern data analysis requires clean, reusable, and reliable code.
More generally though, I was attracted to the idea of linting because it moves the course in an important direction. If we want to give students rapid, consistent, and abundant feedback, then we are going to have to automate at least some parts of that process. Linting for code style is just the beginning of that process. In the future, I hope to automate more and more of the feedback process, not to remove humans from the teaching process, but rather to let the humans focus on the human parts. And, I expect that linting and more sophisticated automated feedback system are going to improve quickly in the coming years given the energy currently devoted to developing MOOCs.
How can I do this with my class?
Actually getting this to work require lots fiddling. Here’s our .travis.yml file in case it helps.
Given the current state of things, it would have taken me or Angela, my teaching assistant, way too long to get this figured out. However, for a professional software developer who is familiar with these tools, it should not take too long; maybe 5 hours. Therefore, if you want to try this with your class, I’d recommend working with a pro to get things set-up. This semester I worked with Luke Baker from the Agathon Group, and he’s great. We worked together completely remotely so he can work with someone anywhere in the world.
Also, if you or someone you know is already doing something like this, please let me know. I’d be curious to see how it worked.
[…] I’ve written about in previous posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social […]
[…] I’ve written about in other posts (here, here, and here), this semester I taught a course called Advanced Data Analysis for the Social […]
[…] I’ve written about in other posts (here, here, and here), this semester I taught the second course in my department’s quantitative methods sequence […]
[…] teaching a short graduate seminar on Data Visualization with R this semester. Following Matt Salganik, I wanted students to be able to submit homework or other assignments as R Markdown files, but to […]