Intermediate Statistics

Intermediate statistics - Rutgers School of Criminal Justice (27:202:543)

View the Project on GitHub f-edwards/intermediate_stats

Intermediate Statistics 27:202:543
Lecture: Friday, 10:00AM - 12:40PM Room: CLJ-574
Lab: Tuesday 1:00PM - 2:30PM Room: CLJ-567
frank.edwards@rutgers.edu Office hours: Wednesday 12:00PM - 2:00PM

Course description

This course introduces students to Bayesian data analysis and applied regression modeling.

Communication

I’ve set up a Slack page for us to communicate about the course. This can be a resource for you to collaborate and ask me questions about homework, and will also be a spot where course announcements are posted. Invites will be circulated before the course begins.

Course Slack

Course goals

  1. Develop familiarity with principals of Bayesian data analysis
  2. Master the principals of model building and model critique
  3. Develop familiarity with principals of contemporary causal inference and design

Expectations

Prerequisites

A prior graduate-level course in statistics is required. This course assumes students are comfortable with multivariate linear regression, basic probability, and statistical computing.

Review resources

These math camp materials from UChicago neatly cover the math you need for graduate-level statistics courses.

Jenny Bryan’s STAT 545 course at UBC provides a very comprehensive overview of programming in R and efficient data science workflows.

Software

All instruction will be conducted in the R statistical programming language. R is free and open-source, and can be downloaded here.

We will be using the RStudio integrated development environment. RStudio provides a powerful text editor and a range of very useful utilities.

In addition to writing code, it is a great tool for writing reports, papers, and slides using RMarkdown. This syllabus, most of my course materials, and most of my academic papers are based on Markdown and occasionally LaTeX. I strongly recommend that you use RMarkdown to complete course assignments. Other plaintext editors (emacs, vim, sublime, atom, etc) are acceptable substitutes for RStudio, but try to avoid using MS Word or other WSIWYG editors for assignments.

Lastly, I recommend learning some form of version control to ensure your work is a) backed up, b) easily accessible to collaborators and c) reproducible. Git and GitHub are great and flexible tools for software development that have powerful applications for researchers. Here’s a useful intro to GitHub for R users.

Books

We will work primarily from two books.

McElreath, Statistical Rethinking: A Bayesian Course with Examples in R and Stan

Wickham’s R for Data Science is available for free online textbook, though there are print versions available if you prefer to purchase a copy.

Assignments and grading

Course grading is based on a combination of course participation (20 percent) and homework assignments (80 percent).

Homeworks

Homework should be submitted to me via email by 10AM on the due date.

For each week, I’ll provide a list of homework questions for you to complete. Students will have a choice to attempt the medium or hard problem set. Students attempting the medium problem set can obtain a maximum grade of 90. Students attempting the hard problem set can obtain a maximum grade of 100.

Each student may request, without penalty, one 5-day extension during the semester. I must recieve an email requesting this extension before the homework due date.

Late homework will be penalized at 5 points per day late.

Course topics and schedule

1/24 Introduction McElreath Preface, 1, 2
1/31 Sampling from the posterior McElreath 3
2/7 Linear regression McElreath 4
2/14 Multiple regression McElreath 5
2/21 Causality McElreath 6
2/28 Overfitting and comparison McElreath 7
3/6 Interactions McElreath 8
3/13 Markov Chain Monte Carlo McElreath 9
3/20 Spring break McElreath 10
3/27 Generalized Linear Models (1) McElreath 10, 11.1, 11.2
4/3 Generalized Linear Models (2) McElreath 11.3, 11.4
4/10 Mixture models McElreath 12
4/17 Multilevel models (intercepts) McElreath 13
4/24 Multilevel models (slopes) McElreath 14
5/1 Measurement error and missing data McElreath 15
5/8 Bayesian data analysis using tidyverse and brms McElreath 17