Descriptive Statistics


  • We have access to a lot of summarising descriptive indicators the the location, spread and shape of our data.

Tidy Data


  • tidy data provides a consistent way of organizing data

The normal distribution


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Testing for normality


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Linear regression


  • Linear regression show the (linear) relationship between variables.
  • The assumption of normalcy is on the residuals, not the data!

Multiple Linear Regression


  • We can do linear regression on multiple independent variables
  • Be careful not to overfit - only retain variables that are significant (and sensible)
  • We can fit just as well on categorical variables - but make sure they are categorical
  • Interpreting linear models with multiple variables are not trivial _ Interpreting linear models with interaction terms are even less trivial