Reproducible Data Analysis


  • Use RMarkdown to enforce reproducible analysis

Reading data from fileCountryNamePhonenumber


  • The readr version of read_csv() is preferred
  • Remember that csv is not always actually separated with commas.
  • The haven package contains functions for reading common proprietary file formats.
  • In general a package will exist for reading strange datatypes. Google is your friend!
  • Use code to read in your data

Descriptive StatisticsCentral tendencyMeasures of variance


  • Nogen statisktiske pointer om det her

Table One


  • A Table One provides a compact describtion of the data we are working with
  • With a little bit of work we can control the content of the table.

Tidy Data


  • tidy data provides a consistent way of organizing data

The normal distribution


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Testing for normality


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

How is the data distributed?


  • The data generating function is not necessarily the same as the distribution that best fit the data
  • Chose the distribution that best describes your data - not the one that fits best

Linear regression


  • Use .md files for episodes when you want static content

Multiple Linear Regression


  • We can do linear regression on multiple independent variables
  • Be careful not to overfit - only retain variables that are significant (and sensible)
  • We can fit just as well on categorical variables - but make sure they are categorical
  • Interpreting linear models with multiple variables are not trivial _ Interpreting linear models with interaction terms are even less trivial

Logistisk regressionfit modellenkoefficienter og p-værdierpredict


  • Use .md files for episodes when you want static content

Central Limit Theorem


  • The mean of a sample can be treated as if it is normally distributed

Nicer barcharts


  • Relatively small changes to a bar chart can make it look much more professional

powerberegninger


  • Use .md files for episodes when you want static content

k-means


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Factor Analysis


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

structure-you-work


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

fence-test


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

design-principles


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally