This lesson is being piloted (Beta version)

R EDA: Glossary

Key Points

Before we Start
  • A good project is an organized project

Getting to know the data
  • Always start by looking at the data

  • Always keep track of the metadata

Exploring with summary statistics
  • Summary statistics tells us about the distribution of data

Joining data
  • Data is often organized in separate tables, joining them can enrich the data we are analysing

Boxplots and linear regressions
  • Boxplots are useful for comparing distributions

  • Boxplots can hide multiple distributions in a variable

  • Density plots can reveal multiple distributions in variables

  • Correlations between variables can be quantified using linear models

What is the next step?
  • Practice is important!

  • Working on data that YOU find interesting is a really good idea,

  • The amount of ressources online is immense.

  • KUB Datalab is there for your.

Glossary

Cheat sheet of functions used in the lessons

Lesson 1 – Introduction to R

Lesson 2 – Starting with Data

Lesson 3 – Data Wrangling with dplyr and tidyr

Lesson 4 – Data Visualization with ggplot2

Lesson 5 – Processing JSON data