Reproducible Data Analysis


Figure 1

One example of these problems is shown every time we load tidyverse: Screenshot of messages from attaching core tidyverse packages including conflicts


Figure 2

Screenshot of dialog box from RStudio asking about saving workspace image
Should we save the environment? No!

Figure 3

A glam rock band comprised of 3 fuzzy round monsters labeled as “Text”, “Outputs” and “Code” performing together. Stylized title text reads: “R Markdown - we’re getting the band back together.
The Rmarkdown family. Artwork by @allison_horst, https://twitter.com/allison_horst, CC-BY

Figure 4

Screenshot of initial demo Rmarkdown document in RStudio
An RMarkdown document

Figure 5

You will see a new button in RStudio: Screenshot of the Knit-button in RStudio


Reading data from fileCountryNamePhonenumber


Figure 1


Figure 2


Figure 3


Figure 4


Descriptive Statistics


Figure 1

What is lenght and depth of penguin bills?{Copyright Allison Horst}


Figure 2


Figure 3


Figure 4

Or even specify the exact intervals we want, here intervals from 0 to 6500 gram in intervals of 250 gram:


Figure 5

The histogram provides us with a visual indication of both range, the variation of the values, and an idea about where the data is located.


Figure 6


Figure 7


Histograms


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Table One


Tidy Data


Figure 1


Figure 2


Figure 3


The normal distribution


Figure 1


Figure 2

The Normal Distribution. The area under the curve is 1, equivalent to 100%.


Figure 3


Testing for normality


Figure 1


Figure 2

Our histogram does not really look like the theoretical curve. The fact that mean and median are almost identical was not a sufficient criterium for normalcy.


Figure 3


Figure 4


Figure 5


How is the data distributed?


Figure 1


Linear regression


Figure 1


Figure 2


Figure 3


Figure 4

They are relatively close to normal.


Multiple Linear Regression


Logistic regression


Figure 1


Central Limit Theorem


Figure 1

This is definitely not normally distributed.


Figure 2


Nicer barcharts


Figure 1


Figure 2

It is not strictly necessary to remove the label of the x-axis, but it is superfluous in this case.


Figure 3

This facilitates the reading of the graph - it becomes very easy to see that the most frequent species of penguin is Adelie penguins.


Figure 4


Figure 5

We also changed the scaling of the title of the plot. The size of that is now 10% larger than the base size. We can do that by specifying a specific size, but here we have done it using the rel() function which changes the size relative to the base font size in the plot.


Figure 6

We control what is happening on the x-scale by using the family of scale_x functions. Because it is a continuous scale, more specifically scale_x_continuous().


Figure 7

First we change the default theme of the plot from theme_grey to theme_minimal, which gets rid of the grey background. In the additional theme() function we remove the gridlines, both major and minor gridlines, on the y-axis, by setting them to the speciel plot element element_blank()


Figure 8


Figure 9


Figure 10


Power Calculations


k-means


Figure 1


Figure 2


Figure 3

No. Even though there might actually be clusters in the data, the algorithm is not necessarily able to find them. Consider this data: There is obviously a cluster centered around (0,0). And another cluster more or lesss evenly spread around it.


Figure 4

But not the ones we want.


ANOVA


Figure 1

That looks reasonable.


Figure 2


Cohens Kappa


R on Ucloud


Figure 1


Figure 2


Figure 3


Figure 4


Figure 5


Figure 6

In it we find a listing of what is in our drive:


Figure 7


Figure 8


Figure 9


Figure 10


Figure 11


Figure 12


Figure 13

Sometimes we close a window, or nagivate away from it. Where can we find it again? In the navigationbar to the left in Ucloud we find this icon. It provides us with a list or running jobs (yes, we can have more than one). Click on the job, and we get back to the job, where we can extend time or open the interface again.


Figure 14

The easy way is to do it through RStudio. Select the folder and/or files you want to save, click on the little cogwheel saying “More” and chose “Export”. RStudio will now zip the files, and ask you where to save them.


Figure 15


Figure 16


A deeper dive into pipes


Figure 1


Figure 2

Place a tickmark in the box next to “Use native pipe operator, |>”.


Setup for GIS


Setup for Git


Figure 1

Begin by introducing yourself to Git. Go to the terminal


Figure 2

Click “Generate token”, and copy the token:


Figure 3


Practice makes perfect


Statistical tests


Figure 1


Figure 2


Figure 3


When install.packages fail


Figure 1


Figure 2

allow us to chose one or more of the repositories that R knows about:


Fences på vores undervisningssider


Make a new course