Reproducible Data Analysis


Figure 1

One example of these problems is shown every time we load tidyverse: Tidyverse conflicts


Figure 2

Should we save the environment? No!
Should we save the environment? No!

Figure 3

The Rmarkdown family. Artwork by @allison_horst, https://twitter.com/allison_horst, CC-BY
The Rmarkdown family. Artwork by @allison_horst, https://twitter.com/allison_horst, CC-BY

Figure 4

An RMarkdown document
An RMarkdown document

Figure 5

You will see a new button in RStudio: The Knit button


Reading data from fileCountryNamePhonenumber


Figure 1


Figure 2


Descriptive StatisticsCentral tendencyMeasures of variance


Figure 1

What is lenght and depth of penguin bills?{Copyright Allison Horst}


Figure 2


Figure 3


Figure 4

Or even specify the exact intervals we want, here intervals from 0 to 6500 gram in intervals of 250 gram:


Figure 5

The histogram provides us with a visual indication of both range, the variation of the values, and an idea about where the data is located.


Figure 6

den skal vi nok have beskrevet lidt mere.


Figure 7


Table One


Tidy Data


Figure 1


Figure 2


Figure 3


The normal distribution


Figure 1

The Normal Distribution. The area under the curve is 1, equivalent to 100%.


Testing for normality


Figure 1


Figure 2


Figure 3


Figure 4


How is the data distributed?


Figure 1


Linear regression


Figure 1


Figure 2


Figure 3


Figure 4

They are relatively close to normal.


Multiple Linear Regression


Logistisk regressionfit modellenkoefficienter og p-værdierpredict


Figure 1

Funktionen ser således ud:


Central Limit Theorem


Figure 1

This is definitely not normally distributed.


Figure 2


Nicer barcharts


Figure 1


Figure 2

It is not strictly necessary to remove the label of the x-axis, but it is superfluous in this case.


Figure 3

This facilitates the reading of the graph - it becomes very easy to see that the most frequent species of penguin is Adelie penguins.


Figure 4


Figure 5

We also changed the scaling of the title of the plot. The size of that is now 10% larger than the base size. We can do that by specifying a specific size, but here we have done it using the rel() function which changes the size relative to the base font size in the plot.


Figure 6

We control what is happening on the x-scale by using the family of scale_x functions. Because it is a continuous scale, more specifically scale_x_continuous().


Figure 7

First we change the default theme of the plot from theme_grey to theme_minimal, which gets rid of the grey background. In the additional theme() function we remove the gridlines, both major and minor gridlines, on the y-axis, by setting them to the speciel plot element element_blank()


Figure 8


Figure 9


Figure 10


powerberegninger


k-means


Figure 1


Figure 2


Figure 3

There is obviously a cluster centered around (0,0). And another cluster more or lesss evenly spread around it.


Factor Analysis


Figure 1

The rule of thumb is that we reject factors with an eigenvalue lower than 1.0.


structure-you-work


fence-test


design-principles


Figure 1

Do the gridlines add value to the plot? In general the answer is no, and we have better ways of adding the value they might bring.


Figure 2