Reproducible Data Analysis
Figure 1
One example of these problems is shown every time we load tidyverse:
Figure 2
Figure 3
Figure 4
Figure 5
You will see a new button in RStudio:
Reading data from fileCountryNamePhonenumber
Figure 1
Figure 2
Descriptive StatisticsCentral tendencyMeasures of variance
Figure 1
{Copyright Allison Horst}
Figure 2
Figure 3
Figure 4
Or even specify the exact intervals we want, here intervals from 0 to 6500 gram in intervals of 250 gram:
Figure 5
The histogram provides us with a visual indication of both range, the variation of the values, and an idea about where the data is located.
Figure 6
den skal vi nok have beskrevet lidt mere.
Figure 7
Table One
Tidy Data
Figure 1
Figure 2
Figure 3
The normal distribution
Figure 1
The area under the curve is 1, equivalent to 100%.
Testing for normality
Figure 1
Figure 2
Figure 3
Figure 4
How is the data distributed?
Figure 1
Linear regression
Figure 1
Figure 2
Figure 3
Figure 4
They are relatively close to normal.
Multiple Linear Regression
Logistisk regressionfit modellenkoefficienter og p-værdierpredict
Figure 1
Funktionen ser således ud:
Central Limit Theorem
Figure 1
This is definitely not normally distributed.
Figure 2
Nicer barcharts
Figure 1
Figure 2
It is not strictly necessary to remove the label of the x-axis, but it is superfluous in this case.
Figure 3
This facilitates the reading of the graph - it becomes very easy to see that the most frequent species of penguin is Adelie penguins.
Figure 4
Figure 5
We also changed the scaling of the title of the plot. The size of that
is now 10% larger than the base size. We can do that by specifying a
specific size, but here we have done it using the rel()
function which changes the size relative to the base font size in the
plot.
Figure 6
We control what is happening on the x-scale by using the family of
scale_x
functions. Because it is a continuous scale, more
specifically scale_x_continuous()
.
Figure 7
First we change the default theme of the plot from
theme_grey
to theme_minimal
, which gets rid of
the grey background. In the additional theme()
function we
remove the gridlines, both major and minor gridlines, on the y-axis, by
setting them to the speciel plot element
element_blank()
Figure 8
Figure 9
Figure 10
powerberegninger
k-means
Figure 1
Figure 2
Figure 3
There is obviously a cluster centered around (0,0). And another cluster more or lesss evenly spread around it.
Factor Analysis
Figure 1
The rule of thumb is that we reject factors with an eigenvalue lower than 1.0.
structure-you-work
fence-test
design-principles
Figure 1
Do the gridlines add value to the plot? In general the answer is no, and we have better ways of adding the value they might bring.