R-toolbox: All Images

Reproducible Data Analysis

Figure 1

One example of these problems is shown every time we load tidyverse: Screenshot of messages from attaching core tidyverse packages including conflicts

Figure 2

Screenshot of dialog box from RStudio asking about saving workspace image

Figure 3

A glam rock band comprised of 3 fuzzy round monsters labeled as “Text”, “Outputs” and “Code” performing together. Stylized title text reads: “R Markdown - we’re getting the band back together.

Figure 4

Screenshot of initial demo Rmarkdown document in RStudio

Figure 5

You will see a new button in RStudio: Screenshot of the Knit-button in RStudio

Reading data from fileCountryNamePhonenumber

Figure 1

Figure 2

Figure 3

Figure 4

Descriptive Statistics

Figure 1

What is lenght and depth of penguin bills? {Copyright Allison Horst}

Figure 2

Figure 3

Figure 4

Or even specify the exact intervals we want, here intervals from 0 to 6500 gram in intervals of 250 gram:

Figure 5

The histogram provides us with a visual indication of both range, the variation of the values, and an idea about where the data is located.

Figure 6

Figure 7

The normal distribution

Figure 1

Figure 2

The Normal Distribution. The area under the curve is 1, equivalent to 100%.

Figure 3

Testing for normality

Figure 1

Figure 2

Our histogram does not really look like the theoretical curve. The fact that mean and median are almost identical was not a sufficient criterium for normalcy.

Figure 3

Figure 4

Figure 5

Central Limit Theorem

Figure 1

This is definitely not normally distributed.

Figure 2

Nicer barcharts

Figure 1

Figure 2

It is not strictly necessary to remove the label of the x-axis, but it is superfluous in this case.

Figure 3

This facilitates the reading of the graph - it becomes very easy to see that the most frequent species of penguin is Adelie penguins.

Figure 4

Figure 5

We also changed the scaling of the title of the plot. The size of that is now 10% larger than the base size. We can do that by specifying a specific size, but here we have done it using the rel() function which changes the size relative to the base font size in the plot.

Figure 6

We control what is happening on the x-scale by using the family of scale_x functions. Because it is a continuous scale, more specifically scale_x_continuous().

Figure 7

First we change the default theme of the plot from theme_grey to theme_minimal, which gets rid of the grey background. In the additional theme() function we remove the gridlines, both major and minor gridlines, on the y-axis, by setting them to the speciel plot element element_blank()

Figure 8

Figure 9

Figure 10

Power Calculations

k-means

Figure 1

Figure 2

Figure 3

No. Even though there might actually be clusters in the data, the algorithm is not necessarily able to find them. Consider this data: There is obviously a cluster centered around (0,0). And another cluster more or lesss evenly spread around it.

Figure 4

But not the ones we want.

ANOVA

Figure 1

That looks reasonable.

Figure 2

Cohens Kappa

R on Ucloud

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

Figure 6

In it we find a listing of what is in our drive:

Figure 7

Figure 8

Figure 9

Figure 10

Figure 11

Figure 12

Figure 13

Sometimes we close a window, or nagivate away from it. Where can we find it again? In the navigationbar to the left in Ucloud we find this icon. It provides us with a list or running jobs (yes, we can have more than one). Click on the job, and we get back to the job, where we can extend time or open the interface again.

Figure 14

The easy way is to do it through RStudio. Select the folder and/or files you want to save, click on the little cogwheel saying “More” and chose “Export”. RStudio will now zip the files, and ask you where to save them.