Different types of plots

Last updated on 2025-04-15 | Edit this page

Overview

Questions

  • What other types of plots can we make?
  • How can we control the order of stuff in plots?

Objectives

  • Learn how to make histograms, barcharts, boxplots and violinplots

Scatterplots are very useful, but we often need other types of plots. In this part of the course, we are going to look at some of the more common types.

Histograms


Histograms splits all observations of a variable up in a number of “bins”. It counts how many observations are in each bin. Then we plot a column with a height equivalent to the number of observations for each bin.

Note that we here use the pipe to get the diamonds data into ggplot(). Both methods can be used. However piping the data into ggplot() is useful if we need to manipulate the data before plotting, eg. by filtering it.

R

diamonds %>% 
  ggplot(mapping = aes(carat)) +
  geom_histogram()

OUTPUT

`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Note that we get a warning from geom_histogram that the number of bins by default is set to 30. 30 bins will almost never be the correct number of bins, and we should chose a better value ourself.

R

diamonds %>% 
  ggplot(aes(carat)) +
  geom_histogram(bins = 25)

What number of bins should I choose? Some heuristics for choosing does exists, but in general it is our recommendation that you experiment with different number of bins to find the one that best shows your data.

Note that we excluded the mapping part of the ggplot function. The first argument of ggplot is always data, and we can get that via the pipe. The second argument is always mapping, and therefore we do not need to specify it.

In the following we are sometimes going to specify the mapping argument. There are two reasons for that. One: We have forgotten to be consistent. Two: In some cases it is useful to remind ourselves that we are actually mapping data to something.

Barcharts


Not to be confused with histograms, barcharts count the number of observations in different groups. Where the scale in histograms is continuous, and split into bins, the scale in barcharts is discrete.

Here we map the color-variable to the x-axis in the barchart. geom_bar counts the number of observations itself - we do not need to provide a count:

R

diamonds %>% 
  ggplot(aes(color)) +
  geom_bar()

A small excursion


Why are the columns in the barchart above in that order?

One might guess that they are simply in alphabetical order.

Not so! color is a categorical variable. Diamonds either have the colour “D” (which is the best colour), or another colour (like “J”, which is the worst).

There are no “D.E” colours, they do not exist on a continuous range.

This is called “factors” in R. The data in a factor can take one of several values, called levels. And the order of these levels are what control the order in the plot.

The order can be either arbitrary, or there can exist an implicit order in the data, like with the colour of the diamonds, where D is the best colour, and J is the worst. These types of ordered categorical data are called ordinal data.

They look like this:

R

diamonds %>% 
  select(cut, color, clarity) %>% 
  str()

OUTPUT

tibble [53,940 × 3] (S3: tbl_df/tbl/data.frame)
 $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
 $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
 $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...

Note that even though the colour “D” is better than “E”, the levels of the color factor indicates that “D<E”.

All this just to say: We can control the order of columns in the plot, by controlling the order of the levels of the categorical value we are plotting:

R

diamonds %>% 
  mutate(color = fct_rev(color)) %>%  
  ggplot(aes(color)) +
  geom_bar()

fct_rev is a function that reverses the order of a factor. It comes from the library forcats that makes it easier to work with categorical data.

Boxplots


Boxplots are suitable for visualising the distribution of data. We can make a boxplot of a single variable in the data - or we can make several boxplots in one plot:

R

diamonds %>% 
  ggplot(aes(x = carat, y = cut)) +
  geom_boxplot()

Here we have the variable we are making boxplots of, on the x-axis, and splitting them up in one plot per cut, on the y-axis.

What is a boxplot?

Boxplots are useful for showing different distributions. The fat line in the middle of the box is the median, the two ends of the box is first and third quartile, and the two whiskers (or lines) on both sides of the box shows the minimum and maximum values - excluding outliers, defined for this purpose as values that lies more that 1.5 times the interquartile range from the box.

Violinplots


Boxplots are not necessarily the best option for showing distributions. A good alternative could be violinplots. They show a density plot - basically a histogram with infinite bins - for each group, blotted symmetrically around an axis:

Exercise

The geom_ for making violin plots is geom_violin Look at the help for geom_violin and make a violinplot with carat on the x-axis, and cut on the y-axis.

R

diamonds %>% 
ggplot(aes(carat, y = cut)) +
geom_violin()

And many more


ggplot2 is born with a multitude of different plots. A complete list of plots will be very long, and take up all the time for this course. Take a look at The R Graph Gallery or at Graphs in R (NB a work in progress), where we will collect weird and wonderful plots, when to use them, when not to use them. And how to make them.

ggplot2 is written as an extensible package, meaning that developers can create packages making plots that are not included in ggplot2, or introduce more advanced functionality around plots. Two of the more interesting extensions are:

ggforce extends ggplot2 with specialised plottypes.

gganimate makes it easyish to make animated plots using ggplot2

Key Points

  • “Categorical data, aka factors can control the order of data in plots”
  • “ggplot makes it easy to make many different types of plots”
  • “ggplot have many useful extensions”