This lesson is being piloted (Beta version)

Facetting

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • What is facetting?

Objectives
  • Learn to use small multiples in your plots

Small multiples

If we only make one plot we quickly runs into the problem of trying to plot too much information in the plot. Here we plot the price against carat, color by the color of the diamonds. And represent their clarity by the shape of the points:

ggplot(data = diamonds, mapping = aes(x = carat, y = price, color = color, shape = clarity)) +
  geom_point()
Warning: Using shapes for an ordinal variable is not advised
Warning: The shape palette can deal with a maximum of 6 discrete values because
more than 6 becomes difficult to discriminate; you have 8. Consider
specifying shapes manually if you must have them.
Warning: Removed 5445 rows containing missing values (`geom_point()`).
plot of chunk unnamed-chunk-2

plot of chunk unnamed-chunk-2

This is probably not the best way to discover patterns in the data. It is actually so bad that ggplot warns us that we are using too many different shapes.

One way of handling that, is to plot “small multiples” of the data.

Instead of plotting information on the clarity of the diamonds in one plot, along with all the other information, we make one plot for each value of clarity. This is called facetting:

ggplot(data = diamonds, mapping = aes(x = carat, y = price, color = color)) +
  geom_point() +
  facet_wrap(~clarity)
plot of chunk unnamed-chunk-3

plot of chunk unnamed-chunk-3

Here we can see that the price rises more rapidly with size, for the better clarities, something that would have been impossible to see in the previous plot.

Hidden connections

facet_wrap is specified using the ~clarity notation.

This is similar to the way R specifies formulas:

y ~ x would in eg. regressions in R specify that y is a function of x.

Here we can think of it as “making plots that are a function of clarity”

The fundamental idea behind faceting, is the concept “small multiples”, popularised by Edward Tufte. He describes it as (resembling) “the frames of a movie: a series of graphics, showing the same combination of variables, indexed by changes in another variable.” The method is also known as “trellis”, “lattice”, “grid” or “panel” charts. They allows us to break down a very “busy” chart, containing too much information, making it possible for the reader of the charts to walk through them one category at a time, and make comparisons.

exercise

Plot price as a function of depth (price on the y-axis, depth on the x-axis), and facet by cut. If you want a colorful plot, color the points by color.

Solution

ggplot(data = diamonds, mapping = aes(x = depth, y = price, color = color)) +

geom_point() +

facet_wrap(~cut)

Note that for the better cuts, diamonds are cut to pretty specific proportions. Worse (Fair) diamonds have more varied proportions.

More than one multiple

We can expand on the “small multiple” concept, by plotting the facets in a grid, defined by two categorical values.

In this plot we plot price as a function of carate, and make individual plots for each combination of clarity and color:

diamonds %>% 
  ggplot(aes(x = carat, y = price, color = color)) +
    geom_point() +
    facet_grid(clarity ~ color)
plot of chunk unnamed-chunk-4

plot of chunk unnamed-chunk-4

Be careful using facets, especially facet_grid when you work with small datasets. You might end up with too little data in each facet.

Key Points

  • Facetting can make busy plots more understandable

  • Grid facetting in two dimensions allows us to plot even more variables