This lesson is being piloted (Beta version)

Further mapping

Overview

Teaching: 10 min
Exercises: 5 min
Questions
  • Can we show data using something other than position?

  • What is correct, colour or color?

  • How do I find out what a geom_ can do?

Objectives
  • Learn to plot more than just positions

We saw how to map data to a position in a scatterplot. But we are able to map the data to other elements of a plot, eg the color of the points.

ggplot(data = diamonds, mapping = aes(x = carat, y = price, colour = color)) +
  geom_point() 
plot of chunk chunk1

plot of chunk chunk1

The argument to which we are mapping the values in the column color is also called colour, making the code look a bit weird.

Are these colours suitable? Probably not. The authors of this course material are not able to distinguish all of the colours. We will return to how to change colours in plots later in this course.

Spelling

Color, and some other words can be spelled in more than one way. For arguments ggplot understands both the correct english spelling colour and the american spelling color.

Note that this only applies to the arguments in the functions. If the column in the dataset is called color ggplot will not find it if you write colour instead.

Not surprisingly, the “best” color, D have higher prices than the “worst” color, “J”.

A common mistake is to place the color argument a wrong place:

ggplot(data = diamonds, mapping = aes(x = carat, y = price), colour = color) +
  geom_point()
plot of chunk chunk2

plot of chunk chunk2

What happened to the colour? The colour argument is outside the aes() function. That means that we are not mapping data to the colour!

What else can we map data to?

ggplot(data = diamonds, mapping = aes(x = carat, y = price, colour = color, size = table)) +
  geom_point()
plot of chunk unnamed-chunk-2

plot of chunk unnamed-chunk-2

Not at good plot… We need to think about the combination of stuff we want to plot. Often two plots are better than trying to cram everything into a single plot.

What can be mapped to the plot depends on the geom we are using.

Calling the help function, eg ?geom_point, on a geom will provide insight on that question. Doing it on the geom_point() function, reveals that x and y are mandatory because they are in bold.

The list of stuff we can map data to in geom_point:

Different geom_ functions have different mandatory/required aesthetics.

Not really mapping. Sorta.

Rather than mapping values from the data to an aesthetic, we can provide values directly. One very useful aesthetic to play with, at least when we have as many datapoints as we have here, is alpha:

ggplot(data = diamonds, mapping = aes(x = carat, y = price, color = color)) +
  geom_point(alpha = 0.1)
plot of chunk unnamed-chunk-3

plot of chunk unnamed-chunk-3

alpha controls the transparency of the points plotted, and is a handy way of handling overplotting, the phenomenon that multiple data points might be identical.

geoms

geom_point() is the function we use to make scatter plots; because points is a geometric object. Other geometric objects can be plotted: geom_histogram() will plot histograms geom_line() will plot lines

All geometries in ggplot2 are named using the pattern geom_

What kind of shapes can we plot?

In principle any kind of shape. But without having to program them ourself, these are available directly in ggplot. They are numbered, because it is easier to write “14” than “square box with upwardspointing triangle inside”.

plot of chunk point_shapes

plot of chunk point_shapes

Key Points

  • Data can be plottet as something other than position

  • Types of plots are determined by geom_functions