Basic Visualisations in Pandas#

Plotting in Pandas

Learning Objectives

Questions:

  • How to get started with plotting using Pandas.

Objectives:

  • Understand the basics of the .plot() syntax.

  • Visualise your data with various plots.

Basic usage#

The .plot() method in Pandas is a simple and convenient way to create visualisations from your data. You can call this method directly on a Series or DataFrame object, and it will generate a plot based on the data it contains.

One of the key features of the .plot() method is its flexibility in creating different types of plots. By using the kind parameter, you can specify the type of plot you want to generate, such as a line plot, bar plot, histogram, or scatter plot, among others. Additionally, the .plot() method provides several other parameters that allow you to customise the appearance of the plot, such as setting labels for the axes, choosing specific columns to plot, and adjusting the layout.


The .plot() syntax#

The general syntax for using the .plot() method is as follows:

DataFrame.plot(kind='plot_type', x='x_column', y='y_column', title='Plot Title', **kwargs)

Key parameters#

  • kind: Specifies the type of plot to generate. The available options include:

    • 'line': Line plot (default)

    • 'bar': Vertical bar plot

    • 'barh': Horizontal bar plot

    • 'hist': Histogram

    • 'box': Boxplot

    • 'kde'/'density': Kernel Density Estimation plot

    • 'area': Area plot

    • 'pie': Pie plot

    • 'scatter': Scatter plot (DataFrame only)

    • 'hexbin': Hexbin plot (DataFrame only)

  • x: (Optional) Specifies the column to use for the x-axis

  • y: (Optional) Specifies the column to use for the y-axis

  • title: (Optional) Specifies the title of the plot

  • kwargs: (Optional) Represents additional keyword arguments for customisation, such as labels, titles, and colors

We will not cover additional keyword arguments here, as this is just a quick introduction. However, you can find more information on the the .plot() method in the official Pandas documentation.


Examples#

# This line imports the pandas library and aliases it as 'pd'.

import pandas as pd
# Load the Titanic dataset from a CSV file into a DataFrame named 'titanic'.

titanic = pd.read_csv('data/titanic.csv')

Pie chart#

This pie chart visualises the distribution of passengers by sex (Sex).
It shows the proportion of male and female passengers in the dataset.

# Basic pie chart of passenger distribution by sex

titanic['Sex'].value_counts().plot(kind='pie', title='Passenger Distribution by Sex')
<Axes: title={'center': 'Passenger Distribution by Sex'}, ylabel='count'>
../../_images/646af5f1f23c13ffa0a3f35d9d2efdcde5246faae696cb190fa1bbc9cde756ea.png

Scatter plot#

This scatter plot illustrates the relationship between passengers’ age and fare.
Each dot represents a passenger, with their age on the x-axis and the fare they paid on the y-axis.

# Scatter plot (requires x and y columns)

titanic.plot(kind='scatter', x='Age', y='Fare', title='Age vs Fare')
<Axes: title={'center': 'Age vs Fare'}, xlabel='Age', ylabel='Fare'>
../../_images/d382cc0944a11b17225f1c42b3f45488907dc93b45e35987131240700ab166e4.png

Bar plot#

This bar plot shows the distribution of passengers across the different classes (Pclass), sorted by class.
The height of each bar indicates the number of passengers in each class.

# Bar plot sorted by class

titanic['Pclass'].value_counts().sort_index().plot(kind='bar', title='Passenger Class Distribution (Sorted)')
<Axes: title={'center': 'Passenger Class Distribution (Sorted)'}, xlabel='Pclass'>
../../_images/1de4be462bc64634144dd2d461f11e1b19f5d0d4a546a34073a5616a9542e0af.png

Customising plots#

The .plot() method is highly customisable, giving you the flexibility to adjust various aspects of your visualisations, such as the plot type, axis labels, titles, colours, and overall figure aesthetics. Whether you need a simple line chart, a detailed scatter plot, or a complex histogram, you can create a wide range of plots with just a few lines of code.


Key points#

  • Pandas’ .plot() method is a tool for quickly generating a wide range of plots directly from your data.

  • It supports various types of plots, including line plots, bar charts, scatter plots, and histograms.

  • The .plot() method allows for both quick exploratory analysis and more polished, presentation-ready graphics.

  • For more elaborate and detailed plotting, consider using the matplotlib library directly.