Getting Started with Pandas

Getting Started with Pandas#

Before we get started, you may need to install Pandas - depending on how/where you are running Python. This guide is intented for people running a local installation of Python and using Jupyter Notebook or Jupyter Lab.

There are many ways to install Pandas. Ours is just one. You can read more here.

Installing (or upgrading) Pandas from PyPI#

Pandas can be installed via pip from PyPi.

This command installs the pandas library using pip, Python’s package installer.
If you have not installed Pandas before, this will download and install the latest version of Pandas along with its dependencies.

  pip install pandas

pip commands are most often run through a shell such as the Terminal on macOS or the Command Prompt on Windows.
However, you can run shell commands directly from Jupyter notebooks by adding an exclamation mark ! in front of the command:

! pip install pandas

Adding the --upgrade flag not only installs pandas if it is not already installed but also ensures that if Pandas is already installed, it is updated to the latest version. This is useful to make sure you have the newest features and bug fixes.

pip install --upgrade pandas

Again, we can run this shell command directly in Jupyter notebooks:

! pip install --upgrade pandas

ModuleNotFoundError

When using Pandas, you may encounter the error message ModuleNotFoundError. This is caused by so-called Pandas dependencies.

When you install the pandas library for Python, it requires other libraries to function correctly. These required libraries are called dependencies. Dependencies are external libraries that provide additional functionality or capabilities that the main library (in this case, Pandas) relies on to operate.

When you install Pandas using pip, Python’s package installer, it automatically installs any required dependencies. However, optional dependencies are not installed by default and must be installed separately if needed.

In our case, we will be working with Excel files. All Excel-related dependencies can be installed in one go with this command:

pip install "pandas[excel]"

Or - if run in a Jupyter notebook:

! pip install "pandas[excel]"

Importing Pandas#

Even though we have now installed Pandas, we still need to import Pandas when we want to use it in our code.
This is conventionally done at the top of the script (together with any other imports) to make it easier for future readers (including ourselves!) to see, which packages are used.

Aliasing pandas as pd is a widely adopted convention that simplifies the syntax for accessing its functionalities.
After this statement, you can use pd to access all the functionalities provided by the pandas library.

 # This line imports the pandas library and aliases it as 'pd'.

import pandas as pd

Getting Started with Pandas

Contents

Getting Started with Pandas#

Installing (or upgrading) Pandas from PyPI#

Importing Pandas#