Getting Started with Pandas#
Before we get started, you may need to install Pandas - depending on how/where you are running Python. This guide is intented for people running a local installation of Python and using Jupyter Notebook or Jupyter Lab.
There are many ways to install Pandas. Ours is just one. You can read more here.
Installing (or upgrading) Pandas from PyPI#
Pandas can be installed via pip
from PyPi.
This command installs the pandas
library using pip
, Python’s package installer.
If you have not installed Pandas before, this will download and install the latest version of Pandas along with its dependencies.
pip install pandas
pip
commands are most often run through a shell such as the Terminal on macOS or the Command Prompt on Windows.
However, you can run shell commands directly from Jupyter notebooks by adding an exclamation mark !
in front of the command:
! pip install pandas
Adding the --upgrade
flag not only installs pandas if it is not already installed but also ensures that if Pandas is already installed, it is updated to the latest version. This is useful to make sure you have the newest features and bug fixes.
pip install --upgrade pandas
Again, we can run this shell command directly in Jupyter notebooks:
! pip install --upgrade pandas
ModuleNotFoundError
When using Pandas, you may encounter the error message ModuleNotFoundError
. This is caused by so-called Pandas dependencies.
When you install the pandas
library for Python, it requires other libraries to function correctly. These required libraries are called dependencies.
Dependencies are external libraries that provide additional functionality or capabilities that the main library (in this case, Pandas) relies on to operate.
When you install Pandas using pip
, Python’s package installer, it automatically installs any required dependencies. However, optional dependencies are not installed by default and must be installed separately if needed.
You can read more about Pandas dependencies here.
In our case, we will be working with Excel files. All Excel-related dependencies can be installed in one go with this command:
pip install "pandas[excel]"
Or - if run in a Jupyter notebook:
! pip install "pandas[excel]"
Importing Pandas#
Even though we have now installed Pandas, we still need to import Pandas when we want to use it in our code.
This is conventionally done at the top of the script (together with any other imports) to make it easier for future readers (including ourselves!) to see, which packages are used.
Aliasing pandas
as pd
is a widely adopted convention that simplifies the syntax for accessing its functionalities.
After this statement, you can use pd
to access all the functionalities provided by the pandas
library.
# This line imports the pandas library and aliases it as 'pd'.
import pandas as pd