{
"cells": [
{
"cell_type": "markdown",
"id": "513cfe15-50b1-4385-a0a9-16b337d970bc",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# Subsets"
]
},
{
"cell_type": "markdown",
"id": "f9ddee47-e8ab-4f68-b283-2e6c71b6aa52",
"metadata": {},
"source": [
"```{admonition} Learning Objectives\n",
"\n",
"Questions:\n",
"* How can we select and filter specific columns and rows in a `DataFrame`?\n",
"\n",
"Objectives:\n",
"* Select specific columns from a `DataFrame` using square bracket notation (`[]`).\n",
"* Use the `loc` property for label-based indexing in a `DataFrame`.\n",
"* Use the `iloc` property for position-based indexing in a `DataFrame`.\n",
"* Filter rows based on the presence or absence of missing values.\n",
"* Understand the differences and complementary uses of `.loc[]` and `.iloc[]`. "
]
},
{
"cell_type": "markdown",
"id": "b90cc6fd-2f57-4f94-bff7-cf6b0aace5f5",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Import Pandas"
]
},
{
"cell_type": "markdown",
"id": "75fa3e37-d901-4cdc-8822-5b6b0102ce4e",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"remove-cell"
]
},
"source": [
"Aliasing pandas as `pd` is a widely adopted convention that simplifies the syntax for accessing its functionalities.\\\n",
"After this statement, you can use `pd` to access all the functionalities provided by the pandas library."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "765b4040-4b8a-420d-bad5-219654bc6174",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# This line imports the pandas library and aliases it as 'pd'.\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"id": "4a72659e-005b-4390-a8fe-4dd61b356945",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "b4f7a487-146c-4573-904e-45601338891f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Creating a `DataFrame` from a CSV file"
]
},
{
"cell_type": "markdown",
"id": "f1e434ec-fdb0-4c8a-89a4-7281de89308f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"The `pd.read_csv()` function is used to read the data from the file 'data/titanic.csv'."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "fe1c7a91-0caf-4eae-bf55-0938bcdaa851",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [],
"source": [
"# Load the Titanic dataset from a CSV file into a DataFrame named 'titanic'.\n",
"\n",
"titanic = pd.read_csv('data/titanic.csv')"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "9d5a2159-bb3b-4a20-94be-21ea37755071",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 886 | \n",
" 887 | \n",
" 0 | \n",
" 2 | \n",
" Montvila, Rev. Juozas | \n",
" male | \n",
" 27.0 | \n",
" 0 | \n",
" 0 | \n",
" 211536 | \n",
" 13.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 887 | \n",
" 888 | \n",
" 1 | \n",
" 1 | \n",
" Graham, Miss. Margaret Edith | \n",
" female | \n",
" 19.0 | \n",
" 0 | \n",
" 0 | \n",
" 112053 | \n",
" 30.0000 | \n",
" B42 | \n",
" S | \n",
"
\n",
" \n",
" 888 | \n",
" 889 | \n",
" 0 | \n",
" 3 | \n",
" Johnston, Miss. Catherine Helen \"Carrie\" | \n",
" female | \n",
" NaN | \n",
" 1 | \n",
" 2 | \n",
" W./C. 6607 | \n",
" 23.4500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 889 | \n",
" 890 | \n",
" 1 | \n",
" 1 | \n",
" Behr, Mr. Karl Howell | \n",
" male | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 111369 | \n",
" 30.0000 | \n",
" C148 | \n",
" C | \n",
"
\n",
" \n",
" 890 | \n",
" 891 | \n",
" 0 | \n",
" 3 | \n",
" Dooley, Mr. Patrick | \n",
" male | \n",
" 32.0 | \n",
" 0 | \n",
" 0 | \n",
" 370376 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
891 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
".. ... ... ... \n",
"886 887 0 2 \n",
"887 888 1 1 \n",
"888 889 0 3 \n",
"889 890 1 1 \n",
"890 891 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
".. ... ... ... ... \n",
"886 Montvila, Rev. Juozas male 27.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"889 Behr, Mr. Karl Howell male 26.0 0 \n",
"890 Dooley, Mr. Patrick male 32.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
".. ... ... ... ... ... \n",
"886 0 211536 13.0000 NaN S \n",
"887 0 112053 30.0000 B42 S \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"889 0 111369 30.0000 C148 C \n",
"890 0 370376 7.7500 NaN Q \n",
"\n",
"[891 rows x 12 columns]"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Display the DataFrame 'titanic'.\n",
"\n",
"titanic"
]
},
{
"cell_type": "markdown",
"id": "13331c76-ef8d-4881-9229-b6dbf6a34447",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "688e5d6e-f20e-41ac-b44f-aaea410cd399",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Selecting specific columns\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "13a500a8-9996-4478-8b37-496110b587d1",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"Selecting specific columns is useful when you want to work with specific subsets of the dataset e.g. you might want to perform statistical calculations based on only the colum `Age` or later on when you want to access and manipulate your data in more complex ways. \n",
"\n",
"Access the 'Age' column from the `DataFrame` 'titanic' to return a `Series` object containing all the data in the 'Age' column:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "6752e1f0-6227-4128-b2f5-323c429ffe5c",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"0 22.0\n",
"1 38.0\n",
"2 26.0\n",
"3 35.0\n",
"4 35.0\n",
" ... \n",
"886 27.0\n",
"887 19.0\n",
"888 NaN\n",
"889 26.0\n",
"890 32.0\n",
"Name: Age, Length: 891, dtype: float64"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titanic['Age']"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "0e3d6cdf-0b96-489c-9271-b0dcf887682d",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.series.Series"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Check the type of the 'Age' column in 'titanic' using the 'type()' function.\n",
"\n",
"type(titanic['Age'])"
]
},
{
"cell_type": "markdown",
"id": "f2c9d264-6190-49dd-96c5-ea5ffb177a41",
"metadata": {},
"source": [
"Use the `shape` attribute to determine the dimensions of the `Series`.\\\n",
"It returns a tuple representing the number of rows and columns (rows, columns)."
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "ddcdfad9-cfb3-413c-bf2a-f2b485ec0d50",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"(891,)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titanic['Age'].shape"
]
},
{
"cell_type": "markdown",
"id": "1b48c4ac-45cf-449b-bcf1-da1ef4408e41",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "54af87e3-2151-4b56-9cca-6fb72cada392",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"### Calling multiple `Series`"
]
},
{
"cell_type": "markdown",
"id": "14237564-b732-47a6-a52d-c2af0c00694a",
"metadata": {},
"source": [
"Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "dc8709c5-0b04-4869-8668-cc5144571fd5",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Age | \n",
" Sex | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 22.0 | \n",
" male | \n",
"
\n",
" \n",
" 1 | \n",
" 38.0 | \n",
" female | \n",
"
\n",
" \n",
" 2 | \n",
" 26.0 | \n",
" female | \n",
"
\n",
" \n",
" 3 | \n",
" 35.0 | \n",
" female | \n",
"
\n",
" \n",
" 4 | \n",
" 35.0 | \n",
" male | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 886 | \n",
" 27.0 | \n",
" male | \n",
"
\n",
" \n",
" 887 | \n",
" 19.0 | \n",
" female | \n",
"
\n",
" \n",
" 888 | \n",
" NaN | \n",
" female | \n",
"
\n",
" \n",
" 889 | \n",
" 26.0 | \n",
" male | \n",
"
\n",
" \n",
" 890 | \n",
" 32.0 | \n",
" male | \n",
"
\n",
" \n",
"
\n",
"
891 rows × 2 columns
\n",
"
"
],
"text/plain": [
" Age Sex\n",
"0 22.0 male\n",
"1 38.0 female\n",
"2 26.0 female\n",
"3 35.0 female\n",
"4 35.0 male\n",
".. ... ...\n",
"886 27.0 male\n",
"887 19.0 female\n",
"888 NaN female\n",
"889 26.0 male\n",
"890 32.0 male\n",
"\n",
"[891 rows x 2 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select the columns 'Age' and 'Sex' from the 'titanic' DataFrame.\n",
"\n",
"titanic[['Age', 'Sex']]"
]
},
{
"cell_type": "markdown",
"id": "2f8e7d7d-4ae1-4542-8caf-31787d7d8081",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"```{note}\n",
"The inner square brackets define a Python `list` with column names, whereas the outer square brackets are used to select the data from a pandas `DataFrame` as seen in the previous example."
]
},
{
"cell_type": "markdown",
"id": "8190374b-8681-48c0-a0cc-028dd9e1f936",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets, and then apply the `type()` function to the resulting `DataFrame` subset:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "41856f83-9f98-4b7a-9800-4acf1c222588",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"pandas.core.frame.DataFrame"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Use the 'type()' function to determine the data type of the DataFrame subset.\n",
"\n",
"type(titanic[['Age', 'Sex']])"
]
},
{
"cell_type": "markdown",
"id": "e6c073f3-f58f-462c-98f1-d027cb0dfc41",
"metadata": {},
"source": [
"Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets, and then apply the `shape` attribute to the resulting `DataFrame` subset:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "b0388ea7-0e3e-4ebe-b9e7-dad3e8271928",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/plain": [
"(891, 2)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Use the 'shape' attribute to determine the dimensions of the DataFrame subset.\n",
"\n",
"titanic[['Age', 'Sex']].shape"
]
},
{
"cell_type": "markdown",
"id": "6cb6bcc8-8a2a-428f-9102-394ee6c333d3",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "6035c0e3-bd8f-4b93-bb6a-65ed95db22ea",
"metadata": {},
"source": [
"## Introducing `.loc[]` and `.iloc[]`"
]
},
{
"cell_type": "markdown",
"id": "975963cb-241a-4036-b90e-c3c1290511a4",
"metadata": {},
"source": [
"Now that you’ve learned how to select specific columns from a `DataFrame`, it’s time to explore more advanced and flexible ways to access and manipulate your data. As you start filtering rows based on conditions or need to select both rows and columns simultaneously, using `.loc[]` and `.iloc[]` will give you greater control and clarity in your code.\n",
"\n",
"These powerful tools allow you to:\n",
"\n",
"* Safely and explicitly select rows and columns by labels with `.loc[]`.\n",
"* Access rows and columns by their integer positions using `.iloc[]`.\n",
"* Avoid potential issues that can arise with more complex operations, ensuring your data manipulations are clear, consistent, and reliable.\n",
"\n",
"With these methods, you’ll be better equipped to handle more complex data tasks efficiently."
]
},
{
"cell_type": "markdown",
"id": "b4d65d36-9d60-41a2-adcf-ba6ac1c5e2e3",
"metadata": {},
"source": [
"```{admonition} What exactly are loc and iloc?\n",
":class: tip dropdown\n",
"\n",
"`loc` and `iloc` in Pandas are technically properties that return a specialized indexer object. These indexer objects allow for label-based (`loc`) and integer-based (`iloc`) indexing. Although they behave similarly to methods in that you use them to perform actions (i.e., selecting or slicing data), they are properties because they return an indexing object rather than directly executing a function.\n",
"\n",
"When you access `loc` or `iloc` on a `DataFrame`, what you are getting is not a simple value but an object that has its own methods and behaviors, specifically designed to handle indexing operations. This design allows for the concise and powerful data selection syntax that Pandas is known for.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "e0f2bd5c-8ea4-4fd9-9d4a-2c527ab072ea",
"metadata": {},
"source": [
"```{admonition} Avoid ambiguous operations\n",
":class: warning dropdown\n",
"\n",
"Some of the operations we’ll be covering can technically be performed without using `.loc[]` or `.iloc[]`. For instance, you might see data selections or modifications done with simple square brackets like `df[df['column'] > value]`. While this can work for basic tasks, it can lead to issues in more complex scenarios, such as unexpected behavior, ambiguity, or performance inefficiencies.\n",
"\n",
"**Why is this a bad idea?**\n",
"\n",
"- **Ambiguity**: It’s not always clear if you're modifying a copy of your data or the original `DataFrame`, which can lead to unintended side effects.\n",
"- **Potential errors**: Using square brackets for complex selections might trigger warnings like `SettingWithCopyWarning`, indicating that your code might not be doing what you think it is.\n",
"- **Lack of flexibility**: As your data operations become more complex, relying on square brackets can limit your ability to write clear and maintainable code.\n",
"\n",
"For these reasons, we recommend using `.loc[]` and `.iloc[]` as they provide a more reliable and explicit way to handle data selection and modification in Pandas.\n",
"\n",
"If you’re interested in learning more about the potential pitfalls of using square brackets and the benefits of `.loc[]` and `.iloc[]`, you can read more in the [Pandas documentation on indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html).\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "99663d1a-3f33-44fa-9387-1e9b1766748d",
"metadata": {},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "961bdfdf-11b0-474a-bd74-4500fad73448",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Filtering rows based on conditional expressions\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "7ece850f-2344-49d6-8dfb-6fed8c58f18c",
"metadata": {},
"source": [
"With `.loc[]` and `.iloc[]`, you have more control and flexibility when filtering rows and selecting columns in a `DataFrame`.\n",
"\n",
"Let’s start by exploring how you can use these tools to filter rows based on conditions."
]
},
{
"cell_type": "markdown",
"id": "903513f6-4ba8-4a19-850f-918b5dd97b21",
"metadata": {},
"source": [
"### Filtering specific rows with `.loc[]`\n",
"\n",
"To filter rows based on a condition, you can place the condition inside the `.loc[]` selector.\n",
"\n",
"Here’s how you can filter rows where the `Age` column is greater than 35:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "60ff0a11-3cbe-48e0-8ee0-28fd8d492d45",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 6 | \n",
" 7 | \n",
" 0 | \n",
" 1 | \n",
" McCarthy, Mr. Timothy J | \n",
" male | \n",
" 54.0 | \n",
" 0 | \n",
" 0 | \n",
" 17463 | \n",
" 51.8625 | \n",
" E46 | \n",
" S | \n",
"
\n",
" \n",
" 11 | \n",
" 12 | \n",
" 1 | \n",
" 1 | \n",
" Bonnell, Miss. Elizabeth | \n",
" female | \n",
" 58.0 | \n",
" 0 | \n",
" 0 | \n",
" 113783 | \n",
" 26.5500 | \n",
" C103 | \n",
" S | \n",
"
\n",
" \n",
" 13 | \n",
" 14 | \n",
" 0 | \n",
" 3 | \n",
" Andersson, Mr. Anders Johan | \n",
" male | \n",
" 39.0 | \n",
" 1 | \n",
" 5 | \n",
" 347082 | \n",
" 31.2750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 15 | \n",
" 16 | \n",
" 1 | \n",
" 2 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" female | \n",
" 55.0 | \n",
" 0 | \n",
" 0 | \n",
" 248706 | \n",
" 16.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 865 | \n",
" 866 | \n",
" 1 | \n",
" 2 | \n",
" Bystrom, Mrs. (Karolina) | \n",
" female | \n",
" 42.0 | \n",
" 0 | \n",
" 0 | \n",
" 236852 | \n",
" 13.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 871 | \n",
" 872 | \n",
" 1 | \n",
" 1 | \n",
" Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | \n",
" female | \n",
" 47.0 | \n",
" 1 | \n",
" 1 | \n",
" 11751 | \n",
" 52.5542 | \n",
" D35 | \n",
" S | \n",
"
\n",
" \n",
" 873 | \n",
" 874 | \n",
" 0 | \n",
" 3 | \n",
" Vander Cruyssen, Mr. Victor | \n",
" male | \n",
" 47.0 | \n",
" 0 | \n",
" 0 | \n",
" 345765 | \n",
" 9.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 879 | \n",
" 880 | \n",
" 1 | \n",
" 1 | \n",
" Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | \n",
" female | \n",
" 56.0 | \n",
" 0 | \n",
" 1 | \n",
" 11767 | \n",
" 83.1583 | \n",
" C50 | \n",
" C | \n",
"
\n",
" \n",
" 885 | \n",
" 886 | \n",
" 0 | \n",
" 3 | \n",
" Rice, Mrs. William (Margaret Norton) | \n",
" female | \n",
" 39.0 | \n",
" 0 | \n",
" 5 | \n",
" 382652 | \n",
" 29.1250 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
217 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"1 2 1 1 \n",
"6 7 0 1 \n",
"11 12 1 1 \n",
"13 14 0 3 \n",
"15 16 1 2 \n",
".. ... ... ... \n",
"865 866 1 2 \n",
"871 872 1 1 \n",
"873 874 0 3 \n",
"879 880 1 1 \n",
"885 886 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"6 McCarthy, Mr. Timothy J male 54.0 0 \n",
"11 Bonnell, Miss. Elizabeth female 58.0 0 \n",
"13 Andersson, Mr. Anders Johan male 39.0 1 \n",
"15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n",
".. ... ... ... ... \n",
"865 Bystrom, Mrs. (Karolina) female 42.0 0 \n",
"871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n",
"873 Vander Cruyssen, Mr. Victor male 47.0 0 \n",
"879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n",
"885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"1 0 PC 17599 71.2833 C85 C \n",
"6 0 17463 51.8625 E46 S \n",
"11 0 113783 26.5500 C103 S \n",
"13 5 347082 31.2750 NaN S \n",
"15 0 248706 16.0000 NaN S \n",
".. ... ... ... ... ... \n",
"865 0 236852 13.0000 NaN S \n",
"871 1 11751 52.5542 D35 S \n",
"873 0 345765 9.0000 NaN S \n",
"879 1 11767 83.1583 C50 C \n",
"885 5 382652 29.1250 NaN Q \n",
"\n",
"[217 rows x 12 columns]"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows in the 'titanic' DataFrame where the 'Age' column is greater than 35.\n",
"\n",
"titanic.loc[titanic['Age'] > 35]"
]
},
{
"cell_type": "markdown",
"id": "38ac84cf-9f52-4c01-bca9-e46fb87d66c1",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"This command filters the rows where the condition `titanic['Age'] > 35` is `True`, and returns the entire `DataFrame` for those rows.\n",
"\n",
"If you only want to select specific columns (e.g., `Name` and `Pclass`) for those filtered rows, you can add the column names after the comma:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "13936ecd-9c28-45bd-9edb-e4727a81d081",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Pclass | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" McCarthy, Mr. Timothy J | \n",
" 1 | \n",
"
\n",
" \n",
" 11 | \n",
" Bonnell, Miss. Elizabeth | \n",
" 1 | \n",
"
\n",
" \n",
" 13 | \n",
" Andersson, Mr. Anders Johan | \n",
" 3 | \n",
"
\n",
" \n",
" 15 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" 2 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 865 | \n",
" Bystrom, Mrs. (Karolina) | \n",
" 2 | \n",
"
\n",
" \n",
" 871 | \n",
" Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | \n",
" 1 | \n",
"
\n",
" \n",
" 873 | \n",
" Vander Cruyssen, Mr. Victor | \n",
" 3 | \n",
"
\n",
" \n",
" 879 | \n",
" Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | \n",
" 1 | \n",
"
\n",
" \n",
" 885 | \n",
" Rice, Mrs. William (Margaret Norton) | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
217 rows × 2 columns
\n",
"
"
],
"text/plain": [
" Name Pclass\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n",
"6 McCarthy, Mr. Timothy J 1\n",
"11 Bonnell, Miss. Elizabeth 1\n",
"13 Andersson, Mr. Anders Johan 3\n",
"15 Hewlett, Mrs. (Mary D Kingcome) 2\n",
".. ... ...\n",
"865 Bystrom, Mrs. (Karolina) 2\n",
"871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1\n",
"873 Vander Cruyssen, Mr. Victor 3\n",
"879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1\n",
"885 Rice, Mrs. William (Margaret Norton) 3\n",
"\n",
"[217 rows x 2 columns]"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows where 'Age' is greater than 35 and select only the 'Name' and 'Pclass' columns.\n",
"\n",
"titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]"
]
},
{
"cell_type": "markdown",
"id": "346705f6-3f5a-4f50-a4c2-23e05aa9f353",
"metadata": {},
"source": [
"```{admonition} When using loc/iloc\n",
":class: tip\n",
"\n",
"The part *before the comma* is the *rows* you want to select.\n",
"\n",
"The part *after the comma* is the *columns* you want to select.\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "39b84234-a6f9-48c8-82a6-e9515d0afa7e",
"metadata": {},
"source": [
"### Using `.iloc[]` for positional indexing\n",
"\n",
"The `.iloc[]` method works similarly to `.loc[]`, but it selects rows and columns based on their integer positions rather than labels.\n",
"\n",
"For example, to select the first 5 rows and the first 2 columns, you can use:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "7dad5185-69b3-41ce-9de0-626ea02ef6b4",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived\n",
"0 1 0\n",
"1 2 1\n",
"2 3 1\n",
"3 4 1\n",
"4 5 0"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select the first 5 rows and the first 2 columns using .iloc[].\n",
"\n",
"titanic.iloc[:5, :2]"
]
},
{
"cell_type": "markdown",
"id": "c3762ceb-ae60-4877-a117-03260cbe3c7a",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"### Inclusive vs. Exclusive Slicing\n",
"\n",
"When slicing with `.loc[]`, the slicing is **inclusive** of the start and end labels. For example:"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "f1d37a3e-c4c1-4687-99b1-6366fb1b83bb",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Pclass | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Braund, Mr. Owen Harris | \n",
" 3 | \n",
"
\n",
" \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" Heikkinen, Miss. Laina | \n",
" 3 | \n",
"
\n",
" \n",
" 3 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Name Pclass\n",
"0 Braund, Mr. Owen Harris 3\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n",
"2 Heikkinen, Miss. Laina 3\n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select rows 0 through 3 (inclusive) and columns 'Name' and 'Pclass'.\n",
"\n",
"titanic.loc[0:3, ['Name', 'Pclass']]"
]
},
{
"cell_type": "markdown",
"id": "47842350-bee8-4683-91f0-da8905771656",
"metadata": {},
"source": [
"This command returns rows 0, 1, 2, and 3, including both the start and end indices.\n",
"\n",
"When slicing with `.iloc[]`, the slicing is **exclusive** of the end position:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "319db576-5646-4549-83dc-67d4c1b9c2c7",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived\n",
"0 1 0\n",
"1 2 1\n",
"2 3 1"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Select rows 0 through 2 (exclusive of 3) and the first two columns using .iloc[].\n",
"\n",
"titanic.iloc[0:3, 0:2]"
]
},
{
"cell_type": "markdown",
"id": "e0be5523-1168-4a3d-be4c-0cea81a4a2cc",
"metadata": {},
"source": [
"### Using `.loc[]` and `.iloc[]` for complex conditions"
]
},
{
"cell_type": "markdown",
"id": "b48d6163-81a7-4bea-9fa8-cbdab2288cb5",
"metadata": {},
"source": [
"You can use both `.loc[]` and `.iloc[]` to filter rows based on more complex conditions, combining multiple criteria with logical operators.\n",
"\n",
"For instance, to filter rows where `Pclass` is either 2 or 3 using `.loc[]`:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "0e5b553f-8987-4f9a-90af-30a3ac194a88",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 5 | \n",
" 6 | \n",
" 0 | \n",
" 3 | \n",
" Moran, Mr. James | \n",
" male | \n",
" NaN | \n",
" 0 | \n",
" 0 | \n",
" 330877 | \n",
" 8.4583 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 7 | \n",
" 8 | \n",
" 0 | \n",
" 3 | \n",
" Palsson, Master. Gosta Leonard | \n",
" male | \n",
" 2.0 | \n",
" 3 | \n",
" 1 | \n",
" 349909 | \n",
" 21.0750 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 884 | \n",
" 885 | \n",
" 0 | \n",
" 3 | \n",
" Sutehall, Mr. Henry Jr | \n",
" male | \n",
" 25.0 | \n",
" 0 | \n",
" 0 | \n",
" SOTON/OQ 392076 | \n",
" 7.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 885 | \n",
" 886 | \n",
" 0 | \n",
" 3 | \n",
" Rice, Mrs. William (Margaret Norton) | \n",
" female | \n",
" 39.0 | \n",
" 0 | \n",
" 5 | \n",
" 382652 | \n",
" 29.1250 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
" 886 | \n",
" 887 | \n",
" 0 | \n",
" 2 | \n",
" Montvila, Rev. Juozas | \n",
" male | \n",
" 27.0 | \n",
" 0 | \n",
" 0 | \n",
" 211536 | \n",
" 13.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 888 | \n",
" 889 | \n",
" 0 | \n",
" 3 | \n",
" Johnston, Miss. Catherine Helen \"Carrie\" | \n",
" female | \n",
" NaN | \n",
" 1 | \n",
" 2 | \n",
" W./C. 6607 | \n",
" 23.4500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 890 | \n",
" 891 | \n",
" 0 | \n",
" 3 | \n",
" Dooley, Mr. Patrick | \n",
" male | \n",
" 32.0 | \n",
" 0 | \n",
" 0 | \n",
" 370376 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
675 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"0 1 0 3 Braund, Mr. Owen Harris \n",
"2 3 1 3 Heikkinen, Miss. Laina \n",
"4 5 0 3 Allen, Mr. William Henry \n",
"5 6 0 3 Moran, Mr. James \n",
"7 8 0 3 Palsson, Master. Gosta Leonard \n",
".. ... ... ... ... \n",
"884 885 0 3 Sutehall, Mr. Henry Jr \n",
"885 886 0 3 Rice, Mrs. William (Margaret Norton) \n",
"886 887 0 2 Montvila, Rev. Juozas \n",
"888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n",
"890 891 0 3 Dooley, Mr. Patrick \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"0 male 22.0 1 0 A/5 21171 7.2500 NaN S \n",
"2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n",
"4 male 35.0 0 0 373450 8.0500 NaN S \n",
"5 male NaN 0 0 330877 8.4583 NaN Q \n",
"7 male 2.0 3 1 349909 21.0750 NaN S \n",
".. ... ... ... ... ... ... ... ... \n",
"884 male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S \n",
"885 female 39.0 0 5 382652 29.1250 NaN Q \n",
"886 male 27.0 0 0 211536 13.0000 NaN S \n",
"888 female NaN 1 2 W./C. 6607 23.4500 NaN S \n",
"890 male 32.0 0 0 370376 7.7500 NaN Q \n",
"\n",
"[675 rows x 12 columns]"
]
},
"execution_count": 27,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows where 'Pclass' is either 2 or 3 using .loc[].\n",
"\n",
"titanic.loc[titanic['Pclass'].isin([2, 3])]"
]
},
{
"cell_type": "markdown",
"id": "3db35580-9be7-4c17-9d1d-975da8faab2f",
"metadata": {},
"source": [
"Similarly, if you want to perform position-based filtering using `.iloc[]`, you would first need to identify the positions of the relevant rows and columns. However, `.iloc[]` is typically less useful for condition-based filtering because it requires knowledge of the exact row and column positions rather than labels."
]
},
{
"cell_type": "markdown",
"id": "5b2120f5-0c03-41a4-8b98-4fbfb43ba993",
"metadata": {},
"source": [
"You can also combine multiple conditions using logical operators like `&` (AND) and `|` (OR):"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "a6ec973c-37e0-4aa7-bd61-ad4ef508206b",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Pclass | \n",
"
\n",
" \n",
" \n",
" \n",
" 13 | \n",
" Andersson, Mr. Anders Johan | \n",
" 3 | \n",
"
\n",
" \n",
" 15 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" 2 | \n",
"
\n",
" \n",
" 25 | \n",
" Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... | \n",
" 3 | \n",
"
\n",
" \n",
" 33 | \n",
" Wheadon, Mr. Edward H | \n",
" 2 | \n",
"
\n",
" \n",
" 40 | \n",
" Ahlin, Mrs. Johan (Johanna Persdotter Larsson) | \n",
" 3 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 854 | \n",
" Carter, Mrs. Ernest Courtenay (Lilian Hughes) | \n",
" 2 | \n",
"
\n",
" \n",
" 860 | \n",
" Hansen, Mr. Claus Peter | \n",
" 3 | \n",
"
\n",
" \n",
" 865 | \n",
" Bystrom, Mrs. (Karolina) | \n",
" 2 | \n",
"
\n",
" \n",
" 873 | \n",
" Vander Cruyssen, Mr. Victor | \n",
" 3 | \n",
"
\n",
" \n",
" 885 | \n",
" Rice, Mrs. William (Margaret Norton) | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
113 rows × 2 columns
\n",
"
"
],
"text/plain": [
" Name Pclass\n",
"13 Andersson, Mr. Anders Johan 3\n",
"15 Hewlett, Mrs. (Mary D Kingcome) 2\n",
"25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... 3\n",
"33 Wheadon, Mr. Edward H 2\n",
"40 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) 3\n",
".. ... ...\n",
"854 Carter, Mrs. Ernest Courtenay (Lilian Hughes) 2\n",
"860 Hansen, Mr. Claus Peter 3\n",
"865 Bystrom, Mrs. (Karolina) 2\n",
"873 Vander Cruyssen, Mr. Victor 3\n",
"885 Rice, Mrs. William (Margaret Norton) 3\n",
"\n",
"[113 rows x 2 columns]"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows where ('Pclass' is either 2 or 3) AND ('Age' is greater than 35),\n",
"# and then select the 'Name' and 'Pclass' columns.\n",
"\n",
"titanic.loc[((titanic['Pclass'] == 2) | (titanic['Pclass'] == 3)) & (titanic['Age'] > 35), ['Name', 'Pclass']]"
]
},
{
"cell_type": "markdown",
"id": "da9cb04b-4246-4c5f-be72-bf8f85fa0a5b",
"metadata": {},
"source": [
"### Handling missing values with `.loc[]` and `.iloc[]`"
]
},
{
"cell_type": "markdown",
"id": "406599e3-1a91-4e45-b609-cf6f3e1f7cc0",
"metadata": {},
"source": [
"Both `.loc[]` and `.iloc[]` allow you to filter rows based on the presence or absence of missing values.\\\n",
"(However, `.loc[]` is generally more convenient when working with labeled data, while `.iloc[]` requires knowing the exact positions.)\n",
"\n",
"For example, to filter rows where the `Embarked` column is not null using `.loc[]`:"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "6ac6403c-8cf1-4161-bd7c-13607d4dfe90",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" 1 | \n",
" 0 | \n",
" 3 | \n",
" Braund, Mr. Owen Harris | \n",
" male | \n",
" 22.0 | \n",
" 1 | \n",
" 0 | \n",
" A/5 21171 | \n",
" 7.2500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 1 | \n",
" 2 | \n",
" 1 | \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" female | \n",
" 38.0 | \n",
" 1 | \n",
" 0 | \n",
" PC 17599 | \n",
" 71.2833 | \n",
" C85 | \n",
" C | \n",
"
\n",
" \n",
" 2 | \n",
" 3 | \n",
" 1 | \n",
" 3 | \n",
" Heikkinen, Miss. Laina | \n",
" female | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" STON/O2. 3101282 | \n",
" 7.9250 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 3 | \n",
" 4 | \n",
" 1 | \n",
" 1 | \n",
" Futrelle, Mrs. Jacques Heath (Lily May Peel) | \n",
" female | \n",
" 35.0 | \n",
" 1 | \n",
" 0 | \n",
" 113803 | \n",
" 53.1000 | \n",
" C123 | \n",
" S | \n",
"
\n",
" \n",
" 4 | \n",
" 5 | \n",
" 0 | \n",
" 3 | \n",
" Allen, Mr. William Henry | \n",
" male | \n",
" 35.0 | \n",
" 0 | \n",
" 0 | \n",
" 373450 | \n",
" 8.0500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 886 | \n",
" 887 | \n",
" 0 | \n",
" 2 | \n",
" Montvila, Rev. Juozas | \n",
" male | \n",
" 27.0 | \n",
" 0 | \n",
" 0 | \n",
" 211536 | \n",
" 13.0000 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 887 | \n",
" 888 | \n",
" 1 | \n",
" 1 | \n",
" Graham, Miss. Margaret Edith | \n",
" female | \n",
" 19.0 | \n",
" 0 | \n",
" 0 | \n",
" 112053 | \n",
" 30.0000 | \n",
" B42 | \n",
" S | \n",
"
\n",
" \n",
" 888 | \n",
" 889 | \n",
" 0 | \n",
" 3 | \n",
" Johnston, Miss. Catherine Helen \"Carrie\" | \n",
" female | \n",
" NaN | \n",
" 1 | \n",
" 2 | \n",
" W./C. 6607 | \n",
" 23.4500 | \n",
" NaN | \n",
" S | \n",
"
\n",
" \n",
" 889 | \n",
" 890 | \n",
" 1 | \n",
" 1 | \n",
" Behr, Mr. Karl Howell | \n",
" male | \n",
" 26.0 | \n",
" 0 | \n",
" 0 | \n",
" 111369 | \n",
" 30.0000 | \n",
" C148 | \n",
" C | \n",
"
\n",
" \n",
" 890 | \n",
" 891 | \n",
" 0 | \n",
" 3 | \n",
" Dooley, Mr. Patrick | \n",
" male | \n",
" 32.0 | \n",
" 0 | \n",
" 0 | \n",
" 370376 | \n",
" 7.7500 | \n",
" NaN | \n",
" Q | \n",
"
\n",
" \n",
"
\n",
"
889 rows × 12 columns
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass \\\n",
"0 1 0 3 \n",
"1 2 1 1 \n",
"2 3 1 3 \n",
"3 4 1 1 \n",
"4 5 0 3 \n",
".. ... ... ... \n",
"886 887 0 2 \n",
"887 888 1 1 \n",
"888 889 0 3 \n",
"889 890 1 1 \n",
"890 891 0 3 \n",
"\n",
" Name Sex Age SibSp \\\n",
"0 Braund, Mr. Owen Harris male 22.0 1 \n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n",
"2 Heikkinen, Miss. Laina female 26.0 0 \n",
"3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n",
"4 Allen, Mr. William Henry male 35.0 0 \n",
".. ... ... ... ... \n",
"886 Montvila, Rev. Juozas male 27.0 0 \n",
"887 Graham, Miss. Margaret Edith female 19.0 0 \n",
"888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n",
"889 Behr, Mr. Karl Howell male 26.0 0 \n",
"890 Dooley, Mr. Patrick male 32.0 0 \n",
"\n",
" Parch Ticket Fare Cabin Embarked \n",
"0 0 A/5 21171 7.2500 NaN S \n",
"1 0 PC 17599 71.2833 C85 C \n",
"2 0 STON/O2. 3101282 7.9250 NaN S \n",
"3 0 113803 53.1000 C123 S \n",
"4 0 373450 8.0500 NaN S \n",
".. ... ... ... ... ... \n",
"886 0 211536 13.0000 NaN S \n",
"887 0 112053 30.0000 B42 S \n",
"888 2 W./C. 6607 23.4500 NaN S \n",
"889 0 111369 30.0000 C148 C \n",
"890 0 370376 7.7500 NaN Q \n",
"\n",
"[889 rows x 12 columns]"
]
},
"execution_count": 22,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows where 'Embarked' is not null using .loc[].\n",
"\n",
"titanic.loc[titanic['Embarked'].notna()]"
]
},
{
"cell_type": "markdown",
"id": "af3013f4-ed33-4052-b5eb-c9c52d7156da",
"metadata": {},
"source": [
"You can then compare the number of extracted rows with the original DataFrame to identify how many rows have missing data:"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "10cd8859-5c33-45c9-b327-3708c4e0b05d",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"RangeIndex: 891 entries, 0 to 890\n",
"Data columns (total 12 columns):\n",
" # Column Non-Null Count Dtype \n",
"--- ------ -------------- ----- \n",
" 0 PassengerId 891 non-null int64 \n",
" 1 Survived 891 non-null int64 \n",
" 2 Pclass 891 non-null int64 \n",
" 3 Name 891 non-null object \n",
" 4 Sex 891 non-null object \n",
" 5 Age 714 non-null float64\n",
" 6 SibSp 891 non-null int64 \n",
" 7 Parch 891 non-null int64 \n",
" 8 Ticket 891 non-null object \n",
" 9 Fare 891 non-null float64\n",
" 10 Cabin 204 non-null object \n",
" 11 Embarked 889 non-null object \n",
"dtypes: float64(2), int64(5), object(5)\n",
"memory usage: 83.7+ KB\n"
]
}
],
"source": [
"# Display information about the DataFrame to compare row counts.\n",
"\n",
"titanic.info()"
]
},
{
"cell_type": "markdown",
"id": "47535c74-e158-4a89-9bc9-a1cea1aa70df",
"metadata": {},
"source": [
"Comparing the two number of rows (889 and 891) tells us, that two rows are missing embarkation data.\n",
"\n",
"As we saw in the exercise on the previous page on [reading and writing tabular data](02_pandas_tabular_data), the two rows missing embarkation data can be found like this:"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "e4a29062-fc16-4519-bde3-ad8f802a0b14",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PassengerId | \n",
" Survived | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
" SibSp | \n",
" Parch | \n",
" Ticket | \n",
" Fare | \n",
" Cabin | \n",
" Embarked | \n",
"
\n",
" \n",
" \n",
" \n",
" 61 | \n",
" 62 | \n",
" 1 | \n",
" 1 | \n",
" Icard, Miss. Amelie | \n",
" female | \n",
" 38.0 | \n",
" 0 | \n",
" 0 | \n",
" 113572 | \n",
" 80.0 | \n",
" B28 | \n",
" NaN | \n",
"
\n",
" \n",
" 829 | \n",
" 830 | \n",
" 1 | \n",
" 1 | \n",
" Stone, Mrs. George Nelson (Martha Evelyn) | \n",
" female | \n",
" 62.0 | \n",
" 0 | \n",
" 0 | \n",
" 113572 | \n",
" 80.0 | \n",
" B28 | \n",
" NaN | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PassengerId Survived Pclass Name \\\n",
"61 62 1 1 Icard, Miss. Amelie \n",
"829 830 1 1 Stone, Mrs. George Nelson (Martha Evelyn) \n",
"\n",
" Sex Age SibSp Parch Ticket Fare Cabin Embarked \n",
"61 female 38.0 0 0 113572 80.0 B28 NaN \n",
"829 female 62.0 0 0 113572 80.0 B28 NaN "
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titanic[titanic['Embarked'].isna()]"
]
},
{
"cell_type": "markdown",
"id": "c50b13ad-3a6a-4162-ac04-fc0aa86da086",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "e3975713-1cbe-4c43-a97a-14c6d6b18554",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Selecting rows and columns\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"id": "55aa6a2d-a5bb-42c9-84fa-ab55160ccf39",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"The true power of `.loc[]` becomes evident when you want to filter rows and simultaneously select specific columns. For example:"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "8fe0feca-1a72-4596-a9a7-d9e30d8fed0f",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Name | \n",
" Pclass | \n",
"
\n",
" \n",
" \n",
" \n",
" 1 | \n",
" Cumings, Mrs. John Bradley (Florence Briggs Th... | \n",
" 1 | \n",
"
\n",
" \n",
" 6 | \n",
" McCarthy, Mr. Timothy J | \n",
" 1 | \n",
"
\n",
" \n",
" 11 | \n",
" Bonnell, Miss. Elizabeth | \n",
" 1 | \n",
"
\n",
" \n",
" 13 | \n",
" Andersson, Mr. Anders Johan | \n",
" 3 | \n",
"
\n",
" \n",
" 15 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" 2 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 865 | \n",
" Bystrom, Mrs. (Karolina) | \n",
" 2 | \n",
"
\n",
" \n",
" 871 | \n",
" Beckwith, Mrs. Richard Leonard (Sallie Monypeny) | \n",
" 1 | \n",
"
\n",
" \n",
" 873 | \n",
" Vander Cruyssen, Mr. Victor | \n",
" 3 | \n",
"
\n",
" \n",
" 879 | \n",
" Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) | \n",
" 1 | \n",
"
\n",
" \n",
" 885 | \n",
" Rice, Mrs. William (Margaret Norton) | \n",
" 3 | \n",
"
\n",
" \n",
"
\n",
"
217 rows × 2 columns
\n",
"
"
],
"text/plain": [
" Name Pclass\n",
"1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n",
"6 McCarthy, Mr. Timothy J 1\n",
"11 Bonnell, Miss. Elizabeth 1\n",
"13 Andersson, Mr. Anders Johan 3\n",
"15 Hewlett, Mrs. (Mary D Kingcome) 2\n",
".. ... ...\n",
"865 Bystrom, Mrs. (Karolina) 2\n",
"871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1\n",
"873 Vander Cruyssen, Mr. Victor 3\n",
"879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1\n",
"885 Rice, Mrs. William (Margaret Norton) 3\n",
"\n",
"[217 rows x 2 columns]"
]
},
"execution_count": 24,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Filter rows where 'Age' is greater than 35 and select only the 'Name' and 'Pclass' columns.\n",
"\n",
"titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]"
]
},
{
"cell_type": "markdown",
"id": "0ee5aef8-30b9-4e07-ad18-d938822d9ad2",
"metadata": {},
"source": [
"### Combining `loc` and `iloc`"
]
},
{
"cell_type": "markdown",
"id": "cc456912-4e0f-4245-856b-06e2d6286456",
"metadata": {},
"source": [
"We can use `iloc` to select:\n",
"\n",
"* Rows from index position 9 to 25 (exclusive).\n",
"* Columns from index position 2 to 6 (exclusive).\n",
"\n",
"Then further filter these rows using `loc` based on the condition where:\n",
"\n",
"* 'Age' is greater than 35."
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "40868771-b1f1-491b-9cdf-3397b731ada7",
"metadata": {
"tags": [
"hide-output"
]
},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" Pclass | \n",
" Name | \n",
" Sex | \n",
" Age | \n",
"
\n",
" \n",
" \n",
" \n",
" 11 | \n",
" 1 | \n",
" Bonnell, Miss. Elizabeth | \n",
" female | \n",
" 58.0 | \n",
"
\n",
" \n",
" 13 | \n",
" 3 | \n",
" Andersson, Mr. Anders Johan | \n",
" male | \n",
" 39.0 | \n",
"
\n",
" \n",
" 15 | \n",
" 2 | \n",
" Hewlett, Mrs. (Mary D Kingcome) | \n",
" female | \n",
" 55.0 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Pclass Name Sex Age\n",
"11 1 Bonnell, Miss. Elizabeth female 58.0\n",
"13 3 Andersson, Mr. Anders Johan male 39.0\n",
"15 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0"
]
},
"execution_count": 25,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"titanic.iloc[9:25, 2:6].loc[titanic['Age'] > 35]"
]
},
{
"cell_type": "markdown",
"id": "d06d4633-9550-4978-a275-0d6920770a9c",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"___"
]
},
{
"cell_type": "markdown",
"id": "4114093e-a6fd-40f0-ac86-d0b2493f9682",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"## Key points\n",
"\n",
"* Access a single column or multiple columns from a `DataFrame` using square bracket notation (`[]`).\n",
"\n",
"* Use label-based indexing (`.loc[]`) and position-based indexing (`.iloc[]`) to select specific rows and columns.\n",
"\n",
"* Slicing with `.loc[]` is inclusive, while slicing with `.iloc[]` is exclusive.\n",
"\n",
"* Filter rows based on conditions and apply additional conditions or select specific columns within the same operation."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "95ec0280-28be-43f5-9c20-5db2f795bf04",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": [
"hide-output"
]
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.8"
}
},
"nbformat": 4,
"nbformat_minor": 5
}