{ "cells": [ { "cell_type": "markdown", "id": "513cfe15-50b1-4385-a0a9-16b337d970bc", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "# Subsets" ] }, { "cell_type": "markdown", "id": "f9ddee47-e8ab-4f68-b283-2e6c71b6aa52", "metadata": {}, "source": [ "```{admonition} Learning Objectives\n", "\n", "Questions:\n", "* How can we select and filter specific columns and rows in a `DataFrame`?\n", "\n", "Objectives:\n", "* Select specific columns from a `DataFrame` using square bracket notation (`[]`).\n", "* Use the `loc` property for label-based indexing in a `DataFrame`.\n", "* Use the `iloc` property for position-based indexing in a `DataFrame`.\n", "* Filter rows based on the presence or absence of missing values.\n", "* Understand the differences and complementary uses of `.loc[]` and `.iloc[]`. " ] }, { "cell_type": "markdown", "id": "b90cc6fd-2f57-4f94-bff7-cf6b0aace5f5", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Import Pandas" ] }, { "cell_type": "markdown", "id": "75fa3e37-d901-4cdc-8822-5b6b0102ce4e", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "remove-cell" ] }, "source": [ "Aliasing pandas as `pd` is a widely adopted convention that simplifies the syntax for accessing its functionalities.\\\n", "After this statement, you can use `pd` to access all the functionalities provided by the pandas library." ] }, { "cell_type": "code", "execution_count": 1, "id": "765b4040-4b8a-420d-bad5-219654bc6174", "metadata": { "tags": [ "hide-output" ] }, "outputs": [], "source": [ "# This line imports the pandas library and aliases it as 'pd'.\n", "\n", "import pandas as pd" ] }, { "cell_type": "markdown", "id": "4a72659e-005b-4390-a8fe-4dd61b356945", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "b4f7a487-146c-4573-904e-45601338891f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Creating a `DataFrame` from a CSV file" ] }, { "cell_type": "markdown", "id": "f1e434ec-fdb0-4c8a-89a4-7281de89308f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The `pd.read_csv()` function is used to read the data from the file 'data/titanic.csv'." ] }, { "cell_type": "code", "execution_count": 2, "id": "fe1c7a91-0caf-4eae-bf55-0938bcdaa851", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [], "source": [ "# Load the Titanic dataset from a CSV file into a DataFrame named 'titanic'.\n", "\n", "titanic = pd.read_csv('data/titanic.csv')" ] }, { "cell_type": "code", "execution_count": 3, "id": "9d5a2159-bb3b-4a20-94be-21ea37755071", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Display the DataFrame 'titanic'.\n", "\n", "titanic" ] }, { "cell_type": "markdown", "id": "13331c76-ef8d-4881-9229-b6dbf6a34447", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "688e5d6e-f20e-41ac-b44f-aaea410cd399", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Selecting specific columns\n", "\n", "![Selecting specific columns from a DataFrame](images/03_subset_columns.svg)" ] }, { "cell_type": "markdown", "id": "13a500a8-9996-4478-8b37-496110b587d1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Selecting specific columns is useful when you want to work with specific subsets of the dataset e.g. you might want to perform statistical calculations based on only the colum `Age` or later on when you want to access and manipulate your data in more complex ways. \n", "\n", "Access the 'Age' column from the `DataFrame` 'titanic' to return a `Series` object containing all the data in the 'Age' column:" ] }, { "cell_type": "code", "execution_count": 4, "id": "6752e1f0-6227-4128-b2f5-323c429ffe5c", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "0 22.0\n", "1 38.0\n", "2 26.0\n", "3 35.0\n", "4 35.0\n", " ... \n", "886 27.0\n", "887 19.0\n", "888 NaN\n", "889 26.0\n", "890 32.0\n", "Name: Age, Length: 891, dtype: float64" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic['Age']" ] }, { "cell_type": "code", "execution_count": 5, "id": "0e3d6cdf-0b96-489c-9271-b0dcf887682d", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "pandas.core.series.Series" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Check the type of the 'Age' column in 'titanic' using the 'type()' function.\n", "\n", "type(titanic['Age'])" ] }, { "cell_type": "markdown", "id": "f2c9d264-6190-49dd-96c5-ea5ffb177a41", "metadata": {}, "source": [ "Use the `shape` attribute to determine the dimensions of the `Series`.\\\n", "It returns a tuple representing the number of rows and columns (rows, columns)." ] }, { "cell_type": "code", "execution_count": 6, "id": "ddcdfad9-cfb3-413c-bf2a-f2b485ec0d50", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "(891,)" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic['Age'].shape" ] }, { "cell_type": "markdown", "id": "1b48c4ac-45cf-449b-bcf1-da1ef4408e41", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "54af87e3-2151-4b56-9cca-6fb72cada392", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Calling multiple `Series`" ] }, { "cell_type": "markdown", "id": "14237564-b732-47a6-a52d-c2af0c00694a", "metadata": {}, "source": [ "Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets:" ] }, { "cell_type": "code", "execution_count": 7, "id": "dc8709c5-0b04-4869-8668-cc5144571fd5", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeSex
022.0male
138.0female
226.0female
335.0female
435.0male
.........
88627.0male
88719.0female
888NaNfemale
88926.0male
89032.0male
\n", "

891 rows × 2 columns

\n", "
" ], "text/plain": [ " Age Sex\n", "0 22.0 male\n", "1 38.0 female\n", "2 26.0 female\n", "3 35.0 female\n", "4 35.0 male\n", ".. ... ...\n", "886 27.0 male\n", "887 19.0 female\n", "888 NaN female\n", "889 26.0 male\n", "890 32.0 male\n", "\n", "[891 rows x 2 columns]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the columns 'Age' and 'Sex' from the 'titanic' DataFrame.\n", "\n", "titanic[['Age', 'Sex']]" ] }, { "cell_type": "markdown", "id": "2f8e7d7d-4ae1-4542-8caf-31787d7d8081", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "```{note}\n", "The inner square brackets define a Python `list` with column names, whereas the outer square brackets are used to select the data from a pandas `DataFrame` as seen in the previous example." ] }, { "cell_type": "markdown", "id": "8190374b-8681-48c0-a0cc-028dd9e1f936", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets, and then apply the `type()` function to the resulting `DataFrame` subset:" ] }, { "cell_type": "code", "execution_count": 8, "id": "41856f83-9f98-4b7a-9800-4acf1c222588", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "pandas.core.frame.DataFrame" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use the 'type()' function to determine the data type of the DataFrame subset.\n", "\n", "type(titanic[['Age', 'Sex']])" ] }, { "cell_type": "markdown", "id": "e6c073f3-f58f-462c-98f1-d027cb0dfc41", "metadata": {}, "source": [ "Select columns 'Age' and 'Sex' from the 'titanic' `DataFrame` using double square brackets, and then apply the `shape` attribute to the resulting `DataFrame` subset:" ] }, { "cell_type": "code", "execution_count": 9, "id": "b0388ea7-0e3e-4ebe-b9e7-dad3e8271928", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/plain": [ "(891, 2)" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Use the 'shape' attribute to determine the dimensions of the DataFrame subset.\n", "\n", "titanic[['Age', 'Sex']].shape" ] }, { "cell_type": "markdown", "id": "6cb6bcc8-8a2a-428f-9102-394ee6c333d3", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "6035c0e3-bd8f-4b93-bb6a-65ed95db22ea", "metadata": {}, "source": [ "## Introducing `.loc[]` and `.iloc[]`" ] }, { "cell_type": "markdown", "id": "975963cb-241a-4036-b90e-c3c1290511a4", "metadata": {}, "source": [ "Now that you’ve learned how to select specific columns from a `DataFrame`, it’s time to explore more advanced and flexible ways to access and manipulate your data. As you start filtering rows based on conditions or need to select both rows and columns simultaneously, using `.loc[]` and `.iloc[]` will give you greater control and clarity in your code.\n", "\n", "These powerful tools allow you to:\n", "\n", "* Safely and explicitly select rows and columns by labels with `.loc[]`.\n", "* Access rows and columns by their integer positions using `.iloc[]`.\n", "* Avoid potential issues that can arise with more complex operations, ensuring your data manipulations are clear, consistent, and reliable.\n", "\n", "With these methods, you’ll be better equipped to handle more complex data tasks efficiently." ] }, { "cell_type": "markdown", "id": "b4d65d36-9d60-41a2-adcf-ba6ac1c5e2e3", "metadata": {}, "source": [ "```{admonition} What exactly are loc and iloc?\n", ":class: tip dropdown\n", "\n", "`loc` and `iloc` in Pandas are technically properties that return a specialized indexer object. These indexer objects allow for label-based (`loc`) and integer-based (`iloc`) indexing. Although they behave similarly to methods in that you use them to perform actions (i.e., selecting or slicing data), they are properties because they return an indexing object rather than directly executing a function.\n", "\n", "When you access `loc` or `iloc` on a `DataFrame`, what you are getting is not a simple value but an object that has its own methods and behaviors, specifically designed to handle indexing operations. This design allows for the concise and powerful data selection syntax that Pandas is known for.\n", "```" ] }, { "cell_type": "markdown", "id": "e0f2bd5c-8ea4-4fd9-9d4a-2c527ab072ea", "metadata": {}, "source": [ "```{admonition} Avoid ambiguous operations\n", ":class: warning dropdown\n", "\n", "Some of the operations we’ll be covering can technically be performed without using `.loc[]` or `.iloc[]`. For instance, you might see data selections or modifications done with simple square brackets like `df[df['column'] > value]`. While this can work for basic tasks, it can lead to issues in more complex scenarios, such as unexpected behavior, ambiguity, or performance inefficiencies.\n", "\n", "**Why is this a bad idea?**\n", "\n", "- **Ambiguity**: It’s not always clear if you're modifying a copy of your data or the original `DataFrame`, which can lead to unintended side effects.\n", "- **Potential errors**: Using square brackets for complex selections might trigger warnings like `SettingWithCopyWarning`, indicating that your code might not be doing what you think it is.\n", "- **Lack of flexibility**: As your data operations become more complex, relying on square brackets can limit your ability to write clear and maintainable code.\n", "\n", "For these reasons, we recommend using `.loc[]` and `.iloc[]` as they provide a more reliable and explicit way to handle data selection and modification in Pandas.\n", "\n", "If you’re interested in learning more about the potential pitfalls of using square brackets and the benefits of `.loc[]` and `.iloc[]`, you can read more in the [Pandas documentation on indexing and selecting data](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html).\n", "```" ] }, { "cell_type": "markdown", "id": "99663d1a-3f33-44fa-9387-1e9b1766748d", "metadata": {}, "source": [ "___" ] }, { "cell_type": "markdown", "id": "961bdfdf-11b0-474a-bd74-4500fad73448", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Filtering rows based on conditional expressions\n", "\n", "![Filtering specific rows from a DataFrame](images/03_subset_rows.svg)" ] }, { "cell_type": "markdown", "id": "7ece850f-2344-49d6-8dfb-6fed8c58f18c", "metadata": {}, "source": [ "With `.loc[]` and `.iloc[]`, you have more control and flexibility when filtering rows and selecting columns in a `DataFrame`.\n", "\n", "Let’s start by exploring how you can use these tools to filter rows based on conditions." ] }, { "cell_type": "markdown", "id": "903513f6-4ba8-4a19-850f-918b5dd97b21", "metadata": {}, "source": [ "### Filtering specific rows with `.loc[]`\n", "\n", "To filter rows based on a condition, you can place the condition inside the `.loc[]` selector.\n", "\n", "Here’s how you can filter rows where the `Age` column is greater than 35:" ] }, { "cell_type": "code", "execution_count": 12, "id": "60ff0a11-3cbe-48e0-8ee0-28fd8d492d45", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
6701McCarthy, Mr. Timothy Jmale54.0001746351.8625E46S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
131403Andersson, Mr. Anders Johanmale39.01534708231.2750NaNS
151612Hewlett, Mrs. (Mary D Kingcome)female55.00024870616.0000NaNS
.......................................
86586612Bystrom, Mrs. (Karolina)female42.00023685213.0000NaNS
87187211Beckwith, Mrs. Richard Leonard (Sallie Monypeny)female47.0111175152.5542D35S
87387403Vander Cruyssen, Mr. Victormale47.0003457659.0000NaNS
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
\n", "

217 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "1 2 1 1 \n", "6 7 0 1 \n", "11 12 1 1 \n", "13 14 0 3 \n", "15 16 1 2 \n", ".. ... ... ... \n", "865 866 1 2 \n", "871 872 1 1 \n", "873 874 0 3 \n", "879 880 1 1 \n", "885 886 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "6 McCarthy, Mr. Timothy J male 54.0 0 \n", "11 Bonnell, Miss. Elizabeth female 58.0 0 \n", "13 Andersson, Mr. Anders Johan male 39.0 1 \n", "15 Hewlett, Mrs. (Mary D Kingcome) female 55.0 0 \n", ".. ... ... ... ... \n", "865 Bystrom, Mrs. (Karolina) female 42.0 0 \n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) female 47.0 1 \n", "873 Vander Cruyssen, Mr. Victor male 47.0 0 \n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) female 56.0 0 \n", "885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "1 0 PC 17599 71.2833 C85 C \n", "6 0 17463 51.8625 E46 S \n", "11 0 113783 26.5500 C103 S \n", "13 5 347082 31.2750 NaN S \n", "15 0 248706 16.0000 NaN S \n", ".. ... ... ... ... ... \n", "865 0 236852 13.0000 NaN S \n", "871 1 11751 52.5542 D35 S \n", "873 0 345765 9.0000 NaN S \n", "879 1 11767 83.1583 C50 C \n", "885 5 382652 29.1250 NaN Q \n", "\n", "[217 rows x 12 columns]" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows in the 'titanic' DataFrame where the 'Age' column is greater than 35.\n", "\n", "titanic.loc[titanic['Age'] > 35]" ] }, { "cell_type": "markdown", "id": "38ac84cf-9f52-4c01-bca9-e46fb87d66c1", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "This command filters the rows where the condition `titanic['Age'] > 35` is `True`, and returns the entire `DataFrame` for those rows.\n", "\n", "If you only want to select specific columns (e.g., `Name` and `Pclass`) for those filtered rows, you can add the column names after the comma:" ] }, { "cell_type": "code", "execution_count": 13, "id": "13936ecd-9c28-45bd-9edb-e4727a81d081", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePclass
1Cumings, Mrs. John Bradley (Florence Briggs Th...1
6McCarthy, Mr. Timothy J1
11Bonnell, Miss. Elizabeth1
13Andersson, Mr. Anders Johan3
15Hewlett, Mrs. (Mary D Kingcome)2
.........
865Bystrom, Mrs. (Karolina)2
871Beckwith, Mrs. Richard Leonard (Sallie Monypeny)1
873Vander Cruyssen, Mr. Victor3
879Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)1
885Rice, Mrs. William (Margaret Norton)3
\n", "

217 rows × 2 columns

\n", "
" ], "text/plain": [ " Name Pclass\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n", "6 McCarthy, Mr. Timothy J 1\n", "11 Bonnell, Miss. Elizabeth 1\n", "13 Andersson, Mr. Anders Johan 3\n", "15 Hewlett, Mrs. (Mary D Kingcome) 2\n", ".. ... ...\n", "865 Bystrom, Mrs. (Karolina) 2\n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1\n", "873 Vander Cruyssen, Mr. Victor 3\n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1\n", "885 Rice, Mrs. William (Margaret Norton) 3\n", "\n", "[217 rows x 2 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows where 'Age' is greater than 35 and select only the 'Name' and 'Pclass' columns.\n", "\n", "titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]" ] }, { "cell_type": "markdown", "id": "346705f6-3f5a-4f50-a4c2-23e05aa9f353", "metadata": {}, "source": [ "```{admonition} When using loc/iloc\n", ":class: tip\n", "\n", "The part *before the comma* is the *rows* you want to select.\n", "\n", "The part *after the comma* is the *columns* you want to select.\n", "```" ] }, { "cell_type": "markdown", "id": "39b84234-a6f9-48c8-82a6-e9515d0afa7e", "metadata": {}, "source": [ "### Using `.iloc[]` for positional indexing\n", "\n", "The `.iloc[]` method works similarly to `.loc[]`, but it selects rows and columns based on their integer positions rather than labels.\n", "\n", "For example, to select the first 5 rows and the first 2 columns, you can use:" ] }, { "cell_type": "code", "execution_count": 14, "id": "7dad5185-69b3-41ce-9de0-626ea02ef6b4", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvived
010
121
231
341
450
\n", "
" ], "text/plain": [ " PassengerId Survived\n", "0 1 0\n", "1 2 1\n", "2 3 1\n", "3 4 1\n", "4 5 0" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select the first 5 rows and the first 2 columns using .iloc[].\n", "\n", "titanic.iloc[:5, :2]" ] }, { "cell_type": "markdown", "id": "c3762ceb-ae60-4877-a117-03260cbe3c7a", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "### Inclusive vs. Exclusive Slicing\n", "\n", "When slicing with `.loc[]`, the slicing is **inclusive** of the start and end labels. For example:" ] }, { "cell_type": "code", "execution_count": 15, "id": "f1d37a3e-c4c1-4687-99b1-6366fb1b83bb", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePclass
0Braund, Mr. Owen Harris3
1Cumings, Mrs. John Bradley (Florence Briggs Th...1
2Heikkinen, Miss. Laina3
3Futrelle, Mrs. Jacques Heath (Lily May Peel)1
\n", "
" ], "text/plain": [ " Name Pclass\n", "0 Braund, Mr. Owen Harris 3\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n", "2 Heikkinen, Miss. Laina 3\n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 1" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select rows 0 through 3 (inclusive) and columns 'Name' and 'Pclass'.\n", "\n", "titanic.loc[0:3, ['Name', 'Pclass']]" ] }, { "cell_type": "markdown", "id": "47842350-bee8-4683-91f0-da8905771656", "metadata": {}, "source": [ "This command returns rows 0, 1, 2, and 3, including both the start and end indices.\n", "\n", "When slicing with `.iloc[]`, the slicing is **exclusive** of the end position:" ] }, { "cell_type": "code", "execution_count": 18, "id": "319db576-5646-4549-83dc-67d4c1b9c2c7", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvived
010
121
231
\n", "
" ], "text/plain": [ " PassengerId Survived\n", "0 1 0\n", "1 2 1\n", "2 3 1" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Select rows 0 through 2 (exclusive of 3) and the first two columns using .iloc[].\n", "\n", "titanic.iloc[0:3, 0:2]" ] }, { "cell_type": "markdown", "id": "e0be5523-1168-4a3d-be4c-0cea81a4a2cc", "metadata": {}, "source": [ "### Using `.loc[]` and `.iloc[]` for complex conditions" ] }, { "cell_type": "markdown", "id": "b48d6163-81a7-4bea-9fa8-cbdab2288cb5", "metadata": {}, "source": [ "You can use both `.loc[]` and `.iloc[]` to filter rows based on more complex conditions, combining multiple criteria with logical operators.\n", "\n", "For instance, to filter rows where `Pclass` is either 2 or 3 using `.loc[]`:" ] }, { "cell_type": "code", "execution_count": 27, "id": "0e5b553f-8987-4f9a-90af-30a3ac194a88", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
.......................................
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

675 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "0 1 0 3 Braund, Mr. Owen Harris \n", "2 3 1 3 Heikkinen, Miss. Laina \n", "4 5 0 3 Allen, Mr. William Henry \n", "5 6 0 3 Moran, Mr. James \n", "7 8 0 3 Palsson, Master. Gosta Leonard \n", ".. ... ... ... ... \n", "884 885 0 3 Sutehall, Mr. Henry Jr \n", "885 886 0 3 Rice, Mrs. William (Margaret Norton) \n", "886 887 0 2 Montvila, Rev. Juozas \n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "890 891 0 3 Dooley, Mr. Patrick \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "0 male 22.0 1 0 A/5 21171 7.2500 NaN S \n", "2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 male 35.0 0 0 373450 8.0500 NaN S \n", "5 male NaN 0 0 330877 8.4583 NaN Q \n", "7 male 2.0 3 1 349909 21.0750 NaN S \n", ".. ... ... ... ... ... ... ... ... \n", "884 male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S \n", "885 female 39.0 0 5 382652 29.1250 NaN Q \n", "886 male 27.0 0 0 211536 13.0000 NaN S \n", "888 female NaN 1 2 W./C. 6607 23.4500 NaN S \n", "890 male 32.0 0 0 370376 7.7500 NaN Q \n", "\n", "[675 rows x 12 columns]" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows where 'Pclass' is either 2 or 3 using .loc[].\n", "\n", "titanic.loc[titanic['Pclass'].isin([2, 3])]" ] }, { "cell_type": "markdown", "id": "3db35580-9be7-4c17-9d1d-975da8faab2f", "metadata": {}, "source": [ "Similarly, if you want to perform position-based filtering using `.iloc[]`, you would first need to identify the positions of the relevant rows and columns. However, `.iloc[]` is typically less useful for condition-based filtering because it requires knowledge of the exact row and column positions rather than labels." ] }, { "cell_type": "markdown", "id": "5b2120f5-0c03-41a4-8b98-4fbfb43ba993", "metadata": {}, "source": [ "You can also combine multiple conditions using logical operators like `&` (AND) and `|` (OR):" ] }, { "cell_type": "code", "execution_count": 26, "id": "a6ec973c-37e0-4aa7-bd61-ad4ef508206b", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePclass
13Andersson, Mr. Anders Johan3
15Hewlett, Mrs. (Mary D Kingcome)2
25Asplund, Mrs. Carl Oscar (Selma Augusta Emilia...3
33Wheadon, Mr. Edward H2
40Ahlin, Mrs. Johan (Johanna Persdotter Larsson)3
.........
854Carter, Mrs. Ernest Courtenay (Lilian Hughes)2
860Hansen, Mr. Claus Peter3
865Bystrom, Mrs. (Karolina)2
873Vander Cruyssen, Mr. Victor3
885Rice, Mrs. William (Margaret Norton)3
\n", "

113 rows × 2 columns

\n", "
" ], "text/plain": [ " Name Pclass\n", "13 Andersson, Mr. Anders Johan 3\n", "15 Hewlett, Mrs. (Mary D Kingcome) 2\n", "25 Asplund, Mrs. Carl Oscar (Selma Augusta Emilia... 3\n", "33 Wheadon, Mr. Edward H 2\n", "40 Ahlin, Mrs. Johan (Johanna Persdotter Larsson) 3\n", ".. ... ...\n", "854 Carter, Mrs. Ernest Courtenay (Lilian Hughes) 2\n", "860 Hansen, Mr. Claus Peter 3\n", "865 Bystrom, Mrs. (Karolina) 2\n", "873 Vander Cruyssen, Mr. Victor 3\n", "885 Rice, Mrs. William (Margaret Norton) 3\n", "\n", "[113 rows x 2 columns]" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows where ('Pclass' is either 2 or 3) AND ('Age' is greater than 35),\n", "# and then select the 'Name' and 'Pclass' columns.\n", "\n", "titanic.loc[((titanic['Pclass'] == 2) | (titanic['Pclass'] == 3)) & (titanic['Age'] > 35), ['Name', 'Pclass']]" ] }, { "cell_type": "markdown", "id": "da9cb04b-4246-4c5f-be72-bf8f85fa0a5b", "metadata": {}, "source": [ "### Handling missing values with `.loc[]` and `.iloc[]`" ] }, { "cell_type": "markdown", "id": "406599e3-1a91-4e45-b609-cf6f3e1f7cc0", "metadata": {}, "source": [ "Both `.loc[]` and `.iloc[]` allow you to filter rows based on the presence or absence of missing values.\\\n", "(However, `.loc[]` is generally more convenient when working with labeled data, while `.iloc[]` requires knowing the exact positions.)\n", "\n", "For example, to filter rows where the `Embarked` column is not null using `.loc[]`:" ] }, { "cell_type": "code", "execution_count": 22, "id": "6ac6403c-8cf1-4161-bd7c-13607d4dfe90", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

889 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[889 rows x 12 columns]" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows where 'Embarked' is not null using .loc[].\n", "\n", "titanic.loc[titanic['Embarked'].notna()]" ] }, { "cell_type": "markdown", "id": "af3013f4-ed33-4052-b5eb-c9c52d7156da", "metadata": {}, "source": [ "You can then compare the number of extracted rows with the original DataFrame to identify how many rows have missing data:" ] }, { "cell_type": "code", "execution_count": 23, "id": "10cd8859-5c33-45c9-b327-3708c4e0b05d", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 891 entries, 0 to 890\n", "Data columns (total 12 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 PassengerId 891 non-null int64 \n", " 1 Survived 891 non-null int64 \n", " 2 Pclass 891 non-null int64 \n", " 3 Name 891 non-null object \n", " 4 Sex 891 non-null object \n", " 5 Age 714 non-null float64\n", " 6 SibSp 891 non-null int64 \n", " 7 Parch 891 non-null int64 \n", " 8 Ticket 891 non-null object \n", " 9 Fare 891 non-null float64\n", " 10 Cabin 204 non-null object \n", " 11 Embarked 889 non-null object \n", "dtypes: float64(2), int64(5), object(5)\n", "memory usage: 83.7+ KB\n" ] } ], "source": [ "# Display information about the DataFrame to compare row counts.\n", "\n", "titanic.info()" ] }, { "cell_type": "markdown", "id": "47535c74-e158-4a89-9bc9-a1cea1aa70df", "metadata": {}, "source": [ "Comparing the two number of rows (889 and 891) tells us, that two rows are missing embarkation data.\n", "\n", "As we saw in the exercise on the previous page on [reading and writing tabular data](02_pandas_tabular_data), the two rows missing embarkation data can be found like this:" ] }, { "cell_type": "code", "execution_count": 28, "id": "e4a29062-fc16-4519-bde3-ad8f802a0b14", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
616211Icard, Miss. Ameliefemale38.00011357280.0B28NaN
82983011Stone, Mrs. George Nelson (Martha Evelyn)female62.00011357280.0B28NaN
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "61 62 1 1 Icard, Miss. Amelie \n", "829 830 1 1 Stone, Mrs. George Nelson (Martha Evelyn) \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "61 female 38.0 0 0 113572 80.0 B28 NaN \n", "829 female 62.0 0 0 113572 80.0 B28 NaN " ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic[titanic['Embarked'].isna()]" ] }, { "cell_type": "markdown", "id": "c50b13ad-3a6a-4162-ac04-fc0aa86da086", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "e3975713-1cbe-4c43-a97a-14c6d6b18554", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Selecting rows and columns\n", "\n", "![Selecting specific rows and columns from a DataFrame](images/03_subset_columns_rows.svg)" ] }, { "cell_type": "markdown", "id": "55aa6a2d-a5bb-42c9-84fa-ab55160ccf39", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "The true power of `.loc[]` becomes evident when you want to filter rows and simultaneously select specific columns. For example:" ] }, { "cell_type": "code", "execution_count": 24, "id": "8fe0feca-1a72-4596-a9a7-d9e30d8fed0f", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NamePclass
1Cumings, Mrs. John Bradley (Florence Briggs Th...1
6McCarthy, Mr. Timothy J1
11Bonnell, Miss. Elizabeth1
13Andersson, Mr. Anders Johan3
15Hewlett, Mrs. (Mary D Kingcome)2
.........
865Bystrom, Mrs. (Karolina)2
871Beckwith, Mrs. Richard Leonard (Sallie Monypeny)1
873Vander Cruyssen, Mr. Victor3
879Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)1
885Rice, Mrs. William (Margaret Norton)3
\n", "

217 rows × 2 columns

\n", "
" ], "text/plain": [ " Name Pclass\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 1\n", "6 McCarthy, Mr. Timothy J 1\n", "11 Bonnell, Miss. Elizabeth 1\n", "13 Andersson, Mr. Anders Johan 3\n", "15 Hewlett, Mrs. (Mary D Kingcome) 2\n", ".. ... ...\n", "865 Bystrom, Mrs. (Karolina) 2\n", "871 Beckwith, Mrs. Richard Leonard (Sallie Monypeny) 1\n", "873 Vander Cruyssen, Mr. Victor 3\n", "879 Potter, Mrs. Thomas Jr (Lily Alexenia Wilson) 1\n", "885 Rice, Mrs. William (Margaret Norton) 3\n", "\n", "[217 rows x 2 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Filter rows where 'Age' is greater than 35 and select only the 'Name' and 'Pclass' columns.\n", "\n", "titanic.loc[titanic['Age'] > 35, ['Name', 'Pclass']]" ] }, { "cell_type": "markdown", "id": "0ee5aef8-30b9-4e07-ad18-d938822d9ad2", "metadata": {}, "source": [ "### Combining `loc` and `iloc`" ] }, { "cell_type": "markdown", "id": "cc456912-4e0f-4245-856b-06e2d6286456", "metadata": {}, "source": [ "We can use `iloc` to select:\n", "\n", "* Rows from index position 9 to 25 (exclusive).\n", "* Columns from index position 2 to 6 (exclusive).\n", "\n", "Then further filter these rows using `loc` based on the condition where:\n", "\n", "* 'Age' is greater than 35." ] }, { "cell_type": "code", "execution_count": 25, "id": "40868771-b1f1-491b-9cdf-3397b731ada7", "metadata": { "tags": [ "hide-output" ] }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PclassNameSexAge
111Bonnell, Miss. Elizabethfemale58.0
133Andersson, Mr. Anders Johanmale39.0
152Hewlett, Mrs. (Mary D Kingcome)female55.0
\n", "
" ], "text/plain": [ " Pclass Name Sex Age\n", "11 1 Bonnell, Miss. Elizabeth female 58.0\n", "13 3 Andersson, Mr. Anders Johan male 39.0\n", "15 2 Hewlett, Mrs. (Mary D Kingcome) female 55.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "titanic.iloc[9:25, 2:6].loc[titanic['Age'] > 35]" ] }, { "cell_type": "markdown", "id": "d06d4633-9550-4978-a275-0d6920770a9c", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "___" ] }, { "cell_type": "markdown", "id": "4114093e-a6fd-40f0-ac86-d0b2493f9682", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Key points\n", "\n", "* Access a single column or multiple columns from a `DataFrame` using square bracket notation (`[]`).\n", "\n", "* Use label-based indexing (`.loc[]`) and position-based indexing (`.iloc[]`) to select specific rows and columns.\n", "\n", "* Slicing with `.loc[]` is inclusive, while slicing with `.iloc[]` is exclusive.\n", "\n", "* Filter rows based on conditions and apply additional conditions or select specific columns within the same operation." ] }, { "cell_type": "code", "execution_count": null, "id": "95ec0280-28be-43f5-9c20-5db2f795bf04", "metadata": { "editable": true, "slideshow": { "slide_type": "" }, "tags": [ "hide-output" ] }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.8" } }, "nbformat": 4, "nbformat_minor": 5 }