Getting Started with Data Analysis#
Pandas is a library for Python, which is tailored to enable coders to handle tabular data (and is used widely in conjunction with other Python libraries including the machine-learning package, SciKitLearn, and the mathematics-oriented Numpy). Pandas’ tabular data take the form of DataFrames: Python’s version of an Excel spreadsheet. This object and the various methods associated with it will be the central focus.
Building on the knowledge you have accumulated in the introductory courses, Getting Started with Data Analysis will teach you how to import, analyse, and visualise data in Pandas. For those that are interested and have some experience coding in Python, check out our Python for Absolute Beginners Part 1 and Part 2 materials. If these seem manageable, please feel free to hop right into this class.
Download the Titanic data#
For much of this course, we will be working with the Titanic dataset. The full dataset can be found on Kaggle, but we will only be using the training data, which we have made available here.
To follow the exercises and use the code exactly as written below, you will need to download the data and place it in a folder called ‘data’ in the same directory as your Jupyter notebook. In other words, the folder where your notebook is located should contain a subfolder named ‘data’, which holds the file ‘titanic.csv’.