Pandas Intermediate

Use the Template to explore some further functionality of Pandas. Create new cells with # %% as necessary.

Use the Data Management section and the Pandas Documentation for help.

Template

# %%
# Import Pandas


# %%
# Read data from the CSV file (read_csv):
# https://gitlab.com/alping/python-data-science/-/raw/main/data/external/heart-disease.csv


# %%
# Inspect the data [.info, .describe]


# %%
# Get the number of males and females [.value_counts]


# %%
# Create a 2x2 table for the variables sex and exang (exercise-induced angina)


# %%
# Inspect the age distribution as a histogram [.hist] and
# adjust the number of bins


# %%
# Inspect the age distribution, stratified by sex


# %%
# Keep only observations with chol >200 [.query]


# %%
# Change the sex variable to have the values male/female,
# instead of 1/0 [.assign, .replace]


# %%
# Create a new categorical age variable, binning ages in
# decades (0, 10, 20, ...) [.assign, pd.cut]


# %%
# In a new data variable, using method chaining:
# - Change the sex variable and create the age variable as above, but in one assign statement
# - Rename the column "exang" to "exercise_angina" [rename]
# - Keep only those with age between 18 and 50, inclusive [query]
# - Sort by "chol" [sort_values]

ExercisesPandas Basics

ExercisesMatplotlib Basics