Pandas Intermediate
Use the Template
to explore some further functionality of Pandas.
Create new cells with # %%
as necessary.
Use the Data Management section and the Pandas Documentation for help.
Template
# %%
# Import Pandas
# %%
# Read data from the CSV file (read_csv):
# https://gitlab.com/alping/python-data-science/-/raw/main/data/external/heart-disease.csv
# %%
# Inspect the data [.info, .describe]
# %%
# Get the number of males and females [.value_counts]
# %%
# Create a 2x2 table for the variables sex and exang (exercise-induced angina)
# %%
# Inspect the age distribution as a histogram [.hist] and
# adjust the number of bins
# %%
# Inspect the age distribution, stratified by sex
# %%
# Keep only observations with chol >200 [.query]
# %%
# Change the sex variable to have the values male/female,
# instead of 1/0 [.assign, .replace]
# %%
# Create a new categorical age variable, binning ages in
# decades (0, 10, 20, ...) [.assign, pd.cut]
# %%
# In a new data variable, using method chaining:
# - Change the sex variable and create the age variable as above, but in one assign statement
# - Rename the column "exang" to "exercise_angina" [rename]
# - Keep only those with age between 18 and 50, inclusive [query]
# - Sort by "chol" [sort_values]