Skip to content

Modelling / Analysis

Try out the basics of modelling and analysis with statsmodels, lifelines, and scikit-learn.


p-hacking is the misuse of statistics to find patterns in data without accounting for the risk of false positives. This is of course not something that we should be doing in our research, so letโ€™s try it out now that weโ€™re on a course and no harm can be done. Explore the lab events data and try to find interesting association between lab values. First plot the scatter plot between the values, then fit a model (OLS or ML), and finally plot the regression line over the scatter plot. You can also try to model non-linear relationships by adding polynomial terms. Happy hunting!

# %%
# Import packages (copy this from the solution)
# %%
# Convenience function (copy this from the solution)
# %%
# Data management (copy this from the solution)
# %%
# Assess what lab values are the most common (copy this from the solution)
# %%
# Replace MCH and MCV with other lab values (copy this from the solution)
# %%
# --- START HERE ---
# Assess the association with a scatter plot, change alpha as needed
# %%
# Fit a linear regression using OLS and inspect the summary
# %%
# Extract the coefficients with confidence intervals
# %%
# Plot the regression line