Examination
This examination consists of 4 questions (1 point each) and 4 tasks (24 points each), for a total of 100 points. To pass, you need a score of 65 points or more.
Answer the questions and solve all tasks in the same python file named python_course_examination.py
.
Provide answers to the questions as comments at the top of the file.
Since grading is done anonymously, please do not write your name anywhere in the file.
When you are done, upload the file to Canvas:
# Questions# 1. ...# 2. ...# 3. ...# 4. ...
# %%# Task 1
...
# %%# Task 2
...
# %%# Task 3
...
# %%# Task 4
...
The deadline is Sunday 5/10 24:00, but the exam should not take more than a few hours to get through. Since the code has to be able to run on our computers for us to grade it, please make sure to read the data directly from the URLs provided in the tasks (ie. put the web address instead of the local file path when reading the data).
Questions
Section titled “Questions”- What is the purpose of using virtual environments when using Python for your projects?
- What is the command for starting a new project with
uv
? - What is the command for installing a package with
uv
(eg.polars
)? - What are the main packages for we have learnt about in this course for:
- Data management?
- Visualization?
- Statistical modelling?
Task 1 - Python Basics
Section titled “Task 1 - Python Basics”Write a simple Python program that goes through a list of scores from an exam and determines how many students have passed the course. Create a second function that calculates the mean score. Finally print a message containing the information from your functions nicely formatted.
- Create a function called
analyze_scores(scores, passing_score)
that takes two parameters:scores
: a list of student scorespassing_score
: the minimum score required to pass- Use a for loop (or list comprehension) to iterate through the list of scores
- Count how many students passed (grade >= passing_score)
- Return the count
- Create a function called
get_mean_score(scores)
that takes one parameter:scores
: a list of student scores- Calculate the mean score using the builtin
sum()
andlen()
functions - Return the mean
- Test your functions with the following data:
- student_scores =
[85, 92, 78, 65, 88, 73, 95, 82, 70, 68]
- passing_score =
75
(only foranalyze_scores
)
- student_scores =
- Use an f-string to print the results in the format:
- “
k
out ofn
students passed the exam (p
%), mean score wasm
” k
is the count of students passingn
is the total number of studentsp
is the proportion of students passing in percent with one decimalm
is the mean score with one decimal
- “
Task 2 - Data Management
Section titled “Task 2 - Data Management”- Import the data in the
admissions
anddiagnoses_icd
CSV files using the URLs: - Filter the diagnoses in
diagnoses_icd
to keep only ICD version 10 - Create a new column in
admissions
for the duration of the hospital stay in days - Join the two data sets on
hadm_id
choosing an appropriate join type and the correct validation - Identify the 10 most common ICD codes in the joined data
- Filter the joined data to keep only the 10 most common ICD codes identified
- Calculate the mean and standard deviation for the duration of the hospital stay stratified (grouped by) by ICD code
- Display the stratified mean and standard deviation nicely rounded to one decimal:
┌──────────┬────────────┐│ icd_code ┆ mean_std │╞══════════╪════════════╡│ code ┆ xxx (xxx) ││ ... ┆ ... │└──────────┴────────────┘
Task 3 - Visualization
Section titled “Task 3 - Visualization”- Import the data in the
patients
CSV file using the URL: - Join the
patients
data with theadmissions
data from the previous task (including the new duration of stay column) - Get the year of admission and calculate the age at admission as: age_at_admission = anchor_age + (admission_year - anchor_year)
- Plot a scatter plot with age at admission on the x axis and duration of stay on the y-axis
- Change the color and alpha of the markers
- Set the x and y limit and labels
- Give the plot a title
- Set the location of the major x and y-ticks
- Add a grid underneath the plot
- Save the figure as
figure-1.svg
Task 4 - Modelling / Analysis
Section titled “Task 4 - Modelling / Analysis”- Using the data from the previous task
- Estimate the association between age at admission (independent variable/exposure) and duration of stay (dependent variable/outcome) using ordinary least squares regression (OLS, in
statsmodels
using the formula interface) - Show the summary of the fitted model
- Plot the regression line over the scatter plot from the previous task, you can use:
x = np.linspace(10, 100, 2)
from thenumpy package
, for the x values of the line- The linear combination of the coefficients (from the fitted model) and x, for y values of the line
- Save the updated figure as
figure-2.svg