Skip to content

Examination

This examination consists of 4 questions (1 point each) and 4 tasks (24 points each), for a total of 100 points. To pass, you need a score of 65 points or more.

Answer the questions and solve all tasks in the same python file named python_course_examination.py. Provide answers to the questions as comments at the top of the file. Since grading is done anonymously, please do not write your name anywhere in the file. When you are done, upload the file to Canvas:

# Questions
# 1. ...
# 2. ...
# 3. ...
# 4. ...
# %%
# Task 1
...
# %%
# Task 2
...
# %%
# Task 3
...
# %%
# Task 4
...

The deadline is Sunday 5/10 24:00, but the exam should not take more than a few hours to get through. Since the code has to be able to run on our computers for us to grade it, please make sure to read the data directly from the URLs provided in the tasks (ie. put the web address instead of the local file path when reading the data).


  1. What is the purpose of using virtual environments when using Python for your projects?
  2. What is the command for starting a new project with uv?
  3. What is the command for installing a package with uv (eg. polars)?
  4. What are the main packages for we have learnt about in this course for:
    • Data management?
    • Visualization?
    • Statistical modelling?

Write a simple Python program that goes through a list of scores from an exam and determines how many students have passed the course. Create a second function that calculates the mean score. Finally print a message containing the information from your functions nicely formatted.

  • Create a function called analyze_scores(scores, passing_score) that takes two parameters:
    • scores: a list of student scores
    • passing_score: the minimum score required to pass
    • Use a for loop (or list comprehension) to iterate through the list of scores
    • Count how many students passed (grade >= passing_score)
    • Return the count
  • Create a function called get_mean_score(scores) that takes one parameter:
    • scores: a list of student scores
    • Calculate the mean score using the builtin sum() and len() functions
    • Return the mean
  • Test your functions with the following data:
    • student_scores = [85, 92, 78, 65, 88, 73, 95, 82, 70, 68]
    • passing_score = 75 (only for analyze_scores)
  • Use an f-string to print the results in the format:
    • k out of n students passed the exam (p%), mean score was m
    • k is the count of students passing
    • n is the total number of students
    • p is the proportion of students passing in percent with one decimal
    • m is the mean score with one decimal
  • Import the data in the admissions and diagnoses_icd CSV files using the URLs:
  • Filter the diagnoses in diagnoses_icd to keep only ICD version 10
  • Create a new column in admissions for the duration of the hospital stay in days
  • Join the two data sets on hadm_id choosing an appropriate join type and the correct validation
  • Identify the 10 most common ICD codes in the joined data
  • Filter the joined data to keep only the 10 most common ICD codes identified
  • Calculate the mean and standard deviation for the duration of the hospital stay stratified (grouped by) by ICD code
  • Display the stratified mean and standard deviation nicely rounded to one decimal:
    ┌──────────┬────────────┐
    │ icd_code ┆ mean_std │
    ╞══════════╪════════════╡
    │ code ┆ xxx (xxx) │
    │ ... ┆ ... │
    └──────────┴────────────┘
  • Import the data in the patients CSV file using the URL:
  • Join the patients data with the admissions data from the previous task (including the new duration of stay column)
  • Get the year of admission and calculate the age at admission as: age_at_admission = anchor_age + (admission_year - anchor_year)
  • Plot a scatter plot with age at admission on the x axis and duration of stay on the y-axis
  • Change the color and alpha of the markers
  • Set the x and y limit and labels
  • Give the plot a title
  • Set the location of the major x and y-ticks
  • Add a grid underneath the plot
  • Save the figure as figure-1.svg
  • Using the data from the previous task
  • Estimate the association between age at admission (independent variable/exposure) and duration of stay (dependent variable/outcome) using ordinary least squares regression (OLS, in statsmodels using the formula interface)
  • Show the summary of the fitted model
  • Plot the regression line over the scatter plot from the previous task, you can use:
    • x = np.linspace(10, 100, 2) from the numpy package, for the x values of the line
    • The linear combination of the coefficients (from the fitted model) and x, for y values of the line
  • Save the updated figure as figure-2.svg