Best Practices

Doing data analysis with Python is pretty straight forward, but there are some best practices to keep in mind.

Virtual Environments

Virtual environments are a way to separate the Python executable and packages between projects and the default version available on some operating systems. This is important as there might be breaking changes between versions of Python and packages, and upgrading these for one project might break another if not handled properly.

Virtual environments create a separate development environment for each project, with its own specified Python and package versions. Common tools for working with virtual environments are pipenv and conda, where conda and its optimized relative mamba are widely used in data science.

One reason that conda is more popular for data science is that many of the Python packages used here are actually written in C or some language that needs to be compiled for the specific platform that is used. This is especially inconvenient on Windows and conda solves this by having a big library of prebuilt binaries ready for use.

x: int = 1
y: int = 2

def my_function(a: int, b: int) -> int:
    return a + b

z: int = my_function(x, y)

GuidePlotting

ExercisesPython Basics

Best Practices

Virtual Environments

Containers

Code Style

Static Typing