Thesis Project Spring 2025

  1. 1. Analysing Python Code at Scale (Open as of 2025-11-10)

Analysing Python Code at Scale

Help us make Python data-race free! We are working on extending Python with a concurrency model that is safe from data races. A concurrency model is basically the way in which threads or tasks are created, allowing a program to use a multiple cores to get more work done in a shorter time.

As part of this work, we need to study Python code at scale as input to design decisions. We are interested in what Python programs look like, what kinds of object structures do they give rise to etc. To answer this question, we are looking to mine e.g. GitHub for Python programs and then apply various program analysis techniques to understand ”what they do”.

This project will involve GitHub mining, program analysis, and probably a little bit of statistics in order to draw conclusions from large numbers of projects.

A basic understanding of Python is essential. The project will contain lots of scripting tasks to acquire the Python code, study it, etc. — which will almost certainly require developing some non-trivial analysis tools in whatever programming language you prefer.

Your result will be essential for informing our on-going research project on making Python data-race free by construction. Since lots of Python programmers are non-computer scientists this is a meaningful strive, since it will make Python programming stay simple.

If you are a master student at IT/UU interested in this project, send me an email so we can set up a meeting. I appreciate — but don’t require — grade transcripts so I can look at your courses, and links to e.g. your GitHub repos so I can look at your projects.

This project may be split into two projects with two different thesis reports (which may have some limited overlap).

(The project will be supervised by my PhD student Fridtjof Peer Stoldt, and reviewed by me.)