What I read in July 2023

Aug 28, 23

In July I’ve found myself reading mostly about conformal prediction/probabilistic forecasting and some deeper internals of Python that I’ve never gotten around to learning.

Starting off with the former. I came across this primer on calibrating probabilities in classification problems; it’s a topic I’m familiar with, but it was good to revisit it with some explicit code examples and visualizations. This reminded me about some interesting tweets I’d seen about the promise of conformal prediction for probabilistic prediction, but never really followed up. Conformal prediction is essentially a technique for creating calibrated prediction sets for nearly any machine learning model without making any assumptions beyond exchangeability of the data. This paper is a deep first-principles introduction to conformal prediction; it’s great, but a lot to consume in one sitting. I followed with this paper extending the framework to time-series and then this blog post by Valeriy Maonokhin (the godfather of conformal prediction) which succinctly describes how to evaluate your probabilistic forecasts in the wild. I had a good read through the MAPIE documentation, a fantastic sklearn-compatible package for conformal prediction, and managed to get my first probabilistic forecast about an hour later - not bad!

Back to python land. I didn’t know much about how threading works in Python, so I read the first…bit of this full guide to threading in python; however about a quater of the way through, I started to feel as if it were slightly pointless to learn how to use threads in python due to the existence of the GIL. Instead, I turned to reading PEP 703 - the proposal for adding optional GIL removal to a future version of Python. This really is well worth the read, it has some great arguments for adding this feature and the uber technical section is interesting too, though I largely flicked through. I briefly got interested in the python abstract syntax tree (AST) module at the beginning of the month and found this decent explainer - though I swiftly realised I couldn’t solve the problem I wanted to with it, and threw it on the backburner.