Reinforcement Learning

The back story

In the Spring 2020, I enrolled in Stanford's CS 229 class called Machine Learning and immediately became passionate about one of its subdisciplines called reinforcement learning (RL). RL is a family of algorithms designed to solve one of the fundamental challenges in artificial intelligence (AI) and machine learning (ML), which is learning to make good decisions under uncertainty.

In November 2020, I decided to fully commit to RL, and began learning and absorbing everything I could possibly find on the subject. I began by watching and studying the amazing David Silver's lecture series available for free on his website and on youtube. Then, I took Stanford's CS 234, a 3-month course offered in the Winter quarter and taught by Professor Emma Brunksill from the Computer Science (CS) department at Stanford. CS 234 was probably the best class I have seen given at Stanford in terms of teaching, content, and help provided by TAs.

Goal of these notes

My goal here is not just to summarize the theorems and properties used in RL settings: there is already a lot of really good material available online for free. Here, I am trying derive the proofs of the main theorems/results, which are usually claimed to be "obvious", overlooked, or not explained in details.

By going through these proofs, I managed to become comfortable manipulating the mathematical tools and learned a lot of the recurring tricks used in RL. As always, these tricks are not difficult but getting familiar with them require practice. And of course, nobody tells you explicitly how to use them, either because they don't exactly know, or because they've done it so many times that they forgot how difficult it can be at first.

Notes outline

Disclaimer: I am currently trying to find time to clean-up my notes, this is still a work in progress :)

  1. Markov Decision Processes

Useful links

Here is a list of links/material I found extremely useful when I studied reinforcement learning:

Acknowledgements

  • The Markov Decision Process chapter is based on the notes from my colleague Rahul Sarkar, former CS 234 teaching assistant (TA). Rahul gave me the inspiration for creating my own notes and adding the proofs/details that I needed to fully understand the concepts.
  • Steve MacCabe, one of my team mate for the CS 234 class, suggested that I put my RL notes online.


Updated