Math | The Kiseki Log

Reinforcement Learning illustration (by [Google Gemini](https://gemini.google.com/))

Fully Annotated Guide to "A (Long) Peek into Reinforcement Learning"

This is a fully annotated guide to Lilian Weng’s post A (Long) Peek into Reinforcement Learning.

Multi-armed bandit (by [ChatGPT Images 2.0](https://openai.com/index/introducing-chatgpt-images-2-0/))

Fully Annotated Guide to "The Multi-Armed Bandit Problem and Its Solutions"

The multi-armed bandit problem is a classic exploration–exploitation dilemma in reinforcement learning. Lilian Weng’s post is an excellent introduction, but some mathematical details and motivations can be cryptic. This article annotates it with step-by-step explanations and supplementary notes.

Diagram of the probability transition process in speculative sampling.

How is the Speculative Decoding Algorithm Constructed?

A simple mathematical derivation of the algorithm construction process from the paper “Fast Inference from Transformers via Speculative Decoding”.

Diagram of the basic principle of diffusion models, showing recovery of an image from noise. Generated by [Google Nano Banana (gemini-2.5-flash-image-preview)](https://www.nano-banana.ai/)

Fully Annotated Guide to "What are Diffusion Models?"

Diffusion models are the de facto standard for image generation. Lilian Weng’s “What Are Diffusion Models?” is an excellent introduction to it, but readers without a solid mathematical background may struggle. This article fills that gap with clear, step‑by‑step derivations and explanations.