Posts

On RMS matched Muon

A short note on some aspects of long context attention

Simple Rules, Complex Dynamics – Part I: Foundations & Intuition

The modded nanogpt speedrun, but in JAX and on TPUs

Theoretical properties of optimizers on a toy problem, and some intuition

August 2, 2025 · 48 min · 10137 words · nor

Deriving RoPE the proper way

July 28, 2025 · 25 min · 5177 words · nor

Solving the IMO 2025 problems

July 19, 2025 · 17 min · 3514 words · nor

Quantizing LLMs for inference

May 14, 2025 · 32 min · 6685 words · nor

A Math Academy review

Calibrating Confidence