machine-learning

Remark #2: The Adam update

Remark #1: On RMS matched Muon

A short note on some aspects of long context attention