machine-learning

On RMS matched Muon

A short note on some aspects of long context attention