The modded nanogpt speedrun, but in JAX and on TPUs
A writeup of porting the modded nanoGPT speedrun to pure JAX on TPU v6e, including hardware bottlenecks, bugs, optimizations, and open performance questions.
A writeup of porting the modded nanoGPT speedrun to pure JAX on TPU v6e, including hardware bottlenecks, bugs, optimizations, and open performance questions.
A short PSA on increasing stack limits before MHC, with commands, compiler flags, and platform caveats.
A discussion of using C on Codeforces.
An explanation of GCC optimization and target pragmas, what common fake pragmas do not do, and when these flags help or hurt.