ai

Improving the lower bound for the unit distance problem

A refinement of Sawin's explicit unit-distance lower bound, mostly by GPT 5.5 Pro, for the Erdős problem recently solved by OpenAI's internal model.

· 17 min · 3599 words · nor

The modded nanogpt speedrun, but in JAX and on TPUs

A writeup of porting the modded nanoGPT speedrun to pure JAX on TPU v6e, including hardware bottlenecks, bugs, optimizations, and open performance questions.

Theoretical properties of optimizers on a toy problem, and some intuition

A theoretical comparison of normalized gradient descent, Muon, and Adam-style updates on a tractable matrix optimization toy problem, showing finite-time convergence and why Muon's guarantees come out nicer than Adam's.

· 48 min · 10137 words · nor

Deriving RoPE the proper way

A rigorous derivation showing RoPE is almost optimally expressive under certain natural constraints, characterizing the allowable positional rotations with a free N-dimensional generalization and constructions for the rotation vectors.

· 25 min · 5177 words · nor

Quantizing LLMs for inference

A practical overview of LLM inference quantization, from why memory bandwidth dominates local inference to GGUF, EXL, AWQ, GPTQ, KV-cache quants, and hardware tradeoffs.