Remark #2: The Adam update
A short note on Adam's update size, the coupling of beta1 and beta2, bias correction, epsilon, and how these relate to stability and warmup.
A short note on Adam's update size, the coupling of beta1 and beta2, bias correction, epsilon, and how these relate to stability and warmup.
A refinement of Sawin's explicit unit-distance lower bound, mostly by GPT 5.5 Pro, for the Erdős problem recently solved by OpenAI's internal model.
A short note on why RMS-matching Muon to AdamW can break width transfer, causing either undertraining or instability depending on scale.
A research note on what breaks in long-context attention, deriving a logit scaling, with QK-norm, hybrid/local attention, gating, and small-scale experiments.
An intuition-building tour of dynamical systems, using canonical examples to connect feedback, thresholds, coupling, noise, reinforcement, spatial structure, and phase transitions.
A theoretical comparison of normalized gradient descent, Muon, and Adam-style updates on a tractable matrix optimization toy problem, showing finite-time convergence and why Muon's guarantees come out nicer than Adam's.
A rigorous derivation showing RoPE is almost optimally expressive under certain natural constraints, characterizing the allowable positional rotations with a free N-dimensional generalization and constructions for the rotation vectors.
Proof sketches and commentary for the IMO 2025 problems, written after doing the contest as a mock.
A practical overview of LLM inference quantization, from why memory bandwidth dominates local inference to GGUF, EXL, AWQ, GPTQ, KV-cache quants, and hardware tradeoffs.
A personal review of Math Academy from the perspective of a self-taught math-heavy user, focusing on its pedagogy, strengths, and annoyances.
An intuition-first explanation of Simpson's paradox as a mismatch between local and global comparisons, followed by the algebra behind it.
A derivation-oriented guide to FFT implementations, moving from recursive Cooley-Tukey to iterative and in-place variants without explicit bit reversal.
An elementary way to solve recurrences by changing variables until simpler structure appears, recovering characteristic-equation intuition without heavy machinery.
A practical introduction to the Akra-Bazzi theorem as a way to analyze divide-and-conquer recurrences with unequal splits, floors, and offsets.
A tutorial on lambdas, from lambda-calculus context and C++ closure mechanics to recursion, stateful patterns, STL use, and competitive-programming examples.
A tour of Floyd-Warshall as an instance of aggregating over graph paths, leading to transitive closure, Kleene algebras, and the algebraic path problem.
A beginner guide to inequalities involving floors and ceilings, with algebraic rules, examples, identities, and programming-language caveats.
A guide to permutations through orderings, cycles, and composition, with pointers to common competitive-programming applications.
A tutorial on greedoids as a framework for understanding when greedy-style reasoning works, with examples from matroids, antimatroids, and related structures.
An introduction to probability, conditional expectation, martingales, and stopping times, with math and competitive-programming examples.
A tutorial on Catalan numbers and uniform random balanced bracket generation, building bijections and generators from first principles.
An introduction to Mobius inversion through incidence algebras of posets, with inclusion-exclusion, subset transforms, number theory, and finite differences as examples.