math | nor's blog

Why wide neural networks share the same loss-curve fine structure

An elementary finite-time account of why wide networks trained on the same minibatches develop matching local loss fluctuations, and how width and batch size control initialization, data, and interaction noise, with implications on scaling.

Remark #2: The Adam update

A short note on Adam's update size, the coupling of beta1 and beta2, bias correction, epsilon, and how these relate to stability and warmup.

Improving the lower bound for the unit distance problem

A refinement of Sawin's explicit unit-distance lower bound, mostly by GPT 5.5 Pro, for the Erdős problem recently solved by OpenAI's internal model.

Remark #1: On RMS matched Muon

A short note on why RMS-matching Muon to AdamW can break width transfer, causing either undertraining or instability depending on scale.

A short note on some aspects of long context attention

A research note on what breaks in long-context attention, deriving a logit scaling, with QK-norm, hybrid/local attention, gating, and small-scale experiments.

Simple Rules, Complex Dynamics – Part I: Foundations & Intuition

An intuition-building tour of dynamical systems, using canonical examples to connect feedback, thresholds, coupling, noise, reinforcement, spatial structure, and phase transitions.

Theoretical properties of optimizers on a toy problem, and some intuition

A theoretical comparison of normalized gradient descent, Muon, and Adam-style updates on a tractable matrix optimization toy problem, showing finite-time convergence and why Muon's guarantees come out nicer than Adam's.

Deriving RoPE the proper way

A rigorous derivation showing RoPE is almost optimally expressive under certain natural constraints, characterizing the allowable positional rotations with a free N-dimensional generalization and constructions for the rotation vectors.

Solving the IMO 2025 problems

Proof sketches and commentary for the IMO 2025 problems, written after doing the contest as a mock.

Quantizing LLMs for inference

A practical overview of LLM inference quantization, from why memory bandwidth dominates local inference to GGUF, EXL, AWQ, GPTQ, KV-cache quants, and hardware tradeoffs.

A Math Academy review

A personal review of Math Academy from the perspective of a self-taught math-heavy user, focusing on its pedagogy, strengths, and annoyances.

The intuition and the math behind Simpson's paradox

An intuition-first explanation of Simpson's paradox as a mismatch between local and global comparisons, followed by the algebra behind it.

Implementing FFT

A derivation-oriented guide to FFT implementations, moving from recursive Cooley-Tukey to iterative and in-place variants without explicit bit reversal.

An elementary way of solving recurrences

An elementary way to solve recurrences by changing variables until simpler structure appears, recovering characteristic-equation intuition without heavy machinery.

The Akra-Bazzi theorem - a generalization of the master theorem for recurrences

A practical introduction to the Akra-Bazzi theorem as a way to analyze divide-and-conquer recurrences with unequal splits, floors, and offsets.

On lambdas, C++ and otherwise: the what, the why, and the how

A tutorial on lambdas, from lambda-calculus context and C++ closure mechanics to recursion, stateful patterns, STL use, and competitive-programming examples.

The Floyd-Warshall algorithm and its generalizations

A tour of Floyd-Warshall as an instance of aggregating over graph paths, leading to transitive closure, Kleene algebras, and the algebraic path problem.

Floors, ceilings and inequalities for beginners (with some programming tips)

A beginner guide to inequalities involving floors and ceilings, with algebraic rules, examples, identities, and programming-language caveats.

A comprehensive guide to permutations for beginners

A guide to permutations through orderings, cycles, and composition, with pointers to common competitive-programming applications.

Greedoids: a formal way to look at families of greedily-solvable problems

A tutorial on greedoids as a framework for understanding when greedy-style reasoning works, with examples from matroids, antimatroids, and related structures.

Probability 101, the intuition behind martingales and solving problems with them

An introduction to probability, conditional expectation, martingales, and stopping times, with math and competitive-programming examples.

Catalan Numbers and Generating Uniform Balanced Bracket Sequences

A tutorial on Catalan numbers and uniform random balanced bracket generation, building bijections and generators from first principles.

Generalized Möbius Inversion on Posets

An introduction to Mobius inversion through incidence algebras of posets, with inclusion-exclusion, subset transforms, number theory, and finite differences as examples.