- Introduction
- Some simple ideas and baselines - growth, local linearity and equilibrium
- Oscillations from feedback and delay
- Thresholds, alignment, and phase transitions
- Reinforcement, herding, and heavy tails
- Swarms and distributed coordination
- Selection dynamics
- Spatial structure: patterns from local rules
- A toolbox
- Main objects of study
- Linearization and local analysis
- 2D phase-plane portraits
- Lyapunov functions and LaSalle’s invariance
- Bifurcations
- Discrete-time systems and chaos
- Delays: compartments vs true delay differential equations
- Invariants, positivity, comparison, monotone structure
- Networks and coupling
- Non-dimensionalization and scaling
- Stochastic dynamics and metastability
- Traveling fronts
- Contraction analysis
- Singular perturbations and slow-fast decompositions
- Koopman operator and data-driven surrogates
- Ergodic theory and mixing
- Non-smooth and hybrid dynamics (thresholds and edges)
- Control and feedback
- Numerical methods that respect structure
- References and Further Reading
- Acknowledgements
Introduction
Simple local interactions can lead to a large variety of global behavior. Many examples come from physics, biology, economics, and ML, but the machinery used to look at them remains essentially the same. Looking at things through a certain lens - asking questions such as “What’s the feedback?”, “How are things coupled?”, “Is there noise?”, “What are the timescales?” - can help intuit the macroscopic outcomes of these systems: growth, cycles, alignment and discontinuities, swarms, heavy tails, and different spatio-temporal patterns.
Part I is more about introducing some canonical examples that show some of the important ideas in dynamical systems. Part II will be about applying these ideas to ML.
We’ll use some math here - linear algebra, ordinary differential equations, basic multivariate calculus, and basic probability/stochastic processes - and build up the remaining things along the way. We will look at lots of examples in order to build intuition, instead of hammering it in with theory.
For the sake of being systematic, you should look out for the following things in most (if not all) of the examples. Doing this deliberately is left as an exercise to the reader.
- State variables: what is tracked?
- Local rule: how does a thing update locally, given its neighbors or the population?
- Coupling: how are things coupled together?
- Feedback: what kinds of feedback does the system give? E.g., linear, saturating, thresholded, multiplicative.
- Timescales/delays: instantaneous/lagged.
- Macroscopic behavior: fixed points, cycles, bifurcations, phase transitions, criticality.
Don’t worry if things don’t make sense yet. They will as we consider examples.
We’ll gradually build up things - we’ll start with systems without any interaction structures or constraints, then add cross-coupling and delays, then thresholds and noise, then reinforcement and multiplicative systems. Then things like continuous alignment, selection dynamics, and spatial diffusion. For each of these, we’ll see what systems arising naturally have dynamics that can be studied using them.
Some simple ideas and baselines - growth, local linearity and equilibrium
Let’s start from the ground up and build some ideas that will help us later on.
The question is: what do systems with no interactions or with simple feedback do?
Exponential and logistic growth
We first look at the simplest feedback mechanisms. The state of this system is going to be \(x(t) \ge 0\), a scalar. For our purposes, we’ll think of it as some population.
What are the local rules? How do we look at feedback?
For exponential growth, we model per-capita growth to be constant (say \(r\)). Then, \(\dot{x} = rx\), which gives \(x(t) = x_0 \exp(rt)\). Here, \(\dot{x}\) denotes the time-derivative of \(x\).
For logistic growth, we model per-capita growth as declining linearly with \(x\) due to some finite “capacity” \(K\). Then, \(\dot{x} = rx(1 - x/K)\). Solving this ODE gives us \(\frac{x(t)}{K - x(t)} = \frac{x_0}{K - x_0} \exp(rt)\), or equivalently, \(x(t) = \frac{K}{1 + \frac{K - x_0}{x_0} \exp(-rt)}\).
In the first case, the growth is clearly exponential. In the second, the growth is S-shaped, with saturation at \(x = K\). The maximum growth happens when \(x = \frac{K}{2}\). The reason why it saturates is that it has an “attractor” at \(K\), and at all points, pushing the rate of growth back is a capacity term. When nothing pushes back (or in more informal terms, the “attractor” is at \(\infty\)), we get exponential growth, as expected. In the logistic growth case, when \(x\) starts from a point beyond \(K\), the system still tries to get back to \(K\), so calling it an attractor is more appropriate than a barrier - it corresponds to things like starting from a population overshoot and the consequent decline (but not “growth” in the usual sense).

Figure 1: Fig 1A - Exponential vs logistic growth (linear and log-y)

Figure 2: Fig 1B - Logistic growth r x (1 - x / K)
Oscillators and local linearity
The state of this system is going to be \(x(t)\), a scalar, which can be thought of as the displacement of a mass, and \(\dot{x}\), which can be thought of as the velocity of the mass. This is a point in the “phase space”. Consider the following:
\[ \ddot{x} + \gamma \dot{x} + \omega^2 x = 0 \]
The reason why we consider this is simple. Near any stable equilibrium of any reasonable system in one variable, the leading behavior after linearization looks like something of the above form. (More generally, an oscillatory mode near a stable equilibrium looks like this).
I recommend going through the proof below for understanding the general ideas (in particular, asymptotic and Lyapunov stability are something to keep in mind).
Informally speaking, the idea lends itself to physical intuition quite well - consider a mass rolling in a parabolic well which also has friction. The bottom of the well is clearly a stable equilibrium. If the damping were negative (i.e., force aligned with velocity), it would gain velocity and escape, making it an unstable equilibrium. Similarly, if the well were upturned, it would again be an unstable equilibrium.
We will prove something slightly more general - a statement about the modes of oscillation of an $n$-dimensional vector.
Let the state of a system be described by a vector \(x(t) \in \mathbb{R}^n\). The evolution of the system over time is governed by a system of first-order ordinary differential equations \(\dot{x} = F(x)\) where \(F: \mathbb{R}^n \to \mathbb{R}^n\) is a continuously differentiable function. This formulation is completely general, as any $k$-th order ODE can be rewritten as a system of \(k\) first-order ODEs.
An equilibrium point, denoted \(x_0\), is a state where the system does not evolve, i.e., it is a constant solution. Mathematically, this is a point where the vector field vanishes, i.e., \(F(x_0) = 0\).
A system can have multiple equilibrium points. We are concerned with those that are stable. An equilibrium point \(x_0\) is defined as Lyapunov stable if, for any arbitrarily small neighborhood of tolerance \(\mathcal{V}\) around \(x_0\), there exists a neighborhood of initial conditions \(\mathcal{U}\) around \(x_0\) such that any trajectory \(x(t)\) starting in \(\mathcal{U}\) remains within \(\mathcal{V}\) for all future time \(t \ge 0\). More formally, for every \(\epsilon > 0\), there exists a \(\delta > 0\) such that if \(\|x(0) - x_0\| < \delta\), then \(\|x(t) - x_0\| < \epsilon\) for all \(t \ge 0\).
Note that this definition of stability doesn’t say anything about convergence to the equilibrium point (so it’s possible that the trajectory looks like a closed loop around the equilibrium point, but never converging). A stronger condition is asymptotic stability - it requires that the trajectory converges to the equilibrium point.
Now, to analyze the behavior of the system near the equilibrium point \(x_0\), we consider a small perturbation \(\eta(t) = x(t) - x_0\). The dynamics of this perturbation are given by \(\dot{\eta}(t) = \dot{x}(t) = F(x_0 + \eta(t))\). Since \(\eta(t)\) is assumed to be small, we can perform a Taylor series expansion of \(F\) around the point \(x_0\): \(F(x_0 + \eta) = F(x_0) + J(x_0)\eta + \mathcal{O}(\|\eta\|^2)\). Here, \(J(x_0)\) is the Jacobian matrix of \(F\) evaluated at \(x_0\), with entries \(J_{ij} = \frac{\partial F_i}{\partial x_j}\bigg|_{x=x_0}\).
By definition of equilibrium, \(F(x_0) = 0\). For a sufficiently small perturbation \(\eta\), the higher-order terms \(\mathcal{O}(\|\eta\|^2)\) are negligible compared to the linear term. The dynamics of the perturbation are therefore well-approximated by the linearized system \(\dot{\eta} = J \eta\).
About the validity of this approximation: the definition of the derivative itself is that it provides the best linear approximation to a function locally. To be rigorous, the Hartman-Grobman theorem (which is probably out of scope as of writing) formalizes the topological equivalence between the non-linear and linear systems under certain conditions, but the approximation itself is a direct consequence of the differentiability of \(F\).
We now try to figure out properties of the Jacobian matrix \(J\) under stability constraints. The behavior of the linear system \(\dot{\eta} = J\eta\) is entirely determined by the eigenvalues of \(J\). The general solution (assuming \(J\) is diagonalizable for simplicity, though the argument holds generally using the Jordan normal form) is a linear combination of terms of the form \(v_i e^{\lambda_i t}\), where \(\lambda_i\) are the eigenvalues and \(v_i\) are the corresponding eigenvectors: \(\eta(t) = \sum_{i=1}^{n} c_i v_i e^{\lambda_i t}\)
Each eigenvalue \(\lambda_i\) is, in general, a complex number: \(\lambda_i = \Re(\lambda_i) + i \cdot \Im(\lambda_i)\). The magnitude of the term \(e^{\lambda_i t}\) is given by \(|e^{\lambda_i t}| = |e^{(\Re(\lambda_i) + i \cdot \Im(\lambda_i))t}| = |e^{\Re(\lambda_i)t}| \cdot |e^{i \cdot \Im(\lambda_i)t}| = e^{\Re(\lambda_i)t}\). The term \(|e^{i \cdot \Im(\lambda_i)t}|\) has a magnitude of 1 and represents oscillation. The growth or decay of the perturbation is governed entirely by the sign of the real part of the eigenvalue, \(e^{\Re(\lambda_i)t}\).
We claim that for the equilibrium point \(x_0\) to be stable, all eigenvalues \(\lambda_i\) of the Jacobian \(J\) must have non-positive real parts, i.e., \(\Re(\lambda_i) \le 0\) for all \(i=1, \dots, n\).
The proof is by contradiction. Assume there exists at least one eigenvalue, say \(\lambda_k\), with a strictly positive real part, \(\Re(\lambda_k) > 0\). Consider an initial perturbation \(\eta(0)\) that has a non-zero component in the direction of the eigenvector \(v_k\). That is, in the sum \(\eta(0) = \sum c_i v_i\), the coefficient \(c_k \ne 0\). Such an initial condition can be chosen arbitrarily close to the equilibrium point (i.e., for any \(\delta > 0\)). The solution \(\eta(t)\) will contain the term \(c_k v_k e^{\lambda_k t}\). The magnitude of this term grows exponentially as \(\|\eta_k(t)\| = |c_k| \|v_k\| e^{\Re(\lambda_k)t}\). Since \(\Re(\lambda_k) > 0\), the term \(e^{\Re(\lambda_k)t} \to \infty\) as \(t \to \infty\). This holds for all \(t\), not just small \(t\). This roughly exponential growth means that the perturbation \(\eta(t)\) will eventually become arbitrarily large. No matter how small the initial neighborhood \(\mathcal{U}\) (i.e., how small we choose \(\eta(0)\)) and how large the tolerance neighborhood \(\mathcal{V}\) (i.e., the value of \(\epsilon\)), the trajectory will eventually leave \(\mathcal{V}\). This violates the definition of Lyapunov stability. So, for a stable equilibrium, it is necessary that \(\Re(\lambda_i) \le 0\) for all eigenvalues of the Jacobian matrix.
A nitpick - when \(\Re(\lambda_i) = 0\) (for any particular eigenvalue), this is called a non-hyperbolic equilibrium - linearization is generally insufficient to comment on stability of the original system (but is fine if we restrict ourselves to only the linearized problem instead of the original), so we will not consider cases where \(\Re(\lambda_i) = 0\) (the Hartman-Grobman theorem doesn’t apply to such cases, and we generally need to look at the non-linearities specifically to say anything). So we will assume that \(\Re(\lambda_i) < 0\) for all \(i\). This also gives us asymptotic stability, which is stronger than Lyapunov stability, and is also enough to talk about the stability of the original system.
The dynamics of the full $n$-dimensional linear system can be decomposed into subspaces corresponding to the eigenvalues of \(J\). Oscillatory behavior is associated with complex conjugate pairs of eigenvalues; the real eigenvalues lead to exponential decay, and are not interesting in general (but a similar argument works for them too).
Let’s consider a dominant “mode” of oscillation within the system, which will be governed by a pair of complex conjugate eigenvalues, \(\lambda\) and \(\bar{\lambda}\). From our stability proof, we know that \(\Re(\lambda) \le 0\). Let us write the eigenvalues as \(\lambda = -\alpha + i\omega’\) and its complex conjugate \(\bar{\lambda}\) - note that this is not the same \(\omega\) as before. Due to the stability requirement, the damping factor \(\alpha = -\Re(\lambda)\) must be non-negative, i.e., \(\alpha \ge 0\). The value \(\omega’ = \Im(\lambda)\) represents the frequency of oscillation.
The dynamics in the two-dimensional real invariant subspace corresponding to this eigenvalue pair can be described by a 2x2 real matrix, which is similar to the canonical form \(A = \begin{pmatrix} -\alpha & -\omega’ \\ \omega’ & -\alpha \end{pmatrix}\).
Let the state within this subspace be represented by a vector \(y(t) = \begin{pmatrix} y_1(t) \\ y_2(t) \end{pmatrix}\). The dynamics are given by \(\dot{y} = Ay\), i.e., \(\dot{y}_1 = -\alpha y_1 - \omega’ y_2\) and \(\dot{y}_2 = \omega’ y_1 - \alpha y_2\). We can combine these two first-order equations into a single second-order equation for one of the components, say \(y_1\). Differentiate the first equation with respect to time and substitute the expression for \(\dot{y}_2\) from the second equation to get \(\ddot{y}_1 = -\alpha \dot{y}_1 - \omega’(\omega’ y_1 - \alpha y_2) = -\alpha \dot{y}_1 - \omega’^2 y_1 + \alpha\omega’ y_2\). Eliminating \(y_2\) and rearranging the terms, we arrive at the standard form of a second-order linear homogeneous ODE \(\ddot{y}_1 + 2\alpha \dot{y}_1 + (\alpha^2 + \omega’^2)y_1 = 0\).
This is the equation for a damped linear oscillator.
- The term \(m\ddot{y}_1\) corresponds to \(\ddot{y}_1\) (we can set the “mass” \(m = 1\) without loss of generality by scaling the equation).
- The term \(b\dot{y}_1\) corresponds to \(2\alpha \dot{y}_1\). This is the damping term.
- The term \(ky_1\) corresponds to \((\alpha^2 + \omega’^2)y_1\). This is the restoring term.
From the definition of stability, we saw that \(\Re(\lambda) \le 0\), which implies that the coefficient \(\alpha = -\Re(\lambda)\) must be non-negative (\(\alpha \ge 0\)), so, the coefficient of the damping term, \(b = 2\alpha\), must also be non-negative.
Thus, any system near a stable equilibrium point, when exhibiting oscillatory behavior, can be mathematically modeled as a damped second-order linear oscillator.
Let’s consider some cases for this second-order system in one variable. Let \(r_1\) and \(r_2\) be the zeros of \(r^2 + \gamma r + \omega^2\).
Then we have \(-\gamma = r_1 + r_2\) and \(\omega^2 = r_1 r_2\), so \((\ddot{x} - r_1 \dot{x}) - r_2 (\dot{x} - r_1 x) = 0\). Setting \(y = \dot{x} - r_1 x\), this gives a homogeneous linear ODE in \(y\). Solving it for \(y\) gives us a linear ODE in \(x\). The solutions differ in different cases:
- Both roots real (i.e., \(\gamma^2 > 4\omega^2\)): the system here is called overdamped, and the two negative real roots imply no oscillation. \(x\) is a sum of two decaying exponentials in time.
- Repeated real root (i.e., \(\gamma^2 = 4\omega^2\)): the system here is called critically damped, and this is the fastest non-oscillatory return to equilibrium. \(x\) is a linear polynomial times a decaying exponential in time.
- Both roots complex (i.e., \(\gamma^2 < 4\omega^2\)): the system here is called underdamped, and there are oscillations here. The form is similar to the both-real-roots case (except the roots here are complex), but with constraints that make the resulting \(x\) real. In general it looks like a sinusoidal function times a decaying exponential in time.

Figure 3: Fig 1C - Damped oscillator phase portrait
There are also other variants of this system that have a forcing component - e.g., instead of \(0\) on the right-hand side, we have a sinusoidal (in time) component that drives the system, and it’s a good exercise to figure out how to solve this (and to figure out what happens in specific cases, what happens to the steady-state amplitude with different sinusoidal frequencies and so on). You will find that “resonance” (peak amplitude) occurs when the forcing frequency is \(\sqrt{\omega^2 - \frac{\gamma^2}{2}}\) at low damping (when damping is high enough that the term under the square root is negative, the amplitude is monotonically decreasing and has a maximum amplitude under a static force).

Figure 4: Fig 1D - Resonance curve for forced oscillator
Invariants and conservation
When things are changing, it is often wise to focus on what doesn’t.
Many systems have things that remain constant over time - these are called invariants or conserved quantities. They help make life easier by reducing the degrees of freedom (or the moving parts) the system has, generally becoming one less thing we need to integrate over.
In physical systems, there are things such as energy or momentum that may be conserved.
Similarly, there are monovariants - things that change monotonically as a process progresses. In physics, we have things like the entropy of an isolated system that seem to be monotonically non-decreasing.
Our first example of an invariant comes from Hooke’s law for springs. Consider a mass attached to a spring. Let the position from the equilibrium position be \(x\), and the state of the system is determined by \(x\) and \(\dot{x}\), as earlier. Hooke’s law states that \(m\ddot{x} + kx = 0\) (restoring force of \(-kx\) with extension \(x\)). Multiplying by \(\dot{x}\) and integrating gives us a conserved quantity \(\frac{1}{2} m \dot{x}^2 + \frac{1}{2} k x^2\). This is precisely the conservation of total mechanical energy - the energy is a function of the state that remains preserved in this system, and hence an invariant/conserved quantity.

Figure 5: Fig 2A - Energy contours in (x, v)
We give another example, this time from biology - the Hardy-Weinberg principle. It states that under no evolutionary “forces”, the allele and genotype distributions remain the same (the former holds from the first generation itself, while the latter holds from the second generation onwards).
To give some context - a gene is a segment of DNA that codes for a specific trait (e.g., flower color). Different versions of a gene are called alleles (e.g., the A allele for purple flowers and the a allele for white flowers). An individual’s genetic makeup, or genotype, consists of two alleles (e.g., AA, Aa, or aa).
Allele frequency is the proportion of a specific allele within a population’s entire gene pool. We denote the frequency of the dominant allele (A) as \(p\) and the recessive allele (a) as \(q\). By definition, \(p + q = 1\).
Under conditions such as random mating, infinite population, no selection/mutation/migration/drift, the Hardy-Weinberg principle says that the allele frequencies remain \(p\) and \(q\) for each generation, and the genotype frequencies reach \(p^2\), \(2pq\), \(q^2\) after the first generation and stay there.
This is fairly straightforward to derive: for the allele frequencies, the probability that an entity in the new generation inherits A (respectively, a) is \(p\) (respectively, \(q\)). The same can be done for the genotypes.
So even if incrementing generations introduces more members to the population, these two ratios are invariant.

Figure 6: Fig 2B - Hardy-Weinberg checkerboard and frequencies
Oscillations from feedback and delay
Now, real systems also generally have interaction between entities and have delays. We’ll study this with a few examples.
Lotka-Volterra predator-prey model
This is a model for populations of species that are in a predator-prey relationship. Prey grows at a per-capita rate of \(\alpha\) but is eaten at the per-capita rate of \(\beta\) times the predator population. Predators die without sustenance at the per-capita rate of \(\gamma\) and grow at the per-capita rate of \(\delta\) times the prey population.
If the prey population is \(x\) and the predator population is \(y\), we get the following equations:
\[\dot{x} = \alpha x - \beta xy\]
\[\dot{y} = -\gamma y + \delta xy\]
This is what we call a cross-coupled system. Let’s think of it empirically first. What are the most trivial solutions?
- If there are no predators, the prey grows exponentially (absurd, but we will live with it in this simple model for now).
- If there is no prey, predators die out with an exponential decay.
- Equilibrium means that \(x\) and \(y\) don’t change - this gives us a solution with \(y = \frac{\alpha}{\beta}\) and \(x = \frac{\gamma}{\delta}\), or when neither species has any individuals. This is of course assuming that \(\beta\) and \(\delta\) are non-zero - this is a physically valid assumption. We’ll in general assume that \(\alpha\), \(\beta\), \(\gamma\) and \(\delta\) are positive.
Let’s look at what happens when either of the populations has a zero rate of change (i.e., a nullcline for that species’ population). For \(x\), they are \(x = 0\) and \(y = \frac{\alpha}{\beta}\), and for \(y\) they are \(y = 0\) and \(x = \frac{\gamma}{\delta}\). This divides the first quadrant (positive \(x\) and \(y\)) into four parts, and where these parts intersect is the non-trivial equilibrium point. By looking at signs for \(\dot{x}\) and \(\dot{y}\) we can see that the path is counterclockwise around the equilibrium point.

Figure 7: Fig 3A - LV nullclines and orbits
Another empirical property, as we’ll see later, is that these interactions can cause oscillations. When the prey populations increase a lot, the predator populations start to increase, and there comes a point where large predator populations lead to reduction in the prey populations. This in turn leads to a reduction in predator populations since they’re dependent on prey, and the prey have a chance to recover.
Let’s try to solve this problem mathematically. Eliminating \(t\) we get \(\frac{dx}{dy} = \frac{x}{y} \frac{\alpha - \beta y}{-\gamma + \delta x}\). Separating variables and integrating, we get that the following quantity is constant:
\[\delta x - \gamma \log x + \beta y - \alpha \log y\]
If we plot contours of this quantity on the Cartesian plane, we’ll get closed loops. It’s easy to see why: \(\delta x - \gamma \log x\) is bounded below, and so is \(\beta y - \alpha \log y\). Since they sum to a constant, each of them is also bounded above, so contours are always finite. This is also a continuous curve with no more than 2 solutions for any given \(x\) (or \(y\)), and the set of \(x\) and \(y\) where such solutions exist is a closed interval.

Figure 8: Fig 3B - LV first integral level sets
Let’s see what happens near the non-trivial equilibrium point - linearization gives us a Jacobian with eigenvalues \(\pm i \sqrt{\alpha \gamma}\), so this system oscillates.
Note that a major issue with this is the jump from discrete populations to continuous populations - in practice these cycles don’t exist under extreme pressure (e.g., you can’t have more than 0 but less than 1 member of a species).
We now add a simple change to the standard Lotka-Volterra model - a carrying capacity (like the logistic growth case from earlier) \(K\) for the prey - and see how things change. \(\alpha x\) becomes \(\alpha x (1 - x/K)\).
The equilibrium points become \((0, 0)\), \((K, 0)\) and \((\gamma/\delta, (\alpha/\beta) (1 - \gamma/(\delta K)))\) (the last of which makes sense only if \(\gamma/\delta < K\)).
The Jacobian at \((0, 0)\) has eigenvalues \(\alpha\) and \(-\gamma\) which are of opposite signs, so it is a saddle point. At \((K, 0)\), the eigenvalues are \(-\alpha\) and \(\delta K - \gamma\), and we get a stable equilibrium if \(\gamma / \delta > K\) and a saddle point when the inequality is flipped. When equality holds instead, a different phenomenon called a (transcritical) bifurcation happens - i.e., two equilibrium points (this one and the coexistence point) swap their stability characteristics. In this specific case, at this exact point, a perturbation that increases the prey population is unstable, and one that decreases the prey population is stable, as can be verified by considering the original non-linear system.
The Jacobian at the coexistence point (which makes physical sense only for \(\gamma / \delta < K\), equality gives us the previous case) has non-zero eigenvalues. The trace is always negative and the determinant is always positive, so this is always a stable equilibrium. Whether it is a direct decay or oscillatory decay depends on the discriminant of the characteristic equation and is easy to determine.

Figure 9: Fig 3C - Logistic-prey LV phase portrait and stability
SEIR model with vital dynamics
The Lotka-Volterra model showed us how instantaneous cross-coupling can produce sustained oscillations. However, many real-world systems, particularly in epidemiology, involve a time lag between cause and effect. An individual does not become instantaneously infectious upon being exposed to a pathogen; instead, there is an incubation period. The SEIR model with vital dynamics is an example that demonstrates how this delay changes the behavior, going from the neutral cycles of the Lotka-Volterra model to damped oscillations (or monotone approach) converging on a stable state.
It is what we call a compartmental model, meaning we divide the total population, \(N\), into distinct groups based on their disease status. We assume the population is large and well-mixed, allowing us to treat the number of individuals in each compartment as a continuous variable.
The state variables are:
- \(S(t)\): the number of susceptible individuals - those who are healthy but can become infected.
- \(E(t)\): the number of exposed individuals - those who have been infected but are not yet infectious to others (i.e., they are in the incubation period).
- \(I(t)\): the number of infectious individuals - those who can transmit the disease to susceptible individuals.
- \(R(t)\): the number of recovered (or removed) individuals - those who are no longer infectious and have acquired immunity (or have died).
The flow between these compartments under this model is as follows:
- The transmission rate, \(\beta\), governs the rate at which susceptible individuals become infected through contact with infectious individuals. The total rate of new infections is proportional to the number of encounters between \(S\) and \(I\), which in a well-mixed population is proportional to \(S \times I\). We often write this term as \(\frac{\beta S I}{N}\) to represent the per-susceptible infection rate as a function of the fraction of the population that is infectious, \(\frac{I}{N}\).
- The incubation rate, \(\sigma\), is the rate at which exposed individuals become infectious. The average duration of the incubation period is \(1/\sigma\).
- The recovery rate, \(\gamma\), is the rate at which infectious individuals recover and gain immunity (or die). The average duration of the infectious period is \(1/\gamma\).
- The birth and death rate, \(\mu\). To account for long-term dynamics, we assume a constant birth rate that adds new individuals to the susceptible pool, and a natural death rate that removes individuals from all compartments. For a stable population size \(N\), the total birth rate must equal the total death rate, so we assume a birth rate of \(\mu N\) and a per-capita death rate of \(\mu\).
These rules give rise to the following system of coupled ordinary differential equations:
\[\dot{S} = \mu N - \frac{\beta S I}{N} - \mu S\] \[\dot{E} = \frac{\beta S I}{N} - (\sigma + \mu) E\] \[\dot{I} = \sigma E - (\gamma + \mu) I\] \[\dot{R} = \gamma I - \mu R\]
An immediate (and obvious) observation is that \(\dot{S} + \dot{E} + \dot{I} + \dot{R} = 0\). This shows that, as expected, the total population \(N = S + E + I + R\) is a conserved quantity and lets us reduce the system’s dimensionality from four to three by setting \(S(t) = N - E(t) - I(t) - R(t)\).
Epidemic models are generally made to answer the question: under what conditions will an outbreak occur? Intuitively, an epidemic will take off if, on average, a single infectious person introduced into a fully susceptible population infects more than one other person. This threshold quantity is called the basic reproduction number, \(R_0\).
If \(R_0 < 1\), each infected individual produces, on average, fewer than one new infection, and the disease will die out. If \(R_0 > 1\), each infected individual produces more than one new infection, and the epidemic will grow.
Let’s compute \(R_0\). We consider the stability of the disease-free equilibrium (DFE), where \((S, E, I, R) = (N, 0, 0, 0)\). An outbreak occurs if this state is unstable. Near the DFE, \(S \approx N\), so the linearized system for the infected compartments (\(E\) and \(I\)) is:
\[\frac{d}{dt} \begin{pmatrix} E \\ I \end{pmatrix} = \begin{pmatrix} -(\sigma + \mu) & \beta \\ \sigma & -(\gamma + \mu) \end{pmatrix} \begin{pmatrix} E \\ I \end{pmatrix}\]
The DFE is stable if and only if the eigenvalues of this Jacobian matrix have negative real parts. This requires the determinant to be positive (since the trace is already negative), leading to the condition for stability:
\[R_0 = \frac{\beta}{\gamma+\mu} \cdot \frac{\sigma}{\sigma+\mu} < 1\]
We can understand this formula intuitively too. For an infection to be passed on, an individual must first survive the incubation period (which occurs with probability \(\frac{\sigma}{\sigma+\mu}\)) and then infect others during their infectious period. The average duration of the infectious period is \(\frac{1}{\gamma+\mu}\), and the transmission rate is \(\beta\). Thus, \(R_0\) is the product of the transmission rate, the fraction of individuals who survive incubation, and the average duration of infectiousness.
The system has two equilibrium points:
- Disease-free equilibrium (DFE): \((S, E, I, R) = (N, 0, 0, 0)\). As we just showed, this equilibrium is stable if \(R_0 < 1\).
- Endemic equilibrium (EE): if \(R_0 > 1\), the DFE becomes unstable, and the system converges to a state where the disease persists in the population. By setting the derivatives to zero, we can solve for the steady-state values (\(S^*, E^*, I^*, R^*\)). Assuming \(I^* \ne 0\), we find \(S^* = \frac{N}{R_0}\). The existence of a physically feasible endemic equilibrium (where compartment sizes are positive) requires \(S^* < N\), which implies \(R_0 > 1\), consistent with our stability analysis. The remaining values can be found using the conservation of total population.
Note that when \(R_0 > 1\), the system does not simply move monotonically to the endemic equilibrium. Instead, it typically exhibits damped oscillations.
The trajectory starts with a rapid decline in susceptibles (\(S\)) and a corresponding rise in the exposed (\(E\)) and infectious (\(I\)) populations. The peak of the exposed population occurs before the peak of the infectious population. By the time the infectious population (\(I\)) reaches its peak, the susceptible population (\(S\)) has been significantly depleted. This lack of new “fuel” causes the infectious population to crash. This is followed by a slow recovery of the susceptible population due to births, which can lead to smaller, subsequent waves until the system eventually dampens into the steady endemic state.
This oscillatory behavior is a direct result of the time lag introduced by the \(E\) compartment. Unlike instantaneous feedback, the delay between becoming infected and becoming infectious creates an “inertial” effect. The system cannot immediately correct for changes, leading to the characteristic overshoots that define damped oscillations around the stable endemic equilibrium.

Figure 10: Fig 3D - SEIR S, E, I, R time series (damped waves)

Figure 11: Fig 3E - SEIR S-I phase portrait to endemic equilibrium
Thresholds, alignment, and phase transitions
So far we’ve looked at examples where feedback and delays lead to continuous changes in state. We now consider a different class of phenomena, where the global behavior is not a smooth wave but is discontinuous. Systems that have some sense of thresholds where units make discrete decisions provide some examples that show this kind of behavior. These microscopic discrete behaviors lead to macroscopic behaviors such as cascades, spontaneous alignment, and segregation - phase transitions of some sort. The question we want to answer is: how do simple threshold rules aggregate to give us sudden changes in global behaviors?
We’ll look at three examples first - Granovetter’s threshold model which shows social cascades, the Ising model for magnetism that’ll help us understand alignment, noise, and emergent order, and Schelling’s model of social segregation arising from mild spatial preferences.
Granovetter’s threshold model for collective behavior
This model provides a simple explanation for how collective behavior, like strikes or the adoption of innovations, can emerge suddenly and unpredictably. The main idea is that an individual’s decision to participate depends on how many others are already participating. Each individual has a personal threshold on the number of other people they need to see participating before they join in themselves.
The state of the system at time \(t\) is the number of individuals who exhibit the behavior \(k(t)\). Each individual \(i\) in the population has a fixed integer threshold \(T_i\) such that they will exhibit the behavior at time \(t + 1\) if \(k(t) \ge T_i\).
The coupling here is mean-field - each person’s decision is based on a global count and not by any sort of local statistics.
The macroscopic behavior of the system is entirely determined by the distribution of thresholds across the population. A small change in this distribution can drastically change the outcome of the experiment.
For example, consider the case where the $i$-th person had threshold \(i - 1\). Then at each time step, another person exhibits the behavior, until there are no people left. If we change the threshold for a specific person \(i\) from \(i - 1\) to \(i\), then there will only be \(i - 1\) people who exhibit the behavior, no matter how long the experiment is run. In the worst case, it could mean \(0\) people showing the behavior. This behavior can be extremely counterintuitive to an outside observer who doesn’t know the individual thresholds.
This is also a bifurcation. The parameter that controls this behavior (the control parameter) is the shape of the threshold distribution.
Let’s try to look at it in the continuous limit. Let \(k(t)\) now be the fraction of individuals who exhibit the behavior. Then \(k(t + 1) = \mathbb{P}[\theta_i \le k(t)] = G(k(t))\) where \(G\) is the CDF of the threshold distribution.
A fixed point occurs when \(k^* = G(k^*)\). Obvious solutions to this are \(k^* = 0\) and \(1\), and maybe some solutions in the middle. Stability requires that \(|G’(k^*)| < 1\). If \(G\) is steep near the origin, then a small seed can make the system go to a non-zero equilibrium.

Figure 12: Fig 4A - Granovetter response map k(t + 1) = G(k(t))
Note that we can also have smooth thresholds here. Suppose each individual exhibits the behavior with probability \(\frac{1}{1 + \exp(-\beta (f(k) - c))}\) - letting \(\beta\) go to \(\infty\) gives us hard thresholds. I recommend looking back at this after reading the Ising model section and trying to find similarities.
The Ising model: alignment, noise and criticality
The Ising model from statistical physics gives a microscopic theory for magnetism based on local interactions, and exhibits cooperative phenomena and phase transitions. The main observation here is that local interactions lead to global alignment, and there is a critical level of noise above which this alignment goes away.
It’s worth noting that this is one of the most studied models in physics (so we will definitely not be able to do justice to it) and has inspired energy-based models in ML as well (Boltzmann machines, Hopfield networks).
The state consists of a vector of “spins” \(s_i\), generally (but not necessarily) arranged on a lattice. Each spin is either \(1\) (up) or \(-1\) (down).
The behavior can be described using an energy function that assigns a scalar energy to every possible configuration. The system generally tries to go towards states of lower energy. Consider a system with nearest-neighbor interactions and an external field \(h\). It has energy \(H(s) = -J \sum_{\langle i, j \rangle} s_i s_j - h \sum_i s_i\). The first term is the coupling between adjacent spins (the angular brackets mean \(i, j\) are neighbors). \(J\), the coupling constant, determines the nature of interaction - \(J > 0\) means ferromagnetic (aligned spins lower energy) and \(J < 0\) means antiferromagnetic (aligned spins increase energy). We will consider the ferromagnetic case. The second term explains how the system interacts with an external magnetic field \(h\).
In this model, as is usual in statistical physics, we consider the effect of temperature. At zero “temperature” (no noise), the system would just find the minimum energy configuration. However when noise is added to the system (thermal fluctuations), the dynamics are defined using a certain statistical ensemble - a combination of states with all possible different energies with different probabilities depending on both the energy and the temperature. The distribution is the Boltzmann distribution - \(P(s) \propto \exp(-H(s) / k_B T)\) where \(k_B\) is the Boltzmann constant (we’ll set it to 1), and the constant of proportionality is the inverse of the partition function (for normalization purposes, but the partition function in general describes all macroscopic properties!). Note that at lower temperatures, the exponential is much sharper and strongly prefers states with lower energies, but at higher temperatures, this preference drops off, and the system has a more significant contribution from higher energy states than at lower temperature, making the coupling \(J\) less important.
To simulate this model, we generally simulate sequences of states (i.e., by doing spin flips) using algorithms like Glauber dynamics or Metropolis-Hastings (since we only know the proportionality, not the actual probabilities themselves). To use Glauber dynamics, at every step we pick a random spin \(s_i\), check the change in the energy of the system if it flipped, and accept the change with probability \(\frac{1}{1 + \exp(\Delta E / T)}\). The Metropolis-Hastings algorithm and Glauber dynamics both have similar properties under certain assumptions, so it doesn’t matter which we choose.
The macroscopic property that we care about is the average magnetization (mean of the spins). This is an “order parameter” in the sense that it tells us how much order there is in the system.
Naively, it seems that the symmetry of the problem would imply that the magnetization must be \(0\). However, it is noticed that this is not the case at low enough temperatures for large enough (practically infinite) lattices with, for instance, a 2D grid. Let’s first look at what happens at very high and very low temperatures. At high temperatures, the thermal noise overwhelms the coupling energy, and aligning spins together has barely any benefit. So on average, there are as many up spins as down spins, and the average magnetization becomes \(0\). At low temperatures, something much more interesting happens. Not correctly aligning with the neighbors has a huge cost, so spins strongly prefer to be aligned, forming large “domains” of the same spin. Practically, the system spontaneously breaks symmetry to get into a super-aligned state like this. We generally think of this as happening under an “infinitesimal” magnetic field \(h\) (which dictates which spin dominates) as we cross the critical temperature, after we take the limit as the number of spins goes to \(\infty\) (the order of these limits is important).

Figure 13: Fig 4B - Ising spin fields above/at/below T_c
A more rigorous explanation is the following. As long as \(N\) (the number of spins) is finite, the energy barrier needed to go from \(M_0\) to \(-M_0\) is finite, so after a long enough time, the system will “tunnel” through this energy barrier. However, as \(N\) becomes larger (and in the limit as \(N\) goes to \(\infty\)), this becomes infeasible. This is also an instance of what’s called ergodicity breaking - i.e., the time average (how it is realized physically) is not the same as the ensemble average (how we think about averages in math).
The most surprising part about this is that the transition during cool-down (or heating) happens at a sharp critical temperature called the Curie temperature \(T_c\). For a 2D square lattice, this temperature is \(\frac{2J}{k_B \log (1 + \sqrt{2})}\) (we will not derive this, as it is quite involved). The reason why this transition happens is that as we reduce temperature beyond the critical temperature, the correlation length (i.e., scale of the length after which the correlation of spins reduces by a certain factor, say half) diverges - to be precise, far from \(T_c\), the correlation function at a distance is exponential, while at \(T_c\), the correlation function becomes a power law (and hence the correlation length diverges).
For \(T > T_c\), there is only one equilibrium state (\(\langle M \rangle = 0\)), but for \(T < T_c\) there are two stable equilibrium states - \(\pm M_0\) (and \(0\) becomes unstable). This is a change in the number of stable states as a parameter is varied. Mean-field yields a pitchfork bifurcation, and in the lattice model this corresponds to a phase transition at \(T_c\).
It is worth noting that this does not happen for one dimension. An intuitive reason is as follows - in one dimension, the energy required to flip a segment is constant, and so is an easily breakable barrier (and is feasible at all temperatures), but in two dimensions, the energy required to flip a domain is proportional to the perimeter of the domain, which grows as the domain grows.
There are also ways to reason about it, e.g., using mean-field theory methods to replace the neighbor average by the average magnetization, but due to errors with this approximation, it gets the constant factor in the critical temperature and the critical exponents wrong. Additionally, it also gets the qualitative behavior for one dimension wrong.
It’s still instructive, though. Let the spin \(s_i = M + \delta s_i\). Plugging this in, and using the approximation \(\delta s_i \delta s_j = 0\), we get the energy term as \(NqJM^2/2 - (h + JqM)\sum_{i} s_i\), where \(q = 4\) is the coordination number (number of neighbors). Using the “self-consistency” equation \(M = \langle s_i \rangle\) and doing some algebra, we get \(M = \tanh(JqM / T)\). One obvious solution to this is \(M = 0\), which is indeed the only (and stable) equilibrium when \(T\) is large, i.e., \(> Jq\) (because the slope of the tanh at the origin is lower than \(1\)), and is an unstable equilibrium for the other case. When \(T < Jq\), there are two other solutions to this equation: \(\pm M_0\). (Note here that the constant factor is incorrect, but we will live with that).
To figure out the stability of these points, we can do two things - either use the physics definition of free energy and do an asymptotic expansion in \(M\) and check the double derivative; or perturb the system at a microscopic level. We’ll go down the simulation path and consider what happens when we simulate the Glauber dynamics. Let’s say that in time \(dt\), a fraction \(\alpha dt\) of the spins are considered by the algorithm. The change in the total \(+1\) spins would be \(dt(\alpha N_{-1} P(-1\text{ flips}) - \alpha N_{+1} P(1\text{ flips}))\). Writing this in terms of magnetization and computing the energy barriers for each flip and simplifying, we get \(\dot{M} = \alpha(\tanh(JqM/T) - M)\). To check whether this is a stable equilibrium, we need to only perturb \(M\) and see if it comes back (which is equivalent to checking the double derivative of \(M\)). A simple calculation confirms that below \(T = Jq\), \(\pm M_0\) are stable equilibria and \(0\) is an unstable equilibrium, and above it, \(0\) is the only equilibrium and is stable.

Figure 14: Fig 4C - Magnetization and susceptibility vs temperature

Figure 15: Fig 4D - Hysteresis loop M(h) at T < T_c
Now consider the effect of an external field \(h\) at a low temperature (\(T < T_c\)).
- Start with a strong negative field, \(h \ll 0\). The system is in a state with \(M \approx -1\).
- Slowly increase \(h\) to 0 and then into positive values. Because the spins are strongly coupled, there is an energy barrier to flipping from the all-down state to the all-up state. The system can remain “stuck” in the \(M \approx -1\) state even for small positive \(h\). This is a metastable state. It is a local, but not global, energy minimum.
- As \(h\) increases further, the energy benefit of aligning with the field eventually overcomes the coupling barrier, and the system undergoes a rapid transition to the \(M \approx +1\) state.
- If we now reverse the process and decrease \(h\), the system will remain at \(M \approx +1\) even for small negative \(h\), until it abruptly flips back to \(M \approx -1\).
Plotting \(M\) as a function of \(h\) as we sweep \(h\) from negative to positive and back gives us what we call a hysteresis loop. The state of the system depends not only on the current value of \(h\), but also on its history. This “memory” is characteristic of systems with multiple stable states and the energy barriers between them. Note that this happens due to the system remaining in a metastable local minimum - if we have finite volume and an infinitely slow sweep, then the loop area reduces to zero as opposed to a large volume with a fast sweep (since it takes time to cross the energy barrier).
The discontinuous jump of magnetization at \(h = 0\) (at equilibrium below the critical temperature) is an example of first-order phase transition (as a function of \(h\) at fixed \(T < T_c\)), while the transition at we cross the critical temperature is called a second-order phase transition (this is just notation for which order derivative of the free energy changes discontinuously on these transitions).
Schelling’s model of segregation
Schelling’s model applies the logic of thresholds and spatial interactions to a social phenomenon: residential segregation. It turns out that extreme macroscopic segregation can arise from the collective actions of individuals who have only mild preferences for living near others of their own type.
The state here is a 2D grid (like a chessboard, which is why it’s also called Schelling’s chessboard model) populated by two kinds of residents (red and blue) and some empty cells. Each resident has a threshold \(\tau\) such that they’re happy if the fraction of neighbors of the same type is at least \(\tau\).
We again simulate this model - pick a random unhappy resident, move them to a randomly chosen vacant cell on the grid, and repeat until all residents are happy.
The way we’ll track the state would be through the “segregation index” which is the mean fraction of same-type neighbors over all residents. A larger value means more segregation.
Note that once a significant amount of unhappy residents start moving to places they’re happy at, this will lead to feedback loops with the neighborhoods they left being less attractive for their own type and the ones they join more attractive for their own type. So when there is low tolerance, the system self-organizes into highly segregated clusters.
The interesting part is the sharpness of this transition. There is a critical value of \(\tau\) where the system’s behavior flips. Even a seemingly small threshold (depending on occupancy, neighborhood and the move rule) can be enough to tip a mixed society into a highly segregated one. The global pattern of segregation ends up being far more severe than any individual’s stated preference.
Schelling’s model can be formally connected to the Ising model (and this is left as a straightforward exercise for the reader). We can define an “unhappiness” function that is analogous to the energy or Hamiltonian. A state has lower energy if fewer residents are unhappy. The dynamic of residents moving to reduce their unhappiness is a form of energy minimization. The tolerance threshold \(\tau\) plays a role similar to that of temperature or the coupling constant, controlling the “strength” of the preference for alignment. Unfortunately, though, a good formula for this transition threshold is not known for this problem, but simulations do show sharp transitions.

Figure 16: Fig 4E - Schelling segregation vs tolerance tau
Reinforcement, herding, and heavy tails
Without any coupling or special structure, repeated random fluctuations tend to average out: independent additive noise plus bounded feedback gives thin tails and concentration (central limit theorem, Hoeffding bounds). The moment we introduce positive feedback that strengthens what already happened, independence breaks, and large deviations become far more likely. We’ll therefore discuss reinforcement and herding in this section, and some of their characteristics: rich-get-richer dynamics, path dependence, lock-in, and heavy-tailed statistics.
We will look at three archetypes:
- Polya-type reinforcement and the Kirman-Folmer herding model (path dependence and bistability)
- The sandpile model (self-organized criticality and power-law avalanches)
- Multiplicative growth and Kesten processes (how random multiplicative amplification creates Pareto tails)
We will also contrast when to expect concentration (independence, boundedness) vs fat tails (reinforcement, multiplicative cascades, criticality).
Polya’s urn: reinforcement and path dependence
The state consists of \(R_t\) and \(B_t\) - the number of red and blue balls at time \(t\). The total is \(N_t = R_t + B_t\).
At each step, we draw one ball uniformly at random, observe its color, return it to the urn, and add another ball of the same color. If red is drawn, \((R_{t + 1}, B_{t + 1}) = (R_t + 1, B_t)\); if blue, \((R_{t + 1}, B_{t + 1}) = (R_t, B_t + 1)\). There is clear positive feedback here - the probability of drawing red is \(R_t/N_t\). A red draw increases \(R\), which increases the probability of drawing red next time.
A useful quantity in this case is the fraction of red balls \(p_t = R_t/N_t\). A short calculation shows that \(p_t\) is a martingale:
\[ \mathbb{E}[p_{t+1} \mid p_t] = \frac{R_t}{N_t}\cdot \frac{R_t+1}{N_t+1} + \frac{B_t}{N_t}\cdot \frac{R_t}{N_t+1} = \frac{R_t}{N_t} = p_t \]
Since \(p_t \in [0, 1]\) and is a bounded martingale, it converges almost surely to a limiting distribution \(p_\infty\).
What is the distribution of \(p_\infty\)? If we start with \(R_0 = a\) and \(B_0 = b\), it can be shown that \(p_\infty \sim \text{Beta}(a, b)\). Similarly, for a fixed time step \(n\), the number of red draws is \(\text{Beta-Binomial}(n; a, b)\).
This shows some interesting macroscopic behavior. From the same initial condition \((a, b)\), different realizations end up with different final proportions, rather than converging to the same value, as the central limit theorem would have suggested in the no-reinforcement case. So, there is a sense of path dependence here that affects the final state. Also, the distribution \(p_\infty\) is quite broad (but not heavy-tailed - the support is \([0, 1]\), as it should be for a fraction). Proportional reinforcement makes early shocks have persistent effects, and this also shows up in heavy tails once we move to unbounded quantities.

Figure 17: Fig 5A - Polya urn sample paths (p_t trajectories)

Figure 18: Fig 5B - Terminal p_inf histogram (Beta)
A similar idea called preferential attachment can be found in network growth, and leads to heavy tails. If each new node attaches to existing nodes with probability proportional to their current degree (this is called the Barabási-Albert model), this is another instantiation of “rich get richer”. We can show that the resulting degree distribution has a power-law tail \(P(\text{deg} \ge k) \sim k^{-2}\) (or \(P(\text{deg} = k) \sim k^{-3}\)) (Proof sketch: consider the rate of change of “continuous” degree of a node that came in at time \(T\), at a later time \(t\), and integrate for the degree at time \(t > T\). Consider the equation degree \(> k\) and then solve for the maximum introduction time for a vertex of degree \(> k\), which is also the number of nodes added up until that point, and it ends up being the required power law in \(k\)). The coupling “probability \(\propto\) current size” is the same reinforcement as we’ve seen here. For this model specifically, we also see a first-movers advantage - older nodes generally have more edges.

Figure 19: Fig 5C - Preferential attachment degree tail (log-log)
Kirman-Folmer herding: bistability
This is a simple continuous-time herding model for \(N\) agents choosing between two options (e.g., two food sources for ants).
Let \(i\) be the number of agents in option A; then \(N-i\) are in option B. The state here will be \(i\). The local rule is that an agent can switch spontaneously (idiosyncratic switching) at rate \(\varepsilon\), or be attracted by the number already in the other option (herding) at rate proportional to that number with coefficient \(h > 0\). The transition rates for \(i\) are \(q_{i \to i + 1} = \varepsilon (N - i) + h i (N - i)\) and \(q_{i \to i - 1} = \varepsilon i + h i (N - i)\), where \(q_{i \to i + 1}\) means “a B joins A”, and \(q_{i \to i - 1}\) means “an A joins B”.
This defines an irreducible birth-death Markov chain on \(\{0, 1, \dots, N\}\). The stationary distribution \(\pi\) can be found by solving the equilibrium constraints - \(\pi(i) q_{i \to i + 1} = \pi(i + 1) q_{i + 1 \to i}\). Solving the resulting recursion gives a Beta-Binomial law:
\[ \pi(i) \propto \binom{N}{i} \frac{\Gamma(\alpha + i) \Gamma(\alpha + N - i)}{\Gamma(\alpha)^2} \]
Where \(\alpha = \frac{\varepsilon}{h}\).
In the large-\(N\) limit, the proportion \(x = i / N\) has a \(\text{Beta}(\alpha, \alpha)\) stationary density.
If \(\alpha\) is large (idiosyncratic switching dominates), the density is unimodal around \(x = \frac{1}{2}\). The population splits roughly evenly. And if \(\alpha\) is small (herding dominates), the density becomes U-shaped and the chain spends long periods near the extremes \(x \approx 0\) or \(x \approx 1\), with intermittent switches between them. This is bistability with stochastic switching.

Figure 20: Fig 5D - Kirman stationary density Beta(alpha, alpha)
This model makes explicit how a simple herding term changes the stationary distribution from concentrated to bimodal and heavy at the boundaries (for the proportion). It is also a mean-field thing like threshold and Ising ideas from earlier, but with inherently stochastic switching.
Sandpiles: self-organized criticality and avalanches
The sandpile model is a simple example (but extremely rich, related to the chip-firing game and surprisingly many areas of math) of self-organized criticality (SOC): slow driving plus threshold dynamics and conservation push the system to a critical state without external fine-tuning (i.e., no parameter like temperature needs to be added to exhibit criticality). At criticality, responses (“avalanches”) have no characteristic size; and their distribution follows a power law.
Let’s consider a 2D square grid. Each site \(i\) has a non-negative integer height \(z_i\). We add one grain at a uniformly random site \(i\). After we’ve added a grain, we check each site for the possibility of toppling: if \(z_j \ge 4\), then the site \(j\) “topples” (\(z_j \leftarrow z_j - 4\)) and each of its neighbors receive a \(+1\). If there are no neighbors (e.g., at the boundaries), the grains drop off the grid. We repeat this until there are no sites that can be toppled. The topplings triggered by the addition of a grain is called an avalanche, the avalanche size is the size of this set (or the total number of topplings, both have scale-free statistics but usually with different power laws).
There are two structural properties to note here: far from the boundaries, toppling only redistributes grains. The slow addition increases the total mass until dissipation events balance it on average. The presence of these boundaries is important, as the power-law distribution of avalanches only holds up to a cutoff scale that is dependent on the system size. The final stable configuration after an addition also does not depend on the order of topplings (which is why this is also called an Abelian sandpile model).
After an initial transient trajectory, the system reaches a stationary ensemble of “recurrent” minimally stable configurations (in the Markov chain sense) - all reachable from one another - in this specific case, this is a deep result connected to the grid shape (in graph theoretic terms) through the Kirchhoff matrix-tree theorem creating a bijection between the recurrent states and the spanning trees of the graph. In this SOC state, the avalanche size distribution obeys a power law \(P(s \ge S) \sim C S^{-(\tau-1)}\) or \(P(s = S) \sim C’ S^{-\tau}\) with an exponent \(\tau > 1\) that depends on the dimension. Deriving this requires a lot of theory, so we will omit trying to derive power laws.

Figure 21: Fig 5E - Sandpile avalanche sizes (PDF/CCDF, log-log)
Why would power laws arise? A branching-process heuristic (that gives the wrong answer, but helps build intuition) would be the following: think of a toppling as “creating” new potential topplings among its neighbors. Let \(m\) be the average number of new topplings caused by one toppling. If \(m<1\), avalanches die out quickly (exponential tail); if \(m>1\), they blow up. The slow drive plus conservation (local count of grains) tunes the system to \(m\approx 1\), i.e., criticality. We can roughly think of it as being connected to a metric such as the average height - if the average height is too low, the avalanche lengths would be too small and dissipation would be pretty slow, so grains will accumulate, and if it is too high, dissipation will be faster than accumulation - so there is some sort of negative feedback here that brings such a metric to an equilibrium value. For a critical Galton-Watson process (a type of branching process) with finite variance, the total progeny has a tail \(P(s = S) \sim c S^{-3/2}\) The sandpile is not exactly a branching process (due to spatial correlations developing in the sandpile sites and looping, that is not possible in a branching process), but this roughly approximates the mechanism: at criticality, there is no characteristic scale, hence power laws.
What is new here compared to earlier threshold models is that no parameter is tuned to a critical value externally. This behavior is also universal (for more reading, see universality classes); the critical exponents like \(\tau\) are robust and do not depend on microscopic details like the exact toppling threshold, but rather on general properties like the system’s dimension. The combination of slow input and boundary dissipation self-organizes the system to criticality (but it depends on timescale separation - the addition of grains is slow enough that the avalanches always finish before the next grain is added). The result is heavy-tailed fluctuations in response to tiny perturbations.
Multiplicative growth and Kesten tails
A second, probably much more common way to get to heavy tails is multiplicative amplification. Consider the linear stochastic recurrence (Kesten process):
\[ X_{t+1} = A_t X_t + B_t \]
where \(\{(A_t, B_t)\}\) are i.i.d., and \(A_t \ge 0\), \(B_t \ge 0\), and \(X_t \ge 0\). Under mild conditions (e.g., \(\mathbb{E}[\log A_t] < 0\) so that the process does not explode on average, \(P(A_t > 1) > 0\), so occasional amplifications occur, \(P(B_t > 0) > 0\), aperiodicity, and \(B_t\) has light tails), there exists a unique stationary distribution. Its upper tail is a power law:
\[ \lim_{x \to \infty} x^\kappa P(X > x) = C \in (0, \infty) \]
where the tail index \(\kappa > 0\) is the unique solution to \(\mathbb{E}\left[A_t^{\kappa}\right] = 1\).
This is the Kesten-Goldie theorem (we omit the proof due to length). The intuition is that rare runs in which \(A_t\) is greater than 1 produce very large values of \(X\) - the probability of such runs decays geometrically, and the balance of these probabilities with the multiplicative growth produces a power law.

Figure 22: Fig 5F - Kesten process tail check (slope -kappa)
There are a lot of examples of such behaviors in economics: if wealth grows roughly multiplicatively with random shocks, with small additive floors (\(B_t\)), stationary wealth distributions display Pareto tails. Similarly, multiplicative cascades in volatility models (e.g., stochastic volatility) can produce heavy-tailed returns. Connecting this to preferential attachment and Yule processes, expected degree grows proportionally to current degree; the induced multiplicative advantage yields heavy-tailed degree distributions. This is a discrete analogue of Kesten growth.
Note the contrast with additive models: for \(X_{t+1} = X_t + \xi_t\) with i.i.d. \(\xi_t\) having finite variance, stationary fluctuations (if any) are thin-tailed (Gaussian-like), and large deviations are exponentially rare. So, multiplicative amplification changes the picture significantly even qualitatively.
Swarms and distributed coordination
We now turn to distributed alignment: how local rules that couple “headings” or velocities generate coherent large-scale motion. We will look at two canonical models and the coarse-grained picture they lead to. As before, keep track of: state variables, local update, coupling/topology, noise, and macroscopic summaries (order parameters, conserved quantities, convergence criteria).
Vicsek alignment: headings, noise, and a mean-field self-consistency
Let’s look at the state and the local rule. Agents \(i = 1, \dots, N\) have positions \(x_i(t) \in \mathbb{R}^2\) and “headings” (directions they’re heading towards) \(\theta_i(t) \in (-\pi, \pi]\). All of them move with a fixed speed \(v > 0\). With “metric” interactions (radius \(r\)), at discrete time step \(\Delta t\), the dynamics are as follows:
\[ \theta_i(t + \Delta t) = \text{arg} \left(\sum_{j: \|x_j(t) - x_i(t)\| \le r} e^{i \theta_j(t)}\right) + \eta \xi_i(t) \]
where \(\xi_i(t)\) are i.i.d. zero-mean. Positions update by
\[ x_i(t + \Delta t) = x_i(t) + v \Delta t (\cos \theta_i(t + \Delta t), \sin \theta_i(t + \Delta t)) \]
We define the “order parameter” as
\[ \phi(t) = \frac{1}{N} \left\| \sum_{i=1}^N e^{i\theta_i(t)} \right\|\in[0,1]. \]
Randomly distributed headings give \(\phi \approx 0\) and near-perfect alignment of directions gives \(\phi \approx 1\). The dynamics are rotationally symmetric; in an ordered phase the symmetry is broken by selecting a macroscopic heading, similar to what we saw in the Ising model.

Figure 23: Fig 6A - Vicsek order parameter vs noise/density
Consider the mean-field (well-mixed, continuous-time) limit. At high density (mean neighbor count \(m\approx \rho\pi r^2\), with \(\rho=N/L^2\) on a periodic square of side \(L\)) and small \(\Delta t\), one can pass to a drift-diffusion in angle (use a partial change in \(\theta\) with an interpolation fraction \(\alpha\) between the initial and final angle, set \(\alpha = K \Delta t\) and \(\eta = \sqrt{2D \Delta t}\), and use \(\sin\) to approximate the “unwrapping” of deviations from \(\psi\) to first order which can also be interpreted as an Ising-type model with a cosine-in-angle-diff potential):
\[ d\theta_i = K r(t) \sin(\psi(t) - \theta_i) dt + \sqrt{2D} dW_i(t) \]
where \(r e^{i\psi} = \frac{1}{N} \sum_j e^{i \theta_j}\), \(K > 0\) scales with alignment strength times the mean neighbor count, and \(D\) is an angular diffusivity induced by noise (\(D\) increases with \(\eta\)). Let \(f(\theta,t)\) be the density on the circle. Using the Fokker-Planck equation, we get that
\[ \partial_t f = D\partial_\theta^2 f + \partial_\theta \left( K r \sin(\theta-\psi) f \right) \]
In a steady state with fixed \(\psi\) (we can without loss of generality set \(\psi=0\)), integrating once yields \(Df’(\theta) + K r \sin \theta f(\theta) = 0\) which gives \(f(\theta) = \frac{1}{2\pi I_0(\kappa)} e^{\kappa\cos\theta}\) where \(\kappa = \frac{K r}{D}\). This is the von Mises distribution (the angular equivalent of the normal distribution in the sense of the central limit theorem).
The order parameter then satisfies the following equation, derived from self-consistency:
\[ r = \int_{-\pi}^{\pi} \cos \theta f(\theta) d\theta = \frac{I_1(\kappa)}{I_0(\kappa)} = \frac{I_1 \left(\frac{K}{D} r \right)}{I_0 \left(\frac{K}{D} r \right)} \]
Expanding \(I_1/I_0\) at a small argument \(r\) gives \(r \approx \frac{K}{2D} r - \frac{1}{16}(\frac{K}{D})^3 r^3 + \cdots\). When \(K > K_c = 2D\), we see a bifurcation (two real solutions).
We can interpret this as the density and noise having a tradeoff, since \(K\) grows with the mean neighbor count \(m\) and \(D\) with \(\eta\).
There are some caveats to this solution (as with all mean-field solutions) though. The mean-field argument ignores the coupling between density and order created by self-propulsion. In the metric Vicsek model at finite speed and size, we observe traveling high-density (and high-order) bands and a weakly first-order transition with hysteresis. Still, the mean-field picture captures a bunch of important things - the existence of a disorder-order transition, von Mises stationary angles deep in the ordered phase, and the density-noise tradeoff.

Figure 24: Fig 6B - Vicsek snapshots: disordered gas vs aligned band
This model specifically has been studied a lot for understanding collective motion, and there exist theories that treat this model in both a hydrodynamics context as well as for arbitrary particle densities, and study the dynamics of this system.
Cucker-Smale: continuous-time velocity alignment and convergence
In this model, agents \(x_i(t)\in\mathbb{R}^d\), \(v_i(t)\in\mathbb{R}^d\) follow \(\dot{x}_i = v_i\), \(\dot{v}_i = \frac{1}{N} \sum_{j=1}^N \psi(\|x_i - x_j\|)(v_j - v_i)\) with a non-increasing influence kernel, e.g., \(\psi( r) = \frac{K}{(1 + r^2)^\beta}\) with \(K > 0\) and \(\beta \ge 0\) - the velocities try to align with that of the remaining agents.
Let’s look at some invariants and potential monovariants. By summing over the acceleration equations, the mean velocity is conserved. Consider the variance in velocity (or, kinetic energy in the center-of-mass frame, which we expect to reduce due to the dissipative force terms) \(\mathcal{V}(t) = \frac{1}{2N} \sum_i \|v_i-v_c\|^2 = \frac{1}{4N^2} \sum_{i,j} \|v_i-v_j\|^2\). A simple calculation gives \(\dot{\mathcal{V}}(t) = -\frac{1}{2N^2} \sum_{i,j} \psi(\|x_i-x_j\|) \|v_i-v_j\|^2 \le -2 \psi(R(t)) \mathcal{V}(t)\) where \(R(t)=\max_{i,j}\|x_i-x_j\|\). Hence \(\mathcal{V}(t) \le \mathcal{V}(0) \exp \left(-2 \int_0^t \psi(R(s)) ds \right)\), which bounds the variance as time grows.

Figure 25: Fig 6C - Cucker-Smale velocity variance convergence
Let \(U(t) = \max_{i, j} \|v_i(t) - v_j(t)\|\) be the velocity spread. Taking the maximizing pair and using the acceleration definition, we get \(\dot{R}(t) \le U(t)\) and \(\dot{U}(t) \le -\psi(R(t)) U(t)\). So if we define \(L(t) = U(t) + \int_0^{R(t)} \psi(s) ds\), then \(\dot{L} \le 0\), which means \(L\) is non-increasing. This means that if \(\int_0^\infty \psi(s)ds = \infty\), then \(R(t)\) stays bounded and \(\mathcal{V}(t)\to 0\), i.e., velocities align to the conserved \(v_c\). When this integral is finite, then flocking still occurs when the initial spread is small enough (more specifically, \(\int_{R(0)}^\infty \psi(s) ds > U(0)\) implies \(R(t)\) bounded, because if \(R\) tried to exceed any \(R’\) with the integral from \(R(0)\) to \(R’\) being \(> U(0)\), \(L(t) \le L(0)\) would force \(U(\cdot) < 0\) which is not true).
In particular, for \(\psi( r)=K(1+r^2)^{-\beta}\) for \(\beta \le \frac{1}{2}\), velocities converge too.
We can also look at this like a graph. With weights \(w_{ij}(t)=\psi(\|x_i-x_j\|)/N\) and Laplacian \(L(t)\), we have \(\dot{v}=-L(t)v\). Decay of \(\mathcal{V}\) is controlled by the time-varying algebraic connectivity. Slowly decaying \(\psi\) means connectivity is more persistent and leads to consensus better.
It is also interesting to see what happens if we added some noise to the acceleration equations - this is left as an exercise for the reader.
Hydrodynamics and long-range order
Note that this section is worth revisiting after the section on spatial dynamics.
When we have a large number of particles, we move to a continuous system (a field) by introducing the density \(\rho(x,t)\) and mean velocity \(u(x,t)\) (“polarization”). We will consider a process that shows interesting behavior. Firstly, as is expected from matter, we have a mass-conservation equation:
\[ \partial_t \rho + \nabla \cdot(\rho u) = 0 \]
The actual process is driven by the following differential equation:
\[ \partial_t u + c (u \cdot \nabla) u = \alpha u - \beta |u|^2 u - \nabla P(\rho) + \nu \nabla^2 u + \xi \]
with constants \(c, \alpha, \beta, \nu \ge 0\), an effective pressure \(P(\rho)\) increasing in \(\rho\), and a zero-mean noise \(\xi\).
This is a special case of the Toner-Tu equation from hydrodynamics. Here, we can intuitively read off what each term does. The \(c (u \cdot \nabla) u\) term is an advection term that basically transports the flow along the velocity vector (i.e., values of \(u\) are carried downstream by the current \(u\)). The terms with \(\alpha\) and \(\beta\) form a non-linear function that provides steady state dynamics for \(u\) when \(\alpha > 0\), and kills it when \(\alpha \le 0\). The \(P(\rho)\) is a pressure that increases with \(\rho\), so high density regions push outwards (force due to “compression pressure”). The \(\nabla^2 u\) term is a Laplace smoothing term (viscosity) that finishes off small scale (high frequency) patterns in \(u\) (the easiest way to see this is to try to simulate the Euler step discretely for this - it looks like a local averaging). And the \(\xi\) term is to just simulate noise.
Let’s try to reconcile this back with the Vicsek model (discrete agents instead of the continuous fields we’ve been considering). In that model, we saw alignment gain beating random noise as the reason behind the phase transition. Here \(\alpha\) is what does that job: when \(\alpha \le 0\), we get \(u\) dying down (so no polarization), and when \(\alpha > 0\), we see that in the steady state, \(\alpha = \beta |u|^2\). Intuitively speaking, \(\alpha\) increases with how strong the alignment is and the neighborhood density, \(\nu\) increases with noise and mixing, and \(c\) is a term that controls how strongly advection is important.
Doing a small perturbation in \(\rho\) and \(u\) at the stable state (and ignoring \(\xi\)), and keeping only first order terms in the perturbations, and splitting the perturbations in \(u\) into parallel and perpendicular components, we get the following three equations:
Parallel component:
\[ \partial_t \delta u_\parallel + c |u_0| \partial_{\parallel} \delta u_\parallel = - P’(\rho_0) \partial_{\parallel} \delta \rho + \nu \nabla^2 \delta u_\parallel - 2 \beta |u_0|^2 \delta u_\parallel \]
Perpendicular component:
\[ \partial_t \delta u_\perp + c|u_0| \partial_{\parallel} \delta u_\perp = - P’(\rho_0) \nabla_\perp \delta \rho + \nu \nabla^2 \delta u_\perp \]
And the mass conservation (continuity) equation:
\[ \partial_t \delta\rho + |u_0| \partial_{\parallel} \delta\rho + \rho_0(\partial_{\parallel} \delta u_\parallel + \nabla_\perp \cdot \delta u_\perp) = 0 \]
There is an advection term here as usual, only the parallel component has a restoring force, and the Laplace smoothing term damps both components. So for \(\alpha > 0\) and \(k > 0\) this is a stable equilibrium. It might be worth looking at some specific instantiations of these equations to get a better feel for what is happening here (e.g., a periodic box and applying Fourier to this equation).
Let’s think about how bands and hysteresis show up here. In particle models with metric neighborhoods, alignment effectively strengthens with local density. Then a small density bump slightly increases the local effective \(\alpha\), which increases \(|u|\), which advects more mass forward, which further steepens the front. Pressure and viscosity oppose this but, near onset, the positive feedback can balance dissipation into a moving profile that connects a low-density disordered “gas” to a high-density ordered “liquid”. Coexistence implies hysteresis: ordered bands persist above the disordering noise on the up-sweep and disordered gas persists below on the down-sweep. This matches our observations from the metric Vicsek settings - mean-field predicts the existence of an order-disorder threshold but misses the coexistence window.
Selection dynamics
We now look at selection, i.e., what happens when “types” of things (e.g., species) reproduce or get copied in proportion to how well they do. We’ll see that the same set of ideas works here too. We will primarily look at the replicator equation, which is a simple model of how relative advantage can lead to changes in population composition.
Replicator dynamics
Let \(p(t) = (p_1, p_2, \dots, p_n)\) be the population composition of \(n\) types, with \(p_i \ge 0\) and \(\sum_i p_i = 1\). Let each type’s fitness be \(f_i(p)\). The replicator ODE is
\[\dot{p}_i = p_i (f_i(p) - \bar{f}(p))\]
where \(\bar{f}(p) = \sum_{k = 1}^n p_k f_k(p)\). This is derived from the population counts growing as \(\dot{N}_i = f_i(p) N_i\).
There are two immediate things to notice: \(p\) always remains in the probability simplex (i.e., when \(p_i = 0\), it stays at zero, and the sum of \(p_i\) is conserved), and the log difference between the probabilities is an integral of the difference between utilities.
Let’s first consider the special case when there are only two types A and B. The equation simplifies to \(\dot p = p (1 - p) \Delta f(p)\) where \(p\) is the population fraction for A, and \(\Delta\) is the fitness difference between \(A\) and \(B\). If this difference is constant, this becomes the logistic growth case as we saw earlier. If payoffs depend linearly on \(p\), there are fixed points at \(0\), \(1\) and at another interior point that depends on these coefficients. The interior point is stable iff it is an anti-coordination game, and \(0\) and \(1\) are stable if and only if it is a coordination game.

Figure 26: Fig 7A - 2-by-2 replicator portraits
Let’s now go back to the \(n\) dimensional case. Firstly, consider a constant \(f\) for the sake of simplicity. Then we get \(\dot{\bar{f}} = \text{Var}_p(f) \ge 0\), which is a good monovariant (in the context of dynamical systems, such a function that helps us prove convergence to an equilibrium is called a Lyapunov function). Note that \(\bar{f}\) is precisely the function whose (standard Euclidean) gradient is \(f\). This motivates a more general observation: suppose \(f\) is the gradient of a function \(\Phi\). Then simple algebra shows that \(\dot{\Phi} = \text{Var}_p (\nabla \Phi) \ge 0\), so replicator dynamics do an ascent on \(\Phi\) (the geometry under which they do this ascent is defined by the Fisher information metric).

Figure 27: Fig 7B - Replicator ascent on a potential
There is another Lyapunov function that is useful here - the KL divergence between the equilibrium probabilities and the perturbed probabilities. Consider the problem where the fitness function vector is \(Ap\). Then it can be shown that an evolutionarily stable strategy is a stable equilibrium - the derivative of the KL divergence is \(u(p, p) - u(p^*, p)\), where \(u\) is the symmetric payoff function (in game theoretic terms) and \(u(p, q) = p^T A q\), so perturbations are stable due to the definition of an evolutionarily stable strategy.
Replicator-mutator dynamics
Selection generally does not operate in isolation. Mutation/innovation keeps types from dying out completely. Modeling mutation as a (column-stochastic) matrix \(Q\) such that \(Q_{ji}\) denotes the probability that offspring of \(j\) has type \(i\), we have \(\dot{p}_i = \sum_j p_j f_j(p) Q_{ji} - \bar{f}(p) p_i\) (and in matrix form, \(\dot{p} = Q^T F(p) p - \bar{f}(p) p\) where \(F(p) = \text{diag}(f(p))\).
If \(Q\) has positive entries (and is hence ergodic) and \(f\) is bounded below, then there is an attractor in the interior of the simplex and it balances selection and mutation. This can be shown by considering the equations at the boundaries of the simplex.
Let’s try a different type of mutator dynamics. If we, as before, consider a potential setting \(f = \nabla \Phi\), with \(\dot{p}_i = p_i (f_i - \bar{f}) + \tau(1/n - p_i)\) (uniform mutations independent of fitness), we get ascent (under the same geometry) of \(\Phi + \tau/n \sum_{i = 1}^n \log p_i\). The proof is simple: the time derivative of this value is \(\sum_i p_i \left(f_i - \bar{f} + \frac{\tau}{np_i} - \tau\right)^2 \ge 0\). Note that \(\tau/n \sum_{i = 1}^n \log p_i\) is proportional to the negative cross entropy of \(p_i\) under a ground truth of the uniform distribution - so the maximizer tries to also push \(p_i\) close to a uniform distribution. Under a logit mutation setup, instead of the reverse KL divergence, we would have had a forward KL divergence which makes it an entropy term instead of sum of log-probabilities.

Figure 28: Fig 7C - Replicator-mutator interior fixed point
Ricardian trade as selection of specializations
Consider two goods \(X, Y\) and two regions \(A, B\). Producing one unit of \(s \in \{X, Y\}\) in \(r \in \{A, B\}\) needs \(a_s^r > 0\) units of labor. Each region has \(L^r\) units of labor. Let the outputs in region \(r\) be \((q_X^r, q_Y^r)\). The constraint is on the total units of labor used - \(\sum_s a_s^r q_s^r \le L^r\), \(q_s^r \ge 0\). Fix a relative price \(p = P_X / P_Y\).
It is optimal to use all labor units. Let \(s\) be the share of labor used by \(X\). Then \(q^r_X(s) = sL^r / a_X^r\) and \(q_Y^r(s) = (1 - s) L^r / a^r_Y\). The value created is proportional to \(sp/a^r_X + (1 - s)/a^r_Y\), which is affine in \(s\), so an optimal strategy is to put all labor on \(X\) if \(p \ge a^r_X/a^r_Y = \theta_r\), and we call this threshold the relative cost of \(X\) in region \(r\).
If \(\theta_A < \theta_B\), we say \(A\) has the comparative advantage in \(X\) and \(B\) has the comparative advantage in \(Y\).
When we combine the outputs of the two regions, and figure out the efficient boundary, i.e., how to maximize \(Q_Y\) for a given value of \(Q_X\), it becomes a piecewise linear function with a break at the point where \(A\) produces only \(X\) and \(B\) produces only \(Y\). Moving left means we reassign some of \(A\)’s labor from \(X\) to \(Y\), and moving right means we reassign some of \(B\)’s labor from \(Y\) to \(X\).
Note that whenever \(p\) is between \(\theta_A\) and \(\theta_B\), it is optimal for both regions to specialize in the goods they have a comparative advantage in, while outside of this range, it is optimal for them to specialize in the same thing.
Starting from non-optimal cases, under simple assumptions, the two-type replicator model applies to this advantage difference, and it shows how under no further restrictions, the system converges to an equilibrium that is also an optimum.
Spatial structure: patterns from local rules
So far we’ve mostly worked in well-mixed (mean-field) settings. However, a lot of interactions are spatial - things diffuse, infections spread across different regions. We saw a preview of this in the discussion on hydrodynamics earlier. The idea of transport and dispersion arises from spatial interactions, and we get to see some interesting phenomena like waves.
Reaction-diffusion and diffusion-driven instability
Let \(u(x, t)\) and \(v(x, t)\) be two concentrations that can react with each other and diffuse. A simple continuous model is: \(\partial_t u = f(u, v) + D_u \nabla^2 u\), \(\partial_t v = g(u, v) + D_v \nabla^2 v\), where \(D_u, D_v > 0\).
Let’s try to understand this in a bit more detail. The left-hand sides of the above equations are time derivatives, expressing how much the concentration changes at the current position. The right-hand sides have two terms - the first term is simply an interaction term, that shows how \(u\) and \(v\) interact (or “react”) to change the concentrations over time. The second term corresponds to the Laplacian, and it measures how much the neighborhood average of the function differs from the value at the point (consider a discrete 1D grid - \(\nabla^2 u(x_i) \approx (u_{i-1} - 2u_i + u_{i+1}) / (\Delta x)^2\)). If a point has more concentration than its neighboring points, we’ll get flow from that point to its neighbors (for more intuition, look up gradients, flux, divergence, Laplacian and Fick’s law).
Pick a homogeneous steady state \((u^*, v^*)\) of the ODE (i.e., such that the values of the reaction terms \(f\) and \(g\) are \(0\)). Let \(J\) be the Jacobian of the reactions at that point.
\[J=\begin{pmatrix} f_u & f_v\\ g_u & g_v\end{pmatrix}\Big|_{(u^*,v^*)}\]
Suppose the ODE is locally stable there (i.e., \(\text{tr} J < 0\) and \(\det J > 0\)). We find something interesting - even when the steady state is stable, adding more diffusion can create spatial instabilities that lead to stripes/spots; this is called Turing instability. We do the math below.
Let’s add a small perturbation \(u = u^* + \delta u\), \(v = v^* + \delta v\) and keep only the linear terms.
\[ \partial_t \begin{pmatrix} \delta u \\ \delta v \end{pmatrix} = \left(J + \text{diag}(D_u, D_v) \nabla^2 \right) \begin{pmatrix} \delta u \\ \delta v \end{pmatrix} \]
On a periodic box, we use the Fourier basis \(\exp(ik \cdot x)\). The Laplacian acts on them by \(\nabla^2 \exp(ik \cdot x) = -|k|^2 \exp(ik \cdot x)\). We try solutions of the form \(\begin{pmatrix} \delta u \\ \delta v \end{pmatrix} = \begin{pmatrix} U \\ V \end{pmatrix} \exp(\lambda t) \exp(ik \cdot x)\). Each spatial frequency \(k\) evolves independently. Plugging it in, we get \(\det (\lambda I - J + |k|^2 \text{diag}(D_u, D_v)) = 0\), i.e.,
\[ \lambda^2 - \lambda\left(\text{tr}J - (D_u + D_v) |k|^2 \right) + \left(\det J - |k|^2 (D_u g_v + D_v f_u) + D_u D_v |k|^4 \right) = 0 \]
Because \(\text{tr} J < 0\), the trace term gets more negative as \(|k|\) increases (we subtract \((D_u + D_v)|k|^2\)), so the trace can’t lead to an instability. The only way to get growth is for the determinant term to become negative for some \(|k| > 0\). Let \(q = |k|^2 \ge 0\) and define \(P(q) = \det J - q (D_u g_v + D_v f_u) + D_u D_v q^2\).
This is an upward-opening parabola in \(q\). We know \(P(0) = \det J > 0\) (ODE stability). For some spatial band to go unstable we need the parabola to go below zero at some positive \(q\). That happens exactly when the following three conditions hold:
- \(\text{tr} J < 0\) and \(\det J > 0\) (for ODE stability)
- \(D_u g_v + D_v f_u > 0\) (for the parabola minimum to be at a positive \(q\))
- \((D_u g_v + D_v f_u)^2 > 4 D_u D_v \det J\) (for the minimum dipping below \(0\))
The second condition puts the minimum at \(q_* = \frac{D_u g_v + D_v f_u}{2 D_u D_v} > 0\). The third condition says that the minimum value \(P(q_*) = \det J - \dfrac{(D_u g_v + D_v f_u)^2}{4 D_u D_v}\) is negative.
When these conditions hold, there is an unstable band \(q \in (q_-, q_+)\) where one eigenvalue \(\lambda(q)\) has positive real part.
The intuition is the following - the reaction part is stabilizing at zero wavenumber (\(\text{tr} J < 0\) \(\det J > 0\)), but the diffusion part tilts the balance differently for different spatial perturbation. For instance, if the species that suppresses variations spreads faster than the species that amplifies them, then small bumps grow into a regular pattern - a fast “inhibitor” and a slow “activator”.
Let’s consider a specific minimal example for chemical kinetics that shows these instabilities - the Schnakenberg model, which considers the simultaneous reactions \(X \rightleftharpoons A\), \(B \to Y\) and \(2X + Y \to 3X\). Upon eliminating concentrations of \(A\) and \(B\), we get a system that looks like the following:
Let \(f(u, v) = a - u + u^2 v\), \(g(u, v) = b - u^2 v\) where \(a, b > 0\).
The homogeneous steady state is \(u^* = a + b\), \(v^* = \frac{b}{(a + b)^2}\). We then have \(f_u = \frac{b - a}{a + b}\), \(f_v = (a + b)^2\), \(g_u = \frac{-2b}{a + b}\) and \(g_v = -(a + b)^2\).
Pick, for instance, \(a=0.2,\ b=1.3\). Then \(u^* = 1.5\), \(v^* \approx 0.578\), \(f_u \approx 0.733\), \(f_v = 2.25\), \(g_u \approx -1.733\), \(g_v = -2.25\).
The ODE is stable because \(\text{tr} J = f_u + g_v \approx -1.517 < 0\), \(\det J \approx 2.25 > 0\).
Now let diffusion coefficients be \(D_u=1\) (slow activator) and \(D_v=30\) (fast inhibitor). Then \(D_u g_v + D_v f_u \approx -2.25 + 30 \cdot 0.733 \approx 19.75 > 0\), \((D_u g_v + D_v f_u)^2 - 4 D_u D_v \det J \approx 120.06 > 0\), so a Turing instability exists. The unstable band in \(q = |k|^2\) is \(q_\pm = \frac{D_u g_v + D_v f_u \pm \sqrt{(D_u g_v + D_v f_u)^2 - 4 D_u D_v \det J}}{2 D_u D_v}\) which is \(0.147\) to \(0.512\).

Figure 29: Fig 8A - Schnakenberg dispersion relation: unstable band

Figure 30: Fig 8B - Turing patterns (spots/stripes)
In a simulation, on a periodic square with side length much larger than \(\ell_*\), small random noise around \((u^*, v^*)\) gets amplified into spots/stripes with a typical spacing determined by the wavenumber.
Traveling fronts: Fisher-KPP
Another spatial phenomenon is invasion - a stable state spreads into an unstable state as a moving “front”. The simple model is the Fisher-KPP equation (which was proposed to describe the spatial spread of an advantageous allele):
\[ \partial_t u = D \nabla^2 u + r u \left(1 - \frac{u}{K} \right) \]
Here \(u=0\) is unstable (due to the positive growth \(ru\)), and \(u=K\) is stable - note that the reaction term is just the logistic growth applied pointwise in space.
A front is a profile that keeps its shape while translating, e.g., in 1D we write \(u(x, t) = U(z)\) where \(z = x - ct\) - here \(U\) is the “shape”, and the whole picture moves to the right with speed \(c > 0\) (so if we ride along with the wave at speed \(c\), the profile looks frozen).
We look for a wave that connects the two homogeneous states, the “invaded” state \(u = K\) behind the front and the “empty” state \(u = 0\). The biologically relevant fronts are the monotone decreasing ones, i.e., the density drops from \(K\) to \(0\) as you move in the direction the front travels.
Far ahead of the front, \(U\) is very small, so the non-linear term is negligible, and the equation linearizes to \(-cU’ \approx DU’’ + rU\). Trying an exponential \(\exp(-\lambda z)\) with positive \(\lambda\) gives us \(D \lambda^2 - c \lambda + r = 0\) (which is a kind of dispersion relation).
For real \(\lambda\) to exist, the quadratic must have a non-negative discriminant, so there is a minimal speed \(c \ge 2 \sqrt{Dr}\) that the wave must move at, in order to not have oscillatory solutions.
Let’s think about it from a physical perspective. Consider what happens when a tiny population suddenly arrives at the origin. Two things happen at once - diffusion, which spreads it like a Gaussian blob of width \(\sim \sqrt{2Dt}\), and exponential growth \(\exp(rt)\) while its small. The combined effect of these things near the front is roughly \(u(x, t) \approx \exp(rt) \times \exp(-x^2 / (4Dt))\), and looking at how the contours move gives us \(x(t) \approx 2 \sqrt{Dr} t\). This also explains why the linearization at \(u = 0\) controls the speed - the front is pulled by the leading edge where \(u\) is small, not by the (saturated) bulk behind it which is at \(K\).
This equation has interesting behavior when \(c < c_* = 2 \sqrt{Dr}\) and when \(c \ge c_*\). Firstly, it can be shown that the front is strictly monotone, and then considering what happens to the oscillatory front when \(c < c_*\), it can be shown that this case does not have a wave at all. For \(c \ge c_*\), it can be shown that there is a unique solution (up to translation). What happens with \(\lambda > \lambda_*\) or \(\lambda < \lambda_*\)? In the first case, it develops a front that travels at \(c_*\), and in the latter, the front travels faster at the larger speed \(D \lambda + r / \lambda\).
We now consider an example from epidemiology again, but with the SIS model this time.
Let \(I(x, t)\) be the density of infectious individuals and suppose the susceptible population density is near \(1\) initially (so early growth is approximately linear). A linearization looks like \(\partial_t I \approx D \partial_{xx} I + (\beta - \gamma) I\) where \(\beta\) is the transmission rate and \(\gamma\) is the recovery rate. This is the same as the Fisher-KPP linearization with the growth rate \(\beta - \gamma = \gamma (R_0 - 1)\). The predicted minimal speed is \(2 \sqrt{D \gamma (R_0 - 1)}\), so other things equal, higher mobility \(D\) and higher \(R_0\) both increase the speed of spread of an outbreak. non-linear saturation shapes the profile but the speed is determined by the diffusion-growth balance again.

Figure 31: Fig 8C - Fisher-KPP traveling front and speed
Other patterns
Spatial structure we saw above was a stripe/spot/wavefront pattern, but these are not the only ones we see. There is another common way to get spatial order - phase separation and coarsening. Consider how oil and water separate - if we start well-mixed and let go, droplets form, the interfaces become sharper, and over long horizons small droplets disappear and big ones become bigger. There are a couple of models that we will see that model this. As you’ll see later, these are related to the Ising/Schelling models we saw earlier - domains form because “like prefers like” and interfaces carry a cost.
The common starting point for both is an energy function. Let \(u(x, t)\) be a smooth field on a domain \(\Omega \subset \mathbb{R}^d\), where \(u = +1\) means phase A and \(u = -1\) means phase B. The energy is:
\[E[u] = \int_\Omega \left( \frac{\varepsilon^2}{2} |\nabla u|^2 + W(u) \right) dx\]
Here, \(W(u)\) is a double-well potential \((u^2 - 1)^2 / 4\) which prefers \(u = \pm 1\), and the gradient term penalizes steep spatial changes and helps define the interface width between the two phases.
Consider the similarity between this and the Ising model - since \(s_i = \pm 1\), up to a constant, the gradient term \(|\nabla u|^2\) corresponds to the interaction of neighbor spins, and the double-well in the spins that encourages the spins to the two values \(\pm 1\).
Consider the variational derivative of this energy functional - it is the function \(\delta E / \delta u\) characterized by \(d/d\theta E[u + \theta \eta] \big|_{\theta = 0} = \int_\Omega \delta E / \delta u (x) \eta(x) dx\) for all test functions \(\eta\).
Expanding and integrating the first term by parts, we get \(\frac{\delta E}{\delta u} = -\varepsilon^2 \nabla^2 u + W’(u)\). We call this the chemical potential and denote it by \(\mu\).
There are multiple ways to minimize energy. The first is the naive way \(\partial_t u = -\frac{\delta E}{\delta u} = \varepsilon^2 \nabla^2 u - W’(u)\). This is called the Allen-Cahn equation, and corresponds to a gradient flow for \(E\). \(\dot{E}\) turns out to be \(\int -|\mu|^2 dx \le 0\), so the system always moves to lower energy and only stops when \(\mu = 0\), which is a steady state. The “mass” \(\int_\Omega u dx\) is not conserved here because the \(-W’(u)\) term creates or destroys the \(\pm 1\) phases locally - the derivative of the mass is the volume integral of \(-W’\). Solving the steady state equation (setting \(\mu = 0\), multiplying by \(\partial_x u\) and integrating) gives us tanh behavior that joins \(u = -1\) to \(u = +1\) with the scale parameter being \(\sim \varepsilon\). An interesting exercise is to see how the interface moves in higher dimensions. The intuition here is that most of the energy is in the interfaces - and the Laplacian term tries to smoothen the interface. This should remind you of surface tension and it can be shown that this equation causes the interface to move with normal speed proportional to the mean curvature. Suppose domains have size \(L(t)\), then the curvature scales like \(\frac{1}{L}\), which means \(\dot{L} \sim 1/L\), and that shows that \(L(t) \sim t^{1/2}\).
Note that this first method didn’t conserve mass. So we keep the same energy, but choose dynamics that preserve \(\int u dx\). To do this, note that \(\frac{d}{dt} \int_\Omega u dx = \int_\Omega \partial_t u dx\). One way to make this \(0\) is to use the divergence theorem, and indeed by picking \(\partial_t u = \nabla^2 \mu\) (i.e., diffusion of chemical potential), we get \(\int_\Omega \partial_t u dx = \int_{\partial \Omega} \nabla \mu \cdot n dS = 0\), where the last term is zero due to no flux conditions. This is called the Cahn-Hilliard equation. Looking at \(\dot{E}\) again, it becomes \(-\int |\nabla \mu|^2 dx \le 0\) using integration by parts. So this is also a gradient flow of \(E\), but one which conserves mass.
The dynamics this time are slightly different. Starting with a noisy \(u\) with \(0\) average, the system first separates into two phases (this is called spinodal decomposition) while thin interfaces form. Then coarsening starts - small droplets disappear and big ones grow - due to energy in interfaces (“surface tension”). For a droplet of radius \(R\), curvature \(1/R\) induces a chemical potential difference of \(\sigma / R\), which creates diffusive flux from high-curvature droplets to low curvature droplets (this is called Ostwald ripening and is observed in sols, like water crystallizing into ice in old ice-cream). To quantify the rate of size of domains here, we note that the driving force here is the difference in chemical potential which is \(\sim 1/R\), the gradient across a length \(L \sim R\) is \(\sim 1/R^2\), flux is proportional to this value, and the interface speed is proportional to flux per unit jump in \(u\), so \(\dot{R} \sim J \sim 1/R^2\), which gives us \(t^{1/3}\) scaling, slower than the previous case.
Note that these two dynamics are different - the former corresponds to something like the Glauber dynamics, which doesn’t conserve spin, but the latter corresponds to something like the Kawasaki dynamics (swapping two spins instead of flipping one) which conserves spin, or to Schelling.
A toolbox
Note that most of the tools we used above are relatively elementary. The theory of dynamical systems has useful tools that give us both different perspectives as well as the ability to analyze different systems. We did not use a lot of the theory of dynamical systems, but it is worth noting down these ideas, alongside some of the older ideas, even if we don’t cover them. It is also a helpful exercise to map things below to the examples we saw, and to go and read more about things that were not covered here in this post.
Main objects of study
A dynamical system evolves a state \(x(t)\) in time by either a differential equation (a flow) or an update rule (a map).
- Continuous time (flow): \(\dot x = F(x, \theta)\), \(x \in \mathbb{R}^n\). The flow \(\Phi_t(x_0)\) gives the state at time \(t\) from initial condition \(x_0\).
- Discrete time (map): \(x_{k+1} = f(x_k, \theta)\), \(k \in \mathbb{N}\).
The following are some things we use:
- An equilibrium/fixed point is a state at which the system stops evolving. For a continuous system, \(F(x^*) = 0\). For a discrete one, \(f(x^*) = x^*\).
- A set \(S\) with \(\Phi_t(S) \subseteq S\) (or \(f(S) \subseteq S\)) is called an invariant set. Invariant regions constrain motion (e.g., non-negative orthant for populations, the simplex for frequencies).
- $ω$-limit set is the set of accumulation points of a trajectory as \(t \to \infty\). Attractors are invariant sets that attract a neighborhood of initial conditions (their basin of attraction). Attractors can be fixed points, limit cycles (periodic orbits), or more complex sets.
- A closed orbit \(\gamma\) that is isolated and attracts or repels nearby trajectories. Stability is determined by the return map (Poincare map) transverse to \(\gamma\), i.e., if all multipliers are inside the unit circle (except the trivial one), the cycle is asymptotically stable.
- Lyapunov stability means trajectories starting close stay close. Asymptotic stability adds attraction back to the equilibrium/limit cycle. Exponential stability adds a uniform convergence rate.
Equilibria are everywhere (logistic, SEIR, predator-prey), invariant orthants/simplices are found in population models, limit cycles are found in sustained oscillations (e.g., predator-prey with saturation but not neutral LV), attractors for Ising ordered phases and Vicsek alignment.
Linearization and local analysis
Near an equilibrium \(x^*\), we generally try to replace the system by its best local linear approximation. Write \(x = x^* + \eta\). Then \(F(x) = F(x^*) + J \eta + O(\|\eta\|^2)\) where \(J = \text{D}F(x^*)\), so \(\dot\eta = J \eta\). In discrete time, \(\eta_{k+1} = A \eta_k\) with \(A = \text{D}f(x^*)\).
What do we do with this linear system? In continuous time, \(\Re\lambda(J) < 0\) for all eigenvalues gives local exponential asymptotic stability, and any \(\Re\lambda>0\) gives instability. In discrete time, all eigenvalues strictly inside the unit circle give local asymptotic stability, and any with \(|\lambda|>1\) gives instability.
The important caveat here is non-hyperbolic points. If some \(\Re\lambda = 0\) (flow) or \(|\lambda| = 1\) (map), linearization alone is inconclusive, and you need center manifolds and normal forms (the way we tackled this above was by just looking at higher order terms).
We also saw how as a consequence of Hartman-Grobman, “Hooke’s law near a well” explains local oscillations almost everywhere.
When do we use this? It should generally be the first thing to look at around any fixed point.
2D phase-plane portraits
In systems of the form \(\dot x = f(x,y)\) and \(\dot y=g(x, y)\), it is possible to visualize things using diagrams that can tell us a lot.
Trace-determinant test: at an equilibrium, having \(\det J < 0\) gives a saddle point, \(\det J > 0\) with \(\text{tr} J < 0\) gives a stable equilibrium (node vs focus is determined by \((\text{tr} J)^2 - 4 \det J\)). And signs flip for unstable equilibrium.
Nullclines are where the derivative of something is \(0\), i.e., \(f = 0\) and \(g = 0\) curves. These constrain the flow. Intersections of these curves are equilibria, and the signs in the tiles give arrows.
To check whether there is periodicity or not, we can sometimes use the Poincare-Bendixson theorem: in 2D, if a trajectory is trapped in a compact set and the limit set isn’t equilibria, it must be a limit cycle. Similarly, to rule out cycles, the Bendixson-Dulac theorem says: if you can find a smooth non-zero \(D(x, y)\) with \(\nabla \cdot(D F)\) having the same sign in a simply connected region, then no closed orbits are present there.
Lyapunov functions and LaSalle’s invariance
A scalar “energy” \(V\) that never increases along trajectories can simplify analysis quite a lot, especially when solving for the trajectory explicitly is not possible.
Locally speaking, if \(V(x^*) = 0\), \(V > 0\) nearby, and \(\dot V \le -c\|x - x^*\|^2\), you get exponential stability. Globally speaking, if \(\dot V \le 0\), LaSalle’s invariance principle says that the trajectory approaches the largest invariant subset of \(\dot V = 0\).
How do we come up with \(V\)? We can generally look at things like physical energy (as we did for oscillators), consensus variance \(V = \sum\|v_i - \bar v\|^2\) (as we did for Cucker-Smale), potentials (as we did for replicator in potential games).
Bifurcations
There are a bunch of different types of bifurcations (with some canonical forms for them). Some of the local ones (i.e., parameter changes change the stability of an equilibrium) are:
- Saddle-node: \(\dot x = \mu - x^2\). A pair of equilibria appears/disappears.
- Transcritical: \(\dot x = \mu x - x^2\). Two equilibria swap stability (e.g., prey-only vs coexistence with logistic prey).
- Pitchfork: \(\dot x = \mu x - x^3\) (supercritical). Symmetry breaks and two stable branches appear (e.g., mean-field Ising).
- Hopf: a complex pair crosses the imaginary axis. In polar form, we have \(\dot r = \alpha \mu r - \beta r^3 + \dots\), \(\dot\theta = \omega + \dots\). With \(\alpha, \beta > 0\) a small stable cycle is created.
There are also global bifurcations where the structure of trajectories change as invariant sets collide with equilibria.
Discrete-time systems and chaos
For a fixed point \(x^*\) of a discrete-time system, \(|f’(x^*)| < 1\) means that it is stable, \(|f’(x^*)| > 1\) means it is unstable. Cobweb plots help qualitatively sketch the stability of fixed points.
The Lyapunov exponent \(\lambda = \lim_{n \to \infty} \frac{1}{n} \sum_{k = 0}^{n - 1} \log |f’(x_k)|\) helps study chaos - \(\lambda > 0\) means that infinitesimally close trajectories will diverge arbitrarily when a long enough period of time has passed.
Poincare maps reduce a periodic ODE to a discrete map, and can also help reason about stability of equilibria for the flow.
Delays: compartments vs true delay differential equations
Distributed lags (e.g., using compartments) vs sharp lags (DDEs) have different dynamics.
Compartment chains (e.g., \(S \to E \to I\)) effectively give gamma-distributed delays and tend to produce damped oscillations back to equilibrium. The lag is implicit and smeared out.
True DDEs \(\dot x(t) = F(x(t), x(t - \tau))\) inject \(e^{-\lambda\tau}\) into the characteristic equation, creating infinitely many eigenvalues. Even scalar DDEs can undergo Hopf bifurcations and oscillate.
Invariants, positivity, comparison, monotone structure
Conserved quantities/first integrals \(I(x)\) with \(\dot{I} = 0\) reduce the effective dimension of the problem (as we saw for oscillator energy and Lotka-Volterra’s implicit integral), effectively constraining where you can be in the state space. Things like \(x_i \ge 0\) for populations and \(S + E + I + R = N\), \(\sum p_i = 1\) for replicator dynamics put us in the right part of the whole space as well.
We can often integrate inequalities under conditions - and comparison principles and Gronwall-type bounds sometimes bound solutions usefully.
Monotone/cooperative systems (\(\partial f_i / \partial x_j \ge 0\) for \(i \neq j\)) preserve order and come with strong convergence theorems.
Networks and coupling
Graphs are quite general and important - especially because any manifold can be approximated as a graph with meaningful guarantees. So it makes sense to consider graph formulations of problems that, e.g., have a natural spatial nature.
When we have consensus mechanisms on a graph, we can sometimes model them using a Laplacian - with weights \(w_{ij} \ge 0\), \(L = D - W\) and \(\dot x = -Lx\). The algebraic connectivity \(\lambda_2(L)\) quantifies how fast disagreements die out.
Spectra of graphs (either adjacency or Laplacian) can provide important information about how systems on graphs will look like. For example, a linearized SIS grows if \(\beta / \gamma > 1 / \rho(A)\), where \(\rho(A)\) is the adjacency spectral radius. Similarly, Kuramoto-type phase models synchronize when coupling beats frequency spread, with onset shaped by network spectrum.
Non-dimensionalization and scaling
Picking natural units for state and time and rescaling often simplifies things a lot. Parameters collapse into dimensionless values that determine the dynamics, e.g., \(R_0\) in epidemics, the diffusion ratio \(D_v / D_u\) for Turing patterns, Fisher-KPP’s \(c_* = 2 \sqrt{Dr}\), density/noise ratios in Vicsek. We get fewer knobs, clearer phase diagrams, as well as easier mapping to standard problems.
Stochastic dynamics and metastability
As we’ve seen, noise can greatly affect dynamics of any system.
Going from an SDE to Fokker-Planck is quite useful, and this is how it’s done: \(dX_t = b(X_t) dt + \sigma(X_t) dW_t\) gives
\[ \partial_t p = -\nabla \cdot (bp) + \frac{1}{2} \nabla \cdot (\sigma \sigma^T \nabla p) \]
For \(b = -\nabla U\) with constant \(\sigma\), the stationary density is \(\propto e^{-2U / \sigma^2}\) (by detailed balance).
In a double-well with small noise \(\varepsilon\), mean escape time scales like \(\exp\{\Delta U / \varepsilon^2\}\) (the Kramers problem). That is the story behind metastability and hysteresis loops. For large deviations, the Freidlin-Wentzell theorem makes these exponentials precise and gives most-likely escape paths.
Traveling fronts
Stability of fronts in the comoving frame (e.g., in Fisher-KPP) is analyzed spectrally, and the Evans function is the standard tool for doing so.
Contraction analysis
If the Jacobian is uniformly contracting in some matrix measure \(\mu\) (i.e., \(\mu(\text{D}F(x)) \le -c < 0\) for all \(x\)), then any two trajectories converge to each other exponentially at rate \(c\). That gives a unique global attractor (possibly modulo a symmetry - e.g., a limit cycle under phase shift - since the transverse distance goes to \(0\)) without any sort of local casework.
When coupling is strong and signs/coercivity are clear (consensus/flocking in velocities), this can be pretty helpful.
Singular perturbations and slow-fast decompositions
Tikhonov’s theorem talks about separated timescales.
\[\dot x = f(x, y, \varepsilon)\] \[\varepsilon \dot y = g(x, y, \varepsilon)\]
with \(0 < \varepsilon \ll 1\). As \(\varepsilon \to 0\), trajectories collapse rapidly to a slow manifold \(g(x, y, 0) = 0\) and then evolve along it, provided normal hyperbolicity. This both reduces dimension and isolates the mechanism you care about.
Koopman operator and data-driven surrogates
Nonlinear dynamics become linear in the (infinite-dimensional) space of observables - more specifically, the Koopman operator composes observables with the flow and is a linear operator. Practically, techniques like dynamic mode decomposition (DMD) and Extended DMD (EDMD) approximate it and extract dominant growth/oscillation modes from data. If you suspect sparse structure, SINDy (sparse identification of non-linear dynamics) recovers parsimonious equations from time series. However, as with empirical methods, this is only an approximation and needs to be validated before use.
Ergodic theory and mixing
With an invariant measure (equivalently, with a measure-preserving transform), ergodicity says that time averages along typical trajectories are equal to ensemble-based averages. In some sense, mixing leads to a decay of correlations. However, finite large systems can be metastable - they linger in one phase for astronomically long times (as we saw with Ising), so practical time averages differ from ensemble ones on realistic horizons.
Non-smooth and hybrid dynamics (thresholds and edges)
Filippov systems are systems where vector fields jump discontinuously across a surface, and as a result weird things happen at boundaries (e.g., sliding motion along the boundary). In discrete systems, border-collision bifurcations in piecewise-smooth maps also have cases that are not covered by smooth theory. This is important to keep in mind for systems with hard thresholds.
Control and feedback
Intervening into a system can let you change the phase portrait, and control theory studies these sorts of systems and interventions. E.g., feedback linearization and Lyapunov-based designs give constructive stabilizers.
Numerical methods that respect structure
Discretization of a continuous system leads to many surprising things, so the solver should be aligned with the system and its constraints (and important properties like invariants).
References and Further Reading
The following is a list of references grouped by topic for ease of reference.
Note that this list was compiled using an LLM with search results - it may or may not have hallucinations (randomly sampling 20% of the references seems to have no hallucinations, please let me know if you find any issues).
General dynamical systems & stability
- Strogatz, S. H. (2015). Nonlinear Dynamics and Chaos (2nd ed.). Westview.
- Hirsch, M. W., Smale, S., & Devaney, R. L. (2012). Differential Equations, Dynamical Systems, and an Introduction to Chaos (3rd ed.). Academic Press.
- Khalil, H. K. (2002). Nonlinear Systems (3rd ed.). Prentice Hall.
- LaSalle, J. P. (1960). Some extensions of Liapunov’s second method. IRE Transactions on Circuit Theory, 7(4), 520–527.
- Hartman, P. (1960). A lemma in the theory of structural stability. Proc. AMS, 11(4), 610–620.
- Grobman, D. (1959). Homeomorphisms of systems of differential equations. Doklady, 128, 880–881.
Growth, oscillators, invariants
- Murray, J. D. (2002, 2003). Mathematical Biology I & II (3rd eds.). Springer.
- Edelstein-Keshet, L. (2005). Mathematical Models in Biology. SIAM (Classics).
- Hethcote, H. W. (2000). The mathematics of infectious diseases. SIAM Review, 42(4), 599–653.
Predator–prey & ecological dynamics
- Lotka, A. J. (1925). Elements of Physical Biology. Williams & Wilkins.
- Volterra, V. (1926). Fluctuations in the abundance of a species considered mathematically. Nature, 118, 558–560.
- Rosenzweig, M. L., & MacArthur, R. H. (1963). Graphical representation and stability conditions of predator–prey interactions. American Naturalist, 97(895), 209–223.
- Kot, M. (2001). Elements of Mathematical Ecology. Cambridge.
Epidemics & \(R_0\)
- Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proc. R. Soc. A, 115, 700–721.
- Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford.
- Diekmann, O., Heesterbeek, J. A. P., & Metz, J. A. J. (1990). On the definition and computation of \(R_0\). J. Math. Biol., 28, 365–382.
- Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton.
- Heffernan, J. M., Smith, R. J., & Wahl, L. M. (2005). Perspectives on \(R_0\). J. R. Soc. Interface, 2(4), 281–293.
Thresholds, cascades, segregation
- Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology, 83(6), 1420–1443.
- Schelling, T. C. (1971). Dynamic models of segregation. J. Math. Sociology, 1(2), 143–186.
- Schelling, T. C. (1978). Micromotives and Macrobehavior. Norton.
Ising, phase transitions, criticality
- Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Physik, 31, 253–258.
- Onsager, L. (1944). Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev., 65, 117–149.
- Metropolis, N., et al. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys., 21, 1087–1092.
- Glauber, R. J. (1963). Time-dependent statistics of the Ising model. J. Math. Phys., 4, 294–307.
- Stanley, H. E. (1971). Introduction to Phase Transitions and Critical Phenomena. Oxford.
- Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions. Oxford.
- Sethna, J. P. (2006). Statistical Mechanics: Entropy, Order Parameters, and Complexity. Oxford.
Reinforcement, herding, heavy tails
- Pólya, G. (1930). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré, 1, 117–161.
- Pemantle, R. (2007). A survey of random processes with reinforcement. Probab. Surveys, 4, 1–79.
- Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
- Dorogovtsev, S. N., & Mendes, J. F. F. (2003). Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford.
- Newman, M. E. J. (2010). Networks: An Introduction. Oxford.
- Kirman, A. (1993). Ants, rationality, and recruitment. QJE, 108(1), 137–156.
- Mitzenmacher, M. (2004). A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 226–251.
- Kesten, H. (1973). Random difference equations… Acta Mathematica, 131, 207–248.
- Goldie, C. M. (1991). Implicit renewal theory and tails of solutions of random equations. Ann. Appl. Prob., 1(1), 126–166.
- Bak, P., Tang, C., & Wiesenfeld, K. (1987). Self-organized criticality. Phys. Rev. Lett., 59, 381–384.
- Dhar, D. (2006). The abelian sandpile. Physica A, 369, 29–70.
Swarms, alignment, active matter
- Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel phase transition in self-driven particles. Phys. Rev. Lett., 75, 1226–1229.
- Chaté, H., Ginelli, F., Grégoire, G., Peruani, F., & Raynaud, F. (2008). Collective motion of self-propelled particles. Phys. Rev. E, 77, 046113.
- Toner, J., & Tu, Y. (1995). Long-range order in a two-dimensional XY model with noise. Phys. Rev. Lett., 75, 4326–4329.
- Toner, J., & Tu, Y. (1998). Flocks, herds, and schools. Phys. Rev. E, 58, 4828–4858.
- Marchetti, M. C., et al. (2013). Hydrodynamics of soft active matter. Rev. Mod. Phys., 85, 1143–1189.
- Cucker, F., & Smale, S. (2007). Emergent behavior in flocks. IEEE TAC, 52(5), 852–862.
- Ha, S.-Y., & Liu, J.-G. (2009). A simple proof of Cucker–Smale flocking. Comm. Math. Sci., 7(2), 297–325.
- Carrillo, J. A., Fornasier, M., Toscani, G., & Vecil, F. (2010). Particle, kinetic, and hydrodynamic models of swarming. In Math. Modeling of Collective Behavior in Socio-Economic and Life Sciences (Birkhäuser).
Replicator dynamics & evolutionary games
- Taylor, P. D., & Jonker, L. (1978). Evolutionary stable strategies and game dynamics. Math. Biosciences, 40, 145–156.
- Hofbauer, J., & Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge.
- Weibull, J. W. (1995). Evolutionary Game Theory. MIT Press.
- Sandholm, W. H. (2010). Population Games and Evolutionary Dynamics. MIT Press.
- Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge.
Reaction–diffusion, Turing, fronts
- Turing, A. M. (1952). The chemical basis of morphogenesis. Phil. Trans. R. Soc. B, 237, 37–72.
- Schnakenberg, J. (1979). Simple chemical reaction systems with limit cycle behavior. J. Theor. Biol., 81(3), 389–400.
- Fisher, R. A. (1937). The wave of advance of advantageous genes. Ann. Eugenics, 7, 355–369.
- Kolmogorov, A., Petrovsky, I., & Piskunov, N. (1937). Study of diffusion equation… Bull. Moscow Univ., 1, 1–25.
- Allen, S. M., & Cahn, J. W. (1979). A microscopic theory for antiphase boundary motion. Acta Metall., 27, 1085–1095.
- Cahn, J. W., & Hilliard, J. E. (1958). Free energy of a non-uniform system. J. Chem. Phys., 28, 258–267.
- Bray, A. J. (1994). Theory of phase-ordering kinetics. Advances in Physics, 43(3), 357–459.
- Langer, J. S. (1971). Theory of spinodal decomposition. Ann. Phys., 65, 53–86.
- Hohenberg, P. C., & Halperin, B. I. (1977). Theory of dynamic critical phenomena. Rev. Mod. Phys., 49, 435–479.
Networks, spectra, SIS thresholds, Kuramoto
- Chung, F. (1997). Spectral Graph Theory. AMS.
- Brouwer, A. E., & Haemers, W. H. (2011). Spectra of Graphs. Springer.
- Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic spreading in scale-free networks. Phys. Rev. Lett., 86, 3200–3203.
- Van Mieghem, P., et al. (2009). Virus spread in networks. IEEE/ACM Trans. Networking, 17(1), 1–14.
- Kuramoto, Y. (1975). Self-entrainment of a population of coupled linear oscillators. In Int. Symp. Math. Problems in Theoretical Physics, Lecture Notes in Physics 39, Springer, pp. 420–422
Stochastic dynamics, large deviations, Koopman & data-driven
- Kramers, H. A. (1940). Brownian motion in a field of force. Physica, 7, 284–304.
- Freidlin, M. I., & Wentzell, A. D. (1984). Random Perturbations of Dynamical Systems. Springer.
- Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. PNAS, 17, 315–318.
- Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech., 656, 5–28.
- Williams, M. O., Kevrekidis, I. G., & Rowley, C. W. (2015). A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition. J. Nonlinear Sci., 25, 1307–1346.
- Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations. PNAS, 113(15), 3932–3937.
Ricardian Trade
- Ricardo, D. (1817). On the Principles of Political Economy and Taxation.
- Dornbusch, R., Fischer, S., & Samuelson, P. A. (1977). Comparative advantage with a continuum of goods. AER, 67(5), 823–839.
- Krugman, P., Obstfeld, M., & Melitz, M. (2015). International Economics (10th ed.). Pearson.
Acknowledgements
I would like to thank genetyx8 for feedback on the post (and others who volunteered to go through a similarly painful experience). GPT-5 Thinking did the list of references and partially wrote the visualization code - in all other aspects, all remaining errors are mine.
Some generous reviewers suggested releasing this post as a series, which is great advice for readability. For now, I’ve kept it as a single piece to preserve the connecting threads and the shared toolbox, which felt harder to maintain across separate posts - and partly because not publishing the post right now might mean this post not seeing the light of day, like many other drafts of mine.
Cite this post
@online{simple-rules-complex-dynamics-part-1-2025,
author = {nor},
title = {Simple Rules, Complex Dynamics -- Part I: Foundations \& Intuition},
year = {2025},
month = {09},
day = {17},
url = {https://nor-blog.pages.dev/posts/2025-09-17-simple-rules-complex-dynamics-part-1},
}