Table of Contents

Introduction and how to read this post

Simple local interactions can lead to a large variety of global behavior. Many examples come from physics, biology, economics, and ML, but the machinery used to look at them remains essentially the same. Looking at things through a certain lens - asking questions such as “What’s the feedback?”, “How are things coupled?”, “Is there noise?”, “What are the timescales?” - can help intuit the macroscopic outcomes of these systems: growth, cycles, alignment and discontinuities, swarms, heavy tails, and different spatio-temporal patterns.

Part I is more about introducing some canonical examples that show some of the important ideas in dynamical systems - the point of this post is to provide the reader with a new lens, some examples, intuition and some tools to be able to look at a dynamical system and explain its behavior. I started writing this post because someone gave me a small seed list of dynamical systems to talk about, and I thought it would be nice to connect them to more pervasive themes in dynamical systems theory. Part II will be about applying these ideas to ML.

We’ll use some math here - linear algebra, ordinary differential equations, basic multivariate calculus, and basic probability/stochastic processes - and build up the remaining things along the way. We will look at lots of examples in order to build intuition, instead of hammering it in with theory.

This post is best read with pen and paper (or equivalent). It is important that you write down the details yourself whether they’re explicitly laid out in the post or not - doing this properly for one section will generally provide more value than reading another section. At times, it might also be worth leaving a section/sub-section for a second pass over the post if the best understanding you have so far is nebulous (this is a huge subject and some non-linearities in presentation are bound to have crept in). Lastly, this is in no way a substitute for a proper book on the subject, so it might be a better idea to go through the references (general ones as well as the ones arranged section-wise) if you feel like reading in more detail.

For the sake of being systematic, you should try to figure out answers to the following questions in most (if not all) of the examples.

  • State variables: what is tracked?
  • Local rule: how does a thing update locally, given its neighbors or the population?
  • Coupling: how are things coupled together?
  • Feedback: what kinds of feedback does the system give? E.g., linear, saturating, thresholded, multiplicative.
  • Timescales/delays: instantaneous/lagged.
  • Macroscopic quantities and behavior: statistics of a system, fixed points, cycles, bifurcations, phase transitions, criticality.
  • Relations: how is the current example related to the previous ones in the above aspects?

Don’t worry if things don’t make sense yet. They will make sense as we consider examples.

For maximizing the value you get out of this post, at the end of each section, you should have the following things about the ideas in that section:

  • The above checklist
  • A mental picture/model of what’s happening
  • A minimal equation that describes it
  • Some recipe (concrete or vibes) on how to deal with similar problems

We’ll gradually build up things - we’ll start with systems without any interaction structures or constraints, then add cross-coupling and delays, then thresholds and noise, then reinforcement and multiplicative systems. Then things like continuous alignment, selection dynamics, and spatial diffusion. For each of these, we’ll see what systems arising naturally have dynamics that can be studied using them.

For a skim, read the “Preview” at the start of each subsection, look at the figures, and read the “Summary” and “Connections” discussions at the end.

The section on common threads tries to distill things into intuition, and the toolbox section provides a set of tools that are quite useful while thinking about systems.

Some simple ideas and baselines - growth, local linearity and equilibrium

Let’s start from the ground up and build some ideas that will help us later on.

The question is: what do systems with no interactions or with simple feedback do?

We begin with three lenses we will use everywhere: (1) the shape of feedback (linear vs saturating), (2) local linearity near equilibria (eigenvalues determine decay vs oscillation), and (3) invariants/monotone quantities that simplify dynamics. These basics will see use in everything that follows.

Exponential and logistic growth

Preview: Two baseline feedbacks: constant per-capita growth gives exponential growth; linearly decreasing per-capita growth gives logistic S-shaped growth with a stable capacity K. We’ll use this idea everywhere.

We first look at the simplest feedback mechanisms. The state of this system is going to be \(x(t) \ge 0\), a scalar. For our purposes, we’ll think of it as some population.

What are the local rules? How do we look at feedback?

For exponential growth, we model per-capita growth to be constant (say \(r\)). Then, \(\dot{x} = rx\), which gives \(x(t) = x_0 \exp(rt)\). Here, \(\dot{x}\) denotes the time-derivative of \(x\).

For logistic growth, we model per-capita growth as declining linearly with \(x\) due to some finite “capacity” \(K\). Then, \(\dot{x} = rx(1 - x/K)\). Solving this ODE gives us \(\frac{x(t)}{K - x(t)} = \frac{x_0}{K - x_0} \exp(rt)\), or equivalently, \(x(t) = \frac{K}{1 + \frac{K - x_0}{x_0} \exp(-rt)}\).

In the first case, the growth is clearly exponential - per-capita growth being constant means that the higher the value, the higher the rate of growth. In the second, the growth is S-shaped, with saturation at \(x = K\). The maximum growth happens when \(x = \frac{K}{2}\). The reason why it saturates is that it has an “attractor” at \(K\), and at all points, pushing the rate of growth back is a capacity term (which is yet another form of feedback, arising this time out of interaction between the system and the environment). When nothing pushes back (or in more informal terms, the “attractor” is at \(\infty\)), we get exponential growth, as expected. In the logistic growth case, when \(x\) starts from a point beyond \(K\), the system still tries to get back to \(K\), so calling it an attractor is more appropriate than a barrier - it corresponds to things like starting from a population overshoot and the consequent decline (but not “growth” in the usual sense).

Figure 1: Fig 1A - Exponential vs logistic growth (linear and log-y)

Figure 1: Fig 1A - Exponential vs logistic growth (linear and log-y)

Figure 2: Fig 1B - Logistic growth r x (1 - x / K)

Figure 2: Fig 1B - Logistic growth r x (1 - x / K)

Summary:

  • Knobs: \(r\) (growth), \(K\) (capacity).
  • Behavior: no coupling; logistic has a stable equilibrium at \(x = K\).
  • Mechanism: linear vs saturating per-capita feedback.

Connections with other parts of the post:

  • The logistic saturation term reappears pointwise in space in Fisher-KPP (reaction term \(ru(1 - u/K)\)).
  • The two-type replicator with a constant payoff advantage reduces to a logistic-like ODE (same S-shaped approach).
  • Adding saturation to Lotka-Volterra prey (\(x(1 - x/K)\)) plays the same role: it turns neutral cycles into decay to equilibrium.
  • We’ll repeatedly ‘read off’ behavior from the feedback shape: linear leads to runaway behavior; saturating gives a stable level.

Oscillators and local linearity

Preview: Near any stable equilibrium, the leading behavior along oscillatory directions looks like a damped oscillator: restoring force + damping determine monotone vs oscillatory return.

The state of this system is going to be \(x(t)\), a scalar, which can be thought of as the displacement of a mass, and \(\dot{x}\), which can be thought of as the velocity of the mass. This is a point in the “phase space”. Consider the following:

\[ \ddot{x} + \gamma \dot{x} + \omega^2 x = 0 \]

The reason why we consider this is simple. Near any stable equilibrium of any reasonable system in one variable, the leading behavior after linearization looks like something of the above form. (More generally, an oscillatory mode near a stable equilibrium looks like this).

I recommend going through the proof below for understanding the general ideas (in particular, asymptotic and Lyapunov stability are something to keep in mind).

Informally speaking, the idea lends itself to physical intuition quite well - consider a mass rolling in a parabolic well which also has friction. The bottom of the well is clearly a stable equilibrium. If the damping were negative (i.e., force aligned with velocity), it would gain velocity and escape, making it an unstable equilibrium. Similarly, if the well were upturned, it would again be an unstable equilibrium.

We will prove something slightly more general - a statement about the modes of oscillation of an $n$-dimensional vector.

Let the state of a system be described by a vector \(x(t) \in \mathbb{R}^n\). The evolution of the system over time is governed by a system of first-order ordinary differential equations \(\dot{x} = F(x)\) where \(F: \mathbb{R}^n \to \mathbb{R}^n\) is a continuously differentiable function. This formulation is completely general, as any $k$-th order ODE can be rewritten as a system of \(k\) first-order ODEs.

An equilibrium point, denoted \(x_0\), is a state where the system does not evolve, i.e., it is a constant solution. Mathematically, this is a point where the vector field vanishes, i.e., \(F(x_0) = 0\).

A system can have multiple equilibrium points. We are concerned with those that are stable. An equilibrium point \(x_0\) is defined as Lyapunov stable if, for any arbitrarily small neighborhood of tolerance \(\mathcal{V}\) around \(x_0\), there exists a neighborhood of initial conditions \(\mathcal{U}\) around \(x_0\) such that any trajectory \(x(t)\) starting in \(\mathcal{U}\) remains within \(\mathcal{V}\) for all future time \(t \ge 0\). More formally, for every \(\epsilon > 0\), there exists a \(\delta > 0\) such that if \(\|x(0) - x_0\| < \delta\), then \(\|x(t) - x_0\| < \epsilon\) for all \(t \ge 0\).

Note that this definition of stability doesn’t say anything about convergence to the equilibrium point (so it’s possible that the trajectory looks like a closed loop around the equilibrium point, but never converges to a single point). A stronger condition is asymptotic stability - it requires that the trajectory converges to the equilibrium point.

Now, to analyze the behavior of the system near the equilibrium point \(x_0\), we consider a small perturbation \(\eta(t) = x(t) - x_0\). The dynamics of this perturbation are given by \(\dot{\eta}(t) = \dot{x}(t) = F(x_0 + \eta(t))\). Since \(\eta(t)\) is assumed to be small, we can perform a Taylor series expansion of \(F\) around the point \(x_0\): \(F(x_0 + \eta) = F(x_0) + J(x_0)\eta + \mathcal{O}(\|\eta\|^2)\). Here, \(J(x_0)\) is the Jacobian matrix of \(F\) evaluated at \(x_0\), with entries \(J_{ij} = \frac{\partial F_i}{\partial x_j}\bigg|_{x=x_0}\).

By definition of equilibrium, \(F(x_0) = 0\). For a sufficiently small perturbation \(\eta\), the higher-order terms \(\mathcal{O}(\|\eta\|^2)\) are negligible compared to the linear term. The dynamics of the perturbation are therefore well-approximated by the linearized system \(\dot{\eta} = J \eta\).

About the validity of this approximation: the definition of the derivative itself is that it provides the best linear approximation to a function locally. To be rigorous, the Hartman-Grobman theorem (which is probably out of scope as of writing) formalizes the topological equivalence between the non-linear and linear systems under certain conditions, but the approximation itself is a direct consequence of the differentiability of \(F\).

We now try to figure out properties of the Jacobian matrix \(J\) under stability constraints. The behavior of the linear system \(\dot{\eta} = J\eta\) is entirely determined by the eigenvalues of \(J\). The general solution (assuming \(J\) is diagonalizable for simplicity, though the argument holds generally using the Jordan normal form) is a linear combination of terms of the form \(v_i e^{\lambda_i t}\), where \(\lambda_i\) are the eigenvalues and \(v_i\) are the corresponding eigenvectors: \(\eta(t) = \sum_{i=1}^{n} c_i v_i e^{\lambda_i t}\).

A reviewer notes here, and I agree, that linear ODEs with degenerate matrices (i.e. matrices that are non-diagonalizable) are interesting in their own right, as they have interesting transient behavior, and a lot of the interesting “bifurcations” in more complex systems are related to the Jacobian being degenerate (for now, think about a “bifurcation” as a qualitative change in the behavior of a system as a parameter changes).

Each eigenvalue \(\lambda_i\) is, in general, a complex number: \(\lambda_i = \Re(\lambda_i) + i \cdot \Im(\lambda_i)\). The magnitude of the term \(e^{\lambda_i t}\) is given by \(|e^{\lambda_i t}| = |e^{(\Re(\lambda_i) + i \cdot \Im(\lambda_i))t}| = |e^{\Re(\lambda_i)t}| \cdot |e^{i \cdot \Im(\lambda_i)t}| = e^{\Re(\lambda_i)t}\). The term \(|e^{i \cdot \Im(\lambda_i)t}|\) has a magnitude of 1 and represents oscillation. The growth or decay of the perturbation is governed entirely by the sign of the real part of the eigenvalue, \(e^{\Re(\lambda_i)t}\).

We claim that for the equilibrium point \(x_0\) to be stable, all eigenvalues \(\lambda_i\) of the Jacobian \(J\) must have non-positive real parts, i.e., \(\Re(\lambda_i) \le 0\) for all \(i=1, \dots, n\).

The proof is by contradiction. Assume there exists at least one eigenvalue, say \(\lambda_k\), with a strictly positive real part, \(\Re(\lambda_k) > 0\). Consider an initial perturbation \(\eta(0)\) that has a non-zero component in the direction of the eigenvector \(v_k\). That is, in the sum \(\eta(0) = \sum c_i v_i\), the coefficient \(c_k \ne 0\). Such an initial condition can be chosen arbitrarily close to the equilibrium point (i.e., for any \(\delta > 0\)). The solution \(\eta(t)\) will contain the term \(c_k v_k e^{\lambda_k t}\). The magnitude of this term grows exponentially as \(\|\eta_k(t)\| = |c_k| \|v_k\| e^{\Re(\lambda_k)t}\). Since \(\Re(\lambda_k) > 0\), the term \(e^{\Re(\lambda_k)t} \to \infty\) as \(t \to \infty\). This holds for all \(t\), not just small \(t\). This roughly exponential growth means that the perturbation \(\eta(t)\) will eventually become arbitrarily large. No matter how small the initial neighborhood \(\mathcal{U}\) (i.e., how small we choose \(\eta(0)\)) and how large the tolerance neighborhood \(\mathcal{V}\) (i.e., the value of \(\epsilon\)), the trajectory will eventually leave \(\mathcal{V}\). This violates the definition of Lyapunov stability. So, for a stable equilibrium, it is necessary that \(\Re(\lambda_i) \le 0\) for all eigenvalues of the Jacobian matrix.

A nitpick - when \(\Re(\lambda_i) = 0\) (for any particular eigenvalue), this is called a non-hyperbolic equilibrium - linearization is generally insufficient to comment on stability of the original system (but is fine if we restrict ourselves to only the linearized problem instead of the original), so we will not consider cases where \(\Re(\lambda_i) = 0\) (the Hartman-Grobman theorem doesn’t apply to such cases, and we generally need to look at the non-linearities specifically to say anything). So we will assume that \(\Re(\lambda_i) < 0\) for all \(i\). This also gives us asymptotic stability, which is stronger than Lyapunov stability, and is also enough to talk about the stability of the original system.

The dynamics of the full $n$-dimensional linear system can be decomposed into subspaces corresponding to the eigenvalues of \(J\). Oscillatory behavior is associated with complex conjugate pairs of eigenvalues; the real eigenvalues lead to exponential decay, and are not interesting in general (but a similar argument works for them too).

Let’s consider a dominant “mode” of oscillation within the system, which will be governed by a pair of complex conjugate eigenvalues, \(\lambda\) and \(\bar{\lambda}\). From our stability proof, we know that \(\Re(\lambda) \le 0\). Let us write the eigenvalues as \(\lambda = -\alpha + i\omega’\) and its complex conjugate \(\bar{\lambda}\) - note that this is not the same \(\omega\) as before. Due to the stability requirement, the damping factor \(\alpha = -\Re(\lambda)\) must be non-negative, i.e., \(\alpha \ge 0\). The value \(\omega’ = \Im(\lambda)\) represents the frequency of oscillation.

The dynamics in the two-dimensional real invariant subspace corresponding to this eigenvalue pair can be described by a 2x2 real matrix, which is similar to the canonical form \(A = \begin{pmatrix} -\alpha & -\omega’ \\ \omega’ & -\alpha \end{pmatrix}\).

Let the state within this subspace be represented by a vector \(y(t) = \begin{pmatrix} y_1(t) \\ y_2(t) \end{pmatrix}\). The dynamics are given by \(\dot{y} = Ay\), i.e., \(\dot{y}_1 = -\alpha y_1 - \omega’ y_2\) and \(\dot{y}_2 = \omega’ y_1 - \alpha y_2\). We can combine these two first-order equations into a single second-order equation for one of the components, say \(y_1\). Differentiate the first equation with respect to time and substitute the expression for \(\dot{y}_2\) from the second equation to get \(\ddot{y}_1 = -\alpha \dot{y}_1 - \omega’(\omega’ y_1 - \alpha y_2) = -\alpha \dot{y}_1 - \omega’^2 y_1 + \alpha\omega’ y_2\). Eliminating \(y_2\) and rearranging the terms, we arrive at the standard form of a second-order linear homogeneous ODE \(\ddot{y}_1 + 2\alpha \dot{y}_1 + (\alpha^2 + \omega’^2)y_1 = 0\).

This is the equation for a damped linear oscillator.

  • The term \(m\ddot{y}_1\) corresponds to \(\ddot{y}_1\) (we can set the “mass” \(m = 1\) without loss of generality by scaling the equation).
  • The term \(b\dot{y}_1\) corresponds to \(2\alpha \dot{y}_1\). This is the damping term.
  • The term \(ky_1\) corresponds to \((\alpha^2 + \omega’^2)y_1\). This is the restoring term.

From the definition of stability, we saw that \(\Re(\lambda) \le 0\), which implies that the coefficient \(\alpha = -\Re(\lambda)\) must be non-negative (\(\alpha \ge 0\)), so, the coefficient of the damping term, \(b = 2\alpha\), must also be non-negative.

Thus, any system near a stable equilibrium point, when exhibiting oscillatory behavior, can be mathematically modeled as a damped second-order linear oscillator.

Let’s consider some cases for this second-order system in one variable. Let \(r_1\) and \(r_2\) be the zeros of \(r^2 + \gamma r + \omega^2\).

Then we have \(-\gamma = r_1 + r_2\) and \(\omega^2 = r_1 r_2\), so \((\ddot{x} - r_1 \dot{x}) - r_2 (\dot{x} - r_1 x) = 0\). Setting \(y = \dot{x} - r_1 x\), this gives a homogeneous linear ODE in \(y\). Solving it for \(y\) gives us a linear ODE in \(x\). The solutions differ in different cases:

  • Both roots real (i.e., \(\gamma^2 > 4\omega^2\)): the system here is called overdamped, and the two negative real roots imply no oscillation. \(x\) is a sum of two decaying exponentials in time.
  • Repeated real root (i.e., \(\gamma^2 = 4\omega^2\)): the system here is called critically damped, and this is the fastest non-oscillatory return to equilibrium. \(x\) is a linear polynomial times a decaying exponential in time.
  • Both roots complex (i.e., \(\gamma^2 < 4\omega^2\)): the system here is called underdamped, and there are oscillations here. The form is similar to the both-real-roots case (except the roots here are complex), but with constraints that make the resulting \(x\) real. In general it looks like a sinusoidal function times a decaying exponential in time.
Figure 3: Fig 1C - Damped oscillator phase portrait

Figure 3: Fig 1C - Damped oscillator phase portrait

There are also other variants of this system that have a forcing component - e.g., instead of \(0\) on the right-hand side, we have a sinusoidal (in time) component that drives the system, and it’s a good exercise to figure out how to solve this (and to figure out what happens in specific cases, what happens to the steady-state amplitude with different sinusoidal frequencies and so on). You will find that “resonance” (peak amplitude) occurs when the forcing frequency is \(\sqrt{\omega^2 - \frac{\gamma^2}{2}}\) at low damping (when damping is high enough that the term under the square root is negative, the amplitude is monotonically decreasing and has a maximum amplitude under a static force).

Figure 4: Fig 1D - Resonance curve for forced oscillator

Figure 4: Fig 1D - Resonance curve for forced oscillator

Summary:

  • Knobs: \(b\) (damping), \(\omega\) (natural frequency).
  • Behavior: overdamped (no oscillation), critical, underdamped (oscillatory decay).
  • Mechanism: eigenvalues’ real parts determine decay, imaginary parts determine oscillation.

Connections with other parts of the post:

  • Linearizing near an equilibrium will generally be our first move in many systems.
  • The damped focus picture explains the spirals you’ll see later in Lotka-Volterra logistic variant and SEIR: linearization gives complex eigenvalues with negative real part.
  • Eigenvalues determining things reappears in reaction-diffusion.
  • If a complex pair crosses the imaginary axis (not here, but in true delays), that gives a Hopf bifurcation (birth of a limit cycle - stable or unstable). The delays section in the toolbox is relevant.

Invariants and conservation

Preview: When things change, first ask what doesn’t: invariants (conserved quantities) and monovariant (monotone) quantities (always increasing/decreasing) simplify analysis and constrain motion.

When things are changing, it is often wise to focus on what doesn’t.

Many systems have things that remain constant over time - these are called invariants or conserved quantities. They help make life easier by reducing the degrees of freedom (or the moving parts) the system has, generally becoming one less thing we need to integrate over. We will see many invariants in the coming sections.

In physical systems, there are things such as energy or momentum that may be conserved, and such conservation laws make solving equations much easier. In classical mechanics, invariants are quite important - e.g., Noether’s theorem shows that symmetries give rise to invariants and makes dealing with mechanical systems and reasoning about their properties much easier.

Similarly, there are monovariants - things that change monotonically as a process progresses. In physics, we have things like the entropy of an isolated system that seem to be monotonically non-decreasing. Monovariants also help us reason about different properties of the system, and are especially useful to show convergence.

Our first example of an invariant comes from Hooke’s law for springs. Consider a mass attached to a spring. Let the position from the equilibrium position be \(x\), and the state of the system is determined by \(x\) and \(\dot{x}\), as earlier. Hooke’s law states that \(m\ddot{x} + kx = 0\) (restoring force of \(-kx\) with extension \(x\)). Multiplying by \(\dot{x}\) and integrating gives us a conserved quantity \(\frac{1}{2} m \dot{x}^2 + \frac{1}{2} k x^2\). This is precisely the conservation of total mechanical energy - the energy is a function of the state that remains preserved in this system, and hence an invariant/conserved quantity. This also helps us solve for the dynamics of the equation by giving us \(\dot{x}\) as a function of \(x\) which is solvable using elementary methods. Note how we reduced the order of the differential equation by doing this - this is one way invariants are quite useful.

Figure 5: Fig 2A - Energy contours in (x, v)

Figure 5: Fig 2A - Energy contours in (x, v)

We give another example, this time from biology - the Hardy-Weinberg principle. It states that under no evolutionary “forces”, the allele and genotype distributions remain the same (the former holds from the first generation itself, while the latter holds from the second generation onwards).

To give some context - a gene is a segment of DNA that codes for a specific trait (e.g., flower color). Different versions of a gene are called alleles (e.g., the A allele for purple flowers and the a allele for white flowers). An individual’s genetic makeup, or genotype, consists of two alleles (e.g., AA, Aa, or aa).

Allele frequency is the proportion of a specific allele within a population’s entire gene pool. We denote the frequency of the dominant allele (A) as \(p\) and the recessive allele (a) as \(q\). By definition, \(p + q = 1\).

Under conditions such as random mating, infinite population, no selection/mutation/migration/drift, the Hardy-Weinberg principle says that the allele frequencies remain \(p\) and \(q\) for each generation, and the genotype frequencies reach \(p^2\), \(2pq\), \(q^2\) after the first generation and stay there.

This is fairly straightforward to derive: for the allele frequencies, the probability that an entity in the new generation inherits A (respectively, a) is \(p\) (respectively, \(q\)). The same can be done for the genotypes.

So even if incrementing generations introduces more members to the population, these two ratios are invariant.

Figure 6: Fig 2B - Hardy-Weinberg checkerboard and frequencies

Figure 6: Fig 2B - Hardy-Weinberg checkerboard and frequencies

Summary:

  • Invariants reduce effective dimension (oscillator energy; allele frequencies).
  • Monotone quantities help show convergence.

Connections with other parts of the post:

  • Lotka-Volterra’s closed orbits come from a first integral (like oscillator energy).
  • In Cucker-Smale, the velocity variance is a monotone Lyapunov-like quantity (always decreasing).
  • In phase separation (Allen-Cahn/Cahn-Hilliard), the energy drops over time; with conservation (Cahn-Hilliard) total ‘mass’ of u is also preserved.
  • In Ising, the energy is not conserved but it organizes the stationary distribution: lower energy configurations are favored.

A quick primer on bifurcations

Preview: A bifurcation is a qualitative change as a parameter moves: equilibria appear/disappear or swap stability; oscillations can be born.

A bifurcation is a qualitative change in the long-term behavior of a dynamical system as a parameter varies. Intuitively, you can think of them as a phase change, but for dynamical systems: equilibria may appear/disappear, swap stability, or oscillations can be born.

For the sake of simplicity, we’ll stick to 1D continuous time systems of the form \(\dot{x} = f(x, r)\), where \(x \in \mathbb{R}\) is the state, and \(r \in \mathbb{R}\) is a parameter that we will vary later.

Equilibria are values \(x^*\) with \(f(x^*, r) = 0\), and stability is determined by \(f’\) - if it is \(< 0\), then trajectories nearby decay back (stable), and if \(> 0\), trajectories are repelled (unstable). A bifurcation typically occurs when an equilibrium’s derivative passes through zero.

The following are some “normal forms” for some types of bifurcations - minimal models that capture the dynamics of a bifurcation:

  • Saddle-node (fold) bifurcation: \(\dot{x} = r + x^2\). For \(r < 0\), two equilibria (one stable, one unstable). At \(r = 0\) they collide and annihilate. For \(r > 0\), there are no equilibria.
  • Transcritical bifurcation: \(\dot{x} = r x - x^2\). Two branches cross and exchange stability at \(r = 0\).
  • Pitchfork (supercritical) bifurcation: \(\dot{x} = r x - x^3\). A single stable equilibrium becomes unstable at \(r = 0\) and two new stable equilibria emerge symmetrically.
  • Hopf (in 2D or higher) bifurcation: a stable equilibrium loses stability and a limit cycle (oscillation) is born.

There are many others, but these four cover a lot of ground.

A bifurcation diagram is a diagram that shows the values visited or approached asymptotically (fixed points, periodic orbits or chaotic attractors) of a system as a function of a bifurcation parameter. In simpler words, it shows how things like the stability of a system change as we change a parameter that the dynamical system depends on. This often helps build intuition about the qualitative behavior of a system.

Let’s look at how to visualize such a diagram for a pitchfork bifurcation, for instance.

Our system is \(\dot{x} = r x - x^3\), with the bifurcation parameter being \(r\). The equilibria are \(x_0 = 0\) for all \(r\), and \(x_\pm = \pm \sqrt{r}\) for \(r \ge 0\). If \(r < 0\), \(x_0\) is stable. If \(r > 0\), \(x_0\) is unstable and \(x_\pm\) are stable.

We plot the parameter on the x-axis and the state on the y-axis, generally denoting the stable branches with a solid line and unstable branches with a dashed line. This is how it looks:

Figure 7: Fig 1E - Bifurcation diagram for a pitchfork bifurcation

Figure 7: Fig 1E - Bifurcation diagram for a pitchfork bifurcation

Summary:

  • There are a bunch of types of bifurcations, we’ll see saddle-node, transcritical and pitchfork.
  • A bifurcation diagram can help visualize the dynamics of a system.

Connections with other parts of the post:

  • Transcritical: Lotka-Volterra + logistic has a prey-only vs coexistence equilibrium swap as \(\gamma / \delta\) crosses K.
  • Pitchfork: mean-field Ising magnetization and mean-field Vicsek alignment both obey a pitchfork-like self-consistency (symmetry breaking).
  • Hopf: true delay equations (as opposed to compartments) can cross into oscillations via Hopf bifurcations (but these bifurcations are not limited to true delays).

Oscillations from feedback and delay

Building on the baselines, we now add two features found in real systems: interaction between entities or cross-coupling (Lotka-Volterra) and a simple lag (SEIR’s incubation). The first gives neutral cycles via a first integral; the second turns those cycles into damped waves. The same phase-plane and linearization ideas apply.

Lotka-Volterra predator-prey model

Preview: Pure cross-coupling (no saturation) yields neutrally stable cycles with a first integral. Adding logistic saturation to prey turns cycles into decay to a stable equilibrium and creates a point where stabilities swap.

This is a model for populations of species that are in a predator-prey relationship. Prey grows at a per-capita rate of \(\alpha\) but is eaten at the per-capita rate of \(\beta\) times the predator population. Predators die without sustenance at the per-capita rate of \(\gamma\) and grow at the per-capita rate of \(\delta\) times the prey population.

If the prey population is \(x\) and the predator population is \(y\), we get the following equations:

\[\dot{x} = \alpha x - \beta xy\]

\[\dot{y} = -\gamma y + \delta xy\]

This is what we call a cross-coupled system. Let’s think of it empirically first. What are the most trivial solutions?

  • If there are no predators, the prey grows exponentially (absurd, but we will live with it in this simple model for now).
  • If there is no prey, predators die out with an exponential decay.
  • Equilibrium means that \(x\) and \(y\) don’t change - this gives us a solution with \(y = \frac{\alpha}{\beta}\) and \(x = \frac{\gamma}{\delta}\), or when neither species has any individuals. This is of course assuming that \(\beta\) and \(\delta\) are non-zero - this is a physically valid assumption. We’ll in general assume that \(\alpha\), \(\beta\), \(\gamma\) and \(\delta\) are positive.

Let’s look at what happens when either of the populations has a zero rate of change (i.e., a nullcline for that species’ population). For \(x\), they are \(x = 0\) and \(y = \frac{\alpha}{\beta}\), and for \(y\) they are \(y = 0\) and \(x = \frac{\gamma}{\delta}\). This divides the first quadrant (positive \(x\) and \(y\)) into four parts, and where these parts intersect is the non-trivial equilibrium point. By looking at signs for \(\dot{x}\) and \(\dot{y}\) we can see that the path is counterclockwise around the equilibrium point.

Figure 8: Fig 3A - LV nullclines and orbits

Figure 8: Fig 3A - LV nullclines and orbits

Another empirical property, as we’ll see later, is that these interactions can cause oscillations. When the prey populations increase a lot, the predator populations start to increase, and there comes a point where large predator populations lead to reduction in the prey populations. This in turn leads to a reduction in predator populations since they’re dependent on prey, and the prey have a chance to recover.

Let’s try to solve this problem mathematically. Eliminating \(t\) we get \(\frac{dx}{dy} = \frac{x}{y} \frac{\alpha - \beta y}{-\gamma + \delta x}\). Separating variables and integrating, we get that the following quantity is constant:

\[\delta x - \gamma \log x + \beta y - \alpha \log y\]

If we plot contours of this quantity on the Cartesian plane, we’ll get closed loops. It’s easy to see why: \(\delta x - \gamma \log x\) is bounded below, and so is \(\beta y - \alpha \log y\). Since they sum to a constant, each of them is also bounded above, so contours are always finite. This is also a continuous curve with no more than 2 solutions for any given \(x\) (or \(y\)), and the set of \(x\) and \(y\) where such solutions exist is a closed interval.

Figure 9: Fig 3B - LV first integral level sets

Figure 9: Fig 3B - LV first integral level sets

Let’s see what happens near the non-trivial equilibrium point - linearization gives us a Jacobian with eigenvalues \(\pm i \sqrt{\alpha \gamma}\), so this system oscillates.

Note that a major issue with this is the jump from discrete populations to continuous populations - in practice these cycles don’t exist under extreme pressure (e.g., you can’t have more than 0 but less than 1 member of a species).

We now add a simple change to the standard Lotka-Volterra model - a carrying capacity (like the logistic growth case from earlier) \(K\) for the prey - and see how things change. \(\alpha x\) becomes \(\alpha x (1 - x/K)\).

The equilibrium points become \((0, 0)\), \((K, 0)\) and \((\gamma/\delta, (\alpha/\beta) (1 - \gamma/(\delta K)))\) (the last of which makes sense only if \(\gamma/\delta < K\)).

The Jacobian at \((0, 0)\) has eigenvalues \(\alpha\) and \(-\gamma\) which are of opposite signs, so it is a saddle point. At \((K, 0)\), the eigenvalues are \(-\alpha\) and \(\delta K - \gamma\), and we get a stable equilibrium if \(\gamma / \delta > K\) and a saddle point when the inequality is flipped. When equality holds instead, a different phenomenon called a (transcritical) bifurcation happens - i.e., two equilibrium points (this one and the coexistence point) swap their stability characteristics. In this specific case, at this exact point, a perturbation that increases the prey population is unstable, and one that decreases the prey population is stable, as can be verified by considering the original non-linear system.

The Jacobian at the coexistence point (which makes physical sense only for \(\gamma / \delta < K\), equality gives us the previous case) has non-zero eigenvalues. The trace is always negative and the determinant is always positive, so this is always a stable equilibrium. Whether it is a direct decay or oscillatory decay depends on the discriminant of the characteristic equation and is easy to determine.

Figure 10: Fig 3C - Logistic-prey LV phase portrait and stability

Figure 10: Fig 3C - Logistic-prey LV phase portrait and stability

Summary:

  • Knobs: the prey capacity \(K\) and rates.
  • Behavior: neutral cycles without saturation, damped approach to equilibrium with saturation, transcritical bifurcation at \(\gamma / \delta = K\).

Connections with other parts of the post:

  • 2D phase-plane tools (nullclines, trace-determinant) used here carry over to any planar system.
  • Adding logistic prey (\(x(1 - x/K)\)) mirrors logistic growth and introduces damping; the stability swap is a transcritical bifurcation (see bifurcations primer).
  • Compare to SEIR: LV cycles are neutral because of a first integral; SEIR’s E-stage acts like a lag and creates damped oscillations instead.

SEIR model with vital dynamics

Preview: The decay-like term in the E-component dynamics introduces a soft “time lag” into the system - outbreaks overshoot and then damp towards an endemic equilibrium when \(R_0 > 1\).

The Lotka-Volterra model showed us how instantaneous cross-coupling can produce sustained oscillations. However, many real-world systems, particularly in epidemiology, involve a time lag between cause and effect. An individual does not become instantaneously infectious upon being exposed to a pathogen; instead, there is an incubation period. The SEIR model with vital dynamics is an example that demonstrates how this delay changes the behavior, going from the neutral cycles of the Lotka-Volterra model to damped oscillations (or monotone approach) converging on a stable state.

It is what we call a compartmental model, meaning we divide the total population, \(N\), into distinct groups based on their disease status. We assume the population is large and well-mixed, allowing us to treat the number of individuals in each compartment as a continuous variable.

The state variables are:

  • \(S(t)\): the number of susceptible individuals - those who are healthy but can become infected.
  • \(E(t)\): the number of exposed individuals - those who have been infected but are not yet infectious to others (i.e., they are in the incubation period).
  • \(I(t)\): the number of infectious individuals - those who can transmit the disease to susceptible individuals.
  • \(R(t)\): the number of recovered (or removed) individuals - those who are no longer infectious and have acquired immunity (or have died).

The flow between these compartments under this model is as follows:

  • The transmission rate, \(\beta\), governs the rate at which susceptible individuals become infected through contact with infectious individuals. The total rate of new infections is proportional to the number of encounters between \(S\) and \(I\), which in a well-mixed population is proportional to \(S \times I\). We often write this term as \(\frac{\beta S I}{N}\) to represent the per-susceptible infection rate as a function of the fraction of the population that is infectious, \(\frac{I}{N}\).
  • The incubation rate, \(\sigma\), is the rate at which exposed individuals become infectious. The average duration of the incubation period is \(1/\sigma\).
  • The recovery rate, \(\gamma\), is the rate at which infectious individuals recover and gain immunity (or die). The average duration of the infectious period is \(1/\gamma\).
  • The birth and death rate, \(\mu\). To account for long-term dynamics, we assume a constant birth rate that adds new individuals to the susceptible pool, and a natural death rate that removes individuals from all compartments. For a stable population size \(N\), the total birth rate must equal the total death rate, so we assume a birth rate of \(\mu N\) and a per-capita death rate of \(\mu\).

These rules give rise to the following system of coupled ordinary differential equations:

\[\dot{S} = \mu N - \frac{\beta S I}{N} - \mu S\] \[\dot{E} = \frac{\beta S I}{N} - (\sigma + \mu) E\] \[\dot{I} = \sigma E - (\gamma + \mu) I\] \[\dot{R} = \gamma I - \mu R\]

An immediate (and obvious) observation is that \(\dot{S} + \dot{E} + \dot{I} + \dot{R} = 0\). This shows that, as expected, the total population \(N = S + E + I + R\) is a conserved quantity and lets us reduce the system’s dimensionality from four to three by setting \(S(t) = N - E(t) - I(t) - R(t)\).

Epidemic models are generally made to answer the question: under what conditions will an outbreak occur? Intuitively, an epidemic will take off if, on average, a single infectious person introduced into a fully susceptible population infects more than one other person. This threshold quantity is called the basic reproduction number, \(R_0\).

If \(R_0 < 1\), each infected individual produces, on average, fewer than one new infection, and the disease will die out. If \(R_0 > 1\), each infected individual produces more than one new infection, and the epidemic will grow.

Let’s compute \(R_0\). We consider the stability of the disease-free equilibrium (DFE), where \((S, E, I, R) = (N, 0, 0, 0)\). An outbreak occurs if this state is unstable. Near the DFE, \(S \approx N\), so the linearized system for the infected compartments (\(E\) and \(I\)) is:

\[\frac{d}{dt} \begin{pmatrix} E \\ I \end{pmatrix} = \begin{pmatrix} -(\sigma + \mu) & \beta \\ \sigma & -(\gamma + \mu) \end{pmatrix} \begin{pmatrix} E \\ I \end{pmatrix}\]

The DFE is stable if and only if the eigenvalues of this Jacobian matrix have negative real parts. This requires the determinant to be positive (since the trace is already negative), leading to the condition for stability:

\[R_0 = \frac{\beta}{\gamma+\mu} \cdot \frac{\sigma}{\sigma+\mu} < 1\]

We can understand this formula intuitively too. For an infection to be passed on, an individual must first survive the incubation period (which occurs with probability \(\frac{\sigma}{\sigma+\mu}\)) and then infect others during their infectious period. The average duration of the infectious period is \(\frac{1}{\gamma+\mu}\), and the transmission rate is \(\beta\). Thus, \(R_0\) is the product of the transmission rate, the fraction of individuals who survive incubation, and the average duration of infectiousness.

The system has two equilibrium points:

  1. Disease-free equilibrium (DFE): \((S, E, I, R) = (N, 0, 0, 0)\). As we just showed, this equilibrium is stable if \(R_0 < 1\).
  2. Endemic equilibrium (EE): if \(R_0 > 1\), the DFE becomes unstable, and the system converges to a state where the disease persists in the population. By setting the derivatives to zero, we can solve for the steady-state values (\(S^*, E^*, I^*, R^*\)). Assuming \(I^* \ne 0\), we find \(S^* = \frac{N}{R_0}\). The existence of a physically feasible endemic equilibrium (where compartment sizes are positive) requires \(S^* < N\), which implies \(R_0 > 1\), consistent with our stability analysis. The remaining values can be found using the conservation of total population.

Note that when \(R_0 > 1\), the system does not simply move monotonically to the endemic equilibrium. Instead, it typically exhibits damped oscillations.

The trajectory starts with a rapid decline in susceptibles (\(S\)) and a corresponding rise in the exposed (\(E\)) and infectious (\(I\)) populations. The peak of the exposed population occurs before the peak of the infectious population. By the time the infectious population (\(I\)) reaches its peak, the susceptible population (\(S\)) has been significantly depleted. This lack of new “fuel” causes the infectious population to crash. This is followed by a slow recovery of the susceptible population due to births, which can lead to smaller, subsequent waves until the system eventually dampens into the steady endemic state.

This oscillatory behavior is a direct result of the time lag introduced by the \(E\) compartment. Unlike instantaneous feedback, the delay between becoming infected and becoming infectious creates an “inertial” effect. The system cannot immediately correct for changes, leading to the characteristic overshoots that define damped oscillations around the stable endemic equilibrium.

Figure 11: Fig 3D - SEIR S, E, I, R time series (damped waves)

Figure 11: Fig 3D - SEIR S, E, I, R time series (damped waves)

Figure 12: Fig 3E - SEIR S-I phase portrait to endemic equilibrium

Figure 12: Fig 3E - SEIR S-I phase portrait to endemic equilibrium

Summary:

  • Knobs: \(R_0\), incubation rate \(\sigma\) and the birth/death rate \(\mu\).
  • Behavior: disease dies out if \(R_0 < 1\), otherwise damped waves to a stable endemic state.
  • Mechanism: the decay-like term in the E-compartment leads to an effective time lag, in the form of overshoot and damping.

Connections with other parts of the post:

  • Adding diffusion to I (and S) produces spatial waves; linearization at the leading edge gives a Fisher-KPP-like front speed.
  • The \(R_0\) threshold mirrors network thresholds later: on graphs, stability depends on a spectral radius (see toolbox: Networks & spectra).

Thresholds, alignment, and phase transitions

So far we’ve looked at examples where feedback and delays lead to continuous changes in state. We now consider a different class of phenomena, where the global behavior is not a smooth wave but is discontinuous. Systems that have some sense of thresholds where units make discrete decisions provide some examples that show this kind of behavior. These microscopic discrete behaviors lead to macroscopic behaviors such as cascades, spontaneous alignment, and segregation - phase transitions of some sort. The question we want to answer is: how do simple threshold rules aggregate to give us sudden changes in global behaviors?

We’ll look at three examples first - Granovetter’s threshold model which shows social cascades, the Ising model for magnetism that’ll help us understand alignment, noise, and emergent order, and Schelling’s model of social segregation arising from mild spatial preferences.

Granovetter’s threshold model for collective behavior

Preview: A simple response map \(k_{t + 1} = G(k_t)\) turns individual thresholds into macroscopic “cascades”, and small changes in \(G\) lead to drastically different outcomes.

This model provides a simple explanation for how collective behavior, like strikes or the adoption of innovations, can emerge suddenly and unpredictably. The main idea is that an individual’s decision to participate depends on how many others are already participating. Each individual has a personal threshold on the number of other people they need to see participating before they join in themselves.

The state of the system at time \(t\) is the number of individuals who exhibit the behavior \(k(t)\). Each individual \(i\) in the population has a fixed integer threshold \(T_i\) such that they will exhibit the behavior at time \(t + 1\) if \(k(t) \ge T_i\).

The coupling here is mean-field - each person’s decision is based on a global count and not by any sort of local statistics.

The macroscopic behavior of the system is entirely determined by the distribution of thresholds across the population. A small change in this distribution can drastically change the outcome of the experiment.

For example, consider the case where the $i$-th person had threshold \(i - 1\). Then at each time step, another person exhibits the behavior, until there are no people left. If we change the threshold for a specific person \(i\) from \(i - 1\) to \(i\), then there will only be \(i - 1\) people who exhibit the behavior, no matter how long the experiment is run. In the worst case, it could mean \(0\) people showing the behavior. This behavior can be extremely counterintuitive to an outside observer who doesn’t know the individual thresholds.

This is also a bifurcation. The parameter that controls this behavior (the control parameter) is the shape of the threshold distribution.

Let’s try to look at it in the continuous limit. Let \(k(t)\) now be the fraction of individuals who exhibit the behavior. Then \(k(t + 1) = \mathbb{P}[\theta_i \le k(t)] = G(k(t))\) where \(G\) is the CDF of the threshold distribution.

A fixed point occurs when \(k^* = G(k^*)\). Obvious solutions to this are \(k^* = 0\) and \(1\), and maybe some solutions in the middle. Stability requires that \(|G’(k^*)| < 1\). If \(G\) is steep near the origin, then a small seed can make the system go to a non-zero equilibrium.

Figure 13: Fig 4A - Granovetter response map k(t + 1) = G(k(t))

Figure 13: Fig 4A - Granovetter response map k(t + 1) = G(k(t))

Note that we can also have smooth thresholds here. Suppose each individual exhibits the behavior with probability \(\frac{1}{1 + \exp(-\beta (f(k) - c))}\) - letting \(\beta\) go to \(\infty\) gives us hard thresholds. I recommend looking back at this after reading the Ising model section and trying to find similarities.

Summary:

  • Knobs: the steepness and the shape of \(G\), especially near \(0\).
  • Behavior: all or partial adoption of behavior depending on where the stable fixed points are.
  • Mechanism: fixed-point slope determines stability, and a steep \(G\) near the origin leads to cascades.

Connections with other parts of the post:

  • The analysis we did for fixed-point stability is the discrete-time analog for 1D flow stability.
  • The smooth threshold setup parallels Ising at a finite temperature, and the steepness \(\beta\) acts as an inverse temperature.

The Ising model: alignment, noise and criticality

Preview: The theme here is local alignment versus thermal noise: above a critical temperature \(T_c\), spins are disordered (magnetization near 0); below \(T_c\) the system picks a direction (\(|M| > 0\)). Hysteresis appears because flipping between the two ordered states requires breaking through an energy barrier.

The Ising model from statistical physics gives a microscopic theory for magnetism based on local interactions, and exhibits cooperative phenomena and phase transitions. The main observation here is that local interactions lead to global alignment, and there is a critical level of noise above which this alignment goes away.

It’s worth noting that this is one of the most studied models in physics (so we will definitely not be able to do justice to it) and has inspired energy-based models in ML as well (Boltzmann machines, Hopfield networks).

The state consists of a vector of “spins” \(s_i\), generally (but not necessarily) arranged on a lattice. Each spin is either \(1\) (up) or \(-1\) (down).

The behavior can be described using an energy function that assigns a scalar energy to every possible configuration. The system generally tries to go towards states of lower energy. Consider a system with nearest-neighbor interactions and an external field \(h\). It has energy \(H(s) = -J \sum_{\langle i, j \rangle} s_i s_j - h \sum_i s_i\). The first term is the coupling between adjacent spins (the angular brackets mean \(i, j\) are neighbors). \(J\), the coupling constant, determines the nature of interaction - \(J > 0\) means ferromagnetic (aligned spins lower energy) and \(J < 0\) means antiferromagnetic (aligned spins increase energy). We will consider the ferromagnetic case. The second term explains how the system interacts with an external magnetic field \(h\).

In this model, as is usual in statistical physics, we consider the effect of temperature. At zero “temperature” (no noise), the system would just find the minimum energy configuration. However when noise is added to the system (thermal fluctuations), the dynamics are defined using a certain statistical ensemble - a combination of states with all possible different energies with different probabilities depending on both the energy and the temperature. The distribution is the Boltzmann distribution - \(P(s) \propto \exp(-H(s) / k_B T)\) where \(k_B\) is the Boltzmann constant (we’ll set it to 1), and the constant of proportionality is the inverse of the partition function (for normalization purposes, but the partition function in general describes all macroscopic properties!). Note that at lower temperatures, the exponential is much sharper and strongly prefers states with lower energies, but at higher temperatures, this preference drops off, and the system has a more significant contribution from higher energy states than at lower temperature, making the coupling \(J\) less important.

To simulate this model, we generally simulate sequences of states (i.e., by doing spin flips) using algorithms like Glauber dynamics or Metropolis-Hastings (since we only know the proportionality, not the actual probabilities themselves). To use Glauber dynamics, at every step we pick a random spin \(s_i\), check the change in the energy of the system if it flipped, and accept the change with probability \(\frac{1}{1 + \exp(\Delta E / T)}\). The Metropolis-Hastings algorithm and Glauber dynamics both have similar properties under certain assumptions, so it doesn’t matter which we choose.

The macroscopic property that we care about is the average magnetization (mean of the spins). This is an “order parameter” in the sense that it tells us how much order there is in the system.

Naively, it seems that the symmetry of the problem would imply that the magnetization must be \(0\). However, it is noticed that this is not the case at low enough temperatures for large enough (practically infinite) lattices with, for instance, a 2D grid. Let’s first look at what happens at very high and very low temperatures. At high temperatures, the thermal noise overwhelms the coupling energy, and aligning spins together has barely any benefit. So on average, there are as many up spins as down spins, and the average magnetization becomes \(0\). At low temperatures, something much more interesting happens. Not correctly aligning with the neighbors has a huge cost, so spins strongly prefer to be aligned, forming large “domains” of the same spin. Practically, the system spontaneously breaks symmetry to get into a super-aligned state like this. We generally think of this as happening under an “infinitesimal” magnetic field \(h\) (which dictates which spin dominates) as we cross the critical temperature, after we take the limit as the number of spins goes to \(\infty\) (the order of these limits is important).

Figure 14: Fig 4B - Ising spin fields above/at/below T_c

Figure 14: Fig 4B - Ising spin fields above/at/below T_c

A more rigorous explanation is the following. As long as \(N\) (the number of spins) is finite, the energy barrier needed to go from \(M_0\) to \(-M_0\) is finite, so after a long enough time, the system will “tunnel” through this energy barrier. However, as \(N\) becomes larger (and in the limit as \(N\) goes to \(\infty\)), this becomes infeasible. This is also an instance of what’s called ergodicity breaking - i.e., the time average (how it is realized physically) is not the same as the ensemble average (how we think about averages in math).

The most surprising part about this is that the transition during cool-down (or heating) happens at a sharp critical temperature called the Curie temperature \(T_c\). For a 2D square lattice, this temperature is \(\frac{2J}{k_B \log (1 + \sqrt{2})}\) (we will not derive this, as it is quite involved). The reason why this transition happens is that as we reduce temperature beyond the critical temperature, the correlation length (i.e., scale of the length after which the correlation of spins reduces by a certain factor, say half) diverges - to be precise, far from \(T_c\), the correlation function at a distance is exponential, while at \(T_c\), the correlation function becomes a power law (and hence the correlation length diverges).

For \(T > T_c\), there is only one equilibrium state (\(\langle M \rangle = 0\)), but for \(T < T_c\) there are two stable equilibrium states - \(\pm M_0\) (and \(0\) becomes unstable). This is a change in the number of stable states as a parameter is varied. Mean-field here yields what is called a pitchfork bifurcation, and in the lattice model this corresponds to a phase transition at \(T_c\).

It is worth noting that this does not happen for one dimension. An intuitive reason is as follows - in one dimension, the energy required to flip a segment is constant, and so is an easily breakable barrier (and is feasible at all temperatures), but in two dimensions, the energy required to flip a domain is proportional to the perimeter of the domain, which grows as the domain grows.

There are also ways to reason about it, e.g., using mean-field theory methods to replace the neighbor average by the average magnetization, but due to errors with this approximation, it gets the constant factor in the critical temperature and the critical exponents wrong. Additionally, it also gets the qualitative behavior for one dimension wrong.

It’s still instructive, though. Let the spin \(s_i = M + \delta s_i\). Plugging this in, and using the approximation \(\delta s_i \delta s_j = 0\), we get the energy term as \(NqJM^2/2 - (h + JqM)\sum_{i} s_i\), where \(q = 4\) is the coordination number (number of neighbors). Using the “self-consistency” equation \(M = \langle s_i \rangle\) and doing some algebra, we get \(M = \tanh(JqM / T)\). One obvious solution to this is \(M = 0\), which is indeed the only (and stable) equilibrium when \(T\) is large, i.e., \(> Jq\) (because the slope of the tanh at the origin is lower than \(1\)), and is an unstable equilibrium for the other case. When \(T < Jq\), there are two other solutions to this equation: \(\pm M_0\). (Note here that the constant factor is incorrect, but we will live with that).

To figure out the stability of these points, we can do two things - either use the physics definition of free energy and do an asymptotic expansion in \(M\) and check the double derivative; or perturb the system at a microscopic level. We’ll go down the simulation path and consider what happens when we simulate the Glauber dynamics. Let’s say that in time \(dt\), a fraction \(\alpha dt\) of the spins are considered by the algorithm. The change in the total \(+1\) spins would be \(dt(\alpha N_{-1} P(-1\text{ flips}) - \alpha N_{+1} P(1\text{ flips}))\). Writing this in terms of magnetization and computing the energy barriers for each flip and simplifying, we get \(\dot{M} = \alpha(\tanh(JqM/T) - M)\). To check whether this is a stable equilibrium, we need to only perturb \(M\) and see if it comes back (which is equivalent to checking the double derivative of \(M\)). A simple calculation confirms that below \(T = Jq\), \(\pm M_0\) are stable equilibria and \(0\) is an unstable equilibrium, and above it, \(0\) is the only equilibrium and is stable.

Figure 15: Fig 4C - Magnetization and susceptibility vs temperature

Figure 15: Fig 4C - Magnetization and susceptibility vs temperature

Figure 16: Fig 4D - Hysteresis loop M(h) at T < T_c

Figure 16: Fig 4D - Hysteresis loop M(h) at T < T_c

Now consider the effect of an external field \(h\) at a low temperature (\(T < T_c\)).

  1. Start with a strong negative field, \(h \ll 0\). The system is in a state with \(M \approx -1\).
  2. Slowly increase \(h\) to 0 and then into positive values. Because the spins are strongly coupled, there is an energy barrier to flipping from the all-down state to the all-up state. The system can remain “stuck” in the \(M \approx -1\) state even for small positive \(h\). This is a metastable state. It is a local, but not global, energy minimum.
  3. As \(h\) increases further, the energy benefit of aligning with the field eventually overcomes the coupling barrier, and the system undergoes a rapid transition to the \(M \approx +1\) state.
  4. If we now reverse the process and decrease \(h\), the system will remain at \(M \approx +1\) even for small negative \(h\), until it abruptly flips back to \(M \approx -1\).

Plotting \(M\) as a function of \(h\) as we sweep \(h\) from negative to positive and back gives us what we call a hysteresis loop. The state of the system depends not only on the current value of \(h\), but also on its history. This “memory” is characteristic of systems with multiple stable states and the energy barriers between them. Note that this happens due to the system remaining in a metastable local minimum - if we have finite volume and an infinitely slow sweep, then the loop area reduces to zero as opposed to a large volume with a fast sweep (since it takes time to cross the energy barrier).

The discontinuous jump of magnetization at \(h = 0\) (at equilibrium below the critical temperature) is an example of first-order phase transition (as a function of \(h\) at fixed \(T < T_c\)), while the transition at we cross the critical temperature is called a second-order phase transition (this is just notation for which order derivative of the free energy changes discontinuously on these transitions).

Summary:

  • Knobs: temperature \(T\) (noise), coupling \(J\) (alignment)
  • Behavior: at \(T_c\), we go from disorder to order, and see metastability and hysteresis for \(T < T_c\).
  • Mechanism: local coupling creates domains, noise breaks them, and correlation length blows up near \(T_c\).

Connections with other parts of the post:

  • Mean-field self-consistency \(M = \tanh(kM)\) has the same structure as a pitchfork bifurcation normal form in 1D.
  • Domain growth and coarsening is similar to the behavior we see in Allen-Cahn/Cahn-Hilliard setups - the alignment term is similar to the surface tension cost we see there.
  • Hysteresis here is similar to the Vicsek model’s bands and hysteresis - history matters when there are multiple locally stable states.
  • Schelling’s ‘unhappiness’ metric plays the role of an energy, and moves that try to reduce it lead to clustering, which is a similar phenomenon as what we see here.

Schelling’s model of segregation

Preview: Mild local preferences (tolerance \(\tau\)) can create strong global segregation via relocation feedback.

Schelling’s model applies the logic of thresholds and spatial interactions to a social phenomenon: residential segregation. It turns out that extreme macroscopic segregation can arise from the collective actions of individuals who have only mild preferences for living near others of their own type.

The state here is a 2D grid (like a chessboard, which is why it’s also called Schelling’s chessboard model) populated by two kinds of residents (red and blue) and some empty cells. Each resident has a threshold \(\tau\) such that they’re happy if the fraction of neighbors of the same type is at least \(\tau\).

We again simulate this model - pick a random unhappy resident, move them to a randomly chosen vacant cell on the grid, and repeat until all residents are happy.

The way we’ll track the state would be through the “segregation index” which is the mean fraction of same-type neighbors over all residents. A larger value means more segregation.

Note that once a significant amount of unhappy residents start moving to places they’re happy at, this will lead to feedback loops with the neighborhoods they left being less attractive for their own type and the ones they join more attractive for their own type. So when there is low tolerance, the system self-organizes into highly segregated clusters.

The interesting part is the sharpness of this transition. There is a critical value of \(\tau\) where the system’s behavior flips. Even a seemingly small threshold (depending on occupancy, neighborhood and the move rule) can be enough to tip a mixed society into a highly segregated one. The global pattern of segregation ends up being far more severe than any individual’s stated preference.

Schelling’s model can be connected to the Ising model (and setting up an analogous model is left as a straightforward exercise for the reader). We can define an “unhappiness” function that is analogous to the energy or Hamiltonian. A state has lower energy if fewer residents are unhappy. The dynamic of residents moving to reduce their unhappiness is a form of energy minimization. The tolerance threshold \(\tau\) plays a role similar to that of temperature or the coupling constant, controlling the “strength” of the preference for alignment. Note that the total residents of each type are conserved here, so Glauber dynamics (which don’t conserve spin) are not sufficient - we instead need something like the Kawasaki dynamics. Also, a good formula for this transition threshold is not known for this problem, but simulations do show sharp transitions.

Figure 17: Fig 4E - Schelling segregation vs tolerance tau

Figure 17: Fig 4E - Schelling segregation vs tolerance tau

Summary:

  • Knobs: tolerance \(\tau\) and occupancy.
  • Behavior: jump from mixed to segregated.
  • Mechanism: feedback amplifies like-with-like neighborhoods, leading to segregation

Connections with other parts of the post:

  • This is similar to Ising - like-prefers-like (alignment), clusters grow and interfaces have a cost.
  • With moves that swap positions (unlike the Glauber dynamics we saw), this resembles conserved dynamics (see Kawasaki dynamics) and is connected to the Cahn-Hilliard coarsening that we look at later.
  • The jump in segregation as \(\tau\) changes is similar to threshold-driven bifurcations.

Reinforcement, herding, and heavy tails

Without any coupling or special structure, repeated random fluctuations tend to average out: independent additive noise plus bounded feedback gives thin tails and concentration (central limit theorem, Hoeffding bounds). The moment we introduce positive feedback that strengthens what already happened, independence breaks, and large deviations become far more likely. We’ll therefore discuss reinforcement and herding in this section, and some of their characteristics: rich-get-richer dynamics, path dependence, lock-in, and heavy-tailed statistics.

We will look at three archetypes:

  • Polya-type reinforcement and the Kirman-Folmer herding model (path dependence and bistability)
  • The sandpile model (self-organized criticality and power-law avalanches)
  • Multiplicative growth and Kesten processes (how random multiplicative amplification creates Pareto tails)

We will also contrast when to expect concentration (independence, boundedness) vs fat tails (reinforcement, multiplicative cascades, criticality).

Polya’s urn: reinforcement and path dependence

Preview: Rich get richer dynamics (and early luck sticks and determines the limiting behavior).

The state consists of \(R_t\) and \(B_t\) - the number of red and blue balls at time \(t\). The total is \(N_t = R_t + B_t\).

At each step, we draw one ball uniformly at random, observe its color, return it to the urn, and add another ball of the same color. If red is drawn, \((R_{t + 1}, B_{t + 1}) = (R_t + 1, B_t)\); if blue, \((R_{t + 1}, B_{t + 1}) = (R_t, B_t + 1)\). There is clear positive feedback here - the probability of drawing red is \(R_t/N_t\). A red draw increases \(R\), which increases the probability of drawing red next time.

A useful quantity in this case is the fraction of red balls \(p_t = R_t/N_t\). A short calculation shows that \(p_t\) is a martingale:

\[ \mathbb{E}[p_{t+1} \mid p_t] = \frac{R_t}{N_t}\cdot \frac{R_t+1}{N_t+1} + \frac{B_t}{N_t}\cdot \frac{R_t}{N_t+1} = \frac{R_t}{N_t} = p_t \]

Since \(p_t \in [0, 1]\) and is a bounded martingale, it converges almost surely to a limiting distribution \(p_\infty\).

What is the distribution of \(p_\infty\)? If we start with \(R_0 = a\) and \(B_0 = b\), it can be shown that \(p_\infty \sim \text{Beta}(a, b)\). Similarly, for a fixed time step \(n\), the number of red draws is \(\text{Beta-Binomial}(n; a, b)\).

This shows some interesting macroscopic behavior. From the same initial condition \((a, b)\), different realizations end up with different final proportions, rather than converging to the same value, as the central limit theorem would have suggested in the no-reinforcement case. So, there is a sense of path dependence here that affects the final state. Also, the distribution \(p_\infty\) is quite broad (but not heavy-tailed - the support is \([0, 1]\), as it should be for a fraction). Proportional reinforcement makes early shocks have persistent effects, and this also shows up in heavy tails once we move to unbounded quantities.

Figure 18: Fig 5A - Polya urn sample paths (p_t trajectories)

Figure 18: Fig 5A - Polya urn sample paths (p_t trajectories)

Figure 19: Fig 5B - Terminal p_inf histogram (Beta)

Figure 19: Fig 5B - Terminal p_inf histogram (Beta)

A similar idea called preferential attachment can be found in network growth, and leads to heavy tails. If each new node attaches to existing nodes with probability proportional to their current degree (this is called the Barabasi-Albert model), this is another instantiation of “rich get richer”. We can show that the resulting degree distribution has a power-law tail \(P(\text{deg} \ge k) \sim k^{-2}\) (or \(P(\text{deg} = k) \sim k^{-3}\)) (Proof sketch: consider the rate of change of “continuous” degree of a node that came in at time \(T\), at a later time \(t\), and integrate for the degree at time \(t > T\). Consider the equation degree \(> k\) and then solve for the maximum introduction time for a vertex of degree \(> k\), which is also the number of nodes added up until that point, and it ends up being the required power law in \(k\)). The coupling “probability \(\propto\) current size” is the same reinforcement as we’ve seen here. For this model specifically, we also see a first-movers advantage - older nodes generally have more edges.

Figure 20: Fig 5C - Preferential attachment degree tail (log-log)

Figure 20: Fig 5C - Preferential attachment degree tail (log-log)

Summary:

  • Knobs: initial counts \((a, b)\).
  • Behavior: path dependence, broad limiting distribution on \([0, 1]\) instead of concentration.
  • Mechanism: proportional reinforcement preserves early dynamics (persistence).

Connections with other parts of the post:

  • Preferential attachment networks have the same probability \(\propto\) current size local rules and this leads to heavy tailed degrees and a similar persistence of what happens early on.
  • Kesten processes are somewhat related - multiplicative amplification is preferential in some sense.
  • Additive noise in well-mixed systems leads to concentration (and convergence to a point instead of a distribution), which is something that explicitly does not happen here.

Kirman-Folmer herding: bistability

Preview: Two forces compete: idiosyncratic switching \(\epsilon\) vs herding \(h\). When herding dominates, the system spends long times near the extremes (bistability).

This is a simple continuous-time herding model for \(N\) agents choosing between two options (e.g., two food sources for ants).

Let \(i\) be the number of agents in option A; then \(N-i\) are in option B. The state here will be \(i\). The local rule is that an agent can switch spontaneously (idiosyncratic switching) at rate \(\varepsilon\), or be attracted by the number already in the other option (herding) at rate proportional to that number with coefficient \(h > 0\). The transition rates for \(i\) are \(q_{i \to i + 1} = \varepsilon (N - i) + h i (N - i)\) and \(q_{i \to i - 1} = \varepsilon i + h i (N - i)\), where \(q_{i \to i + 1}\) means “a B joins A”, and \(q_{i \to i - 1}\) means “an A joins B”.

This defines an irreducible birth-death Markov chain on \(\{0, 1, \dots, N\}\). The stationary distribution \(\pi\) can be found by solving the equilibrium constraints - \(\pi(i) q_{i \to i + 1} = \pi(i + 1) q_{i + 1 \to i}\). Solving the resulting recursion gives a Beta-Binomial law:

\[ \pi(i) \propto \binom{N}{i} \frac{\Gamma(\alpha + i) \Gamma(\alpha + N - i)}{\Gamma(\alpha)^2} \]

Where \(\alpha = \frac{\varepsilon}{h}\).

In the large-\(N\) limit, the proportion \(x = i / N\) has a \(\text{Beta}(\alpha, \alpha)\) stationary density.

If \(\alpha\) is large (idiosyncratic switching dominates), the density is unimodal around \(x = \frac{1}{2}\). The population splits roughly evenly. And if \(\alpha\) is small (herding dominates), the density becomes U-shaped and the chain spends long periods near the extremes \(x \approx 0\) or \(x \approx 1\), with intermittent switches between them. This is bistability with stochastic switching.

Figure 21: Fig 5D - Kirman stationary density Beta(alpha, alpha)

Figure 21: Fig 5D - Kirman stationary density Beta(alpha, alpha)

This model makes explicit how a simple herding term changes the stationary distribution from concentrated to bimodal and heavy at the boundaries (for the proportion). It is also a mean-field thing like threshold and Ising ideas from earlier, but with inherently stochastic switching.

Summary:

  • Knobs: \(\alpha = \epsilon / h\)
  • Behavior: unimodal (balanced) vs bimodal (herding dominated)
  • Mechanism: attraction to the majority affects the shape of the stationary distribution

Connections with other parts of the post:

  • Note how this is similar to Ising’s two ordered phases - the population lives around extremes in the bimodal case, but switching remains possible.

Sandpiles: self-organized criticality and avalanches

Preview: Slow addition + local thresholds + boundary loss push the system to criticality without tuning (self-organized). Avalanche sizes follow a power law.

The sandpile model is a simple example (but extremely rich, related to the chip-firing game and surprisingly many areas of math) of self-organized criticality (SOC): slow driving plus threshold dynamics and conservation push the system to a critical state without external fine-tuning (i.e., no parameter like temperature needs to be added to exhibit criticality). At criticality, responses (“avalanches”) have no characteristic size; and their distribution follows a power law.

Let’s consider a 2D square grid. Each site \(i\) has a non-negative integer height \(z_i\). We add one grain at a uniformly random site \(i\). After we’ve added a grain, we check each site for the possibility of toppling: if \(z_j \ge 4\), then the site \(j\) “topples” (\(z_j \leftarrow z_j - 4\)) and each of its neighbors receive a \(+1\). If there are no neighbors (e.g., at the boundaries), the grains drop off the grid. We repeat this until there are no sites that can be toppled. The topplings triggered by the addition of a grain is called an avalanche, the avalanche size is the size of this set (or the total number of topplings, both have scale-free statistics but usually with different power laws).

There are two structural properties to note here: far from the boundaries, toppling only redistributes grains. The slow addition increases the total mass until dissipation events balance it on average. The presence of these boundaries is important, as the power-law distribution of avalanches only holds up to a cutoff scale that is dependent on the system size. The final stable configuration after an addition also does not depend on the order of topplings (which is why this is also called an Abelian sandpile model).

After an initial transient trajectory, the system reaches a stationary ensemble of “recurrent” minimally stable configurations (in the Markov chain sense) - all reachable from one another - in this specific case, this is a deep result connected to the grid shape (in graph theoretic terms) through the Kirchhoff matrix-tree theorem creating a bijection between the recurrent states and the spanning trees of the graph. In this SOC state, the avalanche size distribution obeys a power law \(P(s \ge S) \sim C S^{-(\tau-1)}\) or \(P(s = S) \sim C’ S^{-\tau}\) with an exponent \(\tau > 1\) that depends on the dimension. Deriving this requires a lot of theory, so we will omit trying to derive power laws.

Figure 22: Fig 5E - Sandpile avalanche sizes (PDF/CCDF, log-log)

Figure 22: Fig 5E - Sandpile avalanche sizes (PDF/CCDF, log-log)

Why would power laws arise? A branching-process heuristic (that gives the wrong answer, but helps build intuition) would be the following: think of a toppling as “creating” new potential topplings among its neighbors. Let \(m\) be the average number of new topplings caused by one toppling. If \(m<1\), avalanches die out quickly (exponential tail); if \(m>1\), they blow up. The slow drive plus conservation (local count of grains) tunes the system to \(m\approx 1\), i.e., criticality. We can roughly think of it as being connected to a metric such as the average height - if the average height is too low, the avalanche lengths would be too small and dissipation would be pretty slow, so grains will accumulate, and if it is too high, dissipation will be faster than accumulation - so there is some sort of negative feedback here that brings such a metric to an equilibrium value. For a critical Galton-Watson process (a type of branching process) with finite variance, the total progeny has a tail \(P(s = S) \sim c S^{-3/2}\) The sandpile is not exactly a branching process (due to spatial correlations developing in the sandpile sites and looping, that is not possible in a branching process), but this roughly approximates the mechanism: at criticality, there is no characteristic scale, hence power laws.

What is new here compared to earlier threshold models is that no parameter is tuned to a critical value externally. This behavior is also universal (for more reading, see universality classes); the critical exponents like \(\tau\) are robust and do not depend on microscopic details like the exact toppling threshold, but rather on general properties like the system’s dimension. The combination of slow input and boundary dissipation self-organizes the system to criticality (but it depends on timescale separation - the addition of grains is slow enough that the avalanches always finish before the next grain is added). The result is heavy-tailed fluctuations in response to tiny perturbations.

Summary:

  • Knobs: none (but grid sizes do affect scaling laws).
  • Behavior: self-organized criticality - system naturally goes to a critical state (an attractor), and such a state has power law behavior in terms of avalanche sizes/durations, as long as the scales are smaller than the size of the board.
  • Mechanism: conservation in the bulk, thresholds and slow drive

Connection to the rest of the post:

  • The system stabilizes on its own due to a separation of timescales - as we see later in the toolbox, at separated timescales, the solution to the system’s equations lives on a manifold, which here turns out to be a set of recurrent states.
  • As we saw in Ising, at criticality, the correlation scales blow up at \(T_c\). Something similar happens here too, as the system (when started from an empty board) goes from a below-criticality state to criticality.
  • Universality classes are an important concept we briefly talked about here - this concept shows up in a large variety of models such as the Ising model, but is out of scope for this blog post.

Multiplicative growth and Kesten tails

Preview: Random multiplicative amplification with occasional positive amplifications and floors on additive noise produces stationary Pareto tails, whose exponent is determined by the distribution of amplifications.

A second, probably much more common way to get to heavy tails is multiplicative amplification. Consider the linear stochastic recurrence (Kesten process):

\[ X_{t+1} = A_t X_t + B_t \]

where \(\{(A_t, B_t)\}\) are i.i.d., and \(A_t \ge 0\), \(B_t \ge 0\), and \(X_t \ge 0\). Under mild conditions (e.g., \(\mathbb{E}[\log A_t] < 0\) so that the process does not explode on average, \(P(A_t > 1) > 0\), so occasional amplifications occur, \(P(B_t > 0) > 0\), aperiodicity, and \(B_t\) has light tails), there exists a unique stationary distribution. Its upper tail is a power law:

\[ \lim_{x \to \infty} x^\kappa P(X > x) = C \in (0, \infty) \]

where the tail index \(\kappa > 0\) is the unique solution to \(\mathbb{E}\left[A_t^{\kappa}\right] = 1\).

This is the Kesten-Goldie theorem (we omit the proof due to length). The intuition is that rare runs in which \(A_t\) is greater than 1 produce very large values of \(X\) - the probability of such runs decays geometrically, and the balance of these probabilities with the multiplicative growth produces a power law.

Figure 23: Fig 5F - Kesten process tail check (slope -kappa)

Figure 23: Fig 5F - Kesten process tail check (slope -kappa)

There are a lot of examples of such behaviors in economics: if wealth grows roughly multiplicatively with random shocks, with small additive floors (\(B_t\)), stationary wealth distributions display Pareto tails. Similarly, multiplicative cascades in volatility models (e.g., stochastic volatility) can produce heavy-tailed returns. Connecting this to preferential attachment and Yule processes, expected degree grows proportionally to current degree; the induced multiplicative advantage yields heavy-tailed degree distributions. This is a discrete analogue of Kesten growth.

Note the contrast with additive models: for \(X_{t+1} = X_t + \xi_t\) with i.i.d. \(\xi_t\) having finite variance, stationary fluctuations (if any) are thin-tailed (Gaussian-like), and large deviations are exponentially rare. So, multiplicative amplification changes the picture significantly even qualitatively.

Summary:

  • Knobs: distribution of \(A\) and \(B\). Stationary needs \(E[\log A] < 0\) with some mass on \(A > 1\) (along with some other technical conditions).
  • Behavior: Pareto tail with index \(\kappa\) determined by \(E[A^\kappa] = 1\).
  • Mechanism: rare multiplicative runs balance with probabilities to create a power law.

Connections with other parts of the post:

  • While Polya’s shares are bounded, Kesten gives unbounded heavy tails with power-law decay.
  • Again, additive-noise models concentrate to distributions where large deviations are exponentially rare, which is unlike the power law tails we see here.

Swarms and distributed coordination

We now turn to distributed alignment: how local rules that couple “headings” or velocities generate coherent large-scale motion. As in Ising, local coupling competes with noise and order parameters capture global motion. We will look at two canonical models and the coarse-grained picture they lead to. As before, keep track of: state variables, local update, coupling/topology, noise, and macroscopic summaries (order parameters, conserved quantities, convergence criteria).

Vicsek alignment: headings, noise, and a mean-field self-consistency

Preview: Local heading alignment versus noise: at low noise/high density the group picks a common direction (order parameter \(\phi > 0\)); at high noise it stays disordered (\(\phi \approx 0\)). Mean-field captures the onset but misses traveling bands and hysteresis seen at finite speed.

Let’s look at the state and the local rule. Agents \(i = 1, \dots, N\) have positions \(x_i(t) \in \mathbb{R}^2\) and “headings” (directions they’re heading towards) \(\theta_i(t) \in (-\pi, \pi]\). All of them move with a fixed speed \(v > 0\). With “metric” interactions (radius \(r\)), at discrete time step \(\Delta t\), the dynamics are as follows:

\[ \theta_i(t + \Delta t) = \text{arg} \left(\sum_{j: \|x_j(t) - x_i(t)\| \le r} e^{i \theta_j(t)}\right) + \eta \xi_i(t) \]

where \(\xi_i(t)\) are i.i.d. zero-mean. Positions update by

\[ x_i(t + \Delta t) = x_i(t) + v \Delta t (\cos \theta_i(t + \Delta t), \sin \theta_i(t + \Delta t)) \]

We define the “order parameter” as

\[ \phi(t) = \frac{1}{N} \left\| \sum_{i=1}^N e^{i\theta_i(t)} \right\|\in[0,1]. \]

Randomly distributed headings give \(\phi \approx 0\) and near-perfect alignment of directions gives \(\phi \approx 1\). The dynamics are rotationally symmetric; in an ordered phase the symmetry is broken by selecting a macroscopic heading, similar to what we saw in the Ising model.

Figure 24: Fig 6A - Vicsek order parameter vs noise/density

Figure 24: Fig 6A - Vicsek order parameter vs noise/density

Consider the mean-field (well-mixed, continuous-time) limit. At high density (mean neighbor count \(m \approx \rho \pi r^2\), with \(\rho = N / L^2\) on a periodic square of side \(L\)) and small \(\Delta t\), one can pass to a drift-diffusion in angle (use a partial change in \(\theta\) with an interpolation fraction \(\alpha\) between the initial and final angle, set \(\alpha = K \Delta t\) and \(\eta = \sqrt{2D \Delta t}\), and use \(\sin\) to approximate the “unwrapping” of deviations from \(\psi\) to first order which can also be interpreted as an Ising-type model with a cosine-in-angle-diff potential):

\[ d\theta_i = K r(t) \sin(\psi(t) - \theta_i) dt + \sqrt{2D} dW_i(t) \]

where \(r e^{i\psi} = \frac{1}{N} \sum_j e^{i \theta_j}\), \(K > 0\) scales with alignment strength times the mean neighbor count, and \(D\) is an angular diffusivity induced by noise (\(D\) increases with \(\eta\)). Let \(f(\theta,t)\) be the density on the circle. Using the Fokker-Planck equation, we get that

\[ \partial_t f = D\partial_\theta^2 f + \partial_\theta \left( K r \sin(\theta-\psi) f \right) \]

In a steady state with fixed \(\psi\) (we can without loss of generality set \(\psi=0\)), integrating once yields \(Df’(\theta) + K r \sin \theta f(\theta) = 0\) which gives \(f(\theta) = \frac{1}{2\pi I_0(\kappa)} e^{\kappa\cos\theta}\) where \(\kappa = \frac{K r}{D}\). This is the von Mises distribution (the angular equivalent of the normal distribution in the sense of the central limit theorem).

The order parameter then satisfies the following equation, derived from self-consistency:

\[ r = \int_{-\pi}^{\pi} \cos \theta f(\theta) d\theta = \frac{I_1(\kappa)}{I_0(\kappa)} = \frac{I_1 \left(\frac{K}{D} r \right)}{I_0 \left(\frac{K}{D} r \right)} \]

Expanding \(I_1/I_0\) at a small argument \(r\) gives \(r \approx \frac{K}{2D} r - \frac{1}{16}(\frac{K}{D})^3 r^3 + \cdots\). When \(K > K_c = 2D\), we see a bifurcation (two real solutions).

We can interpret this as the density and noise having a tradeoff, since \(K\) grows with the mean neighbor count \(m\) and \(D\) with \(\eta\).

There are some caveats to this solution (as with all mean-field solutions) though. The mean-field argument ignores the coupling between density and order created by self-propulsion. In the metric Vicsek model at finite speed and size, we observe traveling high-density (and high-order) bands and a weakly first-order transition with hysteresis. Still, the mean-field picture captures a bunch of important things - the existence of a disorder-order transition, von Mises stationary angles deep in the ordered phase, and the density-noise tradeoff.

Figure 25: Fig 6B - Vicsek snapshots: disordered gas vs aligned band

Figure 25: Fig 6B - Vicsek snapshots: disordered gas vs aligned band

This model specifically has been studied a lot for understanding collective motion, and there exist theories that treat this model in both a hydrodynamics context as well as for arbitrary particle densities, and study the dynamics of this system.

Summary:

  • Knobs: noise \(\eta\) and density through neighbor count
  • Behavior: disorder-order transition, bands and hysteresis
  • Mechanism: local averaging of headings beats random angular forces

Connections with other parts of the post:

  • Mean-field consistency leads to a bifurcation here that is similar to the Ising case pitchfork bifurcation.
  • The hydrodynamic equations below explain bands and hysteresis in this model (in the continuous limit) that the mean-field model fails to describe.
  • Cucker-Smale is a related continuous time velocity-alignment model which also uses local averaging to reduce disagreement between peers.

Cucker-Smale: continuous-time velocity alignment and convergence

Preview: With distance-weighted velocity averaging, the velocity variance decays: velocities align if interactions are strong/long-ranged enough.

In this model, agents \(x_i(t)\in\mathbb{R}^d\), \(v_i(t)\in\mathbb{R}^d\) follow \(\dot{x}_i = v_i\), \(\dot{v}_i = \frac{1}{N} \sum_{j=1}^N \psi(\|x_i - x_j\|)(v_j - v_i)\) with a non-increasing influence kernel, e.g., \(\psi( r) = \frac{K}{(1 + r^2)^\beta}\) with \(K > 0\) and \(\beta \ge 0\) - the velocities try to align with that of the remaining agents.

Let’s look at some invariants and potential monovariants. By summing over the acceleration equations, the mean velocity is conserved. Consider the variance in velocity (or, kinetic energy in the center-of-mass frame, which we expect to reduce due to the dissipative force terms) \(\mathcal{V}(t) = \frac{1}{2N} \sum_i \|v_i-v_c\|^2 = \frac{1}{4N^2} \sum_{i,j} \|v_i-v_j\|^2\). A simple calculation gives \(\dot{\mathcal{V}}(t) = -\frac{1}{2N^2} \sum_{i,j} \psi(\|x_i-x_j\|) \|v_i-v_j\|^2 \le -2 \psi(R(t)) \mathcal{V}(t)\) where \(R(t)=\max_{i,j}\|x_i-x_j\|\). Hence \(\mathcal{V}(t) \le \mathcal{V}(0) \exp \left(-2 \int_0^t \psi(R(s)) ds \right)\), which bounds the variance as time grows.

Figure 26: Fig 6C - Cucker-Smale velocity variance convergence

Figure 26: Fig 6C - Cucker-Smale velocity variance convergence

Let \(U(t) = \max_{i, j} \|v_i(t) - v_j(t)\|\) be the velocity spread. Taking the maximizing pair and using the acceleration definition, we get \(\dot{R}(t) \le U(t)\) and \(\dot{U}(t) \le -\psi(R(t)) U(t)\). So if we define \(L(t) = U(t) + \int_0^{R(t)} \psi(s) ds\), then \(\dot{L} \le 0\), which means \(L\) is non-increasing. This means that if \(\int_0^\infty \psi(s)ds = \infty\), then \(R(t)\) stays bounded and \(\mathcal{V}(t)\to 0\), i.e., velocities align to the conserved \(v_c\). When this integral is finite, then flocking still occurs when the initial spread is small enough (more specifically, \(\int_{R(0)}^\infty \psi(s) ds > U(0)\) implies \(R(t)\) bounded, because if \(R\) tried to exceed any \(R’\) with the integral from \(R(0)\) to \(R’\) being \(> U(0)\), \(L(t) \le L(0)\) would force \(U(\cdot) < 0\) which is not true).

In particular, for \(\psi( r)=K(1+r^2)^{-\beta}\) for \(\beta \le \frac{1}{2}\), velocities converge too.

We can also look at this like a graph. With weights \(w_{ij}(t)=\psi(\|x_i-x_j\|)/N\) and Laplacian \(L(t)\), we have \(\dot{v}=-L(t)v\). Decay of \(\mathcal{V}\) is controlled by the time-varying algebraic connectivity. Slowly decaying \(\psi\) means connectivity is more persistent and leads to consensus better.

It is also interesting to see what happens if we added some noise to the acceleration equations - this is left as an exercise for the reader.

Summary:

  • Knobs: influence kernel decay \(\beta\) and the initial spread.
  • Behavior: convergence of velocities (flocking) if the total influence dominates in the end.
  • Mechanism: variance acts as a Lyapunov function and the mean velocity is conserved.

Connection to the rest of the post:

  • Note how the non-increasing variance as a Lyapunov function is an example of the monovariant idea we developed earlier.
  • See the networks and spectra section in the toolbox for some more discussion.
  • This is also quite similar to Vicsek in terms of having alignment terms.

Hydrodynamics and long-range order

Preview: A continuum version of Vicsek: alignment gain, pressure, and viscosity determine whether uniform motion is stable. If alignment grows with density, long waves can destabilize, creating traveling bands and hysteresis.

Note that this section is worth revisiting after the section on spatial dynamics.

When we have a large number of particles, we move to a continuous system (a field) by introducing the density \(\rho(x,t)\) and mean velocity \(u(x,t)\) (“polarization”). We will consider a process that shows interesting behavior. Firstly, as is expected from matter, we have a mass-conservation equation:

\[ \partial_t \rho + \nabla \cdot(\rho u) = 0 \]

The actual process is driven by the following differential equation:

\[ \partial_t u + c (u \cdot \nabla) u = \alpha u - \beta |u|^2 u - \nabla P(\rho) + \nu \nabla^2 u + \xi \]

with constants \(c, \alpha, \beta, \nu \ge 0\), an effective pressure \(P(\rho)\) increasing in \(\rho\), and a zero-mean noise \(\xi\).

This is a special case of the Toner-Tu equation from hydrodynamics. Here, we can intuitively read off what each term does. The \(c (u \cdot \nabla) u\) term is an advection term that basically transports the flow along the velocity vector (i.e., values of \(u\) are carried downstream by the current \(u\)). The terms with \(\alpha\) and \(\beta\) form a non-linear function that provides steady state dynamics for \(u\) when \(\alpha > 0\), and kills it when \(\alpha \le 0\). The \(P(\rho)\) is a pressure that increases with \(\rho\), so high density regions push outwards (force due to “compression pressure”). The \(\nabla^2 u\) term is a Laplace smoothing term (viscosity) that finishes off small scale (high frequency) patterns in \(u\) (the easiest way to see this is to try to simulate the Euler step discretely for this - it looks like a local averaging). And the \(\xi\) term is to just simulate noise.

Let’s try to reconcile this back with the Vicsek model (discrete agents instead of the continuous fields we’ve been considering). In that model, we saw alignment gain beating random noise as the reason behind the phase transition. Here \(\alpha\) is what does that job: when \(\alpha \le 0\), we get \(u\) dying down (so no polarization), and when \(\alpha > 0\), we see that in the steady state, \(\alpha = \beta |u|^2\). Intuitively speaking, \(\alpha\) increases with how strong the alignment is and the neighborhood density, \(\nu\) is an effective viscosity that diffuses polarization, \(\xi\) sets the level of randomness, and \(c\) is a term that controls how strongly advection is important.

Doing a small perturbation in \(\rho\) and \(u\) at the stable state (and ignoring \(\xi\)), and keeping only first order terms in the perturbations, and splitting the perturbations in \(u\) into parallel and perpendicular components, we get the following three equations:

Parallel component:

\[ \partial_t \delta u_\parallel + c |u_0| \partial_{\parallel} \delta u_\parallel = - P’(\rho_0) \partial_{\parallel} \delta \rho + \nu \nabla^2 \delta u_\parallel - 2 \beta |u_0|^2 \delta u_\parallel \]

Perpendicular component:

\[ \partial_t \delta u_\perp + c|u_0| \partial_{\parallel} \delta u_\perp = - P’(\rho_0) \nabla_\perp \delta \rho + \nu \nabla^2 \delta u_\perp \]

And the mass conservation (continuity) equation:

\[ \partial_t \delta\rho + |u_0| \partial_{\parallel} \delta\rho + \rho_0(\partial_{\parallel} \delta u_\parallel + \nabla_\perp \cdot \delta u_\perp) = 0 \]

There is an advection term here as usual, only the parallel component has a restoring force, and the Laplace smoothing term damps both components. So for \(\alpha > 0\) and \(k > 0\) this is a stable equilibrium (unless \(\alpha\) is a function of \(\rho\), in which case this is not always the case). It might be worth looking at some specific instantiations of these equations to get a better feel for what is happening here (e.g., a periodic box and applying Fourier to this equation).

We now look at how bands and hysteresis show up here. Let’s look at bands first. In particle models with metric neighborhoods, alignment effectively strengthens with local density. Let’s say \(\alpha = \alpha(\rho)\) with \(\alpha’(\rho_0) > 0\). Let’s ignore the noise term \(\xi\) as earlier, linearize around a uniform ordered state \(\rho = \rho_0\) and \(u = u_0 \hat{e}\).

Since \((\rho,u)=(\rho_0,u_0)\) be a uniform ordered equilibrium, we have \(\alpha(\rho_0) u_0 - \beta u_0^3 = 0\), so \(u_0 = \sqrt{\frac{\alpha(\rho_0)}{\beta}}\).

Set \(a_0 = \alpha(\rho_0)\), \(a_1 = \alpha’(\rho_0)\), \(p_1 = P’(\rho_0)\).

For perturbations \(\delta \rho, \delta u \propto e^{\sigma t + ikx}\), we get \(\sigma \delta \rho + ik(u_0 \delta \rho + \rho_0 \delta u) = 0\), \(\sigma \delta u + ik c u_0 \delta u = (a_0 - 3 \beta u_0^2) \delta u + a_1 u_0 \delta \rho - ik p_1 \delta \rho - \nu k^2 \delta u\).

For a non-trivial solution, we have \(\det \begin{pmatrix} \sigma + ik u_0 & ik \rho_0\\ -a_1 u_0 + ik p_1 & \sigma + ik c u_0 + 2 \beta u_0^2 + \nu k^2 \end{pmatrix} = 0\).

Let \(\sigma(k)\) be the branch with \(\sigma(0) = 0\). Expanding \(\sigma(k) = s_1 k + s_2 k^2 + s_3 k^3 + \mathcal{O}(k^4)\), we get \(s_1 = -i \left(u_0 + \frac{a_1 \rho_0}{2 \beta u_0} \right)\) and \(s_2 = \frac{\rho_0 (-4 \beta^2 u_0^2 p_1 + a_1^2 \rho_0 - 2 a_1 \beta (c - 1) u_0^2)}{8 \beta^3 u_0^4}\). The expression for \(s_3\) is not needed for the instability threshold.

Note that the growth/decay at long wavelength (and small wave number \(k\)) is set by \(\Re \sigma \approx s_2 k^2\), so there is an instability if \(s_2 > 0\).

In the special case of \(c = 1\), we get \(a_1 > 2 \sqrt{\frac{\beta a_0 p_1}{\rho_0}}\). This can be interpreted as the density gain in alignment \(a_1\) being required to overcome the restoring pressure stiffness \(p_1\) (and the ordered-state non-linearity through \(a_0\) and \(\beta\)). Viscosity is not present at order \(k^2\) (it regularizes high \(k\) modes, i.e., modes with short wavelength).

Now, let’s look at what happens when we have two states: a disordered gas state and an ordered liquid state (\(u_g = 0\) and \(u_l = \sqrt{\alpha(\rho_l) / \beta}\)). We look for a 1D traveling wave with speed \(s\): \(\rho(x, t) = \rho(z)\), \(u(x, t) = u(z)\), \(z = x - st\), connecting the asymptotic states \((\rho, u) \to (\rho_g, u_g)\) as \(z \to -\infty\) and \((\rho, u) \to (\rho_l, u_l)\) as \(z \to +\infty\).

From continuity, we get \(\partial_z \left(\rho(u - s)\right) = 0\), so \(J = \rho(u - s)\) is constant. This means that \(\rho(z) = \frac{J}{u(z) - s}\). Plugging in the asymptotic conditions, we get \(s = \frac{\rho_l u_l}{\rho_l - \rho_g}\).

In the traveling frame, we get \(\nu u’’ + (c u - s) u’ = \alpha \left(\frac{J}{u - s}\right) u - \beta u^3 - \frac{d}{dz} P \left(\frac{J}{u - s}\right)\) where \(’\) denotes \(d/dz\). Integrating this over \(z\) gives us the solutions to the equation.

Let’s consider an explicit parametric form \(\alpha = a(\rho - \rho_c)\) and \(P = c_s^2 \rho\) where \(a, c_s > 0\). This gives us \(u_0 = \sqrt{\frac{a (\rho_0 - \rho_c)}{\beta}}\). For a linear wave, we use the condition derived earlier (and assume \(c = 1\)) to get \(a \rho_0 > 4 \beta c_s^2 (\rho_0 - \rho_c)\). Now, using the relation between \(s\) and the densities, we can plug it back into the integrated equation to check coexistence for different density pairs (this requires numerical integration though).

Hysteresis arises in this system too - let’s see how. Consider the set of parameters for which the uniform disordered state \(\rho_0, u_0 = 0\) and the uniform ordered state \(\rho_0, u_0 > 0\) are both linearly stable and the traveling-wave constraints admit a connection between \((\rho_g, 0)\) and \((\rho_l, u_l)\) with \(\rho_g < \rho_l\). Then:

  • On down-sweeps (decreasing noise or increasing density), trajectories started near the disordered state remain there unless a finite-amplitude perturbation satisfies the traveling wave constraints, which delays the transition.
  • On up-sweeps, a pre-existing band remains admissible up to a larger noise (or lower density), which delays the loss of order.

This produces an interval where both states coexist and hence gives a hysteresis loop in the order parameter. Note how this is different from the mean-field analysis of the Vicsek model, which misses such hysteresis behavior.

Summary:

  • Knobs: alignment gain \(\alpha\) (can depend on density), pressure stiffness, viscosity.
  • Behavior: uniform order can destabilize into bands and hysteresis can be observed
  • Mechanism: coupling of density and polarization via advection and density-dependent alignment

Connections with other parts of the post:

  • Advection and density-dependent alignment terms explain traveling bands and hysteresis in Vicsek.
  • The \(-Dk^2\) term here acts in a manner similar to how it works in reaction-diffusion - $k$-dependent damping competes with alignment gain.

Selection dynamics

We now look at selection, i.e., what happens when “types” of things (e.g., species) reproduce or get copied in proportion to how well they do. We’ll see that the same set of ideas works here too. We will primarily look at the replicator equation, which is a simple model of how relative advantage can lead to changes in population composition. This can be treated as a more realistic variant of Hardy-Weinberg dynamics, where selection forces are at play.

Replicator dynamics

Preview: Copying in proportion to success; with two types it reduces to a logistic-like equation for population proportion; in potential settings the mean payoff acts like a Lyapunov function.

Let \(p(t) = (p_1, p_2, \dots, p_n)\) be the population composition of \(n\) types, with \(p_i \ge 0\) and \(\sum_i p_i = 1\). Let each type’s fitness be \(f_i(p)\). The replicator ODE is

\[\dot{p}_i = p_i (f_i(p) - \bar{f}(p))\]

where \(\bar{f}(p) = \sum_{k = 1}^n p_k f_k(p)\). This is derived from the population counts growing as \(\dot{N}_i = f_i(p) N_i\).

There are two immediate things to notice: \(p\) always remains in the probability simplex (i.e., when \(p_i = 0\), it stays at zero, and the sum of \(p_i\) is conserved), and the log difference between the probabilities is an integral of the difference between utilities.

Let’s first consider the special case when there are only two types A and B. The equation simplifies to \(\dot p = p (1 - p) \Delta f(p)\) where \(p\) is the population fraction for A, and \(\Delta\) is the fitness difference between \(A\) and \(B\). If this difference is constant, this becomes the logistic growth case as we saw earlier. If payoffs depend linearly on \(p\), there are fixed points at \(0\), \(1\) and at another interior point that depends on these coefficients. The interior point is stable iff it is an anti-coordination game, and \(0\) and \(1\) are stable if and only if it is a coordination game.

Figure 27: Fig 7A - 2-by-2 replicator portraits

Figure 27: Fig 7A - 2-by-2 replicator portraits

Let’s now go back to the \(n\) dimensional case. Firstly, consider a constant \(f\) for the sake of simplicity. Then we get \(\dot{\bar{f}} = \text{Var}_p(f) \ge 0\), which is a good monovariant (in the context of dynamical systems, such a function that helps us prove convergence to an equilibrium is called a Lyapunov function). Note that \(\bar{f}\) is precisely the function whose (standard Euclidean) gradient is \(f\). This motivates a more general observation: suppose \(f\) is the gradient of a function \(\Phi\). Then simple algebra shows that \(\dot{\Phi} = \text{Var}_p (\nabla \Phi) \ge 0\), so replicator dynamics do an ascent on \(\Phi\) (the geometry under which they do this ascent is defined by the Fisher information metric).

Figure 28: Fig 7B - Replicator ascent on a potential

Figure 28: Fig 7B - Replicator ascent on a potential

There is another Lyapunov function that is useful here - the KL divergence between the equilibrium probabilities and the perturbed probabilities. Consider the problem where the fitness function vector is \(Ap\). Then it can be shown that an evolutionarily stable strategy is a stable equilibrium - the derivative of the KL divergence is \(u(p, p) - u(p^*, p)\), where \(u\) is the symmetric payoff function (in game theoretic terms) and \(u(p, q) = p^T A q\), so perturbations are stable due to the definition of an evolutionarily stable strategy.

Summary:

  • Knobs: payoff differences and game structure.
  • Behavior: population proportions (shares) move towards higher payoffs, and the interior vs boundary stability depends on the game.
  • Mechanism: relative payoff drives growth, and in potential games, we see ascent on the potential

Connections with other parts of the post:

  • The 2-type cases is logistic growth with a payoff-dependent equilibrium.
  • In potential games, the mean payoff is a Lyapunov function, which is a monovariant that keeps increasing, and hence converges.

Replicator-mutator dynamics

Preview: Mutation keeps types from vanishing: with a positive mutation matrix, the long-run state sits in the interior and balances selection with mutation.

Selection generally does not operate in isolation. Mutation/innovation keeps types from dying out completely. Modeling mutation as a (column-stochastic) matrix \(Q\) such that \(Q_{ji}\) denotes the probability that offspring of \(j\) has type \(i\), we have \(\dot{p}_i = \sum_j p_j f_j(p) Q_{ji} - \bar{f}(p) p_i\) (and in matrix form, \(\dot{p} = Q^T F(p) p - \bar{f}(p) p\) where \(F(p) = \text{diag}(f(p))\).

If \(Q\) has positive entries (and is hence ergodic) and \(f\) is bounded below, then there is an attractor in the interior of the simplex and it balances selection and mutation. This can be shown by considering the equations at the boundaries of the simplex.

Let’s try a different type of mutator dynamics. If we, as before, consider a potential setting \(f = \nabla \Phi\), with \(\dot{p}_i = p_i (f_i - \bar{f}) + \tau(1/n - p_i)\) (uniform mutations independent of fitness), we get ascent (under the same geometry) of \(\Phi + \tau/n \sum_{i = 1}^n \log p_i\). The proof is simple: the time derivative of this value is \(\sum_i p_i \left(f_i - \bar{f} + \frac{\tau}{np_i} - \tau\right)^2 \ge 0\). Note that \(\tau/n \sum_{i = 1}^n \log p_i\) is proportional to the negative cross entropy of \(p_i\) under a ground truth of the uniform distribution - so the maximizer tries to also push \(p_i\) close to a uniform distribution. Under a logit mutation setup, instead of the reverse KL divergence, we would have had a forward KL divergence which makes it an entropy term instead of sum of log-probabilities.

Figure 29: Fig 7C - Replicator-mutator interior fixed point

Figure 29: Fig 7C - Replicator-mutator interior fixed point

Summary:

  • Knobs: mutation rates (\(Q\)) and the payoffs.
  • Behavior: interior attractor with positive mutation, trades off fit vs diversity.
  • Mechanism: selection term plus encouragement towards spread out probabilities instead of clustered ones

Connections with other parts of the post:

  • Mutation trading off against selection is like how noise pulls systems back from the corners (similar to how finite-temperature Ising is never perfectly saturated).

Ricardian trade as selection of specializations

Preview: With a relative price \(p\), each region specializes in the good where it has a lower relative labor cost (comparative advantage). As \(p\) moves between the two regions’ cost ratios, the efficient outcome flips to specialization in different goods.

Consider two goods \(X, Y\) and two regions \(A, B\). Producing one unit of \(s \in \{X, Y\}\) in \(r \in \{A, B\}\) needs \(a_s^r > 0\) units of labor. Each region has \(L^r\) units of labor. Let the outputs in region \(r\) be \((q_X^r, q_Y^r)\). The constraint is on the total units of labor used - \(\sum_s a_s^r q_s^r \le L^r\), \(q_s^r \ge 0\). Fix a relative price \(p = P_X / P_Y\).

It is optimal to use all labor units. Let \(s\) be the share of labor used by \(X\). Then \(q^r_X(s) = sL^r / a_X^r\) and \(q_Y^r(s) = (1 - s) L^r / a^r_Y\). The value created is proportional to \(sp/a^r_X + (1 - s)/a^r_Y\), which is affine in \(s\), so an optimal strategy is to put all labor on \(X\) if \(p \ge a^r_X/a^r_Y = \theta_r\), and we call this threshold the relative cost of \(X\) in region \(r\).

If \(\theta_A < \theta_B\), we say \(A\) has the comparative advantage in \(X\) and \(B\) has the comparative advantage in \(Y\).

When we combine the outputs of the two regions, and figure out the efficient boundary, i.e., how to maximize \(Q_Y\) for a given value of \(Q_X\), it becomes a piecewise linear function with a break at the point where \(A\) produces only \(X\) and \(B\) produces only \(Y\). Moving left means we reassign some of \(A\)’s labor from \(X\) to \(Y\), and moving right means we reassign some of \(B\)’s labor from \(Y\) to \(X\).

Note that whenever \(p\) is between \(\theta_A\) and \(\theta_B\), it is optimal for both regions to specialize in the goods they have a comparative advantage in, while outside of this range, it is optimal for them to specialize in the same thing.

Starting from non-optimal cases, under simple assumptions, the two-type replicator model applies to this advantage difference, and it shows how under no further restrictions, the system converges to an equilibrium that is also an optimum.

Summary:

  • Knobs: relative price \(p\)
  • Behavior: specialization patterns switch at \(\theta_A\) and \(\theta_B\).
  • Mechanism: piecewise linear value in labor share lead to corner solutions.

Spatial structure: patterns from local rules

So far we’ve mostly worked in well-mixed (mean-field) settings. However, a lot of interactions are spatial - things diffuse, infections spread across different regions. We saw a preview of this in the discussion on hydrodynamics earlier. The idea of transport and dispersion arises from spatial interactions, and we get to see some interesting phenomena like waves, phase separation and so on.

Reaction-diffusion and diffusion-driven instability

Preview: A steady state that’s stable without space can become unstable when diffusion rates differ: a fast ‘inhibitor’ and slow ‘activator’ create stripes/spots in a band of spatial wavelengths.

Let \(u(x, t)\) and \(v(x, t)\) be two concentrations that can react with each other and diffuse. A simple continuous model is: \(\partial_t u = f(u, v) + D_u \nabla^2 u\), \(\partial_t v = g(u, v) + D_v \nabla^2 v\), where \(D_u, D_v > 0\).

Let’s try to understand this in a bit more detail. The left-hand sides of the above equations are time derivatives, expressing how much the concentration changes at the current position. The right-hand sides have two terms - the first term is simply an interaction term, that shows how \(u\) and \(v\) interact (or “react”) to change the concentrations over time. The second term corresponds to the Laplacian, and it measures how much the neighborhood average of the function differs from the value at the point (consider a discrete 1D grid - \(\nabla^2 u(x_i) \approx (u_{i-1} - 2u_i + u_{i+1}) / (\Delta x)^2\)). If a point has more concentration than its neighboring points, we’ll get flow from that point to its neighbors (for more intuition, look up gradients, flux, divergence, Laplacian and Fick’s law).

Pick a homogeneous steady state \((u^*, v^*)\) of the ODE (i.e., such that the values of the reaction terms \(f\) and \(g\) are \(0\)). Let \(J\) be the Jacobian of the reactions at that point.

\[J = \begin{pmatrix} f_u & f_v\\ g_u & g_v\end{pmatrix}\Big|_{(u^*,v^*)}\]

Suppose the ODE is locally stable there (i.e., \(\text{tr} J < 0\) and \(\det J > 0\)). We find something interesting - even when the steady state is stable, adding more diffusion can create spatial instabilities that lead to stripes/spots; this is called Turing instability. We do the math below.

Let’s add a small perturbation \(u = u^* + \delta u\), \(v = v^* + \delta v\) and keep only the linear terms.

\[ \partial_t \begin{pmatrix} \delta u \\ \delta v \end{pmatrix} = \left(J + \text{diag}(D_u, D_v) \nabla^2 \right) \begin{pmatrix} \delta u \\ \delta v \end{pmatrix} \]

On a periodic box, we use the Fourier basis \(\exp(ik \cdot x)\). The Laplacian acts on them by \(\nabla^2 \exp(ik \cdot x) = -|k|^2 \exp(ik \cdot x)\). We try solutions of the form \(\begin{pmatrix} \delta u \\ \delta v \end{pmatrix} = \begin{pmatrix} U \\ V \end{pmatrix} \exp(\lambda t) \exp(ik \cdot x)\). Each spatial frequency \(k\) evolves independently. Plugging it in, we get \(\det (\lambda I - J + |k|^2 \text{diag}(D_u, D_v)) = 0\), i.e.,

\[ \lambda^2 - \lambda\left(\text{tr}J - (D_u + D_v) |k|^2 \right) + \left(\det J - |k|^2 (D_u g_v + D_v f_u) + D_u D_v |k|^4 \right) = 0 \]

Because \(\text{tr} J < 0\), the trace term gets more negative as \(|k|\) increases (we subtract \((D_u + D_v)|k|^2\)), so the trace can’t lead to an instability. The only way to get growth is for the determinant term to become negative for some \(|k| > 0\). Let \(q = |k|^2 \ge 0\) and define \(P(q) = \det J - q (D_u g_v + D_v f_u) + D_u D_v q^2\).

This is an upward-opening parabola in \(q\). We know \(P(0) = \det J > 0\) (ODE stability). For some spatial band to go unstable we need the parabola to go below zero at some positive \(q\). That happens exactly when the following three conditions hold:

  1. \(\text{tr} J < 0\) and \(\det J > 0\) (for ODE stability)
  2. \(D_u g_v + D_v f_u > 0\) (for the parabola minimum to be at a positive \(q\))
  3. \((D_u g_v + D_v f_u)^2 > 4 D_u D_v \det J\) (for the minimum dipping below \(0\))

The second condition puts the minimum at \(q_* = \frac{D_u g_v + D_v f_u}{2 D_u D_v} > 0\). The third condition says that the minimum value \(P(q_*) = \det J - \dfrac{(D_u g_v + D_v f_u)^2}{4 D_u D_v}\) is negative.

When these conditions hold, there is an unstable band \(q \in (q_-, q_+)\) where one eigenvalue \(\lambda(q)\) has positive real part.

The intuition is the following - the reaction part is stabilizing at zero wavenumber (\(\text{tr} J < 0\) \(\det J > 0\)), but the diffusion part tilts the balance differently for different spatial perturbation. For instance, if the species that suppresses variations spreads faster than the species that amplifies them, then small bumps grow into a regular pattern - a fast “inhibitor” and a slow “activator”.

Let’s consider a specific minimal example for chemical kinetics that shows these instabilities - the Schnakenberg model, which considers the simultaneous reactions \(X \rightleftharpoons A\), \(B \to Y\) and \(2X + Y \to 3X\). Upon eliminating concentrations of \(A\) and \(B\), we get a system that looks like the following:

Let \(f(u, v) = a - u + u^2 v\), \(g(u, v) = b - u^2 v\) where \(a, b > 0\).

The homogeneous steady state is \(u^* = a + b\), \(v^* = \frac{b}{(a + b)^2}\). We then have \(f_u = \frac{b - a}{a + b}\), \(f_v = (a + b)^2\), \(g_u = \frac{-2b}{a + b}\) and \(g_v = -(a + b)^2\).

Pick, for instance, \(a = 0.2, b = 1.3\). Then \(u^* = 1.5\), \(v^* \approx 0.578\), \(f_u \approx 0.733\), \(f_v = 2.25\), \(g_u \approx -1.733\), \(g_v = -2.25\).

The ODE is stable because \(\text{tr} J = f_u + g_v \approx -1.517 < 0\), \(\det J \approx 2.25 > 0\).

Now let diffusion coefficients be \(D_u = 1\) (slow activator) and \(D_v = 30\) (fast inhibitor). Then \(D_u g_v + D_v f_u \approx -2.25 + 30 \cdot 0.733 \approx 19.75 > 0\), \((D_u g_v + D_v f_u)^2 - 4 D_u D_v \det J \approx 120.06 > 0\), so a Turing instability exists. The unstable band in \(q = |k|^2\) is \(q_\pm = \frac{D_u g_v + D_v f_u \pm \sqrt{(D_u g_v + D_v f_u)^2 - 4 D_u D_v \det J}}{2 D_u D_v}\) which is \(0.147\) to \(0.512\).

Figure 30: Fig 8A - Schnakenberg dispersion relation: unstable band

Figure 30: Fig 8A - Schnakenberg dispersion relation: unstable band

Figure 31: Fig 8B - Turing patterns (spots/stripes)

Figure 31: Fig 8B - Turing patterns (spots/stripes)

In a simulation, on a periodic square with side length much larger than \(\ell_*\), small random noise around \((u^*, v^*)\) gets amplified into spots/stripes with a typical spacing determined by the wavenumber.

Summary:

  • Knobs: diffusion coefficients \(D_u\) and \(D_v\) (and the reaction interactions).
  • Behavior: stripes/spots when the unstable band exists.
  • Mechanism: diffusion shifts the eigenvalues by \(-Dk^2\) and this can flip stability at some \(k > 0\).

Connections with other parts of the post:

  • Consider the connection to the linearization analysis. Diffusion shifts eigenvalues by \(-D|k|^2\), which is why spatial modes can flip stability even when the ODE is stable.

Traveling fronts: Fisher-KPP

Preview: A stable state spreads into an unstable one as a traveling front. The minimal speed depends only on diffusion and linear growth.

Another spatial phenomenon is invasion - a stable state spreads into an unstable state as a moving “front”. The simple model is the Fisher-KPP equation (which was proposed to describe the spatial spread of an advantageous allele):

\[ \partial_t u = D \nabla^2 u + r u \left(1 - \frac{u}{K} \right) \]

Here \(u=0\) is unstable (due to the positive growth \(ru\)), and \(u=K\) is stable - note that the reaction term is just the logistic growth applied pointwise in space.

A front is a profile that keeps its shape while translating, e.g., in 1D we write \(u(x, t) = U(z)\) where \(z = x - ct\) - here \(U\) is the “shape”, and the whole picture moves to the right with speed \(c > 0\) (so if we ride along with the wave at speed \(c\), the profile looks frozen).

We look for a wave that connects the two homogeneous states, the “invaded” state \(u = K\) behind the front and the “empty” state \(u = 0\). The biologically relevant fronts are the monotone decreasing ones, i.e., the density drops from \(K\) to \(0\) as you move in the direction the front travels.

Far ahead of the front, \(U\) is very small, so the non-linear term is negligible, and the equation linearizes to \(-cU’ \approx DU’’ + rU\). Trying an exponential \(\exp(-\lambda z)\) with positive \(\lambda\) gives us \(D \lambda^2 - c \lambda + r = 0\) (which is a kind of dispersion relation).

For real \(\lambda\) to exist, the quadratic must have a non-negative discriminant, so there is a minimal speed \(c \ge 2 \sqrt{Dr}\) that the wave must move at, in order to not have oscillatory solutions.

Let’s think about it from a physical perspective. Consider what happens when a tiny population suddenly arrives at the origin. Two things happen at once - diffusion, which spreads it like a Gaussian blob of width \(\sim \sqrt{2Dt}\), and exponential growth \(\exp(rt)\) while its small. The combined effect of these things near the front is roughly \(u(x, t) \approx \exp(rt) \times \exp(-x^2 / (4Dt))\), and looking at how the contours move gives us \(x(t) \approx 2 \sqrt{Dr} t\). This also explains why the linearization at \(u = 0\) controls the speed - the front is pulled by the leading edge where \(u\) is small, not by the (saturated) bulk behind it which is at \(K\).

This equation has interesting behavior when \(c < c_* = 2 \sqrt{Dr}\) and when \(c \ge c_*\). Firstly, it can be shown that the front is strictly monotone, and then considering what happens to the oscillatory front when \(c < c_*\), it can be shown that this case does not have a wave at all. For \(c \ge c_*\), it can be shown that there is a unique solution (up to translation). What happens with \(\lambda > \lambda_*\) or \(\lambda < \lambda_*\)? In the first case, it develops a front that travels at \(c_*\), and in the latter, the front travels faster at the larger speed \(D \lambda + r / \lambda\).

We now consider an example from epidemiology again, but with the SIS model this time.

Let \(I(x, t)\) be the density of infectious individuals and suppose the susceptible population density is near \(1\) initially (so early growth is approximately linear). A linearization looks like \(\partial_t I \approx D \partial_{xx} I + (\beta - \gamma) I\) where \(\beta\) is the transmission rate and \(\gamma\) is the recovery rate. This is the same as the Fisher-KPP linearization with the growth rate \(\beta - \gamma = \gamma (R_0 - 1)\). The predicted minimal speed is \(2 \sqrt{D \gamma (R_0 - 1)}\), so other things equal, higher mobility \(D\) and higher \(R_0\) both increase the speed of spread of an outbreak. Non-linear saturation shapes the profile but the speed is determined by the diffusion-growth balance again.

Figure 32: Fig 8C - Fisher-KPP traveling front and speed

Figure 32: Fig 8C - Fisher-KPP traveling front and speed

Summary:

  • Knobs: \(D\) and \(r\)
  • Behavior: a pulled front with a lower bound on speed
  • Mechanism: the leading edge sets the speed

Connections with other parts of the post:

  • The logistic term here is the same as our baseline logistic ODE, now acting pointwise.
  • The front speed depending only on the linearization at \(u \approx 0\) is similar to how \(R_0\) alone decides outbreak takeoff.
  • In SIS with diffusion, we get similar scaling.

Other patterns

Energy descent (with interface-energy-like terms) with and without mass conservation produces phase separation and coarsening: without conservation interfaces move by curvature; with conservation diffusive flux slows it.

Spatial structure we saw above was a stripe/spot/wavefront pattern, but these are not the only ones we see. There is another common way to get spatial order - phase separation and coarsening. Consider how oil and water separate - if we start well-mixed and let go, droplets form, the interfaces become sharper, and over long horizons small droplets disappear and big ones become bigger. There are a couple of models that we will see that model this. As you’ll see later, these are related to the Ising/Schelling models we saw earlier - domains form because “like prefers like” and interfaces carry a cost.

The common starting point for both is an energy function. Let \(u(x, t)\) be a smooth field on a domain \(\Omega \subset \mathbb{R}^d\), where \(u = +1\) means phase A and \(u = -1\) means phase B. The energy is:

\[E[u] = \int_\Omega \left( \frac{\varepsilon^2}{2} |\nabla u|^2 + W(u) \right) dx\]

Here, \(W(u)\) is a double-well potential \((u^2 - 1)^2 / 4\) which prefers \(u = \pm 1\), and the gradient term penalizes steep spatial changes and helps define the interface width between the two phases.

Consider the similarity between this and the Ising model - since \(s_i = \pm 1\), up to a constant, the gradient term \(|\nabla u|^2\) corresponds to the interaction of neighbor spins, and the double-well in the spins that encourages the spins to the two values \(\pm 1\).

Consider the variational derivative of this energy functional - it is the function \(\delta E / \delta u\) characterized by \(d/d\theta E[u + \theta \eta] \big|_{\theta = 0} = \int_\Omega \delta E / \delta u (x) \eta(x) dx\) for all test functions \(\eta\).

Expanding and integrating the first term by parts, we get \(\frac{\delta E}{\delta u} = -\varepsilon^2 \nabla^2 u + W’(u)\). We call this the chemical potential and denote it by \(\mu\).

There are multiple ways to minimize energy. The first is the naive way \(\partial_t u = -\frac{\delta E}{\delta u} = \varepsilon^2 \nabla^2 u - W’(u)\). This is called the Allen-Cahn equation, and corresponds to a gradient flow for \(E\). \(\dot{E}\) turns out to be \(\int -|\mu|^2 dx \le 0\), so the system always moves to lower energy and only stops when \(\mu = 0\), which is a steady state. The “mass” \(\int_\Omega u dx\) is not conserved here because the \(-W’(u)\) term creates or destroys the \(\pm 1\) phases locally - the derivative of the mass is the volume integral of \(-W’\). Solving the steady state equation (setting \(\mu = 0\), multiplying by \(\partial_x u\) and integrating) gives us tanh behavior that joins \(u = -1\) to \(u = +1\) with the scale parameter being \(\sim \varepsilon\). An interesting exercise is to see how the interface moves in higher dimensions. The intuition here is that most of the energy is in the interfaces - and the Laplacian term tries to smoothen the interface. This should remind you of surface tension and it can be shown that this equation causes the interface to move with normal speed proportional to the mean curvature. Suppose domains have size \(L(t)\), then the curvature scales like \(\frac{1}{L}\), which means \(\dot{L} \sim 1/L\), and that shows that \(L(t) \sim t^{1/2}\).

Note that this first method didn’t conserve mass. So we keep the same energy, but choose dynamics that preserve \(\int u dx\). To do this, note that \(\frac{d}{dt} \int_\Omega u dx = \int_\Omega \partial_t u dx\). One way to make this \(0\) is to use the divergence theorem, and indeed by picking \(\partial_t u = \nabla^2 \mu\) (i.e., diffusion of chemical potential), we get \(\int_\Omega \partial_t u dx = \int_{\partial \Omega} \nabla \mu \cdot n dS = 0\), where the last term is zero due to no flux conditions. This is called the Cahn-Hilliard equation. Looking at \(\dot{E}\) again, it becomes \(-\int |\nabla \mu|^2 dx \le 0\) using integration by parts. So this is also a gradient flow of \(E\), but one which conserves mass.

The dynamics this time are slightly different. Starting with a noisy \(u\) with \(0\) average, the system first separates into two phases (this is called spinodal decomposition) while thin interfaces form. Then coarsening starts - small droplets disappear and big ones grow - due to energy in interfaces (“surface tension”). For a droplet of radius \(R\), curvature \(1/R\) induces a chemical potential difference of \(\sigma / R\), which creates diffusive flux from high-curvature droplets to low curvature droplets (this is called Ostwald ripening and is observed in colloidal sols, like water crystallizing into ice in old ice-cream). To quantify the rate of size of domains here, we note that the driving force here is the difference in chemical potential which is \(\sim 1/R\), the gradient across a length \(L \sim R\) is \(\sim 1/R^2\), flux is proportional to this value, and the interface speed is proportional to flux per unit jump in \(u\), so \(\dot{R} \sim J \sim 1/R^2\), which gives us \(t^{1/3}\) scaling, slower than the previous case.

Note that these two dynamics are different - the former corresponds to something like the Glauber dynamics, which doesn’t conserve spin, but the latter corresponds to something like the Kawasaki dynamics (swapping two spins instead of flipping one) which conserves spin, or to Schelling.

Summary:

  • Knobs: conservation on/off, \(\varepsilon\) determines the interface width.
  • Behavior: domains form then coarsen.
  • Mechanism: curvature motion vs diffusive ripening

Connections with other parts of the post:

  • As noted before, coarsening here mirrors domain growth in either Ising or Schelling (without or with conservation respectively).
  • Note that we have gradient flows, which both have Lyapunov formulations.

Some common threads

Having seen so many examples, you might have noticed some patterns and themes pop up throughout the post. This small set of mechanisms explains most of the qualitative behavior, and once you fix those, the rest is bookkeeping.

The most reliable first move is to look at the shape of the local feedback. Constant per-capita effects give exponentials; adding a linear pushback turns the curve into a logistic S with a finite attractor; thresholds create discontinuous responses; multiplicative updates amplify relative differences and make extremes common as a departure from additive updates.

Many phenomena in the post come from a competition between two tendencies. Growth competes with capacity, alignment competes with noise, aggregation competes with diffusion, selection competes with mutation. The point where the balance flips creates a qualitative change. Below \(R_0 = 1\) the disease-free state is stable and outbreaks die; above it the endemic equilibrium takes over, often with damped waves. In the logistic-prey predator-prey system, a coexistence point takes over from prey-only via a transcritical bifurcation. In Ising and mean-field Vicsek, symmetry breaking has pitchfork bifurcations: a disordered state loses stability and two ordered states appear. Not all transitions are the same kind though - SEIR’s threshold is a clean exchange of stability with no diverging correlations, while the Ising Curie point \(T_c\) is a second-order phase transition with scale-free fluctuations and a diverging correlation length - and in Vicsek, the particle model shows a weakly first-order transition with bands and hysteresis.

Local linearity near equilibria makes the picture clearer. Linearizing at a fixed point and reading off the eigenvalues tells whether return is monotone or oscillatory, and at what rate. That is why so many scenarios look like a damped oscillator close to a stable state: a negative real part determines the decay rate, an imaginary part determines the frequency. The Lotka-Volterra coexistence with logistic prey spirals in; the SEIR endemic state does the same because the exposed stage acts like a lag; the same trace-determinant and phase-plane tools answer all of these. When true delays are present rather than compartment lags, the characteristic equation picks up exponentials and even scalar equations can generate oscillations via Hopf bifurcations.

Time structure matters even without explicit memory. Compartment chains create distributed lags and produce overshoot and damping. Slow-fast structure shows up in other places too. In sandpiles, slow addition and fast relaxation separate timescales; the system spends almost all of its time on a low-dimensional set of recurrent configurations and returns there between avalanches. In phase separation, interfaces sharpen on a fast timescale and coarsen slowly, with different long-time laws depending on whether mass is conserved.

Space changes a lot of things too. Diffusion shifts eigenvalues, so each spatial wavelength evolves with its own effective stability. A uniform steady state that is stable without any spatial evolution can become unstable for a band of wavelengths when diffusion rates differ: a fast inhibitor and a slow activator create Turing patterns from homogeneous initial data. Growth plus diffusion creates coherent invading fronts: in Fisher-KPP the speed is set by the linear growth at the leading edge. In hydrodynamic limits of Vicsek-type models, advection and density-dependent alignment couple density and polarization; long waves can destabilize uniform motion, producing traveling bands and hysteresis, while viscosity damps short waves. Phase separation arises out of energy minimization: “like prefers like” with an interface cost leads to domain formation and coarsening; without conservation, interfaces move by mean curvature, and with conservation, transport is diffusive and growth is slower.

Conserved and monotone quantities simplify these analyses. First integrals confine motion to lower-dimensional sets, as with the Lotka-Volterra closed loops. Lyapunov-type functions, when they exist, tell us which way the system must move: oscillator energy in conservative systems, velocity variance in Cucker-Smale, mean payoff in potential replicator dynamics, the free energies behind Allen-Cahn/Cahn-Hilliard. Conservation of totals - population, probability mass, total mass, grains in the bulk - either reduces dimension directly or forces motion along slow manifolds. In sandpiles, conservation in the interior together with thresholds and boundary loss produces a stationary ensemble with avalanches that have no characteristic size up to a system-size cutoff.

Reinforcement and multiplication are where independence breaks and early behaviors persist. In Polya’s urn, the share process is a bounded martingale and converges almost surely to a random limit; different runs end at different shares from the same start. Preferential attachment uses the same “probability proportional to current size” rule and produces heavy-tailed (power-law, in fact) degrees in the simplest version. Kesten’s multiplicative recurrence has a unique stationary law whose upper tail is Pareto. All of these contrast with additive noise under bounded feedback, which concentrates and forgets early fluctuations.

Multiple attractors create memory at the macroscopic scale. Below \(T_c\) the Ising free-energy landscape has two wells; finite systems spend long times in one well and flip on timescales set by barriers and noise, and in the thermodynamic limit they break ergodicity and never leave. Finite-speed Vicsek supports both disordered gas and ordered bands over a range of parameters; which state you see depends on the path you took and the finite-amplitude perturbations you experienced, so order-disorder and disorder-order transitions occur at different points when you sweep a control parameter.

The reason these threads recur across domains is that the math is shared. The same normal forms are seen everywhere, the same types of linearization appear in different systems, the same mean-field arguments work in many places, the same types of energies show up recurrently, and graphs approximate manifolds so a lot of the structure we see in graphs ends up being very general.

Reading a new system then becomes quite systematic - identify the local feedback and whether it saturates, look for conserved and monotone quantities, check the linearization to locate thresholds and to tell oscillatory return from monotone decay, check whether delays are present and in what form - implicit or explicit, ask how growth scales, whether there is noise and how it affects the system, and how different states look like from a macroscopic lens. A lot of the qualitative picture follows from those ingredients.

A toolbox

Note that most of the tools we used above are relatively elementary. The theory of dynamical systems has useful tools that give us both different perspectives as well as the ability to analyze different systems. We did not use a lot of the theory of dynamical systems, but it is worth noting down these ideas, alongside some of the older ideas, even if we don’t cover them. It is also a helpful exercise to map things below to the examples we saw, and to go and read more about things that were not covered here in this post.

Main objects of study

A dynamical system evolves a state \(x(t)\) in time by either a differential equation (a flow) or an update rule (a map).

  • Continuous time (flow): \(\dot x = F(x, \theta)\), \(x \in \mathbb{R}^n\). The flow \(\Phi_t(x_0)\) gives the state at time \(t\) from initial condition \(x_0\).
  • Discrete time (map): \(x_{k+1} = f(x_k, \theta)\), \(k \in \mathbb{N}\).

The following are some things we use:

  • An equilibrium/fixed point is a state at which the system stops evolving. For a continuous system, \(F(x^*) = 0\). For a discrete one, \(f(x^*) = x^*\).
  • A set \(S\) with \(\Phi_t(S) \subseteq S\) (or \(f(S) \subseteq S\)) is called an invariant set. Invariant regions constrain motion (e.g., non-negative orthant for populations, the simplex for frequencies).
  • $ω$-limit set is the set of accumulation points of a trajectory as \(t \to \infty\). Attractors are invariant sets that attract a neighborhood of initial conditions (their basin of attraction). Attractors can be fixed points, limit cycles (periodic orbits), or more complex sets.
  • A closed orbit \(\gamma\) that is isolated and attracts or repels nearby trajectories. Stability is determined by the return map (Poincare map) transverse to \(\gamma\), i.e., if all multipliers are inside the unit circle (except the trivial one), the cycle is asymptotically stable.
  • Lyapunov stability means trajectories starting close stay close. Asymptotic stability adds attraction back to the equilibrium/limit cycle. Exponential stability adds a uniform convergence rate.

Equilibria are everywhere (logistic, SEIR, predator-prey), invariant orthants/simplices are found in population models, limit cycles are found in sustained oscillations (e.g., predator-prey with saturation but not neutral LV), attractors for Ising ordered phases and Vicsek alignment.

Linearization and local analysis

Near an equilibrium \(x^*\), we generally try to replace the system by its best local linear approximation. Write \(x = x^* + \eta\). Then \(F(x) = F(x^*) + J \eta + O(\|\eta\|^2)\) where \(J = \text{D}F(x^*)\), so \(\dot\eta = J \eta\). In discrete time, \(\eta_{k+1} = A \eta_k\) with \(A = \text{D}f(x^*)\).

What do we do with this linear system? In continuous time, \(\Re\lambda(J) < 0\) for all eigenvalues gives local exponential asymptotic stability, and any \(\Re\lambda>0\) gives instability. In discrete time, all eigenvalues strictly inside the unit circle give local asymptotic stability, and any with \(|\lambda|>1\) gives instability.

The important caveat here is non-hyperbolic points. If some \(\Re\lambda = 0\) (flow) or \(|\lambda| = 1\) (map), linearization alone is inconclusive, and you need center manifolds and normal forms (the way we tackled this above was by just looking at higher order terms).

We also saw how as a consequence of Hartman-Grobman, “Hooke’s law near a well” explains local oscillations almost everywhere.

When do we use this? It should generally be the first thing to look at around any fixed point.

2D phase-plane portraits

In systems of the form \(\dot x = f(x,y)\) and \(\dot y=g(x, y)\), it is possible to visualize things using diagrams that can tell us a lot.

Trace-determinant test: at an equilibrium, having \(\det J < 0\) gives a saddle point, \(\det J > 0\) with \(\text{tr} J < 0\) gives a stable equilibrium (node vs focus is determined by \((\text{tr} J)^2 - 4 \det J\)). And signs flip for unstable equilibrium.

Nullclines are where the derivative of something is \(0\), i.e., \(f = 0\) and \(g = 0\) curves. These constrain the flow. Intersections of these curves are equilibria, and the signs in the tiles give arrows.

To check whether there is periodicity or not, we can sometimes use the Poincare-Bendixson theorem: in 2D, if a trajectory is trapped in a compact set and the limit set isn’t equilibria, it must be a limit cycle. Similarly, to rule out cycles, the Bendixson-Dulac theorem says: if you can find a smooth non-zero \(D(x, y)\) with \(\nabla \cdot(D F)\) having the same sign in a simply connected region, then no closed orbits are present there.

Lyapunov functions and LaSalle’s invariance

A scalar “energy” \(V\) that never increases along trajectories can simplify analysis quite a lot, especially when solving for the trajectory explicitly is not possible.

Locally speaking, if \(V(x^*) = 0\), \(V > 0\) nearby, and \(\dot V \le -c\|x - x^*\|^2\), you get exponential stability. Globally speaking, if \(\dot V \le 0\), LaSalle’s invariance principle says that the trajectory approaches the largest invariant subset of \(\dot V = 0\).

How do we come up with \(V\)? We can generally look at things like physical energy (as we did for oscillators), consensus variance \(V = \sum\|v_i - \bar v\|^2\) (as we did for Cucker-Smale), potentials (as we did for replicator in potential games).

Bifurcations

There are a bunch of different types of bifurcations (with some canonical forms for them). Some of the local ones (i.e., parameter changes change the stability of an equilibrium) are:

  • Saddle-node: \(\dot x = \mu - x^2\). A pair of equilibria appears/disappears.
  • Transcritical: \(\dot x = \mu x - x^2\). Two equilibria swap stability (e.g., prey-only vs coexistence with logistic prey).
  • Pitchfork: \(\dot x = \mu x - x^3\) (supercritical). Symmetry breaks and two stable branches appear (e.g., mean-field Ising).
  • Hopf: a complex pair crosses the imaginary axis. In polar form, we have \(\dot r = \alpha \mu r - \beta r^3 + \dots\), \(\dot\theta = \omega + \dots\). With \(\alpha, \beta > 0\) a small stable cycle is created.

There are also global bifurcations where the structure of trajectories change as invariant sets collide with equilibria.

Discrete-time systems and chaos

For a fixed point \(x^*\) of a discrete-time system, \(|f’(x^*)| < 1\) means that it is stable, \(|f’(x^*)| > 1\) means it is unstable. Cobweb plots help qualitatively sketch the stability of fixed points.

The Lyapunov exponent \(\lambda = \lim_{n \to \infty} \frac{1}{n} \sum_{k = 0}^{n - 1} \log |f’(x_k)|\) helps study chaos - \(\lambda > 0\) means that infinitesimally close trajectories will diverge arbitrarily when a long enough period of time has passed.

Poincare maps reduce a periodic ODE to a discrete map, and can also help reason about stability of equilibria for the flow.

Delays: compartments vs true delay differential equations

Distributed lags (e.g., using compartments) vs sharp lags (DDEs) have different dynamics.

  • Compartment chains (e.g., \(S \to E \to I\)) effectively give gamma-distributed delays and tend to produce damped oscillations back to equilibrium. The lag is implicit and smeared out.

  • True DDEs \(\dot x(t) = F(x(t), x(t - \tau))\) inject \(e^{-\lambda\tau}\) into the characteristic equation, creating infinitely many eigenvalues. Even scalar DDEs can undergo Hopf bifurcations and oscillate.

Invariants, positivity, comparison, monotone structure

Conserved quantities/first integrals \(I(x)\) with \(\dot{I} = 0\) reduce the effective dimension of the problem (as we saw for oscillator energy and Lotka-Volterra’s implicit integral), effectively constraining where you can be in the state space. Things like \(x_i \ge 0\) for populations and \(S + E + I + R = N\), \(\sum p_i = 1\) for replicator dynamics put us in the right part of the whole space as well.

We can often integrate inequalities under conditions - and comparison principles and Gronwall-type bounds sometimes bound solutions usefully.

Monotone/cooperative systems (\(\partial f_i / \partial x_j \ge 0\) for \(i \neq j\)) preserve order and come with strong convergence theorems.

Networks and coupling

Graphs are quite general and important - especially because any manifold can be approximated as a graph with meaningful guarantees. So it makes sense to consider graph formulations of problems that, e.g., have a natural spatial nature.

When we have consensus mechanisms on a graph, we can sometimes model them using a Laplacian - with weights \(w_{ij} \ge 0\), \(L = D - W\) and \(\dot x = -Lx\). The algebraic connectivity \(\lambda_2(L)\) quantifies how fast disagreements die out.

Spectra of graphs (either adjacency or Laplacian) can provide important information about how systems on graphs will look like. For example, a linearized SIS grows if \(\beta / \gamma > 1 / \rho(A)\), where \(\rho(A)\) is the adjacency spectral radius. Similarly, Kuramoto-type phase models synchronize when coupling beats frequency spread, with onset shaped by network spectrum.

Reviewer note (lightly paraphrased): You can view a lot of the simpler models’ dynamics as “coarse-grained” versions of full network models. For example, the scalar equations of an epidemic model can be obtained as the mean field over the whole population, with any transmission parameter being scaled by the mean degree divided by the population size. You can do the same with the degree based approximations, where instead you coarse grain along degree classes, and the scaling factor/epidemic threshold is the dominant eigenvalue of the “coarse-grained adjacency matrix”. This is also conceptually adjacent to renormalization (in the statistical mechanics sense).

Non-dimensionalization and scaling

Picking natural units for state and time and rescaling often simplifies things a lot. Parameters collapse into dimensionless values that determine the dynamics, e.g., \(R_0\) in epidemics, the diffusion ratio \(D_v / D_u\) for Turing patterns, Fisher-KPP’s \(c_* = 2 \sqrt{Dr}\), density/noise ratios in Vicsek. We get fewer knobs, clearer phase diagrams, as well as easier mapping to standard problems.

Stochastic dynamics and metastability

As we’ve seen, noise can greatly affect dynamics of any system.

Going from an SDE to Fokker-Planck is quite useful, and this is how it’s done: \(dX_t = b(X_t) dt + \sigma(X_t) dW_t\) gives

\[ \partial_t p = -\nabla \cdot (bp) + \frac{1}{2} \nabla \cdot (\sigma \sigma^T \nabla p) \]

For \(b = -\nabla U\) with constant \(\sigma\), the stationary density is \(\propto e^{-2U / \sigma^2}\) (by detailed balance).

In a double-well with small noise \(\varepsilon\), mean escape time scales like \(\exp\{\Delta U / \varepsilon^2\}\) (the Kramers problem). That is the story behind metastability and hysteresis loops. For large deviations, the Freidlin-Wentzell theorem makes these exponentials precise and gives most-likely escape paths.

Traveling fronts

Stability of fronts in the comoving frame (e.g., in Fisher-KPP) is analyzed spectrally, and the Evans function is the standard tool for doing so.

Contraction analysis

If the Jacobian is uniformly contracting in some matrix measure \(\mu\) (i.e., \(\mu(\text{D}F(x)) \le -c < 0\) for all \(x\)), then any two trajectories converge to each other exponentially at rate \(c\). That gives a unique global attractor (possibly modulo a symmetry - e.g., a limit cycle under phase shift - since the transverse distance goes to \(0\)) without any sort of local casework.

When coupling is strong and signs/coercivity are clear (consensus/flocking in velocities), this can be pretty helpful.

Singular perturbations and slow-fast decompositions

Tikhonov’s theorem talks about separated timescales.

\[\dot x = f(x, y, \varepsilon)\] \[\varepsilon \dot y = g(x, y, \varepsilon)\]

with \(0 < \varepsilon \ll 1\). As \(\varepsilon \to 0\), trajectories collapse rapidly to a slow manifold \(g(x, y, 0) = 0\) and then evolve along it, provided normal hyperbolicity. This both reduces dimension and isolates the mechanism you care about.

Koopman operator and data-driven surrogates

Nonlinear dynamics become linear in the (infinite-dimensional) space of observables - more specifically, the Koopman operator composes observables with the flow and is a linear operator. Practically, techniques like dynamic mode decomposition (DMD) and Extended DMD (EDMD) try to approximate it and extract dominant growth/oscillation modes from data. If you suspect sparse structure, SINDy (sparse identification of non-linear dynamics) recovers parsimonious equations from time series. However, as with empirical methods, this is only an approximation and needs to be validated before use.

Reviewer note (lightly paraphrased): The data-driven Koopman stuff is somewhat recent and not all of it is quite rigorous. One really cool part of Koopman operators on the theoretical side is that they’re really about taking a (stochastic or deterministic) system and instead of looking at the nonlinear dynamics at the trajectory level, you look at linear dynamics at the distribution level. This can actually help make sense out of chaotic/ergodic systems, and was argued by people like Prigogine as the right way to look at such systems. One nice correspondence is that any first integrals of the system correspond to stationary eigenfunctions of the Koopman operator, the stationary distribution is the dominant eigenfunction of the Perron operator (the dual of the Koopman operator).

Ergodic theory and mixing

With an invariant measure (equivalently, with a measure-preserving transform), ergodicity says that time averages along typical trajectories are equal to ensemble-based averages. In some sense, mixing leads to a decay of correlations. However, finite large systems can be metastable - they linger in one phase for astronomically long times (as we saw with Ising), so practical time averages differ from ensemble ones on realistic horizons.

Non-smooth and hybrid dynamics (thresholds and edges)

Filippov systems are systems where vector fields jump discontinuously across a surface, and as a result weird things happen at boundaries (e.g., sliding motion along the boundary). In discrete systems, border-collision bifurcations in piecewise-smooth maps also have cases that are not covered by smooth theory. This is important to keep in mind for systems with hard thresholds.

Control and feedback

Intervening into a system can let you change the phase portrait, and control theory studies these sorts of systems and interventions. E.g., feedback linearization and Lyapunov-based designs give constructive stabilizers.

Numerical methods that respect structure

Discretization of a continuous system leads to many surprising things, so the solver should be aligned with the system and its constraints (and important properties like invariants).

References and Further Reading

The following is a list of references grouped by topic for ease of reference (please let me know if there are any errors).

General dynamical systems & stability

  • Strogatz, S. H. (2015). Nonlinear Dynamics and Chaos (2nd ed.). Westview.
  • Hirsch, M. W., Smale, S., & Devaney, R. L. (2012). Differential Equations, Dynamical Systems, and an Introduction to Chaos (3rd ed.). Academic Press.
  • Khalil, H. K. (2002). Nonlinear Systems (3rd ed.). Prentice Hall.
  • LaSalle, J. P. (1960). Some extensions of Liapunov’s second method. IRE Transactions on Circuit Theory, 7(4), 520–527.
  • Hartman, P. (1960). A lemma in the theory of structural stability. Proc. AMS, 11(4), 610–620.
  • Grobman, D. (1959). Homeomorphisms of systems of differential equations. Doklady, 128, 880–881.

Growth, oscillators, invariants

  • Murray, J. D. (2002, 2003). Mathematical Biology I & II (3rd eds.). Springer.
  • Edelstein-Keshet, L. (2005). Mathematical Models in Biology. SIAM (Classics).
  • Hethcote, H. W. (2000). The mathematics of infectious diseases. SIAM Review, 42(4), 599–653.

Predator–prey & ecological dynamics

  • Lotka, A. J. (1925). Elements of Physical Biology. Williams & Wilkins.
  • Volterra, V. (1926). Fluctuations in the abundance of a species considered mathematically. Nature, 118, 558–560.
  • Rosenzweig, M. L., & MacArthur, R. H. (1963). Graphical representation and stability conditions of predator–prey interactions. American Naturalist, 97(895), 209–223.
  • Kot, M. (2001). Elements of Mathematical Ecology. Cambridge.

Epidemics & \(R_0\)

  • Kermack, W. O., & McKendrick, A. G. (1927). A contribution to the mathematical theory of epidemics. Proc. R. Soc. A, 115, 700–721.
  • Anderson, R. M., & May, R. M. (1991). Infectious Diseases of Humans: Dynamics and Control. Oxford.
  • Diekmann, O., Heesterbeek, J. A. P., & Metz, J. A. J. (1990). On the definition and computation of \(R_0\). J. Math. Biol., 28, 365–382.
  • Keeling, M. J., & Rohani, P. (2008). Modeling Infectious Diseases in Humans and Animals. Princeton.
  • Heffernan, J. M., Smith, R. J., & Wahl, L. M. (2005). Perspectives on \(R_0\). J. R. Soc. Interface, 2(4), 281–293.

Thresholds, cascades, segregation

  • Granovetter, M. (1978). Threshold models of collective behavior. American Journal of Sociology, 83(6), 1420–1443.
  • Schelling, T. C. (1971). Dynamic models of segregation. J. Math. Sociology, 1(2), 143–186.
  • Schelling, T. C. (1978). Micromotives and Macrobehavior. Norton.

Ising, phase transitions, criticality

  • Ising, E. (1925). Beitrag zur Theorie des Ferromagnetismus. Z. Physik, 31, 253–258.
  • Onsager, L. (1944). Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev., 65, 117–149.
  • Metropolis, N., et al. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys., 21, 1087–1092.
  • Glauber, R. J. (1963). Time-dependent statistics of the Ising model. J. Math. Phys., 4, 294–307.
  • Stanley, H. E. (1971). Introduction to Phase Transitions and Critical Phenomena. Oxford.
  • Yeomans, J. M. (1992). Statistical Mechanics of Phase Transitions. Oxford.
  • Sethna, J. P. (2006). Statistical Mechanics: Entropy, Order Parameters, and Complexity. Oxford.

Reinforcement, herding, heavy tails

  • Pólya, G. (1930). Sur quelques points de la théorie des probabilités. Ann. Inst. H. Poincaré, 1, 117–161.
  • Pemantle, R. (2007). A survey of random processes with reinforcement. Probab. Surveys, 4, 1–79.
  • Barabási, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509–512.
  • Dorogovtsev, S. N., & Mendes, J. F. F. (2003). Evolution of Networks: From Biological Nets to the Internet and WWW. Oxford.
  • Newman, M. E. J. (2010). Networks: An Introduction. Oxford.
  • Kirman, A. (1993). Ants, rationality, and recruitment. QJE, 108(1), 137–156.
  • Mitzenmacher, M. (2004). A brief history of generative models for power law and lognormal distributions. Internet Mathematics, 1(2), 226–251.
  • Kesten, H. (1973). Random difference equations… Acta Mathematica, 131, 207–248.
  • Goldie, C. M. (1991). Implicit renewal theory and tails of solutions of random equations. Ann. Appl. Prob., 1(1), 126–166.
  • Bak, P., Tang, C., & Wiesenfeld, K. (1987). Self-organized criticality. Phys. Rev. Lett., 59, 381–384.
  • Dhar, D. (2006). The abelian sandpile. Physica A, 369, 29–70.

Swarms, alignment, active matter

  • Vicsek, T., Czirók, A., Ben-Jacob, E., Cohen, I., & Shochet, O. (1995). Novel phase transition in self-driven particles. Phys. Rev. Lett., 75, 1226–1229.
  • Chaté, H., Ginelli, F., Grégoire, G., Peruani, F., & Raynaud, F. (2008). Collective motion of self-propelled particles. Phys. Rev. E, 77, 046113.
  • Toner, J., & Tu, Y. (1995). Long-range order in a two-dimensional XY model with noise. Phys. Rev. Lett., 75, 4326–4329.
  • Toner, J., & Tu, Y. (1998). Flocks, herds, and schools. Phys. Rev. E, 58, 4828–4858.
  • Marchetti, M. C., et al. (2013). Hydrodynamics of soft active matter. Rev. Mod. Phys., 85, 1143–1189.
  • Cucker, F., & Smale, S. (2007). Emergent behavior in flocks. IEEE TAC, 52(5), 852–862.
  • Ha, S.-Y., & Liu, J.-G. (2009). A simple proof of Cucker–Smale flocking. Comm. Math. Sci., 7(2), 297–325.
  • Carrillo, J. A., Fornasier, M., Toscani, G., & Vecil, F. (2010). Particle, kinetic, and hydrodynamic models of swarming. In Math. Modeling of Collective Behavior in Socio-Economic and Life Sciences (Birkhäuser).

Replicator dynamics & evolutionary games

  • Taylor, P. D., & Jonker, L. (1978). Evolutionary stable strategies and game dynamics. Math. Biosciences, 40, 145–156.
  • Hofbauer, J., & Sigmund, K. (1998). Evolutionary Games and Population Dynamics. Cambridge.
  • Weibull, J. W. (1995). Evolutionary Game Theory. MIT Press.
  • Sandholm, W. H. (2010). Population Games and Evolutionary Dynamics. MIT Press.
  • Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge.

Reaction–diffusion, Turing, fronts

  • Turing, A. M. (1952). The chemical basis of morphogenesis. Phil. Trans. R. Soc. B, 237, 37–72.
  • Schnakenberg, J. (1979). Simple chemical reaction systems with limit cycle behavior. J. Theor. Biol., 81(3), 389–400.
  • Fisher, R. A. (1937). The wave of advance of advantageous genes. Ann. Eugenics, 7, 355–369.
  • Kolmogorov, A., Petrovsky, I., & Piskunov, N. (1937). Study of diffusion equation… Bull. Moscow Univ., 1, 1–25.
  • Allen, S. M., & Cahn, J. W. (1979). A microscopic theory for antiphase boundary motion. Acta Metall., 27, 1085–1095.
  • Cahn, J. W., & Hilliard, J. E. (1958). Free energy of a non-uniform system. J. Chem. Phys., 28, 258–267.
  • Bray, A. J. (1994). Theory of phase-ordering kinetics. Advances in Physics, 43(3), 357–459.
  • Langer, J. S. (1971). Theory of spinodal decomposition. Ann. Phys., 65, 53–86.
  • Hohenberg, P. C., & Halperin, B. I. (1977). Theory of dynamic critical phenomena. Rev. Mod. Phys., 49, 435–479.

Networks, spectra, SIS thresholds, Kuramoto

  • Chung, F. (1997). Spectral Graph Theory. AMS.
  • Brouwer, A. E., & Haemers, W. H. (2011). Spectra of Graphs. Springer.
  • Pastor-Satorras, R., & Vespignani, A. (2001). Epidemic spreading in scale-free networks. Phys. Rev. Lett., 86, 3200–3203.
  • Van Mieghem, P., et al. (2009). Virus spread in networks. IEEE/ACM Trans. Networking, 17(1), 1–14.
  • Kuramoto, Y. (1975). Self-entrainment of a population of coupled linear oscillators. In Int. Symp. Math. Problems in Theoretical Physics, Lecture Notes in Physics 39, Springer, pp. 420–422

Stochastic dynamics, large deviations, Koopman & data-driven

  • Kramers, H. A. (1940). Brownian motion in a field of force. Physica, 7, 284–304.
  • Freidlin, M. I., & Wentzell, A. D. (1984). Random Perturbations of Dynamical Systems. Springer.
  • Koopman, B. O. (1931). Hamiltonian systems and transformation in Hilbert space. PNAS, 17, 315–318.
  • Schmid, P. J. (2010). Dynamic mode decomposition of numerical and experimental data. J. Fluid Mech., 656, 5–28.
  • Williams, M. O., Kevrekidis, I. G., & Rowley, C. W. (2015). A Data-Driven Approximation of the Koopman Operator: Extending Dynamic Mode Decomposition. J. Nonlinear Sci., 25, 1307–1346.
  • Brunton, S. L., Proctor, J. L., & Kutz, J. N. (2016). Discovering governing equations. PNAS, 113(15), 3932–3937.
  • Lasota A., & Mackey M.C. (1985). Probabilistic Properties of Deterministic Systems. Cambridge University Press.

Ricardian Trade

  • Ricardo, D. (1817). On the Principles of Political Economy and Taxation.
  • Dornbusch, R., Fischer, S., & Samuelson, P. A. (1977). Comparative advantage with a continuum of goods. AER, 67(5), 823–839.
  • Krugman, P., Obstfeld, M., & Melitz, M. (2015). International Economics (10th ed.). Pearson.

Acknowledgements

I would like to thank genetyx8 for feedback on the post (and others who volunteered to go through a similarly painful experience). GPT-5 Thinking did the list of references and partially wrote the visualization code - in all other aspects, all remaining errors are mine.

Some generous reviewers suggested releasing this post as a series, which is great advice for readability. For now, I’ve kept it as a single piece to preserve the connecting threads and the shared toolbox, which felt harder to maintain across separate posts - and partly because not publishing the post right now might mean this post not seeing the light of day, like many other drafts of mine.

Cite this post
@online{simple-rules-complex-dynamics-part-1-2025,
  author    = {nor},
  title     = {Simple Rules, Complex Dynamics -- Part I: Foundations \& Intuition},
  year      = {2025},
  month     = {09},
  day       = {17},
  url       = {https://nor-blog.pages.dev/posts/2025-09-17-simple-rules-complex-dynamics-part-1},
}