Generalized Möbius Inversion on Posets

December 27, 2021 · 19 min · 3965 words · nor

This post will be more about developing ideas about certain structures rather than actually coding up stuff. The ideas we'll be developing here have many applications, some of which involve:

  • Number Theoretic Möbius Inversion
  • Principle of Inclusion-Exclusion
  • Directed Acylic Graphs
  • Subset Sum Convolution
  • Finite Difference Calculus (and prefix sums)

Note that some of the terminology used below might be non-standard. Also, some parts of the post make a reference to abstract algebra without much explanation; such parts are just for people who know some algebra, and can safely be skipped by those who don't know those concepts at all.

Prerequisites:

  • Some knowledge about graphs.
  • Matrix multiplication
  • Basic combinatorics

Definition of a poset and some intuition

All of these have one thing in common: they have an underlying "poset". Let's understand what a poset is.

Rigorously speaking, a poset is a partially ordered set, i.e., a set SS equipped with a relation \le that is reflexive, antisymmetric, and transitive.

More precisely,

  • \le is reflexive means that aaa \le a for all aa in SS.
  • \le is antisymmetric means that if aba \le b and bab \le a, then a=ba = b
  • \le is transitive means that if aba \le b and bcb \le c, then aca \le c

To get some intuition about posets, it's helpful to think of them in terms of directed acyclic graphs (DAGs), equipped with the relation of reachability; i.e., vertices aa is related to bb iff there is a path from aa to bb. In fact, there is always at least one DAG corresponding to every poset that is equivalent to it in terms of semantics.

To make this notion a bit clearer, we construct a "DAG" (we allow self-loops, so it is not precisely a DAG, but we can get rid of these self-loops) as follows: construct a vertex for each element of the set, and there is an edge from uu to vv iff uvu \le v. For the lack of a better word, we shall call this the "relational DAG" of the poset. Note that this notation is not standard.

Remark: By the properties of a poset, this "DAG" has the property that if there is a path from uu to vv, there is an edge between uu and vv (i.e., reachability and adjacency are equivalent in this graph). In other words, this graph is its own reflexive transitive closure. Also, a topological sort of this "DAG" (ignoring the self-loops) is called a linear extension of the poset (flattening out the "DAG" induces a natural linear order on the vertices, which are just the elements of the poset).

It is usually easier to look at this graph after removing all redundancies (i.e., if uvu \le v and vwv \le w, then remove the edge (u,w)(u, w) from the graph if it exists). The resulting graph is called the Hasse diagram of the poset, and is super useful for reasoning about posets.

Note that it is NOT necessary that for any two elements (u,v)(u, v), one of uvu \le v and vuv \le u must hold. For instance, consider the poset given by the set S={a1,a2,a3,a4}S = \{a_1, a_2, a_3, a_4\} and the relation \le given by {(a,a)aS}{(a1,a2),(a1,a3),(a1,a4),(a2,a4),(a3,a4)}\{(a, a) \mid a \in S\} \cup \{(a_1, a_2), (a_1, a_3), (a_1, a_4), (a_2, a_4), (a_3, a_4)\}. This satisfies the constraints that \le must be reflexive, antisymmetric and transitive; however, neither of a2a3a_2 \le a_3 and a3a2a_3 \le a_2 hold.

For the remainder of this post, it will help to think of \le in terms of the Hasse diagram of the poset.

Note: when we say a<ba < b, we mean that aba \le b and aba \ne b.

Some examples of posets

  • Either of the sets ,,,\mathbb{R}, \mathbb{N}, \mathbb{Z}, \mathbb{Q} equipped with the usual \le is a poset.
  • For any set AA, the set of its subsets, equipped with the relation \subseteq (or \supseteq) is a poset.
  • The set \mathbb{N} equipped with the relation \mid (divisibility) is a poset.

The incidence algebra of a poset

Note for the pedantic reader: In what follows, we only look at locally finite posets (i.e., posets in which the set of zz such that xzyx \le z \le y is finite for all x,yx, y). In particular, we can use the following ideas with infinite DAGs too, though it might be possible that infinitely large (but "locally sparse") matrices may not make sense.

In particular, the following doesn't apply to the posets (,),(,)(\mathbb{R}, \le), (\mathbb{Q}, \le) and so on.

Let (S,)(S, \le) be some poset. Consider the adjacency matrix MM of its relational DAG. It has the property that if xyx \not\leq y, then the entry at (x,y)(x, y) is 00, and otherwise it is 11. We can also interpret this matrix MM as a function from S×S{0,1}S \times S \to \{0, 1\}.

We generalize this notion a bit. Rather than having M(x,y)=1M(x, y) = 1 whenever xyx \le y, we assign arbitrary complex numbers to M(x,y)M(x, y) (which is equivalent to assigning arbitrary edge weights (possibly 00) to edges in the relational DAG).

This leads to a set of functions α:S×S\alpha : S \times S \to \mathbb{C}, with the property that xyα(x,y)=0x \not\leq y \implies \alpha(x, y) = 0. We call this the "incidence algebra" of the poset (for those who know what an algebra is, we'll see why this is an algebra). We won't distinguish between these functions and their corresponding matrices in what follows.

The main idea behind the whole approach is to look at these functions as matrices and extract useful information using matrix operations.

To define the "product" on this set of functions, we can simply use matrix multiplication to define its semantics. That is, for functions α,β\alpha, \beta in the incidence algebra, we define the product as (αβ)(x,y)=zSα(x,z)β(z,y)(\alpha\beta)(x, y) = \sum_{z \in S} \alpha(x, z) \beta(z, y), which is some kind of convolution.

We can see that the incidence algebra is closed under product (intuitively, think of it as aggregating some combination of edge weights over paths of length 2; if there is no path from xx to yy, the set of such paths is empty anyway, so the answer is still 0, the identity of the aggregation function which is ++).

Similarly, we can define scalar multiplication and addition in terms of matrix operations, and see that the incidence algebra is closed under these operations too.

Important remark: Note that if ff is a function from SS to \mathbb{C}, then a vector with f(x)f(x) at the position corresponding to xx also satisfies the usual matrix multiplication identities with the elements of the incidence algebra, with the semantics of multiplication with a column vector being analogous to aggregating over paths that end at a given vertex (and for row vector left-multiplication, paths that end at a given vertex).

Optional remark: For those who know what an algebra is, matrix multiplication is a bilinear operator so product is the bilinear operator in the incidence algebra. A further generalization would be to define a different underlying field than \mathbb{C} and another bilinear operator, but we won't look into this further.

Some special members of the incidence algebra

Okay, now that we have defined this set and some operations on it, let's see some examples of members of this set, what information we can get out of these operations, and what kind of operations are "nice" in this context.

Firstly, the most obvious choice of a matrix is OO, the zero matrix (which is clearly a member of the incidence algebra). This doesn't really give us any information though. It still has a role as the additive identity for addition in this algebra.

The second most obvious choice is II, the identity matrix. Note that this is a member of the incidence algebra due to reflexivity of \le. This is a multiplicative identity, and that's something we will need for Möbius inversion.

Thirdly, let's consider the adjacency matrix MM of the relational DAG. Let the corresponding function be ζ\zeta. Clearly, this is also in the incidence algebra. We have the property that ζ(x,y)=1\zeta(x, y) = 1 if xyx \le y, and 00 otherwise.

Spoiler alert: the inverse of this matrix is the Möbius function for the poset and corresponds to things like the number theoretic Möbius function in the divisibility poset and so on.

The Möbius function

Finally, we define the Möbius function μ\mu as follows (It might not feel intuitive at first, but bear with me for a while):

For xyx \not\leq y, μ(x,y)=0\mu(x, y) = 0, μ(x,x)=1xS\mu(x, x) = 1 \, \forall x \in S, and inductively define

μ(x,y)=xz<yμ(x,z)\mu(x, y) = -\sum_{x \le z < y} \mu(x, z)

Note that μ(x,y)\mu(x, y) is always an integer. Also, note that xzyμ(x,z)=1\sum_{x \le z \le y} \mu(x, z) = 1 if x=yx = y and 00 otherwise (in the case x<yx < y, it follows from the identity, in the case when x,yx, y are not comparable, the sum is empty, and in the case when x=yx = y, it consists of only one term equal to 11).

Hence we have for all x,ySx, y \in S,

zSμ(x,z)ζ(z,y)=xzyμ(x,z)={1if x=y0otherwise=I(x,y)\sum_{z \in S} \mu(x, z) \zeta(z, y) = \sum_{x \le z \le y} \mu(x, z) = \begin{cases} 1 & \text{if } x = y\\ 0 & \text{otherwise} \end{cases} = I(x, y)

However, the expression on the left is just (μζ)(x,y)(\mu \zeta)(x, y). Since this holds for all x,yx, y, we have μζ=I\mu\zeta = I.

As we mentioned earlier, this means that μ\mu is the inverse of ζ\zeta, so we must also have ζμ=I\zeta\mu = I (and expanding this gives xzyμ(z,y)\sum_{x \le z \le y} \mu(z, y) is 11 if x=yx = y and 00 otherwise).

The Möbius inversion formula

The Möbius inversion formula in this setting just reduces to a simple identity corresponding to the multiplication of a matrix and a vector.

Formally, the Möbius inversion formula states that if f,gf, g are functions from SS to \mathbb{C} such that g(x)=yxf(y)g(x) = \sum_{y \le x} f(y), then we have f(x)=yxμ(y,x)g(y)f(x) = \sum_{y \le x} \mu(y, x) g(y). The converse also holds true.

For a proof, consider a vector FF with the element at position corresponding to xx being f(x)f(x). Define GG similarly.

Now the condition merely means that G=ζFG = \zeta^\intercal F. Left-multiplying this by μ\mu^\intercal, we get μG=μζF=IF=F\mu^\intercal G = \mu^\intercal \zeta^\intercal F = IF = F. Expanding this gives the Möbius inversion formula. The converse can be proved by reversing all these steps, since μ\mu^\intercal is invertible.

Note that this formula is just a trivial consequence of some matrix multiplication properties; so there's a lot more to the theory we have developed above than just this formula. However, we shall limit ourselves to only the applications of this formula to get to some interesting results.

Some applications

The generalized principle of inclusion-exclusion

Let's first talk about the contexts in which we use generalized inclusion-exclusion. We usually want to compute some function ff of a finite set AA, but it is much easier to compute the sum of ff over all subsets of AA.

That is, it is easy to compute g(A)=BAf(B)g(A) = \sum_{B \subseteq A} f(B), but hard to compute f(A)f(A) itself.

The generalized principle of inclusion-exclusion states that f(A)=BA(1)|A||B|g(B)f(A) = \sum_{B \subseteq A} (-1)^{|A| - |B|} g(B).

For a proof, let's look at the poset corresponding to the power set of a finite set FF such that AFA \subseteq F, equipped with the \subseteq relation. Clearly, using the Möbius inversion formula, we have f(A)=BAμ(B,A)g(B)f(A) = \sum_{B \subseteq A} \mu(B, A) g(B).

We claim that μ(B,A)=(1)|A||B|\mu(B, A) = (-1)^{|A| - |B|} for BAB \subseteq A. We'll do this by fixing BB and doing an induction on |A||A|.

For |A|=|B||A| = |B|, this is clear. Now suppose this is true for all |A|<k|A| < k. Consider the case when |A|=k|A| = k. We have (from the definition of the Möbius function) that

μ(B,A)=BCA(1)|C||B|=C\BA\B(1)|C\B|=(1)|A||B|\mu(B, A) = -\sum_{B \subseteq C \subsetneq A} (-1)^{|C| - |B|} = -\sum_{\emptyset \subseteq C \setminus B \subsetneq A \setminus B} (-1)^{|C \setminus B|} = (-1)^{|A| - |B|}

where the last equality follows from the binomial theorem, and the proof is complete.

The usual principle of inclusion-exclusion as a special case

Now let's see how the usual inclusion-exclusion principle is just a special case of this generalized formula.

The inclusion-exclusion principle states that for sets A1,A2,,AnA_1, A_2, \ldots, A_n, we have

|A1An|=1in|Ai|1i<jn|AiAj|++(1)n1|A1An||A_1 \cup \cdots \cup A_n| = \sum_{1 \le i \le n} |A_i| - \sum_{1 \le i < j \le n} |A_i \cap A_j| + \cdots + (-1)^{n-1}|A_1 \cap \cdots \cap A_n|

For a proof, consider the poset corresponding to the power set of [n]:={1,,n}[n] := \{1, \ldots, n\}, equipped with \supseteq.

Also define U=A1AnU = A_1 \cup \ldots \cup A_n.

For a set S[n]S \subseteq [n], define

f(S)=f(S) = number of elements in all AiA_i, where iSi \in S, but not in any other AjA_j-s.

g(S)=g(S) = number of elements in all AiA_i, where iSi \in S, and possibly in other AjA_j-s.

Then we have g(S)=|iSAi|g(S) = |\cap_{i \in S} A_i|. Also, we have g(S)=TSf(T)g(S) = \sum_{T \supseteq S} f(T). Using the Möbius inversion formula, we get

0=f()=Aμ(,A)g(A)=A2[n](1)|A||iAAi|0 = f(\emptyset) = \sum_{A \supseteq \emptyset} \mu(\emptyset, A) g(A) = \sum_{A \in 2^{[n]}} (-1)^{|A|} |\cap_{i \in A} A_i|

which completes the proof.

Subset sum convolution

I won't be explaining the ideas behind subset sum convolution in too much detail, but would refer you to this post that covers the necessary material.

The poset in consideration is the power set of [n][n], and its elements are in bijection with nn bit integers where the ithi^\mathrm{th} bit is 11 iff ii is in the set.

The transforms zz, μ\mu are the ζ\zeta and μ\mu functions here (note that a two-variable function in the incidence algebra is just a map that transforms a one-variable function to another one-variable function, which can be thought of as the multiplication of the matrix corresponding to the two-variable function with the vector corresponding to the one-variable function), and σ\sigma corresponds to a diagonal matrix with the element on the diagonal being 1 if the element of the poset has an even number of elements, and -1 otherwise.

The rest of that post can conveniently be studied in terms of the incidence algebra for this poset.

The number-theoretic Möbius function

For this, we will need to cover a bit more ground, since the divisibility poset has nicer properties compared to other posets (it is what is called a lattice).

What is a lattice

A lattice is a special kind of a poset, where for each pair of elements (x,y)(x, y), there exist elements xyx \land y (called the meet of x,yx, y) and xyx \lor y (called the join of x,yx, y) such that

  • zxz \le x and zyz \le y \implies zxyz \le x \land y
  • xyxx \land y \le x and xyyx \land y \le y
  • xzx \le z and yzy \le z \implies xyzx \lor y \le z
  • xxyx \le x \lor y and yxyy \le x \lor y

Intuitively, this basically means that each pair of elements in the Hasse diagram has a least common ancestor and highest common descendent. For instance, the power set poset and the divisibility poset are both lattices (exercise: what are the meet and the join functions for these lattices?).

Note that inductively, every finite subset of a lattice has a meet and a join. Also, due to antisymmetry of \le, the meet and the join are unique for each pair x,yx, y. Hence for finite lattices LL, there is a unique "maximum" (say L\top_L) and a unique "minimum" (say L\bot_L).

A theorem for lattices

Theorem: For a lattice (L,)(L, \le), and L<aL\bot_L < a \in L, xa=Lμ(L,x)=0\sum_{x \lor a = \top_L} \mu(\bot_L, x) = 0.

Essentially, this theorem states that the sum of the Möbius function applied on all L,x\bot_L, x where the "LCA" of aa and xx is L\top_L is 0.

For a proof, we first note that because of the second Möbius identity (using the product of μ\mu and ζ\zeta), we have:

xa=Lμ(L,x)=xμ(L,x)xayμ(y,L)\sum_{x \lor a = \top_L} \mu(\bot_L, x) = \sum_{x} \mu(\bot_L, x) \sum_{x \lor a \le y} \mu(y, \top_L)

Using the definition of \lor and rearranging the sums reduces the sum to ayμ(y,L)xyμ(L,x)\sum_{a \le y} \mu(y, \top_L) \sum_{x \le y} \mu(\bot_L, x)

Using the first Möbius identity and noting that yy is not L\bot_L, reduces this expression to 00, as needed.

Now let's try to compute μ(a,b)\mu(a, b) for a,ba, b in the divisibility poset. Note that if aa doesn't divide bb or if a>ba > b, this is just 0. So it suffices to consider the case where b=axb = ax for some natural number xx.

Let's look at all positive integers rr such that 1rx1 \mid r \mid x. This forms a lattice LL. Similarly, the set of positive integers rr' such that araxa \mid r' \mid ax also forms a lattice, which is isomorphic to LL. Also, since by the inductive definition, μ(a,ar)\mu(a, ar) depends only on the elements ss such that asara \mid s \mid ar, so we must have μ(a,ar)=μ(1,r)\mu(a, ar) = \mu(1, r). So it suffices to compute μ(1,x)\mu(1, x) for all xx.

We claim that for xx being square-free, μ(1,x)\mu(1, x) is (1)k(-1)^k if xx is a product of kk distinct primes, and it is 00 otherwise. To show this, we induct on xx.

Note that 1=L1 = \bot_L and x=Lx = \top_L. The claim is trivial for x=1x = 1. Now suppose pp is a prime that divides xx. Using the previously proved theorem, we get μ(1,x)=yx,yp=xμ(1,y)\mu(1, x) = -\sum_{y \ne x, y \lor p = x} \mu(1, y).

If p2p^2 divides bb, then the sum on the right is empty (as \lor is just the LCM). Otherwise, the sum contains only one yy, which equals x/px / p, so using the inductive hypothesis, we are done.

As a side-note, for "reasonable" functions that only depend on the ratio between their parameters, we can "compress" them into a function that takes a single parameter. The "product" in that case becomes Dirichlet convolution, which has tons of interesting properties. A relevant post for that can be found here.

Also note that if we restrict ourselves to the divisibility posets of square-free numbers nn, they are isomorphic to the power set poset of the set of prime divisors of nn, so all power set posets are just special cases of divisibility posets!

Finite Difference Calculus

In this case, the poset is particularly simple, and it equals ({0},)(\mathbb{N} \cup \{0\}, \le). The function μ\mu takes a value of 11 at (n,n)(n, n), 1-1 at (n,n+1)(n, n + 1), and 00 everywhere else. "Convolution" with μ\mu is equivalent to the difference operator, and using Möbius inversion, the prefix sum operator is that with ζ\zeta.

Final notes and remarks

Firstly, the content of this post might not be super-useful in competitive programming, but the intent behind this was to introduce a very standard technique in combinatorics that people don't realize is super-flexible.

Secondly, it might not be easy to understand in a first reading, so it's encouraged to work out the math through the post, and understand why and how these things are tied together. It will also help to make some Hasse diagrams of posets, and look at concrete examples of operations we described here.

Finally, there might be errors and typos in the post, as well as better ways to approach the content of the post. If you find any, please let me know in the comments!

Cite this post
@online{mobius-inversion-on-posets-2021,
  author    = {nor},
  title     = {Generalized Möbius Inversion on Posets},
  year      = {2021},
  month     = {12},
  day       = {27},
  url       = {https://nor-blog.pages.dev/posts/2021-12-27-mobius-inversion-on-posets},
}