Tuesday, December 8, 2015

Lovász Local Lemma (non-constructive)

I read the short (and interesting) presentation of the original local Lovász lemma from a lecture by Alistair Sinclair. I report here the proof of this lemma (which was left as an exercise, although a simpler case was proven whose proof contains all the ingredients).

This lemma is extremely useful in the probabilistic method: if you can prove that the set of objects satisfying a certain number of constraints has positive probability (or density), then this set cannot be empty. In particular, an object satisfying those constraints do exist. However, this method does not explicitly build the desired object. The constructive version of the Lovász local lemma was given by Moser and Tardos, as explained in this previous post.

  1. Lovász Local Lemma

Let $A_1,\dots,A_n$ be a set of bad events. These events are not necessarily mutually independent. For each index $1\le i \le n$, we denote by $D_i \subseteq \{1,\dots,n\}$ the set of indices such that the events $A_i, A_j$ with $j \not\in D_i$ are mutually independent. In other words, $D_i$ is the dependency set of the event $A_i$.

The lemma states:
If there exists real values $x_1,\dots,x_n \in [0,1)$ such that $$ \mathbb{P}(A_i) \le x_i \cdot \prod_{j \in D_i} (1-x_j) ~~~~~ (\star) $$ then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n (1-x_i).$$
 The proof goes as follows. We first show that for any index $1\le i \le n$ and any strict subset $S \subset \{1,\dots,n\}$, we have
$$
  \mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) \le x_i.
$$
We prove this claim by induction on the size $m = |S|$. The case $m = 0$, i.e., $S = \emptyset$, follows from $(\star)$. Now, assume the result holds for all subset of size less than $m$. We split the set $S$ in two disjoint parts $S_D = S \cap D_i$ and $S_I = S - S_D$. The indices in $S_D$ (resp. $S_I$) are exactly the indices in $S$ of events that are not independent (resp. that are independent) from $A_i$.  We have, by the chain rule of conditional probabilities:
$$
  \begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
     &= \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{k \in S_D} \overline{A_k} \land \bigwedge\limits_{l \in S_I} \overline{A_l} \right) \\
     &= \frac{\mathbb{P}\left(A_i \land \bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\
     &\le \frac{\mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}
 \end{align*}
$$
Since the events $A_i, A_l$ , $l \in S_I$, are mutually independent, we have:
$$
  \begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
     &\le \frac{\mathbb{P}\left(A_i \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\
     &\le  \frac{x_i \cdot \prod_{j\in D_i} (1-x_j)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}
 \end{align*}
$$
Regarding the denominator, denoting $S_D = \{k_1,\dots,k_s\}$, and $S_D^r = \{k_1,\dots,k_r\}$ for any $1 \le r \le s$, we have
$$
  \begin{align*}
    \mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)
     &= \prod_{r = 1}^s \mathbb{P}\left( \overline{A}_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\\
      &=  \prod_{r = 1}^s \left(1 - \mathbb{P}\left( A_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\right) \\
      &\ge \prod_{r = 1}^s (1 - x_{k_r}) =  \prod_{j \in S_I} (1 - x_j).
  \end{align*}
$$
The last inequality follows from the induction hypothesis. We conclude that
$$
  \begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
     &\le x_i \cdot \frac{\prod_{j \in D_i} (1 - x_j)}{\prod_{j \in S_I} (1 - x_j)} \\
      &\le x_i ~~~ \text{ since } S_I \subseteq D_i.
  \end{align*}
$$
This concludes the induction proof.

We apply this inequality to prove the Lovász lemma. Indeed, we have
$$
  \begin{align*}
    \mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right)
        &= \prod_{i=1}^n \left( 1 - \mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{1 \le j < i} \overline{A}_j \right)\right) \\
        &\ge \prod_{i=1}^n (1-x_i).
  \end{align*}
$$

  2. Other formulations

The previous formulation is quite general. A restricted formulation is given by the following:
Assume $\mathbb{P}(A_i) \le p < 1$ for all $1 \le i \le n$. Let $d$ be the maximum size of the dependency sets $D_i$'s. If $e\cdot p\cdot (d+1) \le 1$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge (1- \frac{1}{d+1})^n > 0.$$
Indeed,let's take $x_i = 1/(d+1)$. It suffices to prove that
$$
  p \le \frac{1}{d+1} \cdot \left(1-\frac{1}{d+1}\right)^d
$$
But we have
$$
  \begin{align*}
    \left(1-\frac{1}{d+1}\right)^d \ge \left(1-\frac{1}{d}\right)^d
                                                         \ge \frac{1}{e}
  \end{align*}
$$

Another restricted formulation is given as follows:
If $\sum_{j \in D_i} \mathbb{P}(A_j) \le 1/4$ for all $i$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n \left(1- 2\mathbb{P}(A_i)\right) > 0.$$
Indeed, let's take $x_i = 2\mathbb{P}(A_i)$. The condition imposes that each $x_i \in [0,1]$. One can easily shows by induction on the size of $D_i$ that
$$
  \begin{align*}
  \prod_{j \in D_i} \left(1- 2\mathbb{P}(A_j)\right) &\ge 1 - \sum_{j\in D_i} 2\mathbb{P}(A_j) \\
           &\ge \frac{1}{2}
  \end{align*}
$$
Therefore, $\mathbb{P}(A_i) \le x_i \prod_{D_i} (1-x_j)$ is satisfied for all $i$.

pb

Thursday, December 3, 2015

Gibbs inequality applied to binomial inequality

  1. Gibbs inequality

Let $P = \{p_x\}_{x \in X}$ and $Q  = \{q_x\}_{x \in X}$ be two probability distributions on a finite set $X$. I simply assume that the $q_x$ are all positive, $q_x > 0$. The Gibbs inequality states that
$$
 - \sum_{x \in X}  p_x \cdot \log p_x \le - \sum_{x \in X} p_x \cdot \log q_x
$$
with equality iff the two distributions are identical. We use the convention that $x\cdot \log x = 0$ if $x = 0$. This result follows from the classic technique of convex inequality   (a.k.a Jensen's inequality). Indeed, the function $f(x) = x\cdot \log x$ is convex on the positive reals, so
$$
  \begin{align*}
    \sum_x p_x \cdot \log  \frac{p_x}{q_x} &= \sum_x q_x \cdot \frac{p_x}{q_x} \cdot \log \frac{p_x}{q_x} \\
                &= \sum_x q_x \cdot f\left( \frac{p_x}{q_x}\right) \\
                &\ge f\left(\sum_x q_x \frac{p_x}{q_x}\right) \\
                &\ge f(1) = 0
  \end{align*}
$$

  2. Binomial and entropy

Varying the probability distributions in the Gibbs inequality can yield quite surprising result. For instance, let $n$ be a positive integer, and $0 \le p \le 1$ a real number such that $n\cdot p$ is an integer. Then
$$
  \begin{equation*}
    \binom{n}{n p} \le 2^{n\cdot h(p)}
  \end{equation*}
$$
where $h(p) = - p\cdot \log - (1-p)\cdot \log (1-p)$. The function $h$ represents the entropy of a coin with bias $p$. The intuition behind this inequality is the following: to identify a subset of $np$ elements in a set of $n$ elements $\{1,2,\dots,n\}$, you need at least $\log \binom{n}{np}$ bits. Now, there is an approximate procedure to identify such a subset: for each element $i = 1,2,\dots,n$, you toss the bias coin, and add the element $i$ to your subset iff the coin turns, e.g., head. With such a procedure, on average, you identify a subset of size $np$, and you could associate with this subset the corresponding sequence of coin tosses as an identifier. By Shannon's source coding theorem, you can store such sequences of coin tosses using about $n\cdot h(p)$ bits. The inequality states that the latter procedure uses (slightly) more bits than necessary.

The proof goes as follows. Let $X_1,\dots,X_n$ be independent random variables following the same distribution
$$
  \begin{align*}
    \mathbb{P}(X_i = 1) &= p \\
    \mathbb{P}(X_i = 0) &= 1-p
  \end{align*}
$$
Let $X = (X_1,\dots,X_n)$ be the joint variable, with values in $\{0,1\}^n$. Consider also the uniform random variable $U$ with values in the subset of vectors $u \in \{0,1\}^n$ having exactly $k = np$ entries set to $1$ (and the others to $0$). We denote by $|u|$ the number of entries set to $1$ in $u$. The probability distribution of $U$ is given by, for any $u \in \{0,1\}^n$,
$$
  \begin{equation*}
    \mathbb{P}(U = u) = \left\{ \begin{array}{cc}
                                                   0 & \text{ if } |u| \ne k\\
                                                   \binom{n}{k}^{-1} & \text{ otherwise}
                                                   \end{array}
                                          \right.
  \end{equation*}
$$

Applying the Gibbs inequality to the distributions of $X$ and $U$, we get
$$
  \begin{align*}
    \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(U = u)
    &\le
    \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(X = u)\\
  \end{align*}
$$
$$
  \begin{align*}
    \log \binom{n}{k}
    &\le
    \sum_{u,~ |u|=k} - \binom{n}{k}^{-1} \cdot \log p^k(1-p)^{n-k}\\

    &\le -k \cdot \log p -(n-k)\cdot\log (1-p) \\
     &\le n \cdot h(p) ~~~~\text{ (since } k = np \text{)}.
  \end{align*}
$$

  3. The bound is quite tight

The bound on $\binom{n}{np}$ is quite tight. Indeed, Stirling's formula states that:
$$
  n! \sim \sqrt{2\pi n}\cdot n^n \cdot e^{-n}
$$
Applying this formula to the binomial, we get (with $q = 1-p$)
$$
  \begin{align*}
    \binom{n}{np} &= \frac{n!}{(np)!(nq)!} \\
                               &= \frac{\sqrt{2\pi n}\cdot n^n \cdot e^{-n}}{\sqrt{2\pi n p}\cdot (np)^{np} \cdot e^{-np}\cdot \sqrt{2\pi nq}\cdot (nq)^{nq} \cdot e^{-nq}}\\
              &= \frac{2^{n\cdot h(p)}}{\sqrt{2\pi \cdot n \cdot p\cdot (1-p)}}.
  \end{align*}
$$
Taking the logarithm of both sides, we see that
$$
  \frac{1}{n} \log \binom{n}{np} \rightarrow h(p) \text{ as } n \rightarrow \infty
$$
This means that, as $n$ gets larger and larger, the ``coin-tossing procedure'' to identify a subset of size $np$ in $n$ gets more and more precise, in the sense that it uses the optimal number of bits $\log \binom{n}{np} \simeq n \cdot h(p)$.

pb

Tuesday, December 1, 2015

Weird binomial identities

In this post, I present some relatively weird binomial identities. I will start by the symmetric case, and then consider the twisted case.

  1. Symmetric case

We prove the following identity
$$
  C_{l,l}(y) \underset{def}{=} \sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}
$$
where $y$ is a formal variable, and $\rho = y/(1-y)$. This identity follows from the classical method of coefficient identification.

Indeed, let's introduce a second variable $x$ and consider the polynomial
$$
  P = \sum_{i=0}^l y^i \cdot (1+x)^{l+i}
$$
By applying the binomial formula to $(1+x)^{l+i}$, we see that the coefficient of $x^l$ in $P$ is
$$
  C_{l,l}(y) = \sum_{i=0}^l \binom{l+i}{l} y^i
$$

On the other hand, in $P$, we have a geometric series
$$
  \begin{align}
    P &= \sum_{i=0}^l y^i  (1+x)^{l+i} \\
       &= (1+x)^l \cdot \sum_{i=0}^l (y+y x)^i \\
       &= (1+x)^l \cdot \frac{1-y^{l+1}(1+x)^{l+1}}{1-y-y x} \\
       &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \frac{1}{1-\rho x} \\
       &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \sum_{s \ge 0} \rho^s  x^s
  \end{align}
$$

From this expression, the coefficient of $x^l$ is given by the convolution
$$
  \begin{align}
  C_{l,l}(y) &= \sum_{s = 0}^l \left( \frac{1}{1-y} \binom{l}{l-s} - \frac{y^{l+1}}{1-y} \binom{2l+1}{l-s}\right) \cdot \rho^s \\
        &= \frac{1}{1-y} \cdot \sum_{s = 0}^l \binom{l}{l-s} \rho^s - \frac{y^{l+1}}{1-y} \sum_{s=0}^l \binom{2l+1}{l-s} \rho^s \\
        &= \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y} \rho^l \sum_{s=0}^l \binom{2l+1}{l-s} \left(\frac{1}{\rho}\right)^{l-s} \\
  \end{align}
$$

Thanks to the symmetry $\binom{2l+1}{l-s} = \binom{2l+1}{l + 1 + s}$, we see that the sum in the last expression is
$$
   \frac{1}{2} \sum_{s = 0}^{2l+1} \binom{2l+1}{s} \left(\frac{1}{\rho}\right)^s = \frac{1}{2} \left(1+\frac{1}{\rho}\right)^{2l+1}
$$

Therefore, writing the two expressions for the coefficient yields the claim
$$
  \sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}
$$

In particular, with $y = 1/2$, we get
$$
   \sum_{i=0}^l \binom{l+i}{l} \frac{1}{2^i} = 2^l
$$

  2. Twisted case

We now consider the twisted case
$$
  C_{a,b}(y) \underset{def}{=} \sum_{i=0}^a \binom{b+i}{b} y^i.
$$
The method is exactly the same except that we compute the coefficient of $x^b$ in the polynomial
$$
  \sum_{i=0}^a y^i (1+x)^{b+i}.
$$
We get
$$
  C_{a,b}(y) = \frac{(1+\rho)^b}{1-y} - \frac{y^{a+1}}{1-y} \cdot \sum_{s=0}^b \binom{a+b+1}{b-s} \rho^s.
$$
Unfortunately, the last sum does not simplify as before because of the asymmetry $a \ne b$ (a priori).

However, we can measure this twist by focusing on the quantity $y^b C_{a,b}(y) + y^a C_{b,a}(y)$. Regarding the parts containing the sums, we have
$$
  \begin{align}
            \sum_{s=0}^b &\binom{a+b+1}{b-s} \rho^s + \sum_{s=0}^a \binom{a+b+1}{a-s} \rho^s \\
     &=  \rho^b \cdot \sum_{s=0}^b \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \rho^a \cdot \sum_{s=0}^a \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s \\

     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=b+1}^{a+b+1} \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \dots \\


     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s + \dots \\

     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} + (\rho^a - \rho^b) \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\  \end{align}
$$

Whence the identities
$$
     y^b C_{a,b}(y) + y^a C_{b,a}(y) \\
     = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y}
      - \frac{y^{a+b+1}}{1-y} \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^a - \rho^b) \sum_{s=0}^a \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\

     = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y}
     - \frac{y^{a+b+1}}{1-y} \rho^a \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^b - \rho^a) \sum_{s=0}^b \binom{a+b+1}{b-s} \left(\frac{1}{\rho}\right)^s
$$

In particular, if $y = 1/2$, we have $\rho = 1$, and the twist is canceled so that
$$
  \sum_{i=0}^a \binom{b+i}{b} \frac{1}{2^{b+i}} + \sum_{i=0}^b \binom{a+i}{a} \frac{1}{2^{a+i}} = 2
$$
which is, somewhat, funny.

Question. Is there a combinatorial proof of these identities ?

pb

Friday, November 20, 2015

Explicit computations of some direct limits of ordered groups

The basic definition of direct limits of groups is not so difficult. But concrete computations of these turn out not to be so easy for beginners (like me). I present in this post some computations of this kind.

    1. Direct limit - definition and construction

The general definition of direct limit can be found on wikipedia. Here, I will focus on direct limits of partially ordered abelian groups (poa-groups). Recall that a poa-group is simply an abelian group $(G,+,0)$ equipped with a partial order $\le$ which is translation invariant: $a \le b$  implies $a+g \le b+g $ for all $g$. The subset $G_+$ of group elements $a \ge 0$ is the positive cone of $G$. A morphism $f : G \rightarrow H$ of poa-groups is a group homomorphism that agrees with the partial orders: $a \le b$ in $G$ implies $f(a) \le f(b)$ in $H$.

The ingredients for a direct limit are: a directed set $(I,\le)$ of ``indices'', a family $(G_i)_{i \in I}$ of poa-groups indexed by $I$, and a family $f_{i,j} : G_i \rightarrow G_j$ of morphisms for every pair $i \le j$ in $I$. These morphisms have to satisfy a compatibility condition, namely, for any $i \le j \le k$ in $I$, $f_{i,k} = f_{j,k} \circ f_{i,j}$. This data forms what is called a direct system of poa-groups. Then there is a universal poa-group $L$ and morphisms $\phi_i : G_i \rightarrow L$ such that for every pair $i \le j$ in $I$, $\phi_j = f_{i,j} \circ \phi_i$. The term universal refers to the usual property in category theory of ``being the most general entity satisfying the given constraints''. Thanks to this universality, the poa-group $L$ is unique up to isomorphism, and is usually denoted by
$$
  L = \lim G_i \xrightarrow{f_{i,j}} G_j
$$
Because I want to perform concrete computations, I won't insist on the universal characterization. Instead, I present a (classical) explicit construction of $L$. We first take the disjoint union $U = \bigsqcup_{i \in I} G_i$. As a set, $U$ is made of elements $(i,a)$ with $i \in I$ and $a \in G_i$. We now define an equivalence relation: $(i,a) \sim (j,b)$ iff there exists $k \ge i,j$ such that $f_{i,k}(a) = f_{j,k}(b)$. Intuitively, two elements are equivalent iff they eventually agree. We denote by $[i,a]$ the equivalence class of $(i,a)$. The poa-structure is defined as follows:
  • The zero element is defined by $0 = [i,0]$ for any $i \in I$. The choice of $i$ does not matter.
  • The addition is defined by $[i,a] + [j,b] = [k, f_{i,k}(a) + f_{j,k}(b)]$ for some $k \ge i,j$. The choice of the representatives and $k$ does not matter.
  • The partial order is defined by: $[i,a] \le [j,b]$ iff for some $k \ge i,j$, $f_{i,k}(a) \le f_{j,k}(b)$ in $G_k$. Again, the choice of the representatives and $k$ does not matter.

In the following examples, we will consider an even more restricted settings. Indeed, any poa-group morphism $f : G \rightarrow G$ yields a direct system $f_{i,j} : G_i \rightarrow G_j$ where the directed set is the set of natural integers $I = \mathbb{N}$ (with the usual order), each $G_i$ is a copy of $G$, and $f_{i,j} = f^{j-i}$ is the $j-i$-th iterate of $f$.By ``computing the direct limit'', I mean finding a poa-group isomorphic to the direct limit, but which is easier to work with.

   2. Dyadic rationals

We consider the direct limit generated by the multiplication by $2$ on integers, denoted by $\mathbb{Z} \xrightarrow{2} \mathbb{Z}$.
$$
  L = \lim \mathbb{Z} \xrightarrow{2} \mathbb{Z} \xrightarrow{2} \dots
$$
The intuition goes as follows. By the definition given above, we have $[i,a] = [i+1,2 \cdot a]$, i.e., each time we move one step forward, we multiply the data by $2$. Therefore, intuitively, moving one step backward amounts to dividing by $2$. Abusing the notations, we could write $[i,a] = [i-1,a/2]$, and thus $[i,a] = [0,a/2^i]$. This suggests considering the poa-group $\mathbb{Z}[\frac{1}{2}]$ of dyadic rationals:
  • Its elements are the fractions $\frac{a}{2^i}$ in $\mathbb{Q}$ with $a \in \mathbb{Z}$ and $i \in \mathbb{Z}$.
  • The poa-structure is the one induced by $\mathbb{Q}$.
We now define $\phi : \mathbb{Z}[\frac{1}{2}] \rightarrow L$ by
$$
  \phi : \frac{a}{2^i} \mapsto [i,a]
$$
We show that $\phi$ is an isomorphism of poa-groups. First, it is well defined: if $a/2^i = b/2^j$ in  $\mathbb{Z}[\frac{1}{2}]$, then for any $k \ge i,j$, we have $2^{k-i} \cdot a = 2^{k-j}\cdot b$, whence $[i,a] = [j,b]$. Second, $\phi$ agrees with addition since
$$
  \frac{a}{2^i} + \frac{b}{2^j} = \frac{2^{k-i}\cdot a + 2^{k-j}\cdot b}{2^k}
$$
Third, $\phi$ agrees with the partial order since
$$
   \frac{a}{2^i} \le \frac{b}{2^j} \Leftrightarrow \frac{2^{k-i}\cdot a}{2^k} \le \frac{2^{k-j}\cdot b}{2^k}
$$
Finally, $\phi(a/2^i) = 0 = [i,0]$ implies that $a = 0$. Since $\phi$ is obviously surjective, $\phi$ is an isomorphism of poa-groups.

    3. ``Fibonacci'' integers

I do not know if this name is appropriate, but it turns out that the construction below is related to the famous Fibonacci sequence; yet, I will not cover this topic here.

Consider the poa-group $\mathbb{Z}^2$ with $(a,b) \le (c,d)$ iff $a \le b$ and $c \le d$, and the multiplication $\mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2$ by the matrix
$$
    A = \left(\begin{array}{cc}
                   1 & 1 \\
                   1 & 0 \\
                   \end{array}\right)
$$
We compute the direct limit $L = \lim \mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2 \dots$. As in the previous section, the idea is consider the element $[k,u]$ as the informal element $u/A^k$. To give a coherent meaning to this element, we ``notice'' the following. Let $\tau = (1+\sqrt{5})/2$ denote the golden mean. We have the $\tau^2 = \tau + 1$. Therefore, the group $G = \mathbb{Z}[\tau]$ of integral combinations of powers of $\tau$ decomposes as $G = \mathbb{Z}\tau + \mathbb{Z}$. If we identify the vectors $(1,0)$  and $(0,1)$ in $\mathbb{Z}^2$ with $\tau$ and $1$ in $\mathbb{Z}[\tau]$ respectively, then multiplication by $A$ on $\mathbb{Z}^2$ translates into multiplication by $\tau$ in $\mathbb{Z}[\tau]$. Also, the order structure on $G$ is defined by: $\tau\cdot a+ b \le \tau\cdot c + d$ iff $a \le c$ and $b \le d$. By the matrix form, we see that multiplication by $\tau$ agrees with this order: $u \le v$ implies $\tau\cdot u \le \tau\cdot v$. Thanks to this trick, the direct limit can be written (is isomorphic to)
$$
  L = \lim G \xrightarrow{\tau} G \dots
$$
and we can compute it as in the case of dyadic integers. We consider the poa-group defined as follows:
  • Its elements are $\frac{\tau\cdot a + b}{\tau^k}$ (the quotient being taken in $\mathbb{R}$) with $a,b,k \in \mathbb{Z}$.
  • Its poa-structure is the one induced by $\mathbb{R}$.
  • Since $1/\tau = \tau-1$, this poa-group is actually $\mathbb{Z}[\tau] = \tau\mathbb{Z} + \mathbb{Z}$ with the order induced by the one of $\mathbb{R}$. Note that it is important to distinguish $G$ and $\mathbb{Z}[\tau]$ although they have the same underlying group structure. The only difference is between their order relations.
We then define the function $\phi : \mathbb{Z}[\tau] \rightarrow L$ by
$$
    \frac{\tau \cdot a + b}{\tau^k} \mapsto [k, \tau\cdot a + b]
$$
As in the case of dyadic integers, we verify that $\phi$ is a poa-isomorphism. First, it is well defined: if $(\tau\cdot a + b)/\tau^k = (\tau\cdot c + d)/\tau^l$, then $\tau^{m-k}\cdot(\tau\cdot a + b) = \tau^{m-l}\cdot(\tau\cdot c + d)$ for some $m \ge k,l$, and $[k,\tau\cdot a+b] = [l,\tau\cdot c + d]$. Second, $\phi$ agrees with addition since
$$
  \frac{\tau\cdot a + b}{\tau^k} + \frac{\tau\cdot c + d}{\tau^l} = \frac{\tau^{m-k}\cdot(\tau\cdot a + b) + \tau^{m-l}\cdot(\tau\cdot c + d)}{\tau^m}.
$$
The fact that $\phi$ agrees with the order is less trivial. Since $\phi$ agrees with addition, it suffices to check that $\phi$ sends the positive cone of $\mathbb{Z}[\tau]$ to the positive cone of $L$. This amounts to prove that if $\tau\cdot a + b \ge 0$ in $\mathbb{R}$ with $a,b \in \mathbb{Z}$, then there exists $k\in \mathbb{Z}$ and two non-negative integers $a',b' \in \mathbb{N}$  such that
$$
   \tau\cdot a + b = \frac{\tau\cdot a' + b'}{\tau^k} ~~~~(\bigstar)
$$ To prove this, we shall turn back to the matrix form. In the base $(\tau,1)$, multiplication by $\tau$ is modeled by the matrix $A$. Consider the action of $A$ on the plane $\mathbb{R}^2$. Let $\Delta$ denote the line $\tau\cdot x + y = 0$, and $\Delta^+$ the half-plane $\tau\cdot x + y \ge 0$. The proof of $(\bigstar)$ amounts to show that iterating $A$ on any point of $\Delta^+$ eventually leads to a point of the positive quadrant $\{(x,y)~|~ x,y \ge 0\}$.

Basic matrix algebra shows that the eigenvalues of $A$ are $\tau$ and $\overline{\tau} = (1-\sqrt{5})/2$. The eigenspace associated with $\tau$ is $\nabla ~:~ \overline{\tau}\cdot x + y = 0$, while the eigenspace associated with $\overline{\tau}$ is $\Delta ~:~ \tau\cdot x + y = 0$. We have $|\tau| > 1$ and $|\overline{\tau}| < 1$, so $A$ dilates $\nabla$, while $A$ contracts $\Delta$. By Figure 1, we see that iterating $A$ sufficiently enough moves any point of the half-plane into the positive quadrant.

Fig. 1 - Action of $A$


Therefore, we just showed that the direct limit $L$ is isomorphic to $\mathbb{Z}\tau + \mathbb{Z}$, with positive cone $\{a\cdot\tau + b \ge 0 ~|~ a,b \in \mathbb{Z}\}$.

pb

Tuesday, November 3, 2015

Distributed interpretation of the constructive Lovász Local Lemma

I follow the paper A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, by J. Messner and T. Thierauf. Let $\phi$ be a $k$-CNF formula with $n$ variables, and $m$ clauses, each clause containing exactly $k$ literals (i.e., a variable, or the negation of a variable). The goal is to explicitly build an assignment of truth values to the variables so that each clause contains at least one literal evaluating to true.

  1. Original algorithm

We define the graph $\Gamma'$, having the clauses of $\phi$ as nodes. In $\Gamma'$,  two clauses $C$ and $D$ define an edge if $C$ has a literal that occurs negated in $D$. We denote by $d$ the maximum degree of $\Gamma'$.

The symmetric version of the Lovász local lemma states that $\phi$ is satisfiable if the clauses do not "interact too much", i.e., more precisely
$$
  \frac{d^d}{(d-1)^{(d-1)}} \leq 2^k - 1
$$
Moser and Tardos gave a constructive proof of the previous result, in the sense that they defined a (randomized) algorithm that (efficiently) produces a satisfying assignment for $\phi$ within $O(m)$ time steps. This algorithm runs as follows:
  • Pick a random assignment for the variables of $\phi$
  • While some clause in $\phi$ is not satisfied
    • Choose (deterministically) an unsatisfied clause $C$
    • Reassign the variables in $C$ independently at random
  •  Output the satisfying assignment

      2. Reformulation

    We now proceed to the interpretation of the proof from the point of view of distributed algorithms. We define an event $e$ as a pair $(C,b)$ where $C$ is a clause, and $b$ is a bit assignment of the variables in $C$. Two events are said to be independent if their clauses are not neighbours in $\Gamma'$.

    Let $\gamma$ be a configuration, i.e., a bit assignment of the $n$ variables in the formula. The event $e = (C,b)$ is enabled in $\gamma$ if $\gamma$ does not satisfy  $C$, i.e., all the literals in $C$ evaluate to false. We say that the event  is applied to $\gamma$ when the variables of clause $C$ are reassigned according to $b$. If $\gamma'$ is the resulting configuration, we denote by $\gamma \xrightarrow{e} \gamma'$ this transition.

    We now introduce a normal form for executions inspired by the work of Cartier and Foata in trace monoids. The idea stems from the fact that if two events $e,e'$ are independent and enabled in $\gamma$, then these events can be applied in any order, and yield the same configuration. One can model a schedule of events as a word $e_0\cdot e_1 \dots e_{s-1}$ in the trace monoid generated by the events, i.e., the quotient of the free monoid over the event alphabet by the commutativity relations induced by event independence. Cartier and Foata showed that any word $w$ in the trace monoid can be uniquely represented by the following normal form
    $$
         V_0 | \dots |  V_{t-1}
    $$
    where each $V_i$ is composed of pairwise independent events, and each event in $V_{i+1}$ depends on some event in $V_i$. An execution can then be defined as a pair $(\gamma_0, S)$ where $\gamma_0$ is the initial configuration, and $S$ is a schedule given in the normal form above. If $S$ contains $s$ events, then the execution has consumed exactly $n + s\cdot k$ random bits. The normal form can be depicted as a forest:

    Fig. 1 Compact representation of a schedule.
    The dependence relations are subsumed by the graph $\Gamma'$. Each dot on a line $C_i$ represents an event with clause $C_i$. The bit assignments are omitted in the picture. The arrows represent dependences. In particular, there are at most $d$ arrows out of any node. Let $[S]$ denote the the schedule $S$ without the bit assignments, i.e., $[S]$ retains only the causal order of clauses, pretty much as depicted in Fig. 1. We will refer to $[S]$ as the causal structure of $S$.

    I don't know if Messner, Thierauf were aware of this, but it turns out that the forest construction they give in their paper corresponds almost (I have not checked it thoroughly) to the method of Cartier and Foata for computing a normal form of a word in a trace monoid.

      3. Final argument

    The crucial point in the proof of Messner and Thierauf consists in the following observation: a finite execution $(\gamma_0,S)$ is entirely determined by the final configuration $\gamma_{t-1}$ and the causal structure $[S]$. 

    Indeed, if you know $\gamma_{t-1}$ and the last group $U_{t-1}$ of pairwise independent clause in $[S]$, then the bit assignment associated with any clause $C \in U_{t-1}$ is simply the restriction of $\gamma_{t-1}$ to the variables of $C$ since they were lastly modified. In particular, $\gamma_{t-2}$ is obtained from $\gamma_{t-1}$ by simply inverting the value of every variable occurring in one of the clauses of $U_{t-1}$.

    The consequence of this is that the $n + s\cdot k$ (Kolmogorov) random bits can be computed from the pair $(\gamma_{t-1},[S])$. The classic compression method in algorithmic information theory yields
    $$
        n + s\cdot k \leq K(\gamma_{t-1},[S]) + O(1)
    $$
    where $K(\cdot)$ is the Kolmogorov complexity. An upper bound for the right-hand side is given by $n$ (number of bits to encode $\gamma_{t-1}$), and the logarithm of the number of causal structures $[S]$ with exactly $s$ (bit-free) events. Combinatorics yields
    $$
      n + s\cdot k \leq n + (d\cdot s + m) \cdot h\left(\frac{1}{d}\right)+ O(1)
    $$
    where $h(p) = - p\cdot \log p - (1-p)\cdot \log (1-p)$. The bound on $d$ gives $s = O(m)$.

    pb

    [1] Robin A. Moser, Gábor Tardos, A constructive proof of the Lovász local lemma, arxiv.org/abs/0903.0544, 2009
    [2] Jochen Messner, Thomas Thierauf, A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, Theoretical Computer Science, volume 461, pages 55-64, 2012
    [3] Pierre Cartier, Dominique Foata, Problèmes combinatoire de commutation et de réarrangements, Lecture notes in Mathematics, 85, Springer Verlag, 1969 

    Monday, November 2, 2015

    The relativization obstacle

    I was wondering what was exactly the meaning of trying to compare complexity classes relatively to some oracle, and, more precisely, how such an approach may help to compare these classes without oracles. In this post, I will focus on the celebrated result by Baker, Gill and Solovay, stating that:
    (BGS) There exist oracles $A$ and $B$ such that $P^A = NP^A$ and $P^B  \neq NP^B$.
    This result imposes a strong constraint on any attempt to prove or refute $P ~=?~ NP$. Indeed, such a proof has to be non-relativizable. Roughly speaking, a proof $\pi$ of a statement $S$ about complexity classes is said to be relativizable if for any oracle $O$ the same proof $\pi$ proves the statement $S^O$ which is $S$ with all complexity classes being relativized to the oracle $O$. For example, if there were a relativizable proof that $P = NP$, then we would have $P^O = NP^O$ for all oracles $O$; which contradicts the BGS theorem above.

    The lesson to retain is the following:
    For any statement $S$ about complexity classes, if there exists an oracle $O$ such that $S^O$ holds (resp. does not hold), then there are no relativizable proofs that $S$ does not hold (resp. holds).
    But what does a relativizable proof looks like ? Basically, it is a proof that combines composition of functions, and use of universal machines.

    Let's try to prove that $P$ is a strict subset of $NP$ (sic!). More precisely, let's try to prove this claim using diagonal/simulation arguments as in the proof of the time hierarchy theorem. We could reduce (polynomial-time reductions) some $NP$-complete problem to some problem $H$ consisting of pairs $(i,n)$ such that the deterministic machine $i$  accepts input $n$ within $poly(|n|)$, plus, perhaps, additional properties. Then, we would assume that $H$ belongs to $P$, and try to derive a contradiction. We would have a deterministic polynomial-time machine $F$ which accepts $(i,n)$ if $(i,n)$ belongs to $H$, or rejects it otherwise. Then we would build a machine $K$ that satisfies the specification of $H$ on input $i$ if $F(i,i) = 0$, or does something contradicting the specification of $H$ otherwise. If $K$ on input $K$ satisfies the specification of $H$, then $F(K,K) = 0$, i.e., the machine $K$ on input $K$ does not satisfy the specification of $H$. If $K$ on input $K$ does not satisfy the specification of $H$, then $F(K,K) = 1$, i.e., the machine $K$ on input $K$ does satisfy the specification of $H$. Whence the 1'000'000 dollar contradiction.

    The issue is, if we look at the proof above, we see that we can replace each Turing machine $M$ involved by the same machine $M^O$ augmented with some oracle $O$, without affecting the flow of arguments. In other words, such a proof is relativizing. The BGS theorem above prevents any such proof from solving the $P$ vs $NP$ problem. Bye bye dear 1'000'000 dollars.

    pb

    Occurrences of the diagonal argument

    Diagonalization occurs in many places, since its first use by Cantor to prove that the real numbers are uncountable. I will try to maintain in this post, a list of the occurrences of this argument. Just for, so to speak: fun.

    The main idea is the following. We have a set $I$ of ``codes'', and a set, e.g., N of ``inputs''. Very often, if the goal were not true, then there would be a function $F : I \times N \rightarrow \{0,1\}$ that ``efficiently'' encodes some property about the pair $(i,n) \in I \times N$. One can then build from the diagonal a new code $K \in I$ such that, for all $n \in N$, $F(K,n) = 1 - F(n,n)$. Very often, every element $i \in I$ can be encoded as an element of $N$. The contradiction is obtained with $F(K,K) = 1-F(K,K)$.

    Let's see some concrete examples.

        #1. Real numbers are uncountable


    Let $I = N = \mathbb{N}$. Assume there exists a function $F : I \times N \rightarrow \{0,1\}$ such that for every real number $x \in [0,1]$, there exists a code $i \in I$ such that $F(i,n)$ is the $n$-th bit of $x$ in its binary expansion.

    We now build a member $K \in I$ from the diagonal of $F$ by $F(K,n) = 1-F(n,n)$, i.e., the real number encoded by K is the one whose $n$-th bit is $1-F(n,n)$. But then the $K$-th bit of $K$ is $F(K,K) = 1 - F(K,K)$. Whence a contradiction.

        #2. The Halting problem

    Let $I$ be the set of codes of Turing machines. Assume that there exists a machine $F$ which solves the Halting problem, i.e., for any $(i,n)$, $F$ terminates on  $(i,n)$, and $F(i,n) = 1$ if the $i$-th machine terminates on input $n$, and $0$ otherwise. Again, we can think of $F$ as an array with rows indexed by Turing machines, and columns indexed by all possible inputs.

    We now build a machine $K \in I$ from the diagonal of $F$. On the input $n$ (a code for a Turing machine), the machine $K$ returns (any value is ok) if $F(n,n) = 0$ and runs forever otherwise. Since $K \in I$, the value $F(K,K)$ is defined. But, $K$ does not terminate on input $K$ iff $F(K,K) = 0$ iff $K$ terminates on input $K$. Whence a contradiction. Note that here the expression ``$F(K,n) = 1 - F(n,n)$'' from the introduction is to be understood as: $K$ does not terminate on input $n$ if $F(n,n)$ returns $1$.

        #3. The Time Hierarachy theorem

    Let $I$ be the set of (codes of) Turing machines. I am merely restating the proof on wikipedia. Let $f : \mathbb{N} \rightarrow \mathbb{N}$ be a time constructible function. Consider the following problem
    $$
      H = \{ (i,n),~\text{the machine } i\in I \text{ accepts } n \text{ within } f(|n|) \text{ steps}\}
    $$
    Here, ``accept'' means returning $1$. It is not difficult to see that $H \in \mathsf{DTIME}(f(m)^3)$ where $m$ denotes the size of $(i,n)$. The goal is to prove that $H$ does not belong to $\mathsf{DTIME}(f(m/2))$. Assume it is the case. Then, there exists a machine $F$ which terminates on each input $(i,n)$ within $f(m/2)$ steps such that $F(i,n) = 1$ if $(i,n) \in H$  or $0$ otherwise.

    We now build a machine $K$ from the diagonal of $F$. The machine $K$ accepts the input $i$ if $F$ rejects $(i,i)$, and rejects it otherwise. Let $s$ be the size of $K$, and $m \simeq 2s+1$ be the size of $(K,K)$. The function $F$ takes about $f(m/2) \simeq f(s)$ steps to compute $F(K,K)$. In particular, $K$ terminates within $f(s)$ steps on input $K$. Now, $K$ accepts input $K$ (which it does within $f(s)$ steps) iff $F$ rejects input $(K,K)$ iff $K$ does not accept $K$ within $f(s)$ steps iff $K$ rejects $K$ (which it does within $f(s)$ steps). Whence a contradiction. Here, the expression ``$F(K,n) = 1 - F(n,n)$'' is to be understood as: $K$ rejects (accepts) $n$ if $F$ accepts (rejects) $(n,n)$.

    pb

    Sunday, June 28, 2015

    Ranking function for multiset combinations

    Consider a totally ordered finite alphabet $A$. Fix a vector $d = (d_s)_{s \in A}$ of non-negative integers with $n = \sum d_s$. In this post, I would like to present a somewhat simple ranking function for the set $S(A^n,d)$ of vectors $v  \in A^n$ satisfying
    $$
         d_s = \text{ number of occurrences of } s \text{ in } v
    $$
    Intuitively, if $A= \{0, \dots, k-1\}$, then $S(A^n,d)$ represents all the ways to choose $d_0$ elements of type $0$, $d_1$ elements of type $1$, etc. such that $d_0 + \dots + d_{k-1} = n$. The size of $S(A^n,d)$ is the multinomial coefficient
    $$
        |S(A^n,d)| = \binom{n}{(d_s)_{s \in A}}
    $$

    By ranking function, I mean an explicit bijection
    $$
         r(A^n,d) : S(A^n,d) \rightarrow \{0, \dots, |S(A^n,d)| - 1\}
    $$

    1. Case $|A| = 2$

    Without loss of generality, we assume $A = \{0,1\}$. Let $d = (b,a) = (n-a, a)$. Our goal is to rank the set of vectors $v = (v_0,\dots,v_{n-1}) \in \{0,1\}^n$ having exactly $a$ entries set to $1$ (the others being set to $0$). There are exactly $\binom{n}{a}$ such vectors.

    To see what is going on, we will try to build progressively a vector $v \in S(A^n,b,a)$. Let's focus, say, on the last entry $v_{n-1}$. The trick stems from the following observation:
    • (a) If $v_{n-1} = 1$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a-1)$, i.e., a vector of dimension $n-1$ containing exactly $a-1$ entries set to $1$.
    • (b) If $v_{n-1} = 0$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a)$, i.e., a vector of dimension $n-1$ containing exactly $a$ entries set to $1$.
     We see that we can define the ranking function by induction (on the dimension $n$). Since there are exactly $\binom{n-1}{a-1}$ vectors of type (a), we can reserve the ranks $0,\dots,\binom{n-1}{a-1}-1$ for them, whereas the remaining ranks are reserved for the vectors of type (b). The induction formula is thus:
    $$
      r(A^n,b,a)(v) = \left\{
                                      \begin{array}{lr}
                                          r(A^{n-1},b,a-1)(v') & \text{if } v_{n-1} = 1 \\
                                          \binom{n-1}{a-1} + r(A^{n-1},b,a)(v')    & \text{otherwise}
                                      \end{array}
                                   \right.
    $$
    Note that, if we had used any 2-element alphabet instead of $\{0,1\}$, the above formula would remain the same.

    2. General case

    We now consider an arbitrary alphabet size. Let $v \in S(A^n,d)$ and $s_* = \min A$. We can associate with $v$ the two following vectors. First, the vector $u \in \{0,1\}^n$ obtained from $v$ by replacing every entry containing $s_*$ with a $0$, and the others with a $1$. Second, the vector $w \in (A - \{s_*\})^{n - d_{s_*}}$ obtained from $v$ by removing all the entries containing $s_*$ and concatenating the remaining parts (orderly). For instance, if $v = (0,2,1,2,0,1,1,0,2)$ on the alphabet $A = \{0,1,2\}$ then
    $u = (0,1,1,1,0,1,1,0,1)$ and $w = (2,1,2,1,1,2)$. Note that we can easily rebuild $v$ from the pair $(u,w)$.

    Again, we build the ranking function by induction. The trick stems from the following observation. For each $v \in S(A^n, d)$
    $$
       u \in S(\{0,1\}^n, d_{s_*}, n-d_{s_*})\\
       w \in S((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})
    $$

    To build $v$, these two vectors can be chosen independently. In particular, once $w$ is fixed, there remain exactly $\binom{n}{d_{s_*}}$ possible choices for $v$. Therefore, we can define
    $$
        r(A^n,d)(v) = r(D_*)(w) \cdot \binom{n}{d_{s_*}} + r(D_0)(u)
    $$
    where $D_* = ((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})$ and $D_0 = (\{0,1\}^n, d_{s_*}, n-d_{s_*})$. Note that the rank $r(D_0)(u)$ corresponds to the case of a $2$-element alphabet.

    3. Conclusion

    The inductive definitions given above cannot be used directly in real code because the recursive calls will blow the stack. However, it is not difficult to encode these definitions using basic loops.

    pb