## Wednesday, October 29, 2014

### Martin-Löf randomness

In this post, I sum up the basic idea of Martin-Löf randomness. Actually, this post is mainly a series of personal notes on the matter. I decided though to publish it here as it may interests someone else. In particular, for the sake of brevity, I rely on informal definitions of computable functions, Turing machines, recursive enumerable sets, etc. The reader should just think about them as any methods which can be translated in a computer program.

1. Statistics background

Let's fix a finite alphabet $X$ of size $d$. We denote by $X^n$ the set of sequences of length $n$ on $X$. We denote by $X^*$ (resp. $X^{\omega}$) the set of finite (resp. infinite) sequences on $X$. As usual, the concatenation of words is denoted by $u \cdot v$, or simply $uv$. The length of a finite word $u$ is denoted by $|u|$. We assume that $X$ is given the uniform probability measure $\mu(x) = d^{-1}$. This measure naturally extends to $X^n$, and $X^{\omega}$. For instance,
$$\begin{equation*} \forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n) \end{equation*}$$
We are interested in finding what it means for an individual sequence in $X^{\omega}$ to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence $S$, the frequency of any symbol $x \in X$ should be close to $\mu(x)$. Put another way, after fixing some arbitrary $k$, we can compute the frequency $freq(x,S \upharpoonright k)$ of the symbol $x$ in the prefix $S \upharpoonright k$ of length $k$, and if we find that this quantity deviates too much from $\mu(x)$, we conclude that the sequence $S$ is not random.

Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "$H_0$: the sequence is random" with the statistics given by the frequency of the symbol $x$ in the prefix of length $k$ of $S$. Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level $\epsilon > 0$, define a function $f(\epsilon)$ and subset $U(\epsilon) \subseteq X^*$ such that, for all $n$,
\begin{align} ~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\ ~&\mu(U(\epsilon)) < \epsilon \end{align}
The function $f$ can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase $\mu(U(\epsilon))$ above $\epsilon$. The equation above means that if $S$ is drawn "randomly", then the probability that the frequency computed from $S \upharpoonright k$ deviates from its expected value by the quantity $f(\epsilon)$ is at most $\epsilon$. In other words, the probability that a randomly selected infinite sequence $S$ is rejected by the test, i.e., the probability that $S \upharpoonright n \in U(\epsilon)$, is bounded by $\epsilon$. Statisticians say that the set $U(\epsilon)$ is the critical region of the test at level $\epsilon$. Actually, a test can be viewed as a subset $U \subseteq \mathbb{R}^+ \times X^*$ with $U(\epsilon) = \{S,~ (\epsilon,S) \in U\}$.

Intuitively, the sequence $S$ is non-random if it is rejected by the test at all levels. Thus $S$ is random if
$$S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega}$$

2. Effective tests

One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length $n$ of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of $\mathbb{R}^+ \times X^*$. The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.

To do so, instead of indexing the confidence level by $\epsilon \in \mathbb{R}^+$ and without loss of generality, we use the levels $\epsilon_m = 2^{-m}$, $m \in \mathbb{N}$. An effective test is then defined as a recursively enumerable subset $U \subseteq \mathbb{N} \times X^*$ such that the subsets $U(m) = \{w \in X^*,~ (m,w) \in U\}$ satisfies, for all $n,m$
\begin{align} ~&U(m) \supseteq U(m+1)\\ ~&\mu(U(m)) < \epsilon_m \end{align}
The major point is the requirement that the set $U$ is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs $(m,w) \in U$, i.e., that is able to detect sequences that fail the test at level $\epsilon_m$.

To be precise, one also requires that, if $(m,w) \in U$, then for all $n \leq m$, $v \sqsupseteq w$, $(n,v) \in U$. This point comes from the fact we want to test randomness for infinite sequences $S \in X^{\omega}$. If $(m, S \upharpoonright k) \in U$, i.e., $S$ is rejected by the test at level $\epsilon_m$, then it is natural to require that any longer prefix $S \upharpoonright l$, $l \geq k$, is also rejected by the test at less constrained levels $\epsilon_n$, $n \leq m$. Intuitively, since $S \upharpoonright l$ comprises the information in $S \upharpoonright k$, and if $S \upharpoonright k$ provides a clue to consider $S$ as being non-random, then $S \upharpoonright l$ should also cause $S$ to be considered non-random.

3. Universal test and random sequences

It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test $U$ that "simulates" every other effective test. More precisely, if $V$ is any other effective test, then there exists a constant $c$, depending only on $U$ and $V$, such that the critical regions of $U$ and $V$ satisfies, for all $m$,
$$V_{m+c} \subseteq U_m$$
Now, let $R$ be the set of random sequences considering the universal test $U$, i.e.
$$R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega}$$
Then, by the universal property of $U$, every $S \in R$ is random relatively to any effective test $V$.

A direct consequence of this definition is that $\mu(R) = 1$ since $\mu(U(m)) \rightarrow 0$. In particular, $R$ is dense in $X^{\omega}$. This means that, for any finite sequence $w \in X^*$, there is a random sequence $S \in R$ that extends $w$, i.e., $w \sqsubset S$. Otherwise, $w\cdot X^{\omega} \cap R = \emptyset$, i.e., $w \cdot X^{\omega}$ is included in the complement of $R$, and thus $0 < \mu(w \cdot X^{\omega}) \leq 0$. In other words, from a measure theoretical point of view, $R$ is large.

Finally, one should notice that the set of random sequences $R$ has been defined in terms of the probability distribution $\mu$. Actually, one could  also take any probability measure on $X^{\omega}$, and define a set $R_{\mu}$ of infinite sequences random with respect to $\mu$. The only restriction is that $\mu$ must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.

pb

1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.