Friday, November 7, 2014

Funny nonsense about Chaitin $\Omega$ and Riemann $\zeta$

This post will be brief. I don't know anything about Riemann $\zeta$, except the basic definition and the million dollar prize which attracts the most terrible bounty hunters mathematicians. I hardly know more about Chaitin $\Omega$. Yet, I found that they look similar in some sense. Maybe, it's completely obvious, and trivial. Maybe, it's related to something slightly deeper. Anyway, it looks funny.

The celebrated Riemann $\zeta(s)$ has the following form, for all $s \in \mathbb{C}$ with real part greater than $1$
$$
  \begin{equation}
    \zeta(s) =  \sum_{n \geq 1} n^{-s}
  \end{equation}
$$
Actually, $\zeta(s)$ can be analytically continued over all $\mathbb{C} - \{1\}$, with a unique pole at $1$.

To define Chaitin $\Omega$, we first have to fix some universal prefix-free partial recursive function $U : \{0,1\}^* \rightarrow \{0,1\}^*$. This simply means that we fix some computer whose legal programs, encoded as binary strings, have some sort of "end of file" marker: no legal program is the prefix of another legal program. In addition, this computer is universal, i.e., is able to simulate every other computer. Denoting by $Dom(U)$ the set of legal programs of $U$, the Chaitin's number is
$$
  \Omega = \sum_{p \in Dom(U)} 2^{-|p|}
$$
where $|p|$ denotes the length of the program $p$. This number can be interpreted as a probability. Indeed, if we flip infinitely many times a fair coin, then $2^{-|p|}$ is the probability that the prefix of length $|p|$ in the generated infinite binary sequence is equal to $p$. Since $U$ is prefix-free, this also represents the probability that $U$ will execute the program $p$ and halt. Hence, $\Omega$ is the probability that $U$ halts when given the generated infinite binary sequence. It can be shown that the binary expansion of $\Omega$ is a Martin-Löf random sequence.

In [1], Tadaki generalizes Chaitin $\Omega$, and define, for all $0 < D < 1$
$$
  \begin{equation}
    \Omega(D) = \sum_{p \in Dom(U)} 2^{-\frac{|p|}{D}}
  \end{equation}
$$

Now, I guess everyone has caught it, but let's put bluntly the (not so) funny part. For all (real) $s > 1$
$$
  \begin{align}
     \zeta(s) &= \sum_{n \geq 1} \left( 2^{\log n}\right)^{-s} \\
     \Omega(s^{-1}) &= \sum_{p \in Dom(U)} \left( 2^{|p|} \right)^{-s}
  \end{align}
$$
Of course, these two quantities are not equal: $\Omega(s^{-1})$ is a sub-sum of $\zeta(s)$ which does not pick any term with $n$ not a power of $2$. We can make them look even more similar by introducing, for every $n$, the function $m(l) = |\{ p \in Dom(U),~ |p| = l\}|$ counting the number of legal programs of length $l$.
$$
  \begin{align}
     \zeta(s) &= \sum_{n \geq 1} n^{-s} \\
     \Omega(s^{-1}) &= \sum_{n \geq 1} m(\log n)  \cdot n^{-s}
  \end{align}
$$
The definition of $\Omega(s^{-1})$ really looks like a form of $\zeta$ regularization of the counting function $m(\log n)$. Actually, we could even define, for any universal (not necessarily prefix-free) function $C$, with $m_C(\log n)$ counting the number of legal programs of $C$ with length $\log n$
$$
  \begin{equation}
    \Omega_C(s^{-1}) = \sum_{n \geq 1} m_C(\log n) \cdot n^{-s}
  \end{equation}
$$
provided that the real part of $s$ is sufficiently large. In particular, because the domain of $C$ is not necessarily prefix-free, the function may diverge when $s \rightarrow 1$.

Question. Does such a regularization give more insight on random strings ?
pb

EDIT: It turns out that Tadaki has already proposed an analogy [2] between $\Omega(s^{-1})$ and partition functions in statistical physics. Since $\zeta$ naturally arises in statistical physics, the above similarity between $\Omega(s^{-1})$ and $\zeta(s)$ is not a surprise.

EDIT: Truly, there is something going on here. I found this paper [3] by Calude, introducing zeta numbers for Turing machines.

[1] Tadaki, Kohtaro. A generalization of Chaitin's halting probability $\Omega$ and halting self-similar sets. Hokkaido Mathematical Journal 31 (2002), no. 1, 219--253. doi:10.14492/hokmj/1350911778. http://projecteuclid.org/euclid.hokmj/1350911778

[2] Tadaki, Kohtaro. A statistical mechanical interpretation of algorithmic information theory. http://arxiv.org/abs/0801.4194

[3] Cristian S. Calude, Michael A. Stay, Natural halting probabilities, partial randomness, and zeta functions, Information and Computation, Volume 204, Issue 11, November 2006, Pages 1718-1739, ISSN 0890-5401, http://dx.doi.org/10.1016/j.ic.2006.07.003.

Wednesday, October 29, 2014

Martin-Löf randomness

In this post, I sum up the basic idea of Martin-Löf randomness. Actually, this post is mainly a series of personal notes on the matter. I decided though to publish it here as it may interests someone else. In particular, for the sake of brevity, I rely on informal definitions of computable functions, Turing machines, recursive enumerable sets, etc. The reader should just think about them as any methods which can be translated in a computer program.

    1. Statistics background

Let's fix a finite alphabet $X$ of size $d$. We denote by $X^n$ the set of sequences of length $n$ on $X$. We denote by $X^*$ (resp. $X^{\omega}$) the set of finite (resp. infinite) sequences on $X$. As usual, the concatenation of words is denoted by $u \cdot v$, or simply $uv$. The length of a finite word $u$ is denoted by $|u|$. We assume that $X$ is given the uniform probability measure $\mu(x) = d^{-1}$. This measure naturally extends to $X^n$, and $X^{\omega}$. For instance,
$$
  \begin{equation*}
    \forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n)
  \end{equation*}
$$
We are interested in finding what it means for an individual sequence in $X^{\omega}$ to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence $S$, the frequency of any symbol $x \in X$ should be close to $\mu(x)$. Put another way, after fixing some arbitrary $k$, we can compute the frequency $freq(x,S \upharpoonright k)$ of the symbol $x$ in the prefix $S \upharpoonright k$ of length $k$, and if we find that this quantity deviates too much from $\mu(x)$, we conclude that the sequence $S$ is not random.

Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "$H_0$: the sequence is random" with the statistics given by the frequency of the symbol $x$ in the prefix of length $k$ of $S$. Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level $\epsilon > 0$, define a function $f(\epsilon)$ and subset $U(\epsilon) \subseteq X^*$ such that, for all $n$,
$$
  \begin{align}
    ~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\
    ~&\mu(U(\epsilon)) < \epsilon
  \end{align}
$$
The function $f$ can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase $\mu(U(\epsilon))$ above $\epsilon$. The equation above means that if $S$ is drawn "randomly", then the probability that the frequency computed from $S \upharpoonright k$ deviates from its expected value by the quantity $f(\epsilon)$ is at most $\epsilon$. In other words, the probability that a randomly selected infinite sequence $S$ is rejected by the test, i.e., the probability that $S \upharpoonright n \in U(\epsilon)$, is bounded by $\epsilon$. Statisticians say that the set $U(\epsilon)$ is the critical region of the test at level $\epsilon$. Actually, a test can be viewed as a subset $U \subseteq \mathbb{R}^+ \times X^*$ with $U(\epsilon) = \{S,~ (\epsilon,S) \in U\}$.

Intuitively, the sequence $S$ is non-random if it is rejected by the test at all levels. Thus $S$ is random if
$$
  \begin{equation}
    S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega}
  \end{equation}
$$

    2. Effective tests

One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length $n$ of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of $\mathbb{R}^+ \times X^*$. The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.

To do so, instead of indexing the confidence level by $\epsilon \in \mathbb{R}^+$ and without loss of generality, we use the levels $\epsilon_m = 2^{-m}$, $m \in \mathbb{N}$. An effective test is then defined as a recursively enumerable subset $U \subseteq \mathbb{N} \times X^*$ such that the subsets $U(m) = \{w \in X^*,~ (m,w) \in U\}$ satisfies, for all $n,m$
$$
  \begin{align}
    ~&U(m) \supseteq U(m+1)\\
    ~&\mu(U(m)) < \epsilon_m
  \end{align}
$$
The major point is the requirement that the set $U$ is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs $(m,w) \in U$, i.e., that is able to detect sequences that fail the test at level $\epsilon_m$.

To be precise, one also requires that, if $(m,w) \in U$, then for all $n \leq m$, $v \sqsupseteq w$, $(n,v) \in U$. This point comes from the fact we want to test randomness for infinite sequences $S \in X^{\omega}$. If $(m, S \upharpoonright k) \in U$, i.e., $S$ is rejected by the test at level $\epsilon_m$, then it is natural to require that any longer prefix $S \upharpoonright l$, $l \geq k$, is also rejected by the test at less constrained levels $\epsilon_n$, $n \leq m$. Intuitively, since $S \upharpoonright l$ comprises the information in $S \upharpoonright k$, and if $S \upharpoonright k$ provides a clue to consider $S$ as being non-random, then $S \upharpoonright l$ should also cause $S$ to be considered non-random.

  3. Universal test and random sequences

It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test $U$ that "simulates" every other effective test. More precisely, if $V$ is any other effective test, then there exists a constant $c$, depending only on $U$ and $V$, such that the critical regions of $U$ and $V$ satisfies, for all $m$,
$$
  \begin{equation}
    V_{m+c} \subseteq U_m
  \end{equation}
$$
Now, let $R$ be the set of random sequences considering the universal test $U$, i.e.
$$
  \begin{equation}
    R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega}
  \end{equation}
$$
Then, by the universal property of $U$, every $S \in R$ is random relatively to any effective test $V$.

A direct consequence of this definition is that $\mu(R) = 1$ since $\mu(U(m)) \rightarrow 0$. In particular, $R$ is dense in $X^{\omega}$. This means that, for any finite sequence $w \in X^*$, there is a random sequence $S \in R$ that extends $w$, i.e., $w \sqsubset S$. Otherwise, $w\cdot X^{\omega} \cap R = \emptyset$, i.e., $w \cdot X^{\omega}$ is included in the complement of $R$, and thus $0 < \mu(w \cdot X^{\omega}) \leq 0$. In other words, from a measure theoretical point of view, $R$ is large.

Finally, one should notice that the set of random sequences $R$ has been defined in terms of the probability distribution $\mu$. Actually, one could  also take any probability measure on $X^{\omega}$, and define a set $R_{\mu}$ of infinite sequences random with respect to $\mu$. The only restriction is that $\mu$ must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.

pb


1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.