Sunday, June 28, 2015

Ranking function for multiset combinations

Consider a totally ordered finite alphabet $A$. Fix a vector $d = (d_s)_{s \in A}$ of non-negative integers with $n = \sum d_s$. In this post, I would like to present a somewhat simple ranking function for the set $S(A^n,d)$ of vectors $v  \in A^n$ satisfying
$$
     d_s = \text{ number of occurrences of } s \text{ in } v
$$
Intuitively, if $A= \{0, \dots, k-1\}$, then $S(A^n,d)$ represents all the ways to choose $d_0$ elements of type $0$, $d_1$ elements of type $1$, etc. such that $d_0 + \dots + d_{k-1} = n$. The size of $S(A^n,d)$ is the multinomial coefficient
$$
    |S(A^n,d)| = \binom{n}{(d_s)_{s \in A}}
$$

By ranking function, I mean an explicit bijection
$$
     r(A^n,d) : S(A^n,d) \rightarrow \{0, \dots, |S(A^n,d)| - 1\}
$$

1. Case $|A| = 2$

Without loss of generality, we assume $A = \{0,1\}$. Let $d = (b,a) = (n-a, a)$. Our goal is to rank the set of vectors $v = (v_0,\dots,v_{n-1}) \in \{0,1\}^n$ having exactly $a$ entries set to $1$ (the others being set to $0$). There are exactly $\binom{n}{a}$ such vectors.

To see what is going on, we will try to build progressively a vector $v \in S(A^n,b,a)$. Let's focus, say, on the last entry $v_{n-1}$. The trick stems from the following observation:
  • (a) If $v_{n-1} = 1$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a-1)$, i.e., a vector of dimension $n-1$ containing exactly $a-1$ entries set to $1$.
  • (b) If $v_{n-1} = 0$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a)$, i.e., a vector of dimension $n-1$ containing exactly $a$ entries set to $1$.
 We see that we can define the ranking function by induction (on the dimension $n$). Since there are exactly $\binom{n-1}{a-1}$ vectors of type (a), we can reserve the ranks $0,\dots,\binom{n-1}{a-1}-1$ for them, whereas the remaining ranks are reserved for the vectors of type (b). The induction formula is thus:
$$
  r(A^n,b,a)(v) = \left\{
                                  \begin{array}{lr}
                                      r(A^{n-1},b,a-1)(v') & \text{if } v_{n-1} = 1 \\
                                      \binom{n-1}{a-1} + r(A^{n-1},b,a)(v')    & \text{otherwise}
                                  \end{array}
                               \right.
$$
Note that, if we had used any 2-element alphabet instead of $\{0,1\}$, the above formula would remain the same.

2. General case

We now consider an arbitrary alphabet size. Let $v \in S(A^n,d)$ and $s_* = \min A$. We can associate with $v$ the two following vectors. First, the vector $u \in \{0,1\}^n$ obtained from $v$ by replacing every entry containing $s_*$ with a $0$, and the others with a $1$. Second, the vector $w \in (A - \{s_*\})^{n - d_{s_*}}$ obtained from $v$ by removing all the entries containing $s_*$ and concatenating the remaining parts (orderly). For instance, if $v = (0,2,1,2,0,1,1,0,2)$ on the alphabet $A = \{0,1,2\}$ then
$u = (0,1,1,1,0,1,1,0,1)$ and $w = (2,1,2,1,1,2)$. Note that we can easily rebuild $v$ from the pair $(u,w)$.

Again, we build the ranking function by induction. The trick stems from the following observation. For each $v \in S(A^n, d)$
$$
   u \in S(\{0,1\}^n, d_{s_*}, n-d_{s_*})\\
   w \in S((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})
$$

To build $v$, these two vectors can be chosen independently. In particular, once $w$ is fixed, there remain exactly $\binom{n}{d_{s_*}}$ possible choices for $v$. Therefore, we can define
$$
    r(A^n,d)(v) = r(D_*)(w) \cdot \binom{n}{d_{s_*}} + r(D_0)(u)
$$
where $D_* = ((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})$ and $D_0 = (\{0,1\}^n, d_{s_*}, n-d_{s_*})$. Note that the rank $r(D_0)(u)$ corresponds to the case of a $2$-element alphabet.

3. Conclusion

The inductive definitions given above cannot be used directly in real code because the recursive calls will blow the stack. However, it is not difficult to encode these definitions using basic loops.

pb