De Bruijn Sequence and Universal Cycle Constructions

Binary Shift Rule Output
Algorithm
Order $n$
Constructing DB sequences using binary shift rules

Let $\mathbf{B}(n)$ denote the set of binary strings of length $n$. A function $f : \mathbf{B}(n) \rightarrow \{0,1\}$ is a DB-successor if there exists a DB sequence $\mathcal{D}$ such that each string $\alpha = a_1a_2\cdots a_n \in \mathbf{B}(n)$ is followed by the symbol $f(\alpha)$ in $\mathcal{D}$. In the best case, such a successor will run in $O(n)$ time per symbol generated using $O(n)$ space. Each such DB-successor is based on an underlying feedback function $g$ which partitions the de Bruijn graph $G(n{-}1,k)$ into a factor of small cycles. Then, special conditions are determined for joining the small cycles together. This approach is similar to Hierholzer's algorithm for generating an Euler cycle. Generally, the simpler the feedback function the simpler the conditions for a related DB-successor. The following are among the simplest feedback functions where $\oplus$ denote addition mod 2:

  1. Pure Cycling Register (PCR):   $g(\alpha) = a_1$
  2. Complementing Cycling Register (CCR):   $g(\alpha) = a_1 \oplus 1 = \overline{a_1}$
  3. Pure Run-length Register (PRR):   $g(\alpha) = a_1 \oplus a_2 \oplus a_n$

DB-successors based on the PCR

Using the PCR as the underlying feedback function to initially partition $\mathbf{B}(n)$ into smaller cycles, the following four simple DB-successors are obtained [GSWW18]. These rules test whether or not a string is a necklace, which is a string that is lexicographically smallest amongst all its rotations. This can be done in $O(n)$ time. The shift rule for the Granddaddy [Fre72] corresponds to the lexicographically smallest binary DB sequence that can also be constructed by the Prefer-smallest greedy approach and a concatenation construction. The Grandmama [DHS+18] can also be constructed via a concatenation construction.

PCR1 Granddaddy DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} \overline{a}_1 &\ \ \mbox{if $\color{blue}{a_ja_{j+1}\cdots a_n 0 1^{j-2}}$ is a necklace;}\\ {a_1} \ &\ \ \mbox{otherwise,}\end{array} \right.$

where $j$ is the smallest index of $\alpha$ such that $a_j = 0$ and $j > 1$, or $j=n{+}1$ if no such index exists.

PCR2 Grandmama DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} \overline{a}_1 &\ \ \mbox{if $\color{blue}{0^{n-j}1a_2\cdots a_j}$ is a necklace;}\\ {a_1} \ &\ \ \mbox{otherwise,}\end{array} \right.$

where $j$ is the largest index of $\alpha$ such that $a_j = 1$, or $j=0$ if no such index exists.

The following two successors are perhaps the simplest to state DB-successors. The PCR3 DB-successor [SWW16b] is a special case of a more generic DB-successor [JFB91].

PCR3 DB-successor

      $ f(\alpha) = \left\{ \begin{array}{ll} \overline{a}_1 &\ \ \mbox{if $\color{blue}{a_2a_3\cdots a_n1}$ is a necklace;}\\ {a_1} \ &\ \ \mbox{otherwise.}\end{array} \right.$

PCR4 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} \overline{a}_1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ is a necklace;}\\ {a_1} \ &\ \ \mbox{otherwise.}\end{array} \right.$

DB-successors based on the CCR

Using the CCR as the underlying feedback function, the following three simple DB-successors are obtained [GSWW18]. A string $\gamma$ is a co-necklace if $\gamma \overline{\gamma}$ is a necklace.

CCR1 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 &\ \ \mbox{if $\color{blue}{a_ja_{j+1}\cdots a_n 1 0^{j-2}}$ is a co-necklace;}\\ \overline{a}_1 \ &\ \ \mbox{otherwise.}\end{array} \right.$

where $j$ is the smallest index of $a_2a_3\cdots a_n$ such that $a_j = 0$, or $j=n{+}1$ if no such index exists.

CCR2 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 &\ \ \mbox{if $\color{blue}{0^{n-j}1\overline{a}_2 \overline{a}_3\cdots \overline{a}_{j}}$ is a co-necklace;}\\ \overline{a}_1 \ &\ \ \mbox{otherwise.}\end{array} \right.$

where $j$ is the largest index of $\alpha$ such that $a_j = 1$, or $j=n$ if no such index exits

CCR3 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 &\ \ \mbox{if $\color{blue}{a_2 a_3 \cdots a_{n}0}$ is a co-necklace not equal to $0^n$;}\\ \overline{a}_1 \ &\ \ \mbox{otherwise.}\end{array} \right.$

Like PCR3, the CCR3 DB-succesor is an efficient application of the more generic rule by [JFB91]. The DB-successor given by Huang [Hua90] is also based on the CCR, however, it is more complicated. Etzion [Etz87] also presented a DB-successor based on the CCR, but it is also more complex and has yet to be implemented.

DB-successors based on the PRR

By considering the PRR as the underlying feedback function, the underlying structure of the greedy Prefer-same and Prefer-opposite constructions have been unveiled [SSA20]. These sequences are the lexicographically largest and smallest sequences, respectively, with respect to their run-length encoding [ASS20]. Below are two additional constructions related to each sequence. They are used as building blocks to defining the successors. Each of the following six DB-successors can be implemented in $O(n)$ time per symbol generated using $O(n)$ space.

Prefer-same: Consider the following three representatives for equivalence classes induced by the PRR, where a string is said to be special if it begins and ends with 0 and has run-length encoding of the form $(21^{2x})^y1^z, \mbox{where $x \geq 0$, $y\geq 2$, and $z\geq 2$}.$

  • RL-rep: The string with the lexicographically largest run-length encoding; if there are two such strings it is the one beginning with 1.

  • LC-rep: The string $\omega$ such that $t{+}1$ applications of the PRR starting with $\omega$ yields the RL-rep, where $t$ is the number of consecutive 1s at the end of the run-length encoding of the RL-rep.

  • same-rep: $\left\{ \begin{array}{ll} \mbox{RL-rep} &\ \ \mbox{if the RL-rep is special}\\ \mbox{LC-rep}\ &\ \ \mbox{otherwise.}\end{array} \right.$

PRR-RunLength DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is an RL-rep;}\\ a_1 \oplus a_2 \oplus a_n\ &\ \ \mbox{otherwise.}\end{array} \right.$

The following successor can be used to construct the lexicographic composition concatenation construction [FK77].

PRR-LexComp DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is an LC-rep;}\\ a_1 \oplus a_2 \oplus a_n \ &\ \ \mbox{otherwise.}\end{array} \right.$

PRR-PrefSame DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is a same-rep;}\\ a_1 \oplus a_2 \oplus a_n \ &\ \ \mbox{otherwise.}\end{array} \right.$


Prefer-opposite: Consider the following three representatives for equivalence classes induced by the PRR, where a string is said to be special2 if it begins with 10 and has run-length encoding of the form $1x^t y, \mbox{where $t$ is odd and $y>x$}.$

  • RL2-rep: The string with the lexicographically smallest run-length encoding; if there are two such strings it is the one beginning with 0.

  • LC2-rep: The string $\omega$ such that $t$ applications of the PRR starting with $\omega$ yields the RL2-rep, where $t$ is last value in the run-length encoding of the RL2-rep.

  • opposite-rep: $\left\{ \begin{array}{ll} \mbox{RL2-rep} &\ \ \mbox{if the RL2-rep is special2}\\ \mbox{LC2-rep}\ &\ \ \mbox{otherwise.}\end{array} \right.$

PRR-RunLength2 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is an RL2-rep;}\\ a_1 \oplus a_2 \oplus a_n\ &\ \ \mbox{otherwise.}\end{array} \right.$

PRR-LC2 DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is an LC2-rep;}\\ a_1 \oplus a_2 \oplus a_n \ &\ \ \mbox{otherwise.}\end{array} \right.$

PRR-PrefOpposite DB-successor

      $f(\alpha) = \left\{ \begin{array}{ll} a_1 \oplus a_2 \oplus a_n \oplus 1 &\ \ \mbox{if $\color{blue}{0a_2 a_3 \cdots a_{n}}$ or $\color{blue}{1a_2 a_3 \cdots a_{n}}$ is an opposite-rep;}\\ a_1 \oplus a_2 \oplus a_n \ &\ \ \mbox{otherwise.}\end{array} \right.$

References

  • [ASS20] A. Alhakim, E. Sala, and J. Sawada. Revisiting the prefer-same and prefer-opposite de Bruijn sequence constructions. Theoretical Computer Science, 2020.

  • [DHS+18] P. B. Dragon, O. I. Hernandez, J. Sawada, A. Williams, and D. Wong. Constructing de Bruijn sequences with co-lexicographic order: the $k$-ary Grandmama sequence. European J. Combin., 72:1-11, 2018.

  • [Etz87] T. Etzion. Self-dual sequences. J. Combin. Theory Ser. A, 44(2):288-298, 1987.

  • [Fre72] H. Fredricksen. Generation of the Ford sequence of length $2n$; $n$ large. J. Combinatorial Theory Ser. A, 12:153-154, 1972.

  • [FK77] H. Fredricksen and I. Kessler. Lexicographic compositions and de Bruijn sequences. J. Combinatorial Theory Ser. A, 22(1):17-30, 1977.

  • [GSWW18] D. Gabric, J. Sawada, A. Williams, and D. Wong. A framework for constructing de Bruijn sequences via simple successor rules. Discrete Math., 341(11):2977-2987, 2018.

  • [Hua90] Y. J. Huang. A new algorithm for the generation of binary de Bruijn sequences. J. Algorithms, 11(1):44-51, 1990.

  • [JFB91] C. J. A. Jansen, W. G. Franx, and D. E. Boekee. An efficient algorithm for the generation of de Bruijn cycles. IEEE Trans. Inform. Theory, 37(5):1475-1478, 1991.

  • [SSA20] E. Sala, J. Sawada, A. Alhakim. Efficiently constructing the Prefer same de Bruijn sequence. Submitted 2020.

  • [SWW16b] J. Sawada, A. Williams, and D. Wong. A surprisingly simple de Bruijn sequence construction. Discrete Math., 339(1):127-131, 2016.