De Bruijn Sequence and Universal Cycle Constructions

Greedy Construction Output
Algorithm
Order $n$
Alphabet size $k$
Constructing DB sequences using greedy algorithms

Perhaps the most surprising DB sequence constructions are the greedy constructions. A greedy construction starts with a seed string, then repeatedly applies some greedy rule to determine the next symbol of a sequence. The algorithm stops when it is impossible to add another symbol without creating a duplicate substring of length $n$, or some termination condition is reached. Such constructions, however, have a major drawback: they require exponential space.

The Prefer-largest [Mar34] (smallest) greedy approach does not use any information about the string currently generated to prioritize the next symbol. There are two approaches whose greedy decision is based on the last symbol generated: the Prefer-opposite [Alh10] and the Prefer-same [EGGR58,Fre82,ASS19]. These greedy approaches can extended when the preferences are based more generally on the last $j$ symbols generated [Alh12].

The Prefer-largest construction

The most well-known greedy construction is the Prefer-largest attributed to Martin [Mar34] which works as follows:

Prefer-largest Greedy Construction
  1. Seed with $0^{n-1}$
  2. Repeat until no new symbol can be added: Append the largest symbol in $\{k{-}1, \ldots , 2,1,0\}$ that does not create a duplicate length $n$ substring
  3. Remove the seed
For example, applying this construction for $n=4$ and $k=2$ we obtain the string:

          000 1111011001010000.

Note the importance of the string $0^{n-1}$ to seed the algorithm. Starting with any other seed will not produce a DB sequence using this greedy approach.

The Prefer-smallest is equivalent to the Prefer-largest construction when the symbol 0 is replaced by $k{-}1$, 1 is replaced by $k{-}2$ and so on. The Prefer-smallest greedy algorithm produces the lexicographically smallest DB sequence, and it can be constructed more efficiently by either a shift rule or by a concatenation approach. This sequence is also known as the Granddaddy.

The Prefer-same construction

The original presentations of a binary Prefer-same construction (in the appendix of [EGGR58,Fre82]) required an additional counting test for certain substrings. However, when seeded with an appropriate string, these tests are not required [ASS19].

Binary Prefer-same Greedy Construction
  1. Seed with length $n-1$ string $\cdots 01010$
  2. Append 1
  3. Repeat until no new bit is added: Append the same bit as the last if it does not create a duplicate length $n$ substring; otherwise append the opposite bit as the last if it does not create a duplicate length $n$ substring
  4. Remove the seed

The resulting sequence has the interesting property that it is the lexicographically largest with respect to a run-length encoding. Note that the this holds even when the roles of the 0 and 1 are interchanged. The Prefer-same sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule (coming soon). This construction can be generalized to a larger alphabet of size $k$ by letting the preference order for the last symbol $s$ be: $s, s{+}1, \ldots, k{-}1, 0, 1, \ldots s{-}1$. The seed becomes the length $n-1$ string $\cdots (k{-}1)\cdots 210(k{-}1)\cdots 210$.

The Prefer-opposite construction

The Prefer-opposite approach presented below is slightly different than its initial presentation in [Alh10]; here, the initial seed of $0^{n-1}$ is rotated to the end so the resulting sequence is the lexicographically smallest with respect to a run-length endcoding [ASS19].

Binary Prefer-opposite Greedy Construction
  1. Seed with $0^{n-1}$
  2. Append 0
  3. Repeat until the suffix is $1^{n-1}$: Append the opposite bit as the last if it does not create a duplicate length $n$ substring; otherwise append the same bit as the last
  4. Remove the seed
  5. Append $10^{n-1}$

The Prefer-opposite sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule (coming soon). To generalize this construction to a larger alphabet size $k$, a valid preference order for the last sybmol $s$ is: $s{+}1, \ldots, k{-}1, 0, 1, \ldots, s$ where the string $(k{-}1)(k{-}2)^n \cdots 2^n1^n0^{n-1}$ is appended to the end instead of $10^{n-1}$ when the length $n{-}1$ suffix $(k{-1})^{n-1}$ is reached. Try it out above!

Download source code

» C source code

References

  • [Alh10] A. Alhakim. A simple combinatorial algorithm for de Bruijn sequences. Amer. Math. Monthly, 117(8):728-732, 2010.

  • [Alh12] A. Alhakim. Spans of preference functions for de Bruijn sequences. Discrete Appl. Math., 160(7-8):992-998, 2012.

  • [ASS19] A. Alhakim, J. Sawada, E. Sala. Restarting the prefer-same algorithm. Manuscript, 2019.

  • [EGGR58] C. Eldert, H. Gray, H. Gurk, and M. Rubino . Shifting counters. AIEE Trans., 77:70- 74, 1958.

  • [Mar34] M. H. Martin. A problem in arrangements. Bull. Amer. Math. Soc., 40(12):859-864, 1934.

  • [Fre82] H. Fredricksen. A survey of full length nonlinear shift register cycle algorithms. Siam Review, 24(2):195-221, 1982.