De Bruijn Sequence and Universal Cycle Constructions

Greedy Construction Output
Algorithm
Order $n$
Symbols $k$
Constructing DB sequences using greedy algorithms

Perhaps the most surprising DB sequence constructions are the greedy constructions. A greedy construction starts with a seed string, then repeatedly applies some greedy rule to determine the next symbol of a sequence. The algorithm stops when it is impossible to add another symbol without creating a duplicate substring of length $n$, or some termination condition is reached. Such constructions, however, have a major drawback: they require exponential space.

The Prefer-largest [Mar34] (smallest) greedy approach does not use any information about the string currently generated to prioritize the next symbol. There are two approaches whose greedy decision is based on the last symbol generated: the Prefer-opposite [Alh10] and the Prefer-same [EGGR58,Fre82,ASS19]. These greedy approaches can be extended when the preferences are based more generally on the last $j$ symbols generated [Alh12], and in fact they are all special cases of an Euler cycle construction based on the de Bruijn graph.

The Prefer-largest construction

The most well-known greedy construction is the Prefer-largest [Mar34], which generates the lexicographically largest DB sequence.

Prefer-largest Greedy Construction
  1. Seed with $0^{n-1}$
  2. Repeat until no new symbol can be added: Append the largest symbol in $\{k{-}1, \ldots , 2,1,0\}$ that does not create a duplicate length $n$ substring
  3. Remove the seed
For example, applying this construction for $n=4$ and $k=2$ we obtain the string:

          000 1111011001010000.

Note the importance of the string $0^{n-1}$ to seed the algorithm. Starting with any other seed will not produce a DB sequence using this greedy approach.

The Prefer-smallest is equivalent to the Prefer-largest construction when the symbol 0 is replaced by $k{-}1$, 1 is replaced by $k{-}2$ and so on. The Prefer-smallest greedy algorithm produces the lexicographically smallest DB sequence, and it can be constructed more efficiently by either a shift rule or by a concatenation approach.

The Prefer-same construction

The original presentations of a binary Prefer-same construction [EGGR58,Fre82] required an additional counting test for certain substrings. However, when seeded with an appropriate string, as described in the related Euler cycle algorithms, these tests are not required [ASS19].

Binary Prefer-same Greedy Construction
  1. Seed with length $n-1$ string $\cdots 10101$
  2. Append 0
  3. Repeat until no new bit is added: Append the same bit as the last if it does not create a duplicate length $n$ substring; otherwise append the opposite bit as the last if it does not create a duplicate length $n$ substring
  4. Remove the seed

The resulting sequence is the lexicographically largest with respect to a run-length encoding. Note that the this holds even when the roles of the 0 and 1 are interchanged. For example, the run-length encoding of the generated de Bruin sequence for $n=5$ is given by: $$00000111110001001100101110110101 = 5531222113121111.$$ The binary Prefer-same sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule. To generalize this greedy construction to a larger alphabet of size $k$:

  1. Let the preference order for the last symbol $s$ be: $s, s{+}1, \ldots, k{-}1, 0, 1, \ldots s{-}1$.
  2. The seed together with the 0 added from Step 2 is the length $n$ string: $\cdots (k{-}1)\cdots 210(k{-}1)\cdots 210$.

The Prefer-opposite construction

A strict Prefer-opposite approach does not yield a DB sequence. However, as described in the section on de Bruijn graphs, by modifying the preference for the special vertex $1^{n-1}$ in the underlying Euler cycle algorithm, we indeed obtain a DB sequence. The algorithm presented below produces a shifted instance of the originally presented sequence in [Alh10]. Here, the initial seed of $0^{n-1}$ is rotated to the end so the resulting sequence is the lexicographically smallest with respect to a run-length encoding [ASS19].

Prefer-opposite Greedy Construction
  1. Seed with $0^{n-1}$
  2. Append 0
  3. Repeat until no new bit is added:
    • If the current suffix is $1^{n-1}$ then: append $1$ if it is the first time $1^{n-1}$ has been seen, and append 0 otherwise.
    • Otherwise: append the opposite bit as the last if it does not create a duplicate length $n$ substring; otherwise append the same bit as the last
  4. Remove the seed

The binary Prefer-opposite sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule. To generalize this construction to a larger alphabet size $k$:

  1. Let the preference order for the last symbol $s$ be: $s{+}1, \ldots, k{-}1, 0, 1, \ldots, s$.
  2. The special case vertices are $1^{n-1}, 2^{n-1}, \ldots, (k{-}1)^{n-1}$, where their preference lists interchange the final two symbols $s{-}1$ and $s$. Thus, the special case is applied only after $k-2$ occurrences of a given special suffix have been visited.

These special case strings all occur in the final $kn$ symbols of the resulting sequence. Try it out above!

Download source code

» C program

References

  • [Alh10] A. Alhakim. A simple combinatorial algorithm for de Bruijn sequences. Amer. Math. Monthly, 117(8):728-732, 2010.

  • [Alh12] A. Alhakim. Spans of preference functions for de Bruijn sequences. Discrete Appl. Math., 160(7-8):992-998, 2012.

  • [ASS20] A. Alhakim, J. Sawada, E. Sala. Revisiting the prefer-same and prefer-opposite de Bruijn sequence construcctions. Theor. Comp. Sci, 2020 (to appear).

  • [EGGR58] C. Eldert, H. Gray, H. Gurk, and M. Rubino . Shifting counters. AIEE Trans., 77:70- 74, 1958.

  • [Fre82] H. Fredricksen. A survey of full length nonlinear shift register cycle algorithms. Siam Review, 24(2):195-221, 1982.

  • [Mar34] M. H. Martin. A problem in arrangements. Bull. Amer. Math. Soc., 40(12):859-864, 1934.