Greedy Construction | Output | ||||||||
|
Perhaps the most surprising DB sequence constructions are the greedy constructions. A greedy construction starts with a seed string, then repeatedly applies some greedy rule to determine the next symbol of a sequence. The algorithm stops when it is impossible to add another symbol without creating a duplicate substring of length $n$, or some termination condition is reached. Such constructions, however, have a major drawback: they require exponential space.
The Prefer-largest [Mar34] (smallest) greedy approach does not use any information about the string currently generated to prioritize the next symbol. There are two approaches whose greedy decision is based on the last symbol generated: the Prefer-opposite [Alh10] and the Prefer-same [EGGR58,Fre82,ASS19]. These greedy approaches can be extended when the preferences are based more generally on the last $j$ symbols generated [Alh12], and in fact they are all special cases of an Euler cycle construction based on the de Bruijn graph.
The most well-known greedy construction is the Prefer-largest [Mar34], which generates the lexicographically largest DB sequence.
Note the importance of the string $0^{n-1}$ to seed the algorithm. Starting with any other seed will not produce a DB sequence using this greedy approach.
The Prefer-smallest is equivalent to the Prefer-largest construction when the symbol 0 is replaced by $k{-}1$, 1 is replaced by $k{-}2$ and so on. The Prefer-smallest greedy algorithm produces the lexicographically smallest DB sequence, and it can be constructed more efficiently by either a shift rule or by a concatenation approach.
The original presentations of a binary Prefer-same construction [EGGR58,Fre82] required an additional counting test for certain substrings. However, when seeded with an appropriate string, as described in the related Euler cycle algorithms, these tests are not required [ASS19].
The resulting sequence is the lexicographically largest with respect to a run-length encoding. Note that the this holds even when the roles of the 0 and 1 are interchanged. For example, the run-length encoding of the generated de Bruin sequence for $n=5$ is given by: $$00000111110001001100101110110101 = 5531222113121111.$$ The binary Prefer-same sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule. To generalize this greedy construction to a larger alphabet of size $k$:
A strict Prefer-opposite approach does not yield a DB sequence. However, as described in the section on de Bruijn graphs, by modifying the preference for the special vertex $1^{n-1}$ in the underlying Euler cycle algorithm, we indeed obtain a DB sequence. The algorithm presented below produces a shifted instance of the originally presented sequence in [Alh10]. Here, the initial seed of $0^{n-1}$ is rotated to the end so the resulting sequence is the lexicographically smallest with respect to a run-length encoding [ASS19].
The binary Prefer-opposite sequence can also be constructed efficiently in $O(n)$ time per bit via a shift rule. To generalize this construction to a larger alphabet size $k$:
These special case strings all occur in the final $kn$ symbols of the resulting sequence. Try it out above!