De Bruijn Sequence and Universal Cycle Constructions

Cutdown Output
Algorithm
Length $m$
Symbols $k$
Cut-down DB sequences

For some applications, a universal cycle of arbitrary length $1 \leq m \leq k^n$ is required such that every $k$-ary string of length $n$ appears at most once. For instance, consider trying to apply the de Bruijn Card Trick on a complete deck of 52 cards. We call such cycles cut-down DB sequences. Such a sequence for $m=52$ is: $$0000011110011100011011010011000010111010110010100010.$$ As a smaller example, the following graph illustrates cut-down DB sequences $10$ (blue) and $1101100001$ (red) of length 2 and 10, respectively.

                     

Such universal cycles are known to exist for all $m$ and $k$ [Lem71,Yoe62], however only a few efficient constructions are known. A successor rule based on cycle joining requires $O(n)$ per symbol using $O(n)$ space [Etz86,CGS22]. Algorithms based on LFSRs, requiring a unique primitive polynomial for each $n$, are also known [Lan00,Gol17]. Both approaches are described below.

In general, cut-down DB sequences may have a poor balance of 0s to 1s; for instance, the length $m=19$ cut-down DB sequence 0000110001011010010 has twelve 0s and only seven 1s. A balanced cut-down DB sequence (called a $P_L^{(k)}$ sequence in [NW22]), has the additional property that every $k$-ary string of length $j\leq m$ appears either $\lfloor \frac{m}{k^j} \rfloor$ or $\lceil \frac{m}{k^j} \rceil$ times as a substring.

Example

The following is a balanced cut-down DB sequence for $m=52$ and $k=2$: $$0110001111101001101101011110011100000101100100101000.$$ There are 26 0s and 26 1s. Every length 2 substring occurs $52/4 = 13$ times, every length 3 substring occurs either $\lfloor 52/8 \rfloor = 6$ or $\lceil 52/8 \rceil = 7$ times, every length 4 substring occurs three or four times, and every length 5 substring occurs one or two times.

A recursive algorithm (outlined below) based on Lempel's lift produces balanced cut-down DB sequences in $O(1)$-amortized time per symbol; however, it requires exponential space and has an exponential time delay before outputting the first symbol [NW22]. A special case of these sequences when $j=n-1$ was previously known to exist [GHS19].

One downside to the algorithms presented below (except for the modified binary successor): There is no efficient way to test a priori whether or not a specific length-$n$ string is found on the cut-down DB sequence about to be generated.

Applying Lempel's lift to construct balanced cut-down DB sequences

A balanced cut-down DB sequence of length $m$ over an alphabet of size $k$ can be recursively constructed applying Lempel's lift $\mathcal{L}(\beta)$ [NW22]. Let $A_n$ denote a length $n$ substring of the cycle $012\cdots (k{-}1)$. For $n=7$ and $k=3$, such substrings are $\{0120120, 1201201, 2012012\}$.

Balance($m,k$)

  1. If $(m < k)$ then return $012\cdots(m{-}1)$
  2. $\beta \gets $ Balance($\lfloor m/k \rfloor,k$)
  3. Let $\alpha$ be the result of joining the cycles in $\mathcal{L}(\beta)$ together one at a time on shared substrings of the form $x A_{n-2}$ or $A_{n-2}x$; if neither exist, join safely on a shared substring of the form $A_{n-2}$, where $n$ is the smallest integer such that $m \leq n^k$
  4. return $\alpha$ with the substring $x^{n-1}$ replaced with $x^n$ for each $x \in \{1,2, \ldots, (m \bmod k)\}$
Example

Consider $m=47$ and $k=3$, which implies $n=4$. Recursively, $\beta = $ Balance(15,3) = 012021121001022. Then

    $\mathcal{L}(\beta) = $ { 010020101112210, 121101212220021, 202212020001102 }.

The first two cycles can be joined on the shared substrings 101 to yield 101112210010020101212220021121, which in turn can be joined with the third cycle on 011 to obtain

    $\alpha = $ 011122100100201012122200211211011022022120200.

Finally, the two substrings 111 and 222 are replaced with 1111 and 2222 to obtain the length 47 balanced cut-down DB sequence: $$ 01111221001002010121222200211211011022022120200.$$

A successor rule via cycle joining

A cycle-joining based construction for $k=2$ can produce an exponential number of cut-down DB sequences [Etz86]. The construction can be simplified to construct a single sequence by applying PCR3-alt as the underlying feedback function to join cycles, then generalized for $k\geq 2$ to generate a single cut-down DB sequence that runs in $O(n)$ time per symbol using $O(n)$ space after some initialization [CGS22]. The cycle-joining based construction can be summarized as follows, assuming $k^{n-1} < m \leq k^n$.

$k$-ary cut-down algorithm (cycle-join)

  1. Construct a main cycle $C$ that has length $m+j$ where $0 \leq j < n$.
  2. Cut out at most two small substrings (corresponding to small cycles in the related DB graph) from $C$ whose combined length totals $j$.
By starting with the string $0^{n-1}x$, where in most cases $x=(k{-}1)$, and applying the PCR3-alt as the underlying successor, the main cycle can be constructed by cutting out cycles based on their weight and period to obtain a length that meets the bounds. The small cycle(s) can be cut out by modifying the output given by PCR3-alt for at most two strings. The choice of PCR3-alt as the underlying successor is what leads to the simplicity of this construction. This approach has one downside: the context matters when generating the sequence, i.e., it cannot start with an arbitrary string on the sequence.

By applying a polynomial time unranking algorithm for fixed-density Lyndon words in the binary case, the successor rule can be adapted to start from any string on the sequence. The successor rule generates each symbol in $O(n)$ time using $O(n)$ space. Determining whether or not a string belongs to the sequence can be computed using $O(n^3)$ operations on $n$ bit numbers [CGS22]. The resulting algorithm is the modified successor rule available for download.

An LFSR-based construction

A simple LFSR based construction, as given by Golomb [Gol17, P.193], is summarized below, assuming $m$ is not a power of 2.

Binary cut-down algorithm (LFSR)

  1. Create an LFSR with feedback function $f$ based on a primitive polynomial of order $n$ where $2^{n-1} < m < 2^n$.
  2. Seed the LFSR with $\alpha = 0^{n-1}1$ and iterate $m$ times to get string $\beta$.
  3. Simultaneously run two copies of the LFSR, one seeded with $\alpha$ and one with $\beta$, until the registers contain a conjugate pair; the registers match in the last $n{-}1$ positions.
  4. Output the next $m$ bits produced by LFSR that started with $\alpha$.
Example

Consider $m=21$ and $n=5$. The primitive polynomial $1 + x^2 + x^5$ corresponds to the feedback function $f(a_1a_2a_3a_4a_5) = a_1 + a_4$. When the LFSR based on this feedback function is initialized with $\alpha = 00001$ the following bits are produced in the next 21 iterations: $$\underline{00001}~0101110110001111\underline{10011} $$ which yields $\beta = 10011$. Applying the LFSR starting with both $\alpha$ and $\beta$ respectively: $$\mbox{Start with $\alpha$: } 0000\underline{1010}$$ $$\mbox{Start with $\beta$: } 1001\underline{1010}$$ until the final $n{-}1=4$ positions match. Continue iterating the LFSR seeded with $\alpha$; the next 21 symbols form the cut-down DB sequence of length $m=21$: $$01010~\underline{111011000111110011010}. $$

This construction requires $O(n)$ space and produces the sequence in $O(n)$-amortized time per symbol. However, unlike the first algorithm, this algorithm requires an exponential time delay to produce the first symbol of the sequence; a unique primitive polynomial is required for each order $n$ to obtain the feedback function; and we are unable to determine whether or not a length $n$ string is a member of the constructed cut-down DB sequence without constructing the sequence.

References

  • [CGS22] B. Cameron, A. Gündoǧan, and J. Sawada. Cut-down de Bruijn sequences. arXiv preprint arXiv:2205.02815, 2022.

  • [Etz86] T. Etzion. An algorithm for generating shift-register cycles. Theoret. Comput. Sci., 44(2):209-224, 1986.

  • [GHS19] D. Gabric, S. Holub, and J. Shallit. Maximal state complexity and generalized de Bruijn words. Information and Computation, (in press), 2021.

  • [Gol17] S. W. Golomb. Shift Register Sequences. World Scientific, Singapore, 2017.

  • [Lan00] M. Landsberg. Feedback functions for generating cycles over a finite alphabet. Discrete Mathematics, 219(1):187–194, 2000.

  • [Lem71] A. Lempel. $m$-ary closed sequences. Journal of Combinatorial Theory, Series A, 10(3):253-258, 1971.

  • [NW22] A. Nellore and R. Ward. Arbitrary-Length Analogs to de Bruijn Sequences. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022), volume 223, pages 9:1–9:20,2022.

  • [Yoe62] M. Yoeli. Binary ring sequences. The American Mathematical Monthly, 69(9):852– 855, 1962.