Cutdown | Output | ||||||||
|
For some applications, a universal cycle of arbitrary length $1 \leq m \leq k^n$ is required such that every $k$-ary string of length $n$ appears at most once. For instance, consider trying to apply the de Bruijn Card Trick on a complete deck of 52 cards. We call such cycles cut-down DB sequences. Such a sequence for $m=52$ is: $$0000011110011100011011010011000010111010110010100010.$$ As a smaller example, the following graph illustrates cut-down DB sequences $10$ (blue) and $1101100001$ (red) of length 2 and 10, respectively.
Such universal cycles are known to exist for all $m$ and $k$ [Lem71,Yoe62], however only a few efficient constructions are known. A successor rule based on cycle joining requires $O(n)$ per symbol using $O(n)$ space [Etz86,CGS22]. Algorithms based on LFSRs, requiring a unique primitive polynomial for each $n$, are also known [Lan00,Gol17]. Both approaches are described below.
In general, cut-down DB sequences may have a poor balance of 0s to 1s; for instance, the length $m=19$ cut-down DB sequence 0000110001011010010 has twelve 0s and only seven 1s. A balanced cut-down DB sequence (called a $P_L^{(k)}$ sequence in [NW22]), has the additional property that every $k$-ary string of length $j\leq m$ appears either $\lfloor \frac{m}{k^j} \rfloor$ or $\lceil \frac{m}{k^j} \rceil$ times as a substring.
The following is a balanced cut-down DB sequence for $m=52$ and $k=2$: $$0110001111101001101101011110011100000101100100101000.$$ There are 26 0s and 26 1s. Every length 2 substring occurs $52/4 = 13$ times, every length 3 substring occurs either $\lfloor 52/8 \rfloor = 6$ or $\lceil 52/8 \rceil = 7$ times, every length 4 substring occurs three or four times, and every length 5 substring occurs one or two times.
One downside to the algorithms presented below (except for the modified binary successor): There is no efficient way to test a priori whether or not a specific length-$n$ string is found on the cut-down DB sequence about to be generated.
Applying Lempel's lift to construct balanced cut-down DB sequences
A balanced cut-down DB sequence of length $m$ over an alphabet of size $k$ can be recursively constructed applying Lempel's lift $\mathcal{L}(\beta)$ [NW22]. Let $A_n$ denote a length $n$ substring of the cycle $012\cdots (k{-}1)$. For $n=7$ and $k=3$, such substrings are $\{0120120, 1201201, 2012012\}$.
- If $(m < k)$ then return $012\cdots(m{-}1)$
- $\beta \gets $ Balance($\lfloor m/k \rfloor,k$)
- Let $\alpha$ be the result of joining the cycles in $\mathcal{L}(\beta)$ together one at a time on shared substrings of the form $x A_{n-2}$ or $A_{n-2}x$; if neither exist, join safely on a shared substring of the form $A_{n-2}$, where $n$ is the smallest integer such that $m \leq n^k$
- return $\alpha$ with the substring $x^{n-1}$ replaced with $x^n$ for each $x \in \{1,2, \ldots, (m \bmod k)\}$
Consider $m=47$ and $k=3$, which implies $n=4$. Recursively, $\beta = $ Balance(15,3) = 012021121001022. Then
$\mathcal{L}(\beta) = $ { 010020101112210, 121101212220021, 202212020001102 }.
The first two cycles can be joined on the shared substrings 101 to yield 101112210010020101212220021121, which in turn can be joined with the third cycle on 011 to obtain
$\alpha = $ 011122100100201012122200211211011022022120200.
Finally, the two substrings 111 and 222 are replaced with 1111 and 2222 to obtain the length 47 balanced cut-down DB sequence: $$ 01111221001002010121222200211211011022022120200.$$
A successor rule via cycle joining
A cycle-joining based construction for $k=2$ can produce an exponential number of cut-down DB sequences [Etz86]. The construction can be simplified to construct a single sequence by applying PCR3-alt as the underlying feedback function to join cycles, then generalized for $k\geq 2$ to generate a single cut-down DB sequence that runs in $O(n)$ time per symbol using $O(n)$ space after some initialization [CGS22]. The cycle-joining based construction can be summarized as follows, assuming $k^{n-1} < m \leq k^n$.
- Construct a main cycle $C$ that has length $m+j$ where $0 \leq j < n$.
- Cut out at most two small substrings (corresponding to small cycles in the related DB graph) from $C$ whose combined length totals $j$.
By applying a polynomial time unranking algorithm for fixed-density Lyndon words in the binary case, the successor rule can be adapted to start from any string on the sequence. The successor rule generates each symbol in $O(n)$ time using $O(n)$ space. Determining whether or not a string belongs to the sequence can be computed using $O(n^3)$ operations on $n$ bit numbers [CGS22]. The resulting algorithm is the modified successor rule available for download.
An LFSR-based construction
A simple LFSR based construction, as given by Golomb [Gol17, P.193], is summarized below, assuming $m$ is not a power of 2.
- Create an LFSR with feedback function $f$ based on a primitive polynomial of order $n$ where $2^{n-1} < m < 2^n$.
- Seed the LFSR with $\alpha = 0^{n-1}1$ and iterate $m$ times to get string $\beta$.
- Simultaneously run two copies of the LFSR, one seeded with $\alpha$ and one with $\beta$, until the registers contain a conjugate pair; the registers match in the last $n{-}1$ positions.
- Output the next $m$ bits produced by LFSR that started with $\alpha$.
Consider $m=21$ and $n=5$. The primitive polynomial $1 + x^2 + x^5$ corresponds to the feedback function $f(a_1a_2a_3a_4a_5) = a_1 + a_4$. When the LFSR based on this feedback function is initialized with $\alpha = 00001$ the following bits are produced in the next 21 iterations: $$\underline{00001}~0101110110001111\underline{10011} $$ which yields $\beta = 10011$. Applying the LFSR starting with both $\alpha$ and $\beta$ respectively: $$\mbox{Start with $\alpha$: } 0000\underline{1010}$$ $$\mbox{Start with $\beta$: } 1001\underline{1010}$$ until the final $n{-}1=4$ positions match. Continue iterating the LFSR seeded with $\alpha$; the next 21 symbols form the cut-down DB sequence of length $m=21$: $$01010~\underline{111011000111110011010}. $$
This construction requires $O(n)$ space and produces the sequence in $O(n)$-amortized time per symbol. However, unlike the first algorithm, this algorithm requires an exponential time delay to produce the first symbol of the sequence; a unique primitive polynomial is required for each order $n$ to obtain the feedback function; and we are unable to determine whether or not a length $n$ string is a member of the constructed cut-down DB sequence without constructing the sequence.
» Binary balanced recursive C program
» $k$-ary balanced recursive C program
» Binary successor rule C program
» Modified binary successor rule C program
- [CGS22] B. Cameron, A. Gündoǧan, and J. Sawada. Cut-down de Bruijn sequences. arXiv preprint arXiv:2205.02815, 2022.
- [Etz86] T. Etzion. An algorithm for generating shift-register cycles. Theoret. Comput. Sci., 44(2):209-224, 1986.
- [GHS19] D. Gabric, S. Holub, and J. Shallit. Maximal state complexity and generalized de Bruijn words. Information and Computation, (in press), 2021.
- [Gol17] S. W. Golomb. Shift Register Sequences. World Scientific, Singapore, 2017.
- [Lan00] M. Landsberg. Feedback functions for generating cycles over a finite alphabet. Discrete Mathematics, 219(1):187–194, 2000.
- [Lem71] A. Lempel. $m$-ary closed sequences. Journal of Combinatorial Theory, Series A, 10(3):253-258, 1971.
- [NW22] A. Nellore and R. Ward. Arbitrary-Length Analogs to de Bruijn Sequences. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022), volume 223, pages 9:1–9:20,2022.
- [Yoe62] M. Yoeli. Binary ring sequences. The American Mathematical Monthly, 69(9):852– 855, 1962.