Cutdown  Output  

For some applications, a universal cycle of arbitrary length $1 \leq m \leq k^n$ is required such that every $k$ary string of length $n$ appears at most once. For instance, consider trying to apply the de Bruijn Card Trick on a complete deck of 52 cards. We call such cycles cutdown DB sequences. Such a sequence for $m=52$ is: $$0000011110011100011011010011000010111010110010100010.$$ As a smaller example, the following graph illustrates cutdown DB sequences $10$ (blue) and $1101100001$ (red) of length 2 and 10, respectively.
Such universal cycles are known to exist for all $m$ and $k$ [Lem71,Yoe62], however only a few efficient constructions are known. A successor rule based on cycle joining requires $O(n)$ per symbol using $O(n)$ space [Etz86,CGS22]. Algorithms based on LFSRs, requiring a unique primitive polynomial for each $n$, are also known [Lan00,Gol17]. Both approaches are described below.
In general, cutdown DB sequences may have a poor balance of 0s to 1s; for instance, the length $m=19$ cutdown DB sequence 0000110001011010010 has twelve 0s and only seven 1s. A balanced cutdown DB sequence (called a $P_L^{(k)}$ sequence in [NW22]), has the additional property that every $k$ary string of length $j\leq m$ appears either $\lfloor \frac{m}{k^j} \rfloor$ or $\lceil \frac{m}{k^j} \rceil$ times as a substring.
The following is a balanced cutdown DB sequence for $m=52$ and $k=2$: $$0110001111101001101101011110011100000101100100101000.$$ There are 26 0s and 26 1s. Every length 2 substring occurs $52/4 = 13$ times, every length 3 substring occurs either $\lfloor 52/8 \rfloor = 6$ or $\lceil 52/8 \rceil = 7$ times, every length 4 substring occurs three or four times, and every length 5 substring occurs one or two times.
One downside to the algorithms presented below (except for the modified binary successor): There is no efficient way to test a priori whether or not a specific length$n$ string is found on the cutdown DB sequence about to be generated.
Applying Lempel's lift to construct balanced cutdown DB sequences
A balanced cutdown DB sequence of length $m$ over an alphabet of size $k$ can be recursively constructed applying Lempel's lift $\mathcal{L}(\beta)$ [NW22]. Let $A_n$ denote a length $n$ substring of the cycle $012\cdots (k{}1)$. For $n=7$ and $k=3$, such substrings are $\{0120120, 1201201, 2012012\}$.
 If $(m < k)$ then return $012\cdots(m{}1)$
 $\beta \gets $ Balance($\lfloor m/k \rfloor,k$)
 Let $\alpha$ be the result of joining the cycles in $\mathcal{L}(\beta)$ together one at a time on shared substrings of the form $x A_{n2}$ or $A_{n2}x$; if neither exist, join safely on a shared substring of the form $A_{n2}$, where $n$ is the smallest integer such that $m \leq n^k$
 return $\alpha$ with the substring $x^{n1}$ replaced with $x^n$ for each $x \in \{1,2, \ldots, (m \bmod k)\}$
Consider $m=47$ and $k=3$, which implies $n=4$. Recursively, $\beta = $ Balance(15,3) = 012021121001022. Then
$\mathcal{L}(\beta) = $ { 010020101112210, 121101212220021, 202212020001102 }.
The first two cycles can be joined on the shared substrings 101 to yield 101112210010020101212220021121, which in turn can be joined with the third cycle on 011 to obtain
$\alpha = $ 011122100100201012122200211211011022022120200.
Finally, the two substrings 111 and 222 are replaced with 1111 and 2222 to obtain the length 47 balanced cutdown DB sequence: $$ 01111221001002010121222200211211011022022120200.$$
A successor rule via cycle joining
A cyclejoining based construction for $k=2$ can produce an exponential number of cutdown DB sequences [Etz86]. The construction can be simplified to construct a single sequence by applying PCR3alt as the underlying feedback function to join cycles, then generalized for $k\geq 2$ to generate a single cutdown DB sequence that runs in $O(n)$ time per symbol using $O(n)$ space after some initialization [CGS22]. The cyclejoining based construction can be summarized as follows, assuming $k^{n1} < m \leq k^n$.
 Construct a main cycle $C$ that has length $m+j$ where $0 \leq j < n$.
 Cut out at most two small substrings (corresponding to small cycles in the related DB graph) from $C$ whose combined length totals $j$.
By applying a polynomial time unranking algorithm for fixeddensity Lyndon words in the binary case, the successor rule can be adapted to start from any string on the sequence. The successor rule generates each symbol in $O(n)$ time using $O(n)$ space. Determining whether or not a string belongs to the sequence can be computed using $O(n^3)$ operations on $n$ bit numbers [CGS22]. The resulting algorithm is the modified successor rule available for download.
An LFSRbased construction
A simple LFSR based construction, as given by Golomb [Gol17, P.193], is summarized below, assuming $m$ is not a power of 2.
 Create an LFSR with feedback function $f$ based on a primitive polynomial of order $n$ where $2^{n1} < m < 2^n$.
 Seed the LFSR with $\alpha = 0^{n1}1$ and iterate $m$ times to get string $\beta$.
 Simultaneously run two copies of the LFSR, one seeded with $\alpha$ and one with $\beta$, until the registers contain a conjugate pair; the registers match in the last $n{}1$ positions.
 Output the next $m$ bits produced by LFSR that started with $\alpha$.
Consider $m=21$ and $n=5$. The primitive polynomial $1 + x^2 + x^5$ corresponds to the feedback function $f(a_1a_2a_3a_4a_5) = a_1 + a_4$. When the LFSR based on this feedback function is initialized with $\alpha = 00001$ the following bits are produced in the next 21 iterations: $$\underline{00001}~0101110110001111\underline{10011} $$ which yields $\beta = 10011$. Applying the LFSR starting with both $\alpha$ and $\beta$ respectively: $$\mbox{Start with $\alpha$: } 0000\underline{1010}$$ $$\mbox{Start with $\beta$: } 1001\underline{1010}$$ until the final $n{}1=4$ positions match. Continue iterating the LFSR seeded with $\alpha$; the next 21 symbols form the cutdown DB sequence of length $m=21$: $$01010~\underline{111011000111110011010}. $$
This construction requires $O(n)$ space and produces the sequence in $O(n)$amortized time per symbol. However, unlike the first algorithm, this algorithm requires an exponential time delay to produce the first symbol of the sequence; a unique primitive polynomial is required for each order $n$ to obtain the feedback function; and we are unable to determine whether or not a length $n$ string is a member of the constructed cutdown DB sequence without constructing the sequence.
» Binary balanced recursive C program
» $k$ary balanced recursive C program
» Binary successor rule C program
» Modified binary successor rule C program
 [CGS22] B. Cameron, A. Gündoǧan, and J. Sawada. Cutdown de Bruijn sequences. arXiv preprint arXiv:2205.02815, 2022.
 [Etz86] T. Etzion. An algorithm for generating shiftregister cycles. Theoret. Comput. Sci., 44(2):209224, 1986.
 [GHS19] D. Gabric, S. Holub, and J. Shallit. Maximal state complexity and generalized de Bruijn words. Information and Computation, (in press), 2021.
 [Gol17] S. W. Golomb. Shift Register Sequences. World Scientific, Singapore, 2017.
 [Lan00] M. Landsberg. Feedback functions for generating cycles over a finite alphabet. Discrete Mathematics, 219(1):187–194, 2000.
 [Lem71] A. Lempel. $m$ary closed sequences. Journal of Combinatorial Theory, Series A, 10(3):253258, 1971.
 [NW22] A. Nellore and R. Ward. ArbitraryLength Analogs to de Bruijn Sequences. In 33rd Annual Symposium on Combinatorial Pattern Matching (CPM 2022), volume 223, pages 9:1–9:20,2022.
 [Yoe62] M. Yoeli. Binary ring sequences. The American Mathematical Monthly, 69(9):852– 855, 1962.