De Bruijn Sequence and Universal Cycle Constructions

Multisets Output
Algorithm
Order $n$
Multiset size $k$
Multisets

The number of $k$-multisets of $[n]$ = $\{1,2,\ldots, n\}$ is ${n+k-1 \choose k}$. UCs for the $k$-multisets of $[n]$ have been studied by considering a standard string representation, where for example, either 112, 121, or 211 can be used to represent the multiset $\{1,1,2\}$. This representation has a major drawback; UCs exist only if $n$ divides ${n+k-1 \choose k}$ [HGZ09]. Jackson first studied the existence of such sequences when $k=3$ and $k=4$ [Jac93].

Another method to represent a $k$-multiset is with a frequency map, which is a length-$n$ string $f_1f_2\cdots f_n$ where each $f_i$ represents the number of occurrences of $i$ in the multiset. Note, $\sum_{i=1}^n f_i = k$. It is a simple exercise to demonstrate that UCs for $k$-multisets do not exist using this representation for $n,k \geq 2$. However, observe that the final symbol $f_n$ in a frequency map is redundant: its value can be determined from the previous $n-1$ values, $f_n = k - \sum_{i=1}^{n-1} f_i$. Thus, we say $f_1\cdots f_{n-1}$ is a shorthand frequency representation for a multiset of $[n]$. UCs always exist and can be efficiently constructed using this representation [CJS25].

Like with $k$-subsets, a difference representation can be used for $k$-multisets: The first symbol is the smallest element in the multiset minus one, each successive symbol in the string is the difference of the next element in the non-decreasing multiset and the previous element. UCs always exist and can be efficiently constructed using this representation [CJS25].

These three representations are illustrated in the following table:

     

UCs for $n=3$ and $k=3$ are given below for the shorthand frequency and difference representations:

  • 0011021203 (shorthand frequency)
  • 0001011002 (difference)
UCs for $k$-multisets have applications related to proximity sensor networks [CWLW24]. If the number of occurrences of each element in a $k$-multiset is at most $t$, then such $t$-restricted multisets have been studied in [BG11].

Shorthand frequency representation

Applying the shorthand frequency representation, each $k$-multiset can be uniquely represented by a string over $\{\tt{0,1},\ldots, \tt{k}\}$ of length $n-1$, where the weight of each string is bounded above by $k$. A UC for this set of representatives corresponds to a DB sequence with bounded weight. Eight $O(n)$-time successor-rule constructions are provided in [GSWW20] based on the PCR1-PCR4 $k$-ary DB sequence constructions. Concatenation approaches for each algorithm are also given in [SSTW24], where each symbol can be generated in $O(1)$-amortized time. An MSR-based construction is also provided in [CJS25].

We provide two constructions above that have interesting concatenation properties. The generalized Grandmama construction visits necklaces of length $m$ with weight at most $w$ as they appear in colex order. The MSR-based construction visits necklaces of length $m+1$ with weight exactly $w$ as they appear in reverse colex order. Examples below are for $k$-multisets where $n=4$ and $k=4$: $$ \mbox{Grandmama} = \tt{0 \cdot 001 \cdot 011 \cdot 1 \cdot 021 \cdot 031 \cdot 002 \cdot 012 \cdot 112 \cdot 022 \cdot 003 \cdot 013 \cdot 004},$$ $$ \mbox{MSR-based} = \tt{0004 \cdot 0013 \cdot 0103 \cdot 0022 \cdot 0112 \cdot 02 \cdot 0031\cdot 0121 \cdot 0211 \cdot 1.}~~~~~ $$

Difference representation

By applying the difference representation, the set of $k$-multisets corresponds to all strings over the alphabet $\{\tt{0,1,\ldots, n{-}1}\}$ of length $k$, where the weight of each string is bounded above by $n-1$. Like the shorthand frequency representation, a UC for $k$-multisets using the difference representation is equivalent to a DB sequence with bounded weight [CJS25]. Examples below are for $k$-multisets where $n=4$ and $k=4$: $$ \mbox{Grandmama} = \tt{0 \cdot 0001 \cdot 01 \cdot 0011 \cdot 0111 \cdot 0021 \cdot 0002 \cdot 0102 \cdot 0012 \cdot 0003}, $$ $$ \mbox{MSR-based} = \tt{00003 \cdot 00012 \cdot 00102 \cdot 00021 \cdot 00111 \cdot 01011 \cdot 00201.}~~~~~$$

References

  • [BG11] A. Blanca and A. P. Godbole. On universal cycles for new classes of combinatorial structures. SIAM Journal on Discrete Mathematics, 25(4):1832–1842, 2011.

  • [CJS25] C. Campbell, L. Janik-Jones, and J. Sawada. Universal cycles for $k$-subsets and $k$-multisets of an $n$-set, submitted manuscript, 2025.

  • [GSWW20] D. Gabric, J. Sawada, A. Williams, and D. Wong. A successor rule framework for constructing $k$-ary de Bruijn sequences and universal cycles. IEEE Transactions on Information Theory, 66(1):679–687, 2020.

  • [HGZ09] G. Hurlbert, T. Johnson, and J. Zahl. On universal cycles for multisets. Discrete mathematics, 309(17):5321–5327, 2009.

  • [Jac93] B. W. Jackson. Universal cycles of $k$-subsets and $k$-permutations. Discrete mathematics, 117(1):141–150, 1993.

  • [SSTW24] J. Sawada, J. Sears, A. Trautrim, and A. Williams. Concatenation trees: A framework for efficient universal cycle and de Bruijn sequence constructions. arXiv preprint arXiv:2308.12405, 2024.