De Bruijn Sequence and Universal Cycle Constructions

Random Generation Output
Algorithm
Order $n$
Symbols $k$
Random generation

A DB sequence is in one to one correspondence with an Euler cycle in the de Bruijn graph. Each Euler cycle corresponds to a unique spanning in-tree (arborescence) with respect to some arbitrary root. Thus, a DB sequence can be generated uniformly at random as follows (see Fleury's Euler cycle algorithm).

Random DB sequence (uniform)
  1. Generate a random arborescence $T$ rooted arbitrarily at $0^{n-1}$ in the de Bruijn graph $G(n{-}1,k)$, and randomly order its $k$ outgoing edges
  2. Make each edge of $T$ (the bridges) the last edge on the adjacency list of the corresponding vertex, randomly assign the order of the remaining $k{-}1$ outgoing edges
  3. Starting from vertex $0^{n-1}$ in $G(n{-}1,k)$, traverse edges in a DFS manner by visiting the first unused edge in the current vertex's adjacency list

For Eulerian graphs, an arborescence can be generated uniformly at random by performing a simple random backwards walk until every vertex is visited. The first time a vertex is visited, the edge is recorded as a tree edge in the random arborescence [KMUW96]. The running time of the above algorithm depends on the cover time of the random walk, which is the number of steps it takes to visit every vertex; it has exponential time delay and requires exponential space. The following tables (for $k=2$) illustrate the minimum, maximum, and average ratios of the cover time to total edges ($2^n$) applying this method over 100,000 iterations for $n\leq 15$ and over 10,000 iterations for $16 \leq n \leq 24$.

          

If an application does not require a sequence generated uniformly at random, an algorithm which applies a Burrows-Wheeler Transform can be applied; it outputs each DB sequence with positive probability [LP24]. The algorithm requires exponential space and has an exponent time delay with respect to the order $n$, but produces the sequence in $\alpha(n)$-amortized time per symbol for $k=2$, where $\alpha(n)$ is the inverse Ackerman function. The first estimate on the mean discrepancy of DB sequences is obtained using this algorithm [LP24].

References

  • [KMUW96] D. Kandel, Y. Matias, R. Unger, and P. Winkler. Shuffling biological sequences. Discrete Applied Mathematics, 71(1):171–185, 1996.

  • [LP24] Z. Lipták and L. Parmigiani. A BWT-Based algorithm for random de Bruijn sequence construction. LATIN 2024: Theoretical Informatics, 130-145, 2024.