Low-Rank Methods for Parallelizing Dynamic Programming Algorithms
This article proposes efficient parallel methods for an important class of dynamic programming problems that includes Viterbi, Needleman-Wunsch, Smith-Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages or wavefronts. The methods presented in this article provide additional parallelism allowing multiple stages to be computed in parallel despite dependencies among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation. This article demonstrates the efficiency of the parallel algorithm by showing significant speedups on a variety of important dynamic programming problems. In particular, the parallel Viterbi decoder is up to 24× faster (with 64 processors) than a highly optimized commercial baseline.