Abstract: Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm (Yen et al. 2016), which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an Accelerated Dual Decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.
- Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition (pdf, software)
J. Zhang, I. Yen, P. Ravikumar, I. Dhillon.
In International Conference on Artificial Intelligence and Statistics (AISTATS), April 2017.