Project SummaryGraphical models have become an important field of machine learning research. In many applications, the structure of graphical models is unknown and has to be learnt from real data. For example, we can recover the gene-gene network from the given micro-array data, or estimate the interactions between different brain regions using the fMRI brain images. The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering the graph structure even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with the number of parameters scaling quadratically with the number of Gaussian variables. In this work, we developed a new algorithm to estimate the sparse inverse covariance matrix for ultra-large-scale problems (i.e., problems with 1 million nodes).
The structure of Gaussian graphical models is directly connected to the sparsity of its pxp inverse covariance matrix. Estimating the structure of this p by p matrix usually comes at a computational cost of O(p^3) time and O(p^2) memory for solving a non-smooth log-determinant minimization problem, thus for large p both storage and computation become an issue.
In this project, we developed a new algorithm BigQUIC, which can solve a one million dimensional problem in one day using a single machine and 32 cores. We have the following innovations:
- Second-order Newton like method (QUIC)
Most of the existing solvers for this problem are first-order methods that mainly use gradient information at each step, so usually achieve at most linear rates of convergence. To achieve super-linear convergence rate, we developed a quadratic approximation method, QUIC, for solving the sparse inverse covariance estimation problem. At each iteration, we built a second order Taylor expansion on the smooth part, and solve the resulting subproblem using coordinate descent method.
- A divide-and-conquer procedure (DC-QUIC)
To speed up the training for large-scale problems, a natural strategy is to use a divide-and-conquer strategy. Given a partition of the set of nodes, we solve the sparse inverse covariance estimation problem over the subproblems individually, and then aggregate them together to get the global solution . The main issue of applying the divide and conquer strategy is the way to divide the whole problem with a suitable partition. Based on the theoretical justification, we proposed a normalized-cut spectral clustering algorithm to minimize the error.
- A memory-efficient scheme using block coordinate descent (BigQUIC)
To scale our algorithm to one million dimensional data, we have to avoid the quadratic memory requirement. To address this, we carry out the coordinate descent computations in a blockwise manner, and by selecting blocks very carefully using an automated clustering scheme, we not only leverage sparsity of the iterates, but help cache computations suitably. We also reduce the computation of log-determinant function to linear equation solving using the Schur decomposition. The resulting algorithm, BigQUIC, can solve problems with 1 million nodes in one day using a single machine.
- The extension to recover other structures (Quic&Dirty)
We extended our approach to solve dirty statistical models, where the underlying graph can have several different structures mixed together (e.g., low-rank+sparse or sparse+block-sparse). In addition to graph recovery, Quic&Dirty can be applied to a wide range of applications including robust PCA and multi-task learning.
- QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models (pdf)
C. Hsieh, I. Dhillon, P. Ravikumar, S. Becker, P. Olsen.
In Neural Information Processing Systems (NIPS), pp. 2006–2014, December 2014.
- QUIC: Quadratic Approximation for Sparse Inverse Covariance Matrix Estimation (pdf, software)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar.
Journal of Machine Learning Research (JMLR), October 2014.
- BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables (pdf, software)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack.
In Neural Information Processing Systems (NIPS), December 2013. (Oral)
- A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation (pdf)
C. Hsieh, I. Dhillon, P. Ravikumar, A. Banerjee.
In Neural Information Processing Systems (NIPS), pp. 2339-2347, December 2012.
- Sparse Inverse Covariance Matrix Estimation using Quadratic Approximation (pdf)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar.
In Neural Information Processing Systems (NIPS), December 2011.