Large Scale Inverse Covariance Estimation | Center for Big Data Analytics

Project Summary

Graphical models have become an important field of machine learning research. In many applications, the structure of graphical models is unknown and has to be learnt from real data. For example, we can recover the gene-gene network from the given micro-array data, or estimate the interactions between different brain regions using the fMRI brain images. The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering the graph structure even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with the number of parameters scaling quadratically with the number of Gaussian variables. In this work, we developed a new algorithm to estimate the sparse inverse covariance matrix for ultra-large-scale problems (i.e., problems with 1 million nodes).

The interaction between brain regions can be recovered by our proposed algorithms.

Project Description

The structure of Gaussian graphical models is directly connected to the sparsity of its pxp inverse covariance matrix. Estimating the structure of this p by p matrix usually comes at a computational cost of O(p^3) time and O(p^2) memory for solving a non-smooth log-determinant minimization problem, thus for large p both storage and computation become an issue.

In this project, we developed a new algorithm BigQUIC, which can solve a one million dimensional problem in one day using a single machine and 32 cores. We have the following innovations:

Second-order Newton like method (QUIC)
Most of the existing solvers for this problem are first-order methods that mainly use gradient information at each step, so usually achieve at most linear rates of convergence. To achieve super-linear convergence rate, we developed a quadratic approximation method, QUIC, for solving the sparse inverse covariance estimation problem. At each iteration, we built a second order Taylor expansion on the smooth part, and solve the resulting subproblem using coordinate descent method.
A divide-and-conquer procedure (DC-QUIC)
To speed up the training for large-scale problems, a natural strategy is to use a divide-and-conquer strategy. Given a partition of the set of nodes, we solve the sparse inverse covariance estimation problem over the subproblems individually, and then aggregate them together to get the global solution . The main issue of applying the divide and conquer strategy is the way to divide the whole problem with a suitable partition. Based on the theoretical justification, we proposed a normalized-cut spectral clustering algorithm to minimize the error.

The Divide-and-Conquer procedure for recovering sparse graphs.
A memory-efficient scheme using block coordinate descent (BigQUIC)
To scale our algorithm to one million dimensional data, we have to avoid the quadratic memory requirement. To address this, we carry out the coordinate descent computations in a blockwise manner, and by selecting blocks very carefully using an automated clustering scheme, we not only leverage sparsity of the iterates, but help cache computations suitably. We also reduce the computation of log-determinant function to linear equation solving using the Schur decomposition. The resulting algorithm, BigQUIC, can solve problems with 1 million nodes in one day using a single machine.

The comparison of scalability on three types of graph structures. In all the experiments, BIGQUIC
can solve larger problems than QUIC even with a single core, and using 32 cores BIGQUIC can solve million dimensional data in one day.
Results from BIGQUIC analyses of resting-state fMRI data. Left panel: Map of degree distribution across voxels, thresholded at degree=20. Regions showing high degree were generally found in the gray matter (as expected for truly connected functional regions), with very few high-degree voxels found in the white matter. Right panel: Left-hemisphere surface renderings of two network modules obtained
through graph clustering. Top panel shows a sensorimotor network, bottom panel shows medial prefrontal, posterior cingulate, and lateral temporoparietal regions characteristic of the “default mode” generally observed during the resting state. Both of these are commonly observed in analyses of resting state fMRI data.
The extension to recover other structures (Quic&Dirty)
We extended our approach to solve dirty statistical models, where the underlying graph can have several different structures mixed together (e.g., low-rank+sparse or sparse+block-sparse). In addition to graph recovery, Quic&Dirty can be applied to a wide range of applications including robust PCA and multi-task learning.

Comparison of algorithms on the latent feature GMRF problem using gene expression
datasets. Our algorithm (Quic&Dirty) is much faster than other approaches.

Main Publications

QUIC & DIRTY: A Quadratic Approximation Approach for Dirty Statistical Models (pdf, software)
C. Hsieh, I. Dhillon, P. Ravikumar, S. Becker, P. Olsen.
In Neural Information Processing Systems (NIPS), pp. 2006–2014, December 2014.

QUIC: Quadratic Approximation for Sparse Inverse Covariance Matrix Estimation (pdf, software)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar.
Journal of Machine Learning Research (JMLR), October 2014.

BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables (pdf, software)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar, R. Poldrack.
In Neural Information Processing Systems (NIPS), December 2013. (Oral)

A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation (pdf, software)
C. Hsieh, I. Dhillon, P. Ravikumar, A. Banerjee.
In Neural Information Processing Systems (NIPS), pp. 2339-2347, December 2012.

Sparse Inverse Covariance Matrix Estimation using Quadratic Approximation (pdf, software)
C. Hsieh, M. Sustik, I. Dhillon, P. Ravikumar.
In Neural Information Processing Systems (NIPS), December 2011.