Large Scale Inverse Covariance Estimation

Large Scale Inverse Covariance Estimation

Project Summary

Graphical models have become an important field of machine learning research. In many applications, the structure of graphical models is unknown and has to be learnt from real data. For example, we can recover the gene-gene network from the given micro-array data, or estimate the interactions between different brain regions using the fMRI brain images. The L1-regularized Gaussian maximum likelihood estimator (MLE) has been shown to have strong statistical guarantees in recovering the graph structure even under high-dimensional settings. However, it requires solving a difficult non-smooth log-determinant program with the number of parameters scaling quadratically with the number of Gaussian variables. In this work, we developed a new algorithm to estimate the sparse inverse covariance matrix for ultra-large-scale problems (i.e., problems with 1 million nodes). The interaction between brain regions can be recovered by our proposed algorithms.

Project Description

The structure of Gaussian graphical models is directly connected to the sparsity of its pxp inverse covariance matrix. Estimating the structure of this p by p matrix usually comes at a computational cost of O(p^3) time and O(p^2) memory for solving a non-smooth log-determinant minimization problem, thus for large p both storage and computation become an issue. logdeterminant

In this project, we developed a new algorithm BigQUIC, which can solve a one million dimensional problem in one day using a single machine and 32 cores. We have the following innovations:

  1. Second-order Newton like method (QUIC)
    Most of the existing solvers for this problem are first-order methods that mainly use gradient information at each step, so usually achieve at most linear rates of convergence. To achieve super-linear convergence rate, we developed a quadratic approximation method, QUIC, for solving the sparse inverse covariance estimation problem. At each iteration, we built a second order Taylor expansion on the smooth part, and solve the resulting subproblem using coordinate descent method.
  2. A divide-and-conquer procedure (DC-QUIC)
    To speed up the training for large-scale problems, a natural strategy is to use a divide-and-conquer strategy. Given a partition of the set of nodes, we solve the sparse inverse covariance estimation problem over the subproblems individually, and then aggregate them together to get the global solution . The main issue of applying the divide and conquer strategy is the way to divide the whole problem with a suitable partition. Based on the theoretical justification, we proposed a normalized-cut spectral clustering algorithm to minimize the error.

    The Divide-and-Conquer procedure for recovering sparse graphs.

    The Divide-and-Conquer procedure for recovering sparse graphs.

  3. A memory-efficient scheme using block coordinate descent (BigQUIC)
    To scale our algorithm to one million dimensional data, we have to avoid the quadratic memory requirement. To address this, we carry out the coordinate descent computations in a blockwise manner, and by selecting blocks very carefully using an automated clustering scheme, we not only leverage sparsity of the iterates, but help cache computations suitably. We also reduce the computation of log-determinant function to linear equation solving using the Schur decomposition. The resulting algorithm, BigQUIC, can solve problems with 1 million nodes in one day using a single machine.

    The comparison of scalability on three types of graph structures. In all the experiments, BIGQUIC can solve larger problems than QUIC even with a single core, and using 32 cores BIGQUIC can solve million dimensional data in one day.

    The comparison of scalability on three types of graph structures. In all the experiments, BIGQUIC
    can solve larger problems than QUIC even with a single core, and using 32 cores BIGQUIC can solve million dimensional data in one day.

  4. Results from BIGQUIC analyses of resting-state fMRI data. Left panel: Map of degree distribution across voxels, thresholded at degree=20. Regions showing high degree were generally found in the gray matter (as expected for truly connected functional regions), with very few high-degree voxels found in the white matter. Right panel: Left-hemisphere surface renderings of two network modules obtained through graph clustering. Top panel shows a sensorimotor network, bottom panel shows medial prefrontal, posterior cingulate, and lateral temporoparietal regions characteristic of the “default mode” generally observed during the resting state. Both of these are commonly observed in analyses of resting state fMRI data.

    Results from BIGQUIC analyses of resting-state fMRI data. Left panel: Map of degree distribution across voxels, thresholded at degree=20. Regions showing high degree were generally found in the gray matter (as expected for truly connected functional regions), with very few high-degree voxels found in the white matter. Right panel: Left-hemisphere surface renderings of two network modules obtained
    through graph clustering. Top panel shows a sensorimotor network, bottom panel shows medial prefrontal, posterior cingulate, and lateral temporoparietal regions characteristic of the “default mode” generally observed during the resting state. Both of these are commonly observed in analyses of resting state fMRI data.

  5. The extension to recover other structures (Quic&Dirty)
    We extended our approach to solve dirty statistical models, where the underlying graph can have several different structures mixed together (e.g., low-rank+sparse or sparse+block-sparse). In addition to graph recovery, Quic&Dirty can be applied to a wide range of applications including robust PCA and multi-task learning.

    Comparison of algorithms on the latent feature GMRF problem using gene expression datasets. Our algorithm (Quic&Dirty) is much faster than other approaches.

    Comparison of algorithms on the latent feature GMRF problem using gene expression
    datasets. Our algorithm (Quic&Dirty) is much faster than other approaches.

 

Main Publications