Abstract: Vanishing and exploding gradients are two of the
main obstacles in training deep neural networks,
especially in capturing long range dependencies
in recurrent neural networks (RNNs). In this paper,
we present an efficient parametrization of
the transition matrix of an RNN that allows us
to stabilize the gradients that arise in its training.
Specifically, we parameterize the transition matrix
by its singular value decomposition (SVD),
which allows us to explicitly track and control
its singular values. We attain efficiency by using
tools that are common in numerical linear
algebra, namely Householder reflectors for representing
the orthogonal matrices that arise in the
SVD. By explicitly controlling the singular values,
our proposed Spectral-RNN method allows
us to easily solve the exploding gradient problem
and we observe that it empirically solves the vanishing
gradient issue to a large extent. We note
that the SVD parameterization can be used for any
rectangular weight matrix, hence it can be easily
extended to any deep neural network, such as a
multi-layer perceptron. Theoretically, we demonstrate
that our parameterization does not lose any
expressive power, and show how it potentially
makes the optimization process easier. Our extensive
experimental results also demonstrate that
the proposed framework converges faster, and has
good generalization, especially in capturing long
range dependencies, as shown on the synthetic
addition and copy tasks, as well as on MNIST and
Penn Tree Bank data sets.
Download: pdf, arXiv version
Citation
- Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization (pdf, arXiv, software)
J. Zhang, Q. Lei, I. Dhillon.
In International Conference on Machine Learning (ICML), pp. 5801–5809, July 2018.
Bibtex: