Abstract: It is a known fact that training recurrent neural
networks for tasks that have long term dependencies
is challenging. One of the main reasons
is the vanishing or exploding gradient problem,
which prevents gradient information from propagating
to early layers. In this paper we propose
a simple recurrent architecture, the Fourier Recurrent
Unit (FRU), that stabilizes the gradients
that arise in its training while giving us stronger
expressive power. Specifically, FRU summarizes
the hidden states h^(t)
along the temporal dimension
with Fourier basis functions. This allows
gradients to easily reach any layer due to FRU’s
residual learning structure and the global support
of trigonometric functions. We show that FRU
has gradient lower and upper bounds independent
of temporal dimension. We also show the
strong expressivity of sparse Fourier basis, from
which FRU obtains its strong expressive power.
Our experimental study also demonstrates that
with fewer parameters the proposed architecture
outperforms other recurrent architectures on
many tasks.
- Topics:
- Big Data
- Deep Learning
Download: pdf, arXiv version
Citation
- Learning long term dependencies via Fourier recurrent units (pdf, arXiv, software)
J. Zhang, Y. Lin, Z. Song, I. Dhillon.
In International Conference on Machine Learning (ICML), pp. 5810–5818, July 2018.
Bibtex: