Investigating Convergence of Restricted Boltzmann Machine Learning

NIPS 2010 Workshop on Deep Learning and Unsupervised Feature Learning |

Restricted Boltzmann Machines are increasingly popular tools for unsupervised learning. They are very general, can cope with missing data and are used to pretrain deep learning machines. RBMs learn a generative model of the data distribution. As exact gradient ascent on the data likelihood is infeasible, typically Markov Chain Monte Carlo approximations to the gradient such as Contrastive Divergence (CD) are used. Even though there are some theoretical insights into this algorithm, it is not guaranteed to converge. Recently it has been observed that after an initial increase in likelihood, the training degrades, if no additional regularization is used. The parameters for regularization however cannot be determined even for medium-sized RBMs. In this work, we investigate the learning behavior of training algorithms by varying minimal set of parameters and show that with relatively simple variants of CD, it is possible to obtain good results even without further regularization. Furthermore, we show that it is not necessary to tune many hyperparameters to obtain a good model – finding a suitable learning rate is sufficient. Fast learning, however, comes with a higher risk of divergence and therefore requires a stopping criterion. For this purpose, we investigate the commonly used Annealed Importance Sampling, an approximation to the true log likelihood of the data and find that it completely fails to discover divergence in certain cases.