Differentiable Feature Selection by Discrete Relaxation

AISTATS |

In this paper, we introduce Differentiable Feature Selection, a gradient-based search algorithm for feature selection. Our approach extends a recent result on the estimation of learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e., in mini-batches) and in linear time and space with respect to both the number of features D and the sample size N. This, along with a discrete-to-continuous relaxation of the search domain, allows for an efficient, gradient-based search algorithm among feature subsets for very large datasets. Our algorithm utilizes higher-order correlations between features and targets for both the N > D and N < D regimes, as opposed to approaches that do not consider such correlations and/or only consider one regime. We provide experimental demonstration of the algorithm in small and large sample- and feature-size settings.