Active Learning for Sparse Bayesian Multilabel Classification
- Deepak Vasisht ,
- Andreas Damianou ,
- Manik Varma ,
- Ashish Kapoor
Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining |
Published by ACM - Association for Computing Machinery
We study the problem of active learning for multilabel classification. We focus on the real-world scenario where the average number of positive (relevant) labels per data point is small leading to positive label sparsity. Carrying out mutual information based near-optimal active learning in this setting is a challenging task since the computational complexity involved is exponential in the total number of labels. We propose a novel inference algorithm for the sparse Bayesian multilabel model of [17]. The benefit of this alternate inference scheme is that it enables a natural approximation of the mutual information objective. We prove that the approximation leads to an identical solution to the exact optimization problem but at a fraction of the optimization cost. This allows us to carry out ecient, non-myopic, and near-optimal active learning for sparse multilabel classification. Extensive experiments reveal the effectiveness of the method.
© ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version can be found at http://dl.acm.org.