Combinatorial Pure Exploration of Multi-Armed Bandits

Shouyuan Chen; Tian Lin; Irwin King; Michael R. Lyu; Wei Chen

Combinatorial Pure Exploration of Multi-Armed Bandits

Shouyuan Chen ,
Tian Lin ,
Irwin King ,
Michael R. Lyu ,
Wei Chen

Proceedings of the 28th Annual Conference on Advances in Neural Information Processing Systems (NIPS'2014) | December 2014

The PDF download link contains some bug fixes to the original paper.

下载 BibTex

We study the combinatorial pure exploration (CPE) problem in the stochastic multi-armed bandit setting, where a learner explores a set of arms with the objective of identifying the optimal member of a decision class, which is a collection of subsets of arms with certain combinatorial structures such as size-K subsets, matchings, spanningtrees or paths, etc. The CPE problem represents a rich class of pure exploration tasks which covers not only many existing models but also novel cases where the object of interest has a nontrivial combinatorial structure. In this paper, we provide a series of results for the general CPE problem. We present general learning algorithms which work for all decision classes that admit ofﬂine maximization oracles in both ﬁxed conﬁdence and ﬁxed budget settings. We prove problem-dependent upper bounds of our algorithms. Our analysis exploits the combinatorial structures of the decision classes and introduces a new analytic tool. We also establish a general problem-dependent lower bound for the CPE problem. Our results show that the proposed algorithms achieve the optimal sample complexity (within logarithmic factors) for many decision classes. In addition, applying our results back to the problems of top-K arms identiﬁcation and multiple bandit best arms identiﬁcation, we recover the best available upper bounds up to constant factors and partially resolve a conjecture on the lower bounds.