Open Problem: First-Order Regret Bounds for Contextual Bandits

Alekh Agarwal; Akshay Krishnamurthy; John Langford; Haipeng Luo; Robert E. Schapire

Open Problem: First-Order Regret Bounds for Contextual Bandits

Alekh Agarwal ,
Akshay Krishnamurthy ,
John Langford ,
Haipeng Luo ,
Robert E. Schapire

Conference on Learning Theory | July 2017

Published by PMLR

Download BibTex

We describe two open problems related to first order regret bounds for contextual bandits. The first asks for an algorithm with a regret bound of \(O^\tilde(L^*\sqrt{K\ln N})\) where there are \(K\) actions, \(N\) policies, and \(L^*\) is the cumulative loss of the best policy. The second asks for an optimization-oracle-efficient algorithm with regret \(O^\tilde({L^*}^{2/3}poly(K,\ln(N/\delta)))\). We describe some positive results, such as an inefficient algorithm for the second problem, and some partial negative results.