On Oracle-Efficient PAC Reinforcement Learning with Rich Observations

Christoph Dann; Nan Jiang; Akshay Krishnamurthy; Alekh Agarwal; John Langford; Robert E. Schapire

On Oracle-Efficient PAC Reinforcement Learning with Rich Observations

Christoph Dann ,
Nan Jiang ,
Akshay Krishnamurthy ,
Alekh Agarwal ,
John Langford ,
Robert E. Schapire

Advances in Neural Information Processing Systems | December 2018

下载 BibTex

We study the computational tractability of provably sample-efficient (PAC) reinforcement learning in episodic environments with rich observations. We present new sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation — accessing policy and value function classes exclusively through standard optimization primitives — and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. In the more general stochastic transition setting, we prove that the only known sample-efficient algorithm, Olive [1], cannot be implemented in our oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.