Target Transfer Q-learning and Its Convergence Analysis
Reinforcement Learning (RL) technologies are powerful to learn how to interact with environments and have been successfully applied to various important applications. Q-learning is one of the most popular methods in RL, which leverages the Bellman equation to update the Q-function. Considering that data collection in RL is both time and cost consuming and Q-learning converges slowly, different kinds of transfer RL algorithms are designed to improve the sample complexity of the new tasks1. However, most of the previous transfer RL algorithms are similar to the transfer learning methods in deep learning and are heuristic with no theoretical guarantee of the convergence rate. Therefore, it is important for us to clearly understand how and when will transfer learning help RL method and provide the theoretical guarantee for the improvement of the sample complexity. In this paper, we rethink the transfer Rl problems in the RL perspective and propose to transfer the Q-function learned in the old task to the target Q-function in the Q-learning of the new task. We call this new transfer Q-learning method target transfer Q-Learning (abbrev. TTQL). The transfer process is controlled by the error condition which can help to avoid the harm to the new tasks brought by the transferred target. We design the error condition in TTQL as whether the Bellman error of the transferred target Q-function is less than the current Q-function. We show that TTQL with the error condition will achieve a faster convergence rate than Q-learning. Our experiments are consistent with our theoretical results and verify the effectiveness of our proposed target transfer Q-learning method.