Collective Noise Contrastive Estimation for Policy Transfer Learning
- Weinan Zhang ,
- Ulrich Paquet ,
- Katja Hofmann
Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016) |
Published by AAAI - Association for the Advancement of Artificial Intelligence
We address the problem of learning behaviour policies to optimise online metrics from heterogeneous usage data. While online metrics, e.g., click-through rate, can be optimised effectively using exploration data, such data is costly to collect in practice, as it temporarily degrades the user experience. Leveraging related data sources to improve online performance would be extremely valuable, but is not possible using current approaches.
We formulate this task as a policy transfer learning problem, and propose a first solution, called collective noise contrastive estimation (collective NCE). NCE is an efficient solution to approximating the gradient of a log-softmax objective. Our approach jointly optimises embeddings of heterogeneous data to transfer knowledge from the source domain to the target domain. We demonstrate the effectiveness of our approach by learning an effective policy for an online radio station jointly from user-generated playlists, and usage data collected in an exploration bucket.
Copyright (c) 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.