{"id":170006,"date":"2008-11-23T19:01:32","date_gmt":"2008-11-23T19:01:32","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/project\/multi-armed-bandits\/"},"modified":"2021-11-11T17:24:18","modified_gmt":"2021-11-12T01:24:18","slug":"multi-armed-bandits","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/multi-armed-bandits\/","title":{"rendered":"Multi-Armed Bandits"},"content":{"rendered":"
This is an umbrella project for several related efforts at Microsoft Research Silicon Valley that address various Multi-Armed Bandit (MAB) formulations motivated by web search and ad placement. The MAB problem is a classical paradigm in Machine Learning in which an online algorithm chooses from a set of strategies in a sequence of trials so as to maximize the total payoff of the chosen strategies.<\/div>\n

<\/p>\n

\n

This page is inactive since the closure of MSR-SVC in September 2014.
\n<\/em><\/p>\n<\/div>\n

\n

The name “multi-armed bandits” comes from a\u00a0whimsical scenario in which a gambler faces\u00a0several slot machines, a.k.a. “one-armed bandits”, that look identical at first\u00a0but\u00a0produce different expected\u00a0winnings. The crucial issue here is the trade-off between acquiring new information (exploration<\/em>) and capitalizing on the information available so far (exploitation<\/em>). While the MAB problems have been studied extensively in Machine Learning, Operations Research and Economics, many exciting questions\u00a0are open. One aspect that\u00a0we are particularly interested in concerns\u00a0modeling and\u00a0efficiently using various types of side information that\u00a0may be\u00a0available to the algorithm.<\/p>\n

Contact<\/u>: Alex Slivkins<\/a>.<\/p>\n

Research directions<\/h1>\n