{"id":709156,"date":"2020-12-07T07:55:00","date_gmt":"2020-12-07T15:55:00","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=709156"},"modified":"2021-06-24T14:20:48","modified_gmt":"2021-06-24T21:20:48","slug":"research-collection-reinforcement-learning-at-microsoft","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/research-collection-reinforcement-learning-at-microsoft\/","title":{"rendered":"Research Collection \u2013 Reinforcement Learning at Microsoft"},"content":{"rendered":"\n
Reinforcement learning is about agents taking information from the world and learning a policy for interacting with it, so that they perform better. So, you can imagine a future where, every time you type on the keyboard, the keyboard learns to understand you better. Or every time you interact with some website, it understands better what your preferences are, so the world just starts working better and better at interacting with people.<\/em><\/p>John Langford, Partner Research Manager, MSR NYC<\/cite><\/blockquote>\n\n\n\n
Fundamentally, reinforcement learning (RL) is an approach to machine learning in which a software agent interacts with its environment, receives rewards, and chooses actions that will maximize those rewards. Research on reinforcement learning goes back many decades and is rooted in work in many different fields, including animal psychology, and some of its basic concepts were explored in the earliest research on artificial intelligence \u2013 such as Marvin Minsky\u2019s 1951 SNARC machine, which used an ancestor of modern reinforcement learning techniques to simulate a rat solving a maze.<\/p>\n\n\n\n
In the 1990s and 2000s, theoretical and practical work in reinforcement learning began to accelerate, leading to the rapid progress we see today. The theory behind reinforcement learning continues to advance, while its applications in real-world scenarios are leading to meaningful impact in many areas \u2013 from training autonomous systems to operate more safely and reliably in real-world environments, to making games more engaging and entertaining, to delivering more personalized information and experiences on the web.<\/p>\n\n\n\n
Below is a timeline of advances that researchers and their collaborators across Microsoft have made in reinforcement learning, along with key milestones<\/em> in the field generally.<\/p>\n\n\n\n
Foundational work in reinforcement learning (1992-2014)<\/h3>\n\n\n\n
- In 1992, this paper (opens in new tab)<\/span><\/a> and its Reinforce algorithm were instrumental in the development of policy optimization algorithms.<\/em><\/li>
- This 1995 paper (opens in new tab)<\/span><\/a> (and a later journal version (opens in new tab)<\/span><\/a>) presented a novel approach to solving the \u201cmultiarmed bandit problem\u201d without making any statistical assumptions about the distribution of payoffs.<\/em><\/li>
- This 1998 paper (opens in new tab)<\/span><\/a> (and a later journal ve (opens in new tab)<\/span><\/a>r (opens in new tab)<\/span><\/a>sion (opens in new tab)<\/span><\/a>) show how to learn optimal behavior in solving Markov Decision Processes generally.<\/em><\/li>
- This 2002 paper (opens in new tab)<\/span><\/a> showed the first conditions under which learning to improve a policy locally achieves optimal policies.<\/em><\/li>
- In 2007, bandits that are generalized to use features and context are named contextual bandits (opens in new tab)<\/span><\/a>.<\/em><\/li>
- Also in 2007, the first public version of Vowpal Wabbit (opens in new tab)<\/span><\/a> is released, offering fast, efficient and flexible online machine learning techniques, as well as other machine learning approaches. John Langford and several of his colleagues on this project later join Microsoft Research to continue their work.<\/em><\/li>
- Microsoft researcher John Langford (opens in new tab)<\/span><\/a> presents a tutorial (opens in new tab)<\/span><\/a> on interactive learning at the Neural Information Processing Systems conference. (NIPS 2013)<\/em><\/li>
- In 2014, Richard Sutton and Andrew Barto publish Reinforcement Learning: An Introduction (opens in new tab)<\/span><\/a>, recounting work in the field that began in the late 1970s.<\/em><\/li><\/ul>\n\n\n\t
\n\t\t<\/span>\n\t\t\n\t\n\t
- \n\t\t\t
\n\t\t\t\t\t\t\t\t\n\t\t\t2016\t\t\t<\/div>\n\t\t\t\t\t\t\t\t<\/div>\n\t\t\n\nWork begins on Project Malmo<\/h3>\n\n\n\n