Andrey Kolobov

Principal Research Manager

Approximation Algorithms for Planning Under Uncertainty

ICAPS’13

PDF (opens in new tab) | Video (opens in new tab)

Probabilistic Planning with Markov Decision Processes

Slightly different versions of which were given at the ICAPS’12 Summer School and AAAI’12

Markov Decision Processes (MDPs) are a powerful tool for sequential decision-making under uncertainty, a core subfield of Artificial Intelligence. Their applications include planning advertising campaigns, putting together a scientific agenda for a Mars Rover, playing games, and many others.

This tutorial takes an algorithmic approach to covering MDPs. It starts by describing the basic MDP types: finite, total discounted reward, and stochastic shortest-path MDPs. It then proceeds to techniques for solving them, from the oldest optimal ones (Value Iteration, Policy Iteration), to heuristic search algorithms (LAO*, LRTDP, etc.), to state-of-the-art approximation approaches such as RFF — a special emphasis area of this tutorial. The lecture will conclude by discussing extensions to the established MDP classes and promising research directions.

The tutorial attendees are not expected to have any prior familiarity with MDPs, and will walk away with the knowledge of this field sufficient for conducting research in it.
- Slides (opens in new tab)
Opens in a new tab
1. Introduction and a Theory of MDPs
  - Motivation
  - Common MDP classes with guaranteed solution existence
    
    Finite-Horizon MDPs
    
    Infinite-Horizon discouted-Reward MDPs
    
    Stochastic Shortest-Path MDPs
  - SSPs as the general problem class
  - Complexity of solving MDPs
2. Optimal Solution Algorithms
  - Value Iteration
  - Asynchronous Value Iteration
  - Policy Iteration
  - Modified Policy Iteration
  - Prioritized Value Iteration
    
    Prioritized Sweeping
    
    Improved Prioritized Sweeping
    
    Focused Dynamic Programming
    
    Backward Value Iteration
  - Partitioned Value Iteration
    
    General Framework
    
    Topological Value Iteration
  - LP-based Algorithms
  - Special case of Infinite-Horizon Discounted-Reward MDPs
3. Heuristic Search Algorithms
  - LAO* and extensions (iLAO*, RLAO*, BLAO*)
  - RTDP and extensions (LRTDP, FRTDP, BRTDP, Bayesian RTDP)
  - Focused Topological VI
  - A note on techniques for computing bounds (lower, upper)
  - Heuristic search and SSP MDPs with dead ends
4. Approximation algorithms
  - Factored MDPs as the basis for approximation techniques
  - Determinization-based approaches
    
    FF-Replan
    
    FF-hindsight
    
    RFF
    
    HMDPP
    
    Determinization-based approaches and SSP MDPs with dead ends
  - Sampling-based approaches
    
    UCT
  - Heuristic search with inadmissible heuristics
    
    The FF Heuristic
    
    The GOTH Heuristic
  - Dimensionality Reduction Approaches
    
    ReTrASE
    
    FPG
    
    Approximate Policy Iteration, Approxiate Linear Programming
  - Hierarchical Planning
  - Hybridized Planning
5. Extensions to MDPs/Research Directions
Opens in a new tab

I have also written Glutton (opens in new tab) and Gourmand (opens in new tab), two solvers for large finite-horizon MDPs with high branching factors.

Andrey Kolobov

Tutorials

Approximation Algorithms for Planning Under Uncertainty

Probabilistic Planning with Markov Decision Processes

Contact Andrey Kolobov

AI Frontiers

Microsoft Research Lab – Redmond

Andrey Kolobov

Tutorials

Approximation Algorithms for Planning Under Uncertainty

Probabilistic Planning with Markov Decision Processes

Abstract

Materials

Approximate outline