Portrait of Andrey Kolobov

Andrey Kolobov

Principal Research Manager

Tutorials

Approximation Algorithms for Planning Under Uncertainty

ICAPS’13


Probabilistic Planning with Markov Decision Processes

Slightly different versions of which were given at the ICAPS’12 Summer School and AAAI’12

  • Markov Decision Processes (MDPs) are a powerful tool for sequential decision-making under uncertainty, a core subfield of Artificial Intelligence. Their applications include planning advertising campaigns, putting together a scientific agenda for a Mars Rover, playing games, and many others.

    This tutorial takes an algorithmic approach to covering MDPs. It starts by describing the basic MDP types: finite, total discounted reward, and stochastic shortest-path MDPs. It then proceeds to techniques for solving them, from the oldest optimal ones (Value Iteration, Policy Iteration), to heuristic search algorithms (LAO*, LRTDP, etc.), to state-of-the-art approximation approaches such as RFF — a special emphasis area of this tutorial. The lecture will conclude by discussing extensions to the established MDP classes and promising research directions.

    The tutorial attendees are not expected to have any prior familiarity with MDPs, and will walk away with the knowledge of this field sufficient for conducting research in it.

    1. Introduction and a Theory of MDPs
      • Motivation
      • Common MDP classes with guaranteed solution existence
        • Finite-Horizon MDPs
        • Infinite-Horizon discouted-Reward MDPs
        • Stochastic Shortest-Path MDPs
      • SSPs as the general problem class
      • Complexity of solving MDPs
    2. Optimal Solution Algorithms
      • Value Iteration
      • Asynchronous Value Iteration
      • Policy Iteration
      • Modified Policy Iteration
      • Prioritized Value Iteration
        • Prioritized Sweeping
        • Improved Prioritized Sweeping
        • Focused Dynamic Programming
        • Backward Value Iteration
      • Partitioned Value Iteration
        • General Framework
        • Topological Value Iteration
      • LP-based Algorithms
      • Special case of Infinite-Horizon Discounted-Reward MDPs
    3. Heuristic Search Algorithms
      • LAO* and extensions (iLAO*, RLAO*, BLAO*)
      • RTDP and extensions (LRTDP, FRTDP, BRTDP, Bayesian RTDP)
      • Focused Topological VI
      • A note on techniques for computing bounds (lower, upper)
      • Heuristic search and SSP MDPs with dead ends
    4. Approximation algorithms
      • Factored MDPs as the basis for approximation techniques
      • Determinization-based approaches
        • FF-Replan
        • FF-hindsight
        • RFF
        • HMDPP
        • Determinization-based approaches and SSP MDPs with dead ends
      • Sampling-based approaches
        • UCT
      • Heuristic search with inadmissible heuristics
        • The FF Heuristic
        • The GOTH Heuristic
      • Dimensionality Reduction Approaches
        • ReTrASE
        • FPG
        • Approximate Policy Iteration, Approxiate Linear Programming
      • Hierarchical Planning
      • Hybridized Planning
    5. Extensions to MDPs/Research Directions

I have also written Glutton (opens in new tab) and Gourmand (opens in new tab), two solvers for large finite-horizon MDPs with high branching factors.