Convergence to Equilibrium of No-regret Dynamics in Congestion Games

  • Volkan Cevher ,
  • ,
  • Leello Dadi ,
  • Jing Dong ,
  • Ioannis Panageas ,
  • Stratis Skoulakis ,
  • Luca Viano ,
  • Baoxiang Wang ,
  • Siwei Wang ,
  • Jingyu Wu

Conference on Web and Internet Economics (WINE) |

Publication

The congestion game is a powerful model that encompasses a range of engineering systems such as traffic networks and resource allocation. It describes the behavior of a group of agents who share a common set of \(F\) facilities and take actions as subsets of \(k\) facilities. In this work, we study the online formulation of congestion games, where agents participate in the game repeatedly and observe feedback with randomness. We note that this paper is the result of the merging of [24] arXiv:2306.13673 and [19] arXiv:2401.09628. In [24], we propose CongestEXP, a decentralized algorithm that is based on the classic exponential weights method. By maintaining weights on the facility level, the regret bound of CongestEXP avoids the exponential dependence on the size of possible facility sets, i.e., \(\binom{F}{k} \approx F^k\), and scales only linearly with \(F\). Specifically, we show that CongestEXP attains a regret upper bound of \(O(kF\sqrt{T})\) for every individual player, where \(T\) is the time horizon. If a strict Nash equilibrium exists, we show that CongestEXP can converge to the strict Nash policy almost exponentially fast in \(O(F exp(−t^{1−α}))\), where \(t\) is the number of iterations and \(\alpha \in (1/2, 1)\). In [19], we present an online learning algorithm in the bandit feedback model that, once adopted by all agents of a congestion game, results in game-dynamics that converge to an ǫ-approximate Nash Equilibrium in a polynomial number of rounds with respect to \(1/\epsilon\), the number of players and the number of available resources. The proposed algorithm also guarantees sublinear regret to any agent adopting it. As a result, our work answers an open question from [17] and extends the recent results of [37] to the bandit feedback model.