{"id":1141975,"date":"2025-06-12T20:16:48","date_gmt":"2025-06-13T03:16:48","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-blog-post&p=1141975"},"modified":"2025-06-13T08:56:20","modified_gmt":"2025-06-13T15:56:20","slug":"maag-a-new-framework-for-consistent-ai-generated-games","status":"publish","type":"msr-blog-post","link":"https:\/\/www.microsoft.com\/en-us\/research\/articles\/maag-a-new-framework-for-consistent-ai-generated-games\/","title":{"rendered":"MaaG: A new framework for consistent AI-generated games"},"content":{"rendered":"\n

World models are a key concept in AI, used to simulate how agents behave in virtual environments and enable immersive, interactive experiences. They\u2019re not only transforming game and media generation, they\u2019re also opening new frontiers for using AI in complex, dynamic settings.<\/p>\n\n\n\n

One emerging trend is generative games, where game environments are created frame by frame using neural networks. Microsoft\u2019s MUSE system, for example, can generate scenes from the game Bleeding Edge<\/em> using deep learning models.<\/p>\n\n\n\n

\"\u5fae\u8f6f\u63d0\u51fa\u7684
Figure 1. Microsoft\u2019s MUSE generates frames from Bleeding Edge using neural networks.<\/figcaption><\/figure>\n\n\n\n

Yet beneath the visual polish, generative games often contain noticeable inconsistencies. Background elements may disappear or shift abruptly after minor player actions, like a form of short-term memory loss. These disruptions highlight one of the field\u2019s biggest challenges: maintaining consistency.<\/p>\n\n\n\n

In response, researchers from Microsoft Research Asia, the Hong Kong University of Science and Technology, and the University of Chinese Academy of Sciences have introduced a new framework called Model as a Game (MaaG)<\/a>. This approach addresses two core inconsistencies in generative games: numerical and spatial.<\/p>\n\n\n\n

Defining the problem: Numerical and spatial consistency<\/h2>\n\n\n\n

Numerical consistency refers to the logical accuracy of score updates based on game events. For example, if an action yields a +1 score, the result should reflect that exact change. Spatial consistency, by contrast, means the environment remains visually coherent when players revisit previously explored areas.<\/p>\n\n\n\n

To examine these issues in a controlled setting, the team created a minimalist 2D game called Traveler<\/em>. In it, a small black block moves left and right. As it passes through empty spaces, colorful buildings are randomly generated, and the score increases by one.<\/p>\n\n\n\n

Despite its simplicity, Traveler<\/em> clearly reveals the limitations of current generative models. Notably, the game was generated using large language models (LLMs) and built with Pygame, a set of Python modules for writing video games. It also supports frame-by-frame data export with synchronized numerical states, offering a strong foundation for research.<\/p>\n\n\n\n

\"chart,
Figure 2. In Traveler<\/em>, a moving block generates buildings and scores, exposing consistency challenges.<\/figcaption><\/figure>\n\n\n\n

Inside the MaaG framework: Numerical and spatial modules<\/h2>\n\n\n\n

The MaaG framework uses a numerical module and a spatial module to enhance the Diffusion Transformer (DiT) architecture. Together, they work to ensure that generative models do more than just produce images, they also recognize and follow game logic.<\/p>\n\n\n\n

\"diagram\"
Figure 3: MaaG incorporates numerical (red, left) and spatial (blue, right) modules to improve consistency.<\/figcaption><\/figure>\n\n\n\n