ICME 2025 Grand Challenge on Video Super-Resolution for Video Conferencing

Super-Resolution (SR) is a pivotal challenge in the field of computer vision, aiming to reconstructing a high-resolution (HR) image from its low-resolution (LR) counterpart [1]. Over the past decade, numerous single image Super-Resolution challenges have been organized, leading to substantial advancements in the field. These include the Image Super-Resolution [2]–[5] and Efficient Super-Resolution [6]–[8] challenge series.

The Video Super-Resolution (VSR) task extends SR to the temporal domain, aiming to reconstruct a high-resolution video from a low-resolution one. Models for VSR may build upon single image SR techniques, employing various temporal information propagation methods such as local propagation (sliding windows), uni- or bi-directional propagation to enhance quality [9]. Alternatively, traditional upscaling methods like bicubic interpolation can be used, followed by restoration models to improve perceptual quality [10], [11].

VSR has been a focus in challenges such as NTIRE 2019 [12], NTIRE 2021 [1], and AIM 2024, with the latest exploring efficient VSR [13]. These challenges have addressed various scenarios, including Clean LR [1], [12], LR with motion blur [12], and LR with frame drops [1]. The NTIRE 2021 quality enhancement challenge considered input video encoded with H.265 under a fixed quantization parameter (QP) or fixed bitrate [10] without upscaling. In the AIM 2024 challenge, LR was encoded with AV1 and targeted efficient SR [13].

In the VSR challenges the performance of the models is evaluated using objective metrics like PSNR [14], SSIM [15], and LPIPS [16]. However, it has been shown that PSNR, SSIM, and MS-SSIM do not correlate well with subjective opinions [17], [18] which can lead to misleading model rankings when human users are the target audience. Moreover, models trained on synthetic data often suffer from error propagation when processing videos with various distortions present in real-world recordings [19]. Some models address this issue by including de-noising as a pre-processing step or limiting the number of frames processed together [19]. However, our experiments indicate that these approaches can lead to other problems, such as unrealistic videos, flickering, or error propagation that appear in longer sequences (>200 frames).

Registration procedure

Registration is open! To register for the challenge, participants are required to email the VSR Challenge organizers vsr_challenge@microsoft.com(opens in new tab) (opens in new tab) with the name of their team members, emails, affiliations, team name, track(s) participating in, team captain, and tentative paper title. Participants also need to register on the Challenge CMT (opens in new tab) site where they can submit the enhanced clips. Registration data is captured and stored in the US.

Submission instructions

Please use Microsoft Conference Management Toolkit (opens in new tab) for submitting the results. The test set will be posted 1 week before the challenge’s end date and only to the registered teams. This instruction is tentative and may be updated before the release of the test set. After logging in, complete the following steps to submit the results

  1. Choose “Create new submission” in the Author Console.
  2. Enter the title, abstract, and co-authors, and upload a lastname.txt file (can be empty or contain additional information regarding the submission).
  3. Compress the enhanced results files to a single lastname.zip file, retaining the same folder and file names as the blind test set .
  4. After creating the submission, return to the “Author Console” (by clicking on “Submissions” at the top of the page) and upload the lastname.zip file via “Upload Supplementary Material”.

Contact us: For questions, please contact vsr_challenge@microsoft.com

The Microsoft CMT service (opens in new tab) was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.

References

[1] Sanghyun Son, Suyoung Lee, Seungjun Nah, Radu Timofte, and Kyoung Mu Lee, “Ntire 2021 challenge on video super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 166–181.
[2] Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, et al., “NTIRE 2024 Challenge on Image Super-Resolution (x4): Methods and Results,” arXiv preprint arXiv:2404.09790, 2024.
[3] Yulun Zhang, Kai Zhang, Zheng Chen, Yawei Li, Radu Timofte, Junpei Zhang, Kexin Zhang, Rui Peng, Yanbiao Ma, Licheng Jia, et al., “NTIRE 2023 Challenge on Image Super-Resolution (x4): Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1865–1884.
[4] Marcos V Conde, Florin Vasluianu, and Radu Timofte, “Bsraw: Improving blind raw image super-resolution,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 8500–8510.
[5] Andreas Lugmayr, Martin Danelljan, and Radu Timofte, “Ntire 2020 challenge on real-world image super-resolution: Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 494–495.
[6] Bin Ren, Yawei Li, Nancy Mehta, Radu Timofte, Hongyuan Yu, Cheng Wan, Yuxin Hong, Bingnan Han, Zhuoyuan Wu, Yajun Zou, et al., “The ninth ntire 2024 efficient super-resolution challenge report,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6595–6631.
[7] Marcos V Conde, Eduard Zamfir, Radu Timofte, Daniel Motilla, Cen Liu, Zexin Zhang, Yunbo Peng, Yue Lin, Jiaming Guo, Xueyi Zou, et al., “Efficient deep models for real-time 4k image super-resolution. NTIRE 2023 benchmark and report,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 1495–1521.
[8] Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, et al., “Aim 2020 challenge on efficient super-resolution: Methods and results,” in Computer Vision–ECCV 2020 Workshops: Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 2020, pp. 5–40.
[9] Kelvin CK Chan, Xintao Wang, Ke Yu, Chao Dong, and Chen Change Loy, “Basicvsr: The search for essential components in video super-resolution and beyond,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 4947–4956.
[10] Ren Yang, “Ntire 2021 challenge on quality enhancement of compressed video: Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 647–666.
[11] Jeya Maria Jose Valanarasu, Rahul Garg, Andeep Toor, Xin Tong, Weijuan Xi, Andreas Lugmayr, Vishal M Patel, and Anne Menini,“ Rebotnet: Fast real-time video enhancement,” arXiv preprint arXiv:2303.13504, 2023.
[12] Seungjun Nah, Radu Timofte, Shuhang Gu, Sungyong Baik, Seokil Hong, Gyeongsik Moon, Sanghyun Son, and Kyoung Mu Lee, “Ntire 2019 challenge on video super-resolution: Methods and results,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2019.
[13] Marcos V Conde, Zhijun Lei, Wen Li, Christos Bampis, Ioannis Katsavounidis, and Radu Timofte, “Aim 2024 challenge on efficient video super-resolution for av1 compressed content,” arXiv preprint arXiv:2409.17256, 2024.
[14] R. Gonzalez and R. Woods, Digital image processing, Prentice Hall, 3rd edition, 2006.
[15] Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, Apr. 2004.
[16] Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang, “The Unreasonable Effectiveness of Deep Features as a Perceptual Metric,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, June 2018, pp. 586–595, IEEE.
[17] Z. Li, A. Aaron, I. Katsavounidis, A. Moorthy, and M. Manohara, “Toward A Practical Perceptual Video Quality Metric.,” Tech. Rep., 2016.
[18] Kalpana Seshadrinathan, Rajiv Soundararajan, Alan Bovik, and Lawrence Cormack, “A Subjective Study to Evaluate Video Quality Assessment Algorithms,” in Human Vision and Electronic Imaging, 2010.
[19] Kelvin CK Chan, Shangchen Zhou, Xiangyu Xu, and Chen Change Loy, “Investigating tradeoffs in real-world video super-resolution,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 5962–5971.