The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS)
- Bjoern H. Menze ,
- Andras Jakab ,
- Stefan Bauer ,
- Jayashree Kalpathy-Cramer ,
- Keyvan Farahani ,
- Justin Kirby ,
- Yuliya Burren ,
- Nicole Porz ,
- Johannes Slotboom ,
- Roland Wiest ,
- Levente Lanczi ,
- Elizabeth Gerstner ,
- Marc-Andre Weber ,
- Tal Arbel ,
- Brian B. Avants ,
- Nicholas Ayache ,
- Patricia Buendia ,
- D. Louis Collins ,
- Nicolas Cordier ,
- Jason J. Corso ,
- Antonio Criminisi ,
- Tilak Das ,
- Hervé Delingette ,
- Cagatay Demiralp ,
- Christopher R. Durst ,
- Michel Dojat ,
- Senan Doyle ,
- Joana Festa ,
- Florence Forbes ,
- Ezequiel Geremia ,
- Ben Glocker ,
- Polina Golland ,
- Xiaotao Guo ,
- Andac Hamamci ,
- Khan M. Iftekharuddin ,
- Raj Jena ,
- Nigel M. John ,
- Ender Konukoglu ,
- Danial Lashkari ,
- Jose Antonio Mariz ,
- Raphael Meier ,
- Sergio Pereira ,
- Doina Precup ,
- Stephen J. Price ,
- Tammy Riklin Raviv ,
- Syed M. S. Reza ,
- Michael Ryan ,
- Duygu Sarikaya ,
- Lawrence Schwartz ,
- Hoo-Chang Shin ,
- Jamie Shotton ,
- Carlos A. Silva ,
- Nuno Sousa ,
- Nagesh K. Subbanna ,
- Gabor Szekely ,
- Thomas J. Taylor ,
- Owen M. Thomas ,
- Nicholas J. Tustison ,
- Gozde Unal ,
- Flor Vasseur ,
- Max Wintermark ,
- Dong Hye Ye ,
- Liang Zhao ,
- Binsheng Zhao ,
- D. Zikic ,
- Marcel Prastawa ,
- Mauricio Reyes ,
- Koen Van Leemput
IEEE Transactions on Medical Imaging | , Vol 34(10): pp. 1993-2024
In this paper we report the set-up and results of the Multimodal Brain Tumor Image Segmentation Benchmark (BRATS) organized in conjunction with the MICCAI 2012 and 2013 conferences. Twenty state-of-the-art tumor segmentation algorithms were applied to a set of 65 multi-contrast MR scans of low- and high-grade glioma patients—manually annotated by up to four raters—and to 65 comparable scans generated using tumor image simulation software. Quantitative evaluations revealed considerable disagreement between the human raters in segmenting various tumor sub-regions (Dice scores in the range 74%–85%), illustrating the difficulty of this task. We found that different algorithms worked best for different sub-regions (reaching performance comparable to human inter-rater variability), but that no single algorithm ranked in the top for all sub-regions simultaneously. Fusing several good algorithms using a hierarchical majority vote yielded segmentations that consistently ranked above all individual algorithms, indicating remaining opportunities for further methodological improvements. The BRATS image data and manual annotations continue to be publicly available through an online evaluation system as an ongoing benchmarking resource.