Strategic Subset Selection in Satellite Imagery: Machine Vision Insights

The abundance of the currently available satellite and aerial images contrasts sharply with the scarcity of labels for these images. With data on such a grand scale, labeling everything, even to a small degree, is impractical. This raises an important question: which image patches should we prioritize for labeling? The data-centric machine learning challenge at the Machine Vision for Earth Observation 2023 (opens in new tab) workshop addressed this issue using the DFC2022 (opens in new tab) dataset. In this challenge, participants were provided with a set of image patches for land cover segmentation. Instead of focusing on model training, the participants were tasked with selecting three subsets of the training data (1%, 10%, and 25%) to be labeled. These subsets were then used to train a standard deep learning semantic segmentation model following a fixed routine, and the trained model was evaluated on an annotated test set that was not disclosed to the contestants. This challenge highlighted the importance of selecting the most advantageous samples for the training process, and managing the label noise present in the DFC2022 dataset. We present our winning methods for subset selection in satellite imagery.

日期:
演讲者:
Akram Zaytar, Simone Nsutezo Fobi
所属机构:
Microsoft AI for Good