Vision Texture for Annotation

This paper demonstrates a new application of computer vision to digital libraries — the use of texture for annotation, the description of content. Vision-based annotation assists the user in attaching descriptions to large sets of images and video. If a user labels a piece of an image as ”water,” a texture model can be used to propagate this label to other ”visually similar” regions. However, a serious problem is that no single model has been found to be good enough to reliably match human perception of similarity in pictures. Rather than using one model, the system described here knows several texture models, and is equipped with the ability to choose the one which ”best explains” the regions selected by the user for annotating. If none of these models suffices, then it creates new explanations by combining models. Examples are given of annotations propagated by the system on natural scenes. The system provides an average gain of four to one in label prediction over a set of 98 images.