{"id":668466,"date":"2020-06-23T09:19:53","date_gmt":"2020-06-23T16:19:53","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?p=668466"},"modified":"2020-07-06T08:48:56","modified_gmt":"2020-07-06T15:48:56","slug":"enhancing-your-photos-through-artificial-intelligence","status":"publish","type":"post","link":"https:\/\/www.microsoft.com\/en-us\/research\/blog\/enhancing-your-photos-through-artificial-intelligence\/","title":{"rendered":"Enhancing your photos through artificial intelligence"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-668979 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero.png\" alt=\"\" width=\"5834\" height=\"3285\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero.png 5834w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1024x577.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1536x865.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-2048x1153.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1920x1080.png 1920w\" sizes=\"auto, (max-width: 5834px) 100vw, 5834px\" \/><\/p>\n<p>The amount of visual data we accumulate around the world is mind boggling. However, not all the images are captured by high-end DSLR cameras, and very often they suffer from imperfections. It is of tremendous benefit to save those degraded images so that users can reuse them for their own design or other aesthetic purposes.<\/p>\n<p>In this blog, we are going to present our latest efforts in image enhancement. The first technique enhances the image resolution of an image file by referring to external reference images. Compared to traditional learning-based methods, the reference-based solution solves the ambiguity of <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/www.americanscientist.org\/article\/computer-vision-and-computer-hallucinations\">computer hallucination<\/a> and achieves impressive visual quality. The second technique restores old photographs, which contain a mix of degradations that are hard to model. To solve this, we propose a novel triplet domain translation network by leveraging real photos along with massive synthetic image pairs. The proposed technique revives the photos to a modern form. These two works enable users to enhance their photos with ease, and the techniques were presented at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"http:\/\/cvpr2020.thecvf.com\/\">CVPR 2020 (Computer Vision and Pattern Recognition)<\/a>.<\/p>\n<h3>Learning texture transformer network for image super-resolution<\/h3>\n<p>Image super-resolution (SR) aims to recover natural and realistic textures for a high-resolution image from its degraded low-resolution counterpart, which is an important problem in the image enhancement field. Traditional single image super-resolution usually trains a deep convolutional neural network to recover a high-resolution image from the low-resolution image. Models trained with pixel-wise reconstruction loss functions often result in blurry effects for complex textures in the generated high-resolution results, which is far from satisfactory. Recently, some approaches adopt generative adversarial networks (GANs) to relieve the above problems, but the resultant hallucinations and artifacts caused by GANs further pose grand challenges to image SR tasks.<\/p>\n<p>To address the above problems, reference-based image super-resolution (RefSR) is proposed as a new direction in the image SR field. RefSR approaches utilize information from a high-resolution image, which is similar to the input image, to assist in the recovery process. The introduction of a high-resolution reference image transforms the difficult texture generation process to a simple texture search and transfer, which achieves significant improvement in visual quality. We propose a novel Texture Transformer Network for Image Super-Resolution (TTSR). This approach can effectively search and transfer high-resolution texture information for the low-resolution input, which makes full use of the reference image to solve blurry effects and artifacts.<\/p>\n<h3>Texture transformer features<\/h3>\n<p>Transformer is widely used in natural language processing, which has achieved remarkable results. However, transformer is rarely used in image generation tasks. Researchers at Microsoft Research Asia propose a novel texture transformer for image super-resolution to successfully apply transformer in image generation tasks. As shown in Figure 1, there are four parts in the texture transformer: the learnable texture extractor (LTE), the relevance embedding module (RE), the hard-attention module for feature transfer (HA) and the soft-attention module for feature synthesis (SA). Details will be discussed below.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_668970\" style=\"width: 1006px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668970\" class=\"wp-image-668970 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-1-_-ai-for-image-996x1024.png\" alt=\"Diagram of texture transformer Bottom: 4 images of a raising bridge labeled from left to right: LR, RF, RF (up and down), LR (up). LR arrow to backbone, arrow to F, arrow to T, and arrow to output. RF arrow to learnable texture extractor, arrow to V, arrow to hard attention, arrow to T, arrow to Soft Attention. RF (up and down) arrow to learnable texture extractor, arrow to K, arrow to relevance embedding, arrow to H (boxes read 3, ..., 5, ..., 2, ...), arrow to hard attention, arrow to T, arrow to soft attention. LR (up) arrow to learnable texture extractor, arrow to Q, arrow to S (boxes 0.4, ..., 0.1, ..., 0.9, ...), arrow to soft attention. Soft attention arrow to output.\" width=\"996\" height=\"1024\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-1-_-ai-for-image-996x1024.png 996w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-1-_-ai-for-image-292x300.png 292w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-1-_-ai-for-image-768x790.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-1-_-ai-for-image.png 1092w\" sizes=\"auto, (max-width: 996px) 100vw, 996px\" \/><p id=\"caption-attachment-668970\" class=\"wp-caption-text\">Figure 1: The proposed texture transformer.<\/p><\/div>\n<ul>\n<li><strong>Learnable texture transformer.<\/strong> For texture extraction, recent approaches usually use semantic features extracted by a pre-trained classification model like VGG. However, such design has obvious drawbacks. First, the training objective of VGG network is a semantic classification label, and the high-level information is different from the low-level texture information. Therefore, it is not proper to use VGG features as the texture features. Second, the needed texture information is variable for different tasks. To solve the above problems, a learnable texture extractor, whose parameters will be updated during end-to-end training, is used. Such a design encourages a joint feature learning across the low-resolution (LR) and the reference (Ref) image, in which more accurate texture features can be captured, which provides a superior foundation for texture search and transfer to generate high-quality results.<\/li>\n<li><strong>Relevance embedding.<\/strong> As shown in Figure 1, the same as traditional transformer, the proposed texture transformer also has Q, K, and V elements. Q (query) represents the texture information extracted from the low-resolution input image. The Ref image is sequentially applied bicubic down-sampling and up-sampling with the same factor (4x) to be domain-consistent with LR and to obtain texture information as K (key). V (value) indicates the texture features extracted from the original Ref image which is used to transfer. For Q and K, a relevance embedding module is proposed to build a relationship between the LR and Ref image. Specifically, this module unfolds both Q and K into patches and calculates the relevance between each Q patch and each K patch by normalized inner product. A larger inner product indicates higher relevance between two patches, which means more texture features can be transferred. In contrast, lower inner product indicates lower relevance, and therefore less texture features can be transferred. The relevance embedding module will output a hard-attention map and a soft-attention map. The hard-attention map records the position of the most relevant patch in K for each patch in Q, while the soft-attention map records specific relevance values, that is the inner products. These two maps will be applied in the hard-attention module and the soft-attention module.<\/li>\n<li><strong>Hard attention.<\/strong> In the hard-attention module, we utilize the recorded positions in the hard-attention map to transfer corresponding patches, forming the transferred texture feature T. T contains the most relevant texture features of the Ref image in every position, and it will be synthesized with backbone features by concatenation operation.<\/li>\n<li><strong>Soft attention. <\/strong>In the soft-attention module, the above synthesized features will be multiplied element-wise by the soft-attention map. In such a design, more relevant textures can be enhanced, while less relevant ones will be relieved. Therefore, the soft-attention module can utilize the transferred textures more accurately.<\/li>\n<\/ul>\n<h3>Cross-scale feature integration<\/h3>\n<p>A traditional transformer achieves more powerful representation by stacking, but in an image generation task, simple stacking is not effective . To solve this, the texture transformer can be further stacked in a cross-scale way with a cross-scale feature integration module. The proposed texture transformers are applied in three scales: 1x, 2x, and 4x. Features from different scales will be fused by up-sampling or strided convolution. In such a design, the texture features transferred from the stacked texture transformers are exchanged across each scale, which achieves a more powerful feature representation and further improves the performance of our approach.<\/p>\n<div id=\"attachment_668967\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668967\" class=\"wp-image-668967 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image-1024x369.png\" alt=\"Left side: Three pink boxes labeled Texture Transformer. Bottom box: arrow to six 1 X boxes. each pair is labeled from left to right: RBs, CSFI, RBs, CSFI, RBs. Middle box: arrow to five 2 X boxes. Top box: arrow to three 4 X boxes.\" width=\"1024\" height=\"369\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image-1024x369.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image-300x108.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image-768x277.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image-1536x554.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Updated-fig-2_-ai-for-image.png 1933w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-668967\" class=\"wp-caption-text\">Figure 2: Architecture of stacking multiple texture transformers in a cross-scale way with the proposed cross-scale feature integration module (CSFI).<\/p><\/div>\n<p>There are 3 loss functions in this approach: reconstruction loss, adversarial loss, and perceptual loss. The overall loss can be interpreted as:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-668514 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fucntion1.png\" alt=\"\" width=\"670\" height=\"76\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fucntion1.png 670w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fucntion1-300x34.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fucntion1-665x76.png 665w\" sizes=\"auto, (max-width: 670px) 100vw, 670px\" \/><\/p>\n<ul>\n<li><strong>Reconstruction loss.<\/strong> L1 loss is adopted since L1 is superior to L2 in generating clear images. This loss can be interpreted as:<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-medium wp-image-668511\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-image_function-2-300x59.png\" alt=\"\" width=\"300\" height=\"59\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-image_function-2-300x59.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-image_function-2.png 422w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/p>\n<ul>\n<li><strong>Adversarial loss.<\/strong> Compared to reconstruction loss, the adversarial has less constraints on low-level frequency. This loss requires the domain consistency between generated results and the original image to generate more clear and realistic textures. This loss is based on WGAN-GP which can be interpreted as:<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-668508 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-function-3.png\" alt=\"\" width=\"478\" height=\"207\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-function-3.png 478w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-function-3-300x130.png 300w\" sizes=\"auto, (max-width: 478px) 100vw, 478px\" \/><\/p>\n<ul>\n<li><strong>Perceptual loss.<\/strong> Perceptual loss is a special reconstruction loss imposed on the feature space of a neural network. There are two kinds of perceptual losses. The first one adopts VGG as the feature extraction network for the result image and the original high-resolution image. The second one adopts the proposed texture extractor to constraint features of the result image and the transferred features T. These two losses can be interpreted as:<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-668505 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_function-4.png\" alt=\"\" width=\"581\" height=\"152\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_function-4.png 581w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_function-4-300x78.png 300w\" sizes=\"auto, (max-width: 581px) 100vw, 581px\" \/><\/p>\n<h3>Experiment results<\/h3>\n<p>The proposed TTSR is quantitatively evaluated on CUFED5, Sun80, Urban100, and Manga109 dataset, which is shown in Table 1.<\/p>\n<p>&nbsp;<\/p>\n<div id=\"attachment_668499\" style=\"width: 862px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668499\" class=\"wp-image-668499 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fig-3.png\" alt=\"Results for TTSR-rec: 27.09\/.814 (CUFED5), 30.02\/.814 (Sun80), 25.87\/.784 (Urban100), and 30.09\/.907 (Manga 109) \" width=\"852\" height=\"464\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fig-3.png 852w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fig-3-300x163.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_fig-3-768x418.png 768w\" sizes=\"auto, (max-width: 852px) 100vw, 852px\" \/><p id=\"caption-attachment-668499\" class=\"wp-caption-text\">Table 1: Quantitative comparison between the proposed TTSR and other methods on different datasets.<\/p><\/div>\n<p>On the other hand, a user study is conducted for qualitative evaluation. The proposed TTSR significantly outperforms other approaches with over 90% of users voting for ours, which verifies the favorable visual quality of TTSR. Figure 3 shows the results.<\/p>\n<div id=\"attachment_668490\" style=\"width: 744px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668490\" class=\"wp-image-668490 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-fig-3-real.png\" alt=\"RCAN: 93.6 +- 0.9% RSRGAN: 90.8 +- 0.5% CrossNet: 92.6 +- 1.1% SRNTT: 90.7 +- 0.6%\" width=\"734\" height=\"363\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-fig-3-real.png 734w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_-fig-3-real-300x148.png 300w\" sizes=\"auto, (max-width: 734px) 100vw, 734px\" \/><p id=\"caption-attachment-668490\" class=\"wp-caption-text\">Figure 3: User study results.<\/p><\/div>\n<p>Figure 4 shows the visual comparison between the proposed TTSR and other methods on different datasets. TTSR achieves the best visual quality.<\/p>\n<div id=\"attachment_668487\" style=\"width: 929px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668487\" class=\"wp-image-668487 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-fig-4-919x1024.png\" alt=\"A 6 by 6 grid showing examples of images and a comparison between methods. 1. A pair of women standing beside each other with graduation gowns on. One of the woman's face is enlarged to show the models' capabilities. 2. A man sitting on a couch in front of three small picture frames on a shelf. One of the picture frames is enlarged to show model capabilities, showing a couple. 3.\" width=\"919\" height=\"1024\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-fig-4-919x1024.png 919w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-fig-4-269x300.png 269w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-fig-4-768x856.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-fig-4.png 1269w\" sizes=\"auto, (max-width: 919px) 100vw, 919px\" \/><p id=\"caption-attachment-668487\" class=\"wp-caption-text\">Figure 4: Visual comparison between TTSR and other methods compared to the ground truth and reference. Methods compared include RDN, CrossNet, RCAN, SRNTT, RSRGAN, and our own TTSR.<\/p><\/div>\n<p style=\"text-align: left;\">For more technical details, check out our CVPR 2020 paper <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/arxiv.org\/abs\/2006.04139\">\u201cLearning Texture Transformer Network for Image Super-Resolution&#8221;<\/a>. Source code and pre-trained models are available at <a class=\"msr-external-link glyph-append glyph-append-open-in-new-tab glyph-append-xsmall\" href=\"https:\/\/github.com\/researchmm\/TTSR\">https:\/\/github.com\/researchmm\/TTSR<\/a>.<\/p>\n<h3>Bringing old photos back to life<\/h3>\n<p>People can be nostalgic and often cherish the happy moments in revisiting old times. There is a strong desire to bring old photos back to their original quality so that people can feel the nostalgia of travelling back to their childhood or experiencing earlier memories. However, manually restoring old photos is usually laborious and time consuming, and most users may not be able to afford these expensive services. Researchers at Microsoft recently proposed a technique to automate this restoration process, which revives old photos with compelling quality using AI. Different from simply cascading multiple processing operators, we propose a solution specifically for old photos and thus achieve far better results. This work will be presented in an upcoming CVPR 2020 presentation.<\/p>\n<p style=\"text-align: left;\">With the emergence of deep learning, one can address a variety of low-level image restoration problems by exploiting the powerful representation capability of convolutional neural networks, that is, learning the mapping for a specific task from a large amount of synthetic images. The same framework, however, does not apply to old photo restoration. This is because old photos are plagued with multiple degradations (such as scratches, blemishes, color fading, or film noises) and there exists no degradation model that can realistically render old photo artifacts (Figure 5). Also, photography technology consistently evolves, so photos of different eras demonstrate different artifacts. As a result, the domain gap between the synthetic images and real old photos makes the network fail to generalize.<\/p>\n<div id=\"attachment_668484\" style=\"width: 795px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668484\" class=\"wp-image-668484 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image-_-figure-5.png\" alt=\"Three old photographs. A black and white headshot of a woman has scratches and folds. A sepia headshot of a girl shows discoloration as well as scratches and bends. A black and white photo of a man on a bike shows film noise. \" width=\"785\" height=\"342\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image-_-figure-5.png 785w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image-_-figure-5-300x131.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image-_-figure-5-768x335.png 768w\" sizes=\"auto, (max-width: 785px) 100vw, 785px\" \/><p id=\"caption-attachment-668484\" class=\"wp-caption-text\">Figure 5<\/p><\/div>\n<p style=\"text-align: left;\">We formulate old photo restoration as a triplet domain translation problem. Specifically, we leverage data from three domains: the real photo domain R, the synthetic domain X where images suffer from artificial degradation, and the corresponding ground truth domain Y that comprises images without degradation. The synthetic images in R have their correspondence counterpart in Y. We learn the mapping X\u2192Y in the latent space. Such triplet domain translation (Figure 2) is crucial to our task as it leverages the unlabeled real photos as well as a large amount of synthetic data associated with ground truth.<\/p>\n<div id=\"attachment_668481\" style=\"width: 917px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668481\" class=\"wp-image-668481 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-figure-6.png\" alt=\"6 interrelated ovals representing proposed image mapping process. The synthetic domain (X) and real domain (R )are represented by separate ovals and are mapped to the same latent spaces, a shared space for corrupted images represented by two overlapping ovals (ZX and ZR respectively) and a clean space represented by its own circle (ZY). ZY is connected to the last circle (Y) representing ground truth domain. See corresponding paragraphs in post for details.\" width=\"907\" height=\"474\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-figure-6.png 907w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-figure-6-300x157.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image-figure-6-768x401.png 768w\" sizes=\"auto, (max-width: 907px) 100vw, 907px\" \/><p id=\"caption-attachment-668481\" class=\"wp-caption-text\">Figure 6<\/p><\/div>\n<p style=\"text-align: left;\">As shown in Figure 6 , we propose mapping the synthetic images and the real photos to the same latent space with a shared variational autoencoder (VAE). Intuitively, synthetic images have a similar appearance to real old photos, so it is easier to align them into the same space due to their distribution overlap. In terms of network structure, we adopt VAE rather than vanilla autoencoder because VAE features denser latent representation due to the assumption of Gaussian prior for latent codes, which helps produce closer latent space with VAE1 and leads to a smaller domain gap. To further narrow the domain gap in this reduced space, we propose using an adversarial network to examine the residual latent gap.<br \/>\nMeanwhile, another VAE is trained to project clean images into the corresponding latent space. In this way, we obtain two latent spaces, the shared space Z_X (\u2248Z_R) for corrupted images and the latent space Z_Y for clean images. We now fix the encoders and decoders for both VAEs and train a mapping network that maps the latent spaces in between. This mapping network is learned by making use of paired data in R and Y. The mapping in the compact low-dimensional latent space is in principle much easier to learn than in the high-dimensional image space.<\/p>\n<div id=\"attachment_668478\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668478\" class=\"wp-image-668478 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_figure-7-1024x508.jpg\" alt=\"Another representation of the paragraphs above. An old photo in sepia tone of a woman representing a real photo and a synthetic image point to a latent space. It moves through VAE 1 and then results in a reconstructed real phot and a reconstructed synthetic image. On another path, the photo travels through two ResBlock blocks, moves through mapping labeled Partial nonlocal and two ResBlocks, and then moves through another ResBlock. A clean image moves through a latent space to VAE 2 and then to a restored image. \" width=\"1024\" height=\"508\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_figure-7-1024x508.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_figure-7-300x149.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_figure-7-768x381.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-for-image_figure-7.jpg 1515w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-668478\" class=\"wp-caption-text\">Figure 7<\/p><\/div>\n<p style=\"text-align: left;\">Furthermore, we propose global-local context fusion in the latent mapping network. This is motivated by the observation that artifacts existing in old photos can be categorized into two types. Unstructured defects such as film noise, blurriness, and color fading can be restored with spatially homogeneous filters by making use of surrounding pixels within the local patch. Structured defects such as scratches and blotches, on the other hand, should be filled in by considering the global context to ensure the structural consistency. In implementation, we train another scratch detection network to detect scratch areas in the photos, and we feed this mask information into the partial nonlocal block in the global branch so that the restoration excludes information in the corrupted regions during image inpainting.<\/p>\n<p style=\"text-align: left;\">Because the domain gap is greatly reduced in the latent space, the network during inference is capable of recovering old photos at the same quality of processing synthetic images. In addition, we crop faces in portraits and train a refinement network with a similar translation scheme on face dataset. The facial details\u2014even under severe degradation\u2014can be hallucinated at this stage, which improves the perceptual quality towards reconstructed photos.<\/p>\n<p style=\"text-align: left;\">Our method provides superior quality when comparing with several state-of-the-art methods in real photo restoration, as shown in Figure 8 . Our method appears the most sharp, vibrant, and realistic when compared to existing methods. During the subjective evaluation, our method is preferred as the best in 64.86% cases.<\/p>\n<div id=\"attachment_668502\" style=\"width: 980px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668502\" class=\"wp-image-668502 size-full\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-8.png\" alt=\"Two diferent photos as restored by 4 different methods. One photo is a headshot of a girl with flowers. The colors in the original are slightly off with a reddish tint. The other is an old headshot of a girl looking downward smiling slightly. It is slightly discolored with a bluish tint. Compared to DIP, CycleGAN, Pix2Pix, Our model shows shows accurate color correction and higher sharpness. \" width=\"970\" height=\"429\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-8.png 970w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-8-300x133.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Figure-8-768x340.png 768w\" sizes=\"auto, (max-width: 970px) 100vw, 970px\" \/><p id=\"caption-attachment-668502\" class=\"wp-caption-text\">Figure 8<\/p><\/div>\n<p style=\"text-align: left;\">We present more results of our method in the following Figure 9. Our method works remarkably well to recover real photos from different periods.<\/p>\n<div id=\"attachment_668475\" style=\"width: 1034px\" class=\"wp-caption aligncenter\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-668475\" class=\"wp-image-668475 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image_-figure-9-1024x309.png\" alt=\"Six different images show the original photo and a much improved version after being run through the model. Image one: a faded image if a girl holding flowers. Image two: A headshot of a faded woman smiling and holding a bird perched on her hand in front of her face. Image three: a faded and discolored image of young person with long hair and glasses with a forced smile. Image four: A faded and discolored image of a woman wearing a dress with a dog in her lap. Image five: A cracked and bent black and white image of a boy wearing a vest and dress shirt. Image six: a severely cracked black and white image of a couple. The man wears a vintage military uniform and the woman wears a vintage dress. All of the images have a similar high quality with imperfections removed after being run through the model. \" width=\"1024\" height=\"309\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image_-figure-9-1024x309.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image_-figure-9-300x90.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image_-figure-9-768x232.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/Ai-for-image_-figure-9.png 1439w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><p id=\"caption-attachment-668475\" class=\"wp-caption-text\">Figure 9<\/p><\/div>\n<p>Please refer to our paper, <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/bringing-old-photos-back-to-life\/\">\u201cBringing Old Photos Back to Life,\u201d<\/a> for more technical details.<\/p>\n<p style=\"text-align: left;\">Apart from photo restoration, colorizing a legacy photo or video is another area of interest to many people. In fact, we have published another CVPR paper earlier (<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/deep-exemplar-based-video-colorization\/\">https:\/\/arxiv.org\/abs\/1906.09909<\/a>).\u00a0 Here, we leave you with a photo demo of this technology.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-668973 size-large\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post-1024x481.jpg\" alt=\"Two rows of photos. The first row shows the input photo and then alternates between a \"reference\" photo and \"our\" photo which is the result of the model. The second row continues with more examples of the same. \" width=\"1024\" height=\"481\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post-1024x481.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post-300x141.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post-768x361.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post-1066x502.jpg 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/New-figure-for-ai-for-image-post.jpg 1068w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The amount of visual data we accumulate around the world is mind boggling. However, not all the images are captured by high-end DSLR cameras, and very often they suffer from imperfections. It is of tremendous benefit to save those degraded images so that users can reuse them for their own design or other aesthetic purposes. [&hellip;]<\/p>\n","protected":false},"author":38838,"featured_media":668979,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"categories":[1],"tags":[],"research-area":[13562],"msr-region":[],"msr-event-type":[],"msr-locale":[268875],"msr-post-option":[],"msr-impact-theme":[],"msr-promo-type":[],"msr-podcast-series":[],"class_list":["post-668466","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-research-blog","msr-research-area-computer-vision","msr-locale-en_us"],"msr_event_details":{"start":"","end":"","location":""},"podcast_url":"","podcast_episode":"","msr_research_lab":[199560],"msr_impact_theme":[],"related-publications":[],"related-downloads":[],"related-videos":[],"related-academic-programs":[],"related-groups":[737755],"related-projects":[737098],"related-events":[661083],"related-researchers":[],"msr_type":"Post","featured_image_thumbnail":"<img width=\"960\" height=\"540\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-960x540.png\" class=\"img-object-cover\" alt=\"a person standing in front of a building\" decoding=\"async\" loading=\"lazy\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-960x540.png 960w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-300x169.png 300w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1024x577.png 1024w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-768x432.png 768w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1536x865.png 1536w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-2048x1153.png 2048w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1066x600.png 1066w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-655x368.png 655w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-343x193.png 343w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-640x360.png 640w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1280x720.png 1280w, https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2020\/06\/AI-For-Image-_1400-by-788_hero-1920x1080.png 1920w\" sizes=\"auto, (max-width: 960px) 100vw, 960px\" \/>","byline":"","formattedDate":"June 23, 2020","formattedExcerpt":"The amount of visual data we accumulate around the world is mind boggling. However, not all the images are captured by high-end DSLR cameras, and very often they suffer from imperfections. It is of tremendous benefit to save those degraded images so that users can&hellip;","locale":{"slug":"en_us","name":"English","native":"","english":"English"},"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/668466","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/users\/38838"}],"replies":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/comments?post=668466"}],"version-history":[{"count":11,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/668466\/revisions"}],"predecessor-version":[{"id":671787,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/posts\/668466\/revisions\/671787"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media\/668979"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=668466"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/categories?post=668466"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/tags?post=668466"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=668466"},{"taxonomy":"msr-region","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-region?post=668466"},{"taxonomy":"msr-event-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-event-type?post=668466"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=668466"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=668466"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=668466"},{"taxonomy":"msr-promo-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-promo-type?post=668466"},{"taxonomy":"msr-podcast-series","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-podcast-series?post=668466"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}