{"id":960144,"date":"2023-08-13T04:05:49","date_gmt":"2023-08-13T11:05:49","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-project&#038;p=960144"},"modified":"2023-09-18T19:02:30","modified_gmt":"2023-09-19T02:02:30","slug":"dragnuwa","status":"publish","type":"msr-project","link":"https:\/\/www.microsoft.com\/en-us\/research\/project\/dragnuwa\/","title":{"rendered":"DragNUWA"},"content":{"rendered":"\n<div class=\"wp-block-group is-layout-constrained wp-block-group-is-layout-constrained\"><section class=\"mb-3 moray-highlight\">\n\t<div class=\"card-img-overlay mx-lg-0\">\n\t\t<div class=\"card-background  has-background- card-background--full-bleed\">\n\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1600\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301.jpg\" class=\"attachment-full size-full\" alt=\"DragNUWA background\" style=\"\" srcset=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301.jpg 2560w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-300x188.jpg 300w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-1024x640.jpg 1024w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-768x480.jpg 768w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-1536x960.jpg 1536w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-2048x1280.jpg 2048w, https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/hills-2836301-240x150.jpg 240w\" sizes=\"(max-width: 2560px) 100vw, 2560px\" \/>\t\t<\/div>\n\t\t<!-- Foreground -->\n\t\t<div class=\"card-foreground d-flex mt-md-n5 my-lg-5 px-g px-lg-0\">\n\t\t\t<!-- Container -->\n\t\t\t<div class=\"container d-flex mt-md-n5 my-lg-5 \">\n\t\t\t\t<!-- Card wrapper -->\n\t\t\t\t<div class=\"w-100 w-lg-col-5\">\n\t\t\t\t\t<!-- Card -->\n\t\t\t\t\t<div class=\"card material-md-card py-5 px-md-5\">\n\t\t\t\t\t\t<div class=\"card-body \">\n\t\t\t\t\t\t\t\n\t\t\t\t\t\t\t\n\n<h1 class=\"wp-block-heading\" id=\"dragnuwa\">DragNUWA<\/h1>\n\n\n\n<p>DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation.<\/p>\n\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t<\/div>\n\t\t<\/div>\n\t<\/div>\n<\/section>\n<\/div>\n\n\n\n\n\n<p><strong>DragNUWA <\/strong>is a video generation model that utilizes<strong> text, images, and trajectory<\/strong> as three essential control factors to facilitate <strong>highly controllable video generation<\/strong> from semantic, spatial, and temporal aspects. Distinct from existing research, DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.<\/p>\n\n\n\n<div class=\"annotations \" data-bi-aN=\"citation\">\n\t<ul class=\"annotations__list card depth-16 bg-body p-4 \">\n\t\t<li class=\"annotations__list-item\">\n\t\t\t\t\t\t<span class=\"annotations__type d-block text-uppercase font-weight-semibold text-neutral-300 small\">Publication<\/span>\n\t\t\t<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/publication\/dragnuwa-fine-grained-control-in-video-generation-by-integrating-text-image-and-trajectory\/\" target=\"_self\" class=\"annotations__link font-weight-semibold text-decoration-none\" data-bi-type=\"annotated-link\" aria-label=\"DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory\" data-bi-aN=\"citation\" data-bi-cN=\"DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory\">\n\t\t\t\tDragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory&nbsp;<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t<\/a>\n\t\t\t\t\t<\/li>\n\t<\/ul>\n<\/div>\n\n\n\n<p>Click the <strong>top-left &#8220;play&#8221; button<\/strong> to observe how DragNUWA manipulates the same image to create videos with desired camera movements and object motions.<\/p>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"2057\" height=\"720\" class=\"wp-image-961641\" style=\"width: 2000px\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/Fig1-64d79c3b13269.gif\" alt=\"DragNUWA-Fig1\"><\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><img loading=\"lazy\" decoding=\"async\" width=\"1366\" height=\"720\" class=\"wp-image-961977\" style=\"width: 2000px\" src=\"https:\/\/www.microsoft.com\/en-us\/research\/uploads\/prod\/2023\/08\/Fig2-64d8b745bdc9c.gif\" alt=\"DragNUWA-Fig1\"><\/p>\n\n\n\n<div style=\"height:30px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n","protected":false},"excerpt":{"rendered":"<p>DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation. DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation from semantic, spatial, and temporal aspects. Distinct from existing [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"research-area":[13556,13562,13554],"msr-locale":[268875],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-960144","msr-project","type-msr-project","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-computer-vision","msr-research-area-human-computer-interaction","msr-locale-en_us","msr-archive-status-active"],"msr_project_start":"","related-publications":[964119],"related-downloads":[],"related-videos":[],"related-groups":[],"related-events":[],"related-opportunities":[],"related-posts":[],"related-articles":[],"tab-content":[],"slides":[],"related-researchers":[],"msr_research_lab":[],"msr_impact_theme":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/960144"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-project"}],"version-history":[{"count":25,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/960144\/revisions"}],"predecessor-version":[{"id":964164,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-project\/960144\/revisions\/964164"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=960144"}],"wp:term":[{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=960144"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=960144"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=960144"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=960144"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}