{"id":711955,"date":"2020-12-09T22:58:47","date_gmt":"2020-12-10T06:58:47","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=711955"},"modified":"2020-12-09T22:58:47","modified_gmt":"2020-12-10T06:58:47","slug":"learning-dynamic-gmm-for-attention-distribution-on-single-face-videos","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/learning-dynamic-gmm-for-attention-distribution-on-single-face-videos\/","title":{"rendered":"Learning Dynamic GMM for Attention Distribution on Single-face Videos"},"content":{"rendered":"<p>The past decade has witnessed the popularity of video conferencing, such as FaceTime and Skype. In video conferencing, almost every frame has a human face. Hence, it is necessary to predict attention on face videos by saliency detection, as saliency can be used as a guidance of regionof-interest (ROI) for the content-based applications. To this end, this paper proposes a novel approach for saliency detection in single-face videos. From the data-driven perspective, we first establish an eye tracking database which contains fixations of 70 single-face videos viewed by 40 subjects. Through analysis on our database, we investigate that most attention is attracted by face in videos, and that attention distribution within a face varies with regard to face size and mouth movement. Inspired by the previous work which applies Gaussian mixture model (GMM) for face saliency detection in still images, we propose to model visual attention on face region for videos by dynamic GMM (DGMM), the variation of which relies on face size, mouth movement and facial landmarks. Then, we develop a long short-term memory (LSTM) neural network in estimating DGMM for saliency detection of single-face videos, so called LSTM-DGMM. Finally, the experimental results show that our approach outperforms other state-of-the-art approaches in saliency detection of single-face videos.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The past decade has witnessed the popularity of video conferencing, such as FaceTime and Skype. In video conferencing, almost every frame has a human face. Hence, it is necessary to predict attention on face videos by saliency detection, as saliency can be used as a guidance of regionof-interest (ROI) for the content-based applications. To this [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13562],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-711955","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-computer-vision","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2017-6","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/openaccess.thecvf.com\/content_cvpr_2017_workshops\/w20\/papers\/Ren_Learning_Dynamic_GMM_CVPR_2017_paper.pdf","label_id":"243109","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Yun Ren","user_id":0,"rest_url":false},{"type":"text","value":"Zulin Wang","user_id":0,"rest_url":false},{"type":"text","value":"Mai Xu","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Haoyu Dong","user_id":37128,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Haoyu Dong"},{"type":"text","value":"Shengxi Li","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/711955"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/711955\/revisions"}],"predecessor-version":[{"id":711958,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/711955\/revisions\/711958"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=711955"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=711955"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=711955"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=711955"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=711955"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=711955"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=711955"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=711955"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=711955"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=711955"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=711955"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=711955"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=711955"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=711955"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=711955"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=711955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}