{"id":804475,"date":"2021-12-13T23:11:17","date_gmt":"2021-12-14T07:11:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&p=804475"},"modified":"2021-12-14T04:25:19","modified_gmt":"2021-12-14T12:25:19","slug":"on-the-evaluation-of-neural-code-summarization","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-the-evaluation-of-neural-code-summarization\/","title":{"rendered":"On the Evaluation of Neural Code Summarization"},"content":{"rendered":"

Source<\/span> code<\/span> summaries<\/span> are<\/span> important<\/span> for<\/span> the<\/span> com<\/span>prehension<\/span> and<\/span> maintenance<\/span> of<\/span> programs.<\/span> However,<\/span> there<\/span> are <\/span>plenty<\/span> of<\/span> programs<\/span> with<\/span> missing,<\/span> outdated,<\/span> or<\/span> mismatched <\/span>summaries.<\/span> Recently,<\/span> deep<\/span> learning<\/span> techniques<\/span> have<\/span> been<\/span> ex<\/span>ploited<\/span> to<\/span> automatically<\/span> generate<\/span> summaries<\/span> for<\/span> given<\/span> code <\/span>snippets.<\/span> To<\/span> achieve<\/span> a<\/span> profound<\/span> understanding<\/span> of<\/span> how<\/span> far<\/span> we <\/span>are<\/span> from<\/span> solving<\/span> this <\/span>problem,<\/span> in<\/span> this<\/span> paper,<\/span> we<\/span> conduct<\/span> a <\/span>systematic<\/span> and<\/span> in-depth<\/span> analysis<\/span> of<\/span> five<\/span> state-of-the-art<\/span> neural <\/span>source code summarization models on three widely used datasets. <\/span>Our evaluation results suggest that: (1) The BLEU metric, which <\/span>is<\/span> widely<\/span> used<\/span> by<\/span> existing<\/span> work<\/span> for<\/span> evaluating<\/span> the<\/span> performance <\/span>of<\/span> the<\/span> summarization<\/span> models,<\/span> has<\/span> many<\/span> variants.<\/span> Ignoring<\/span> the <\/span>differences<\/span> among<\/span> the<\/span> BLEU<\/span> variants<\/span> could<\/span> affect<\/span> the<\/span> validity <\/span>of<\/span> the<\/span> claimed<\/span> results.<\/span> Furthermore,<\/span> we<\/span> discover<\/span> an<\/span> important, <\/span>previously unknown bug about BLEU calculation in a commonly-<\/span>used software package. (2) Code pre-processing choices can have <\/span>a<\/span> large<\/span> impact<\/span> on<\/span> the<\/span> summarization<\/span> performance,<\/span> therefore <\/span>they<\/span> should<\/span> not<\/span> be<\/span> ignored.<\/span> (3)<\/span> Some<\/span> important<\/span> characteristics <\/span>of<\/span> datasets<\/span> (corpus<\/span> size,<\/span> data<\/span> splitting<\/span> method,<\/span> and<\/span> duplication <\/span>ratio)<\/span> have<\/span> a<\/span> significant<\/span> impact<\/span> on<\/span> model<\/span> evaluation.<\/span> Based<\/span> on <\/span>the<\/span> experimental<\/span> results,<\/span> we<\/span> give<\/span> some<\/span> actionable<\/span> guidelines<\/span> on <\/span>more<\/span> systematic<\/span> ways<\/span> for<\/span> evaluating<\/span> code<\/span> summarization<\/span> and <\/span>choosing<\/span> the<\/span> best<\/span> method<\/span> in<\/span> different<\/span> scenarios.<\/span> We<\/span> also<\/span> suggest <\/span>possible<\/span> future<\/span> research<\/span> directions.<\/span> We<\/span> believe<\/span> that<\/span> our<\/span> results <\/span>can<\/span> be<\/span> of<\/span> great<\/span> help<\/span> for<\/span> practitioners<\/span> and<\/span> researchers<\/span> in<\/span> this <\/span>interesting<\/span> area.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"

Source code summaries are important for the comprehension and maintenance of programs. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem, in this […]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"footnotes":""},"msr-content-type":[3],"msr-research-highlight":[],"research-area":[13556,13560],"msr-publication-type":[193716],"msr-product-type":[],"msr-focus-area":[],"msr-platform":[],"msr-download-source":[],"msr-locale":[268875],"msr-field-of-study":[],"msr-conference":[259390],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-804475","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-programming-languages-software-engineering","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-2-11","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2107.07112","label_id":"252679","label":0}],"msr_related_uploader":"","msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Ensheng Shi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Yanlin Wang","user_id":39141,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yanlin Wang"},{"type":"user_nicename","value":"Lun Du","user_id":39144,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Lun Du"},{"type":"text","value":"Junjie Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Shi Han","user_id":33618,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shi Han"},{"type":"text","value":"Hongyu Zhang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Dongmei Zhang","user_id":31665,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dongmei Zhang"},{"type":"text","value":"Hongbin Sun","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[838618],"msr_group":[],"msr_project":[],"publication":[],"video":[],"download":[],"msr_publication_type":"inproceedings","_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475"}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475\/revisions"}],"predecessor-version":[{"id":804481,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475\/revisions\/804481"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=804475"}],"wp:term":[{"taxonomy":"msr-content-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-content-type?post=804475"},{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=804475"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=804475"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=804475"},{"taxonomy":"msr-product-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-product-type?post=804475"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=804475"},{"taxonomy":"msr-platform","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-platform?post=804475"},{"taxonomy":"msr-download-source","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-download-source?post=804475"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=804475"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=804475"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=804475"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=804475"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=804475"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=804475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}