{"id":804475,"date":"2021-12-13T23:11:17","date_gmt":"2021-12-14T07:11:17","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=804475"},"modified":"2021-12-14T04:25:19","modified_gmt":"2021-12-14T12:25:19","slug":"on-the-evaluation-of-neural-code-summarization","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/on-the-evaluation-of-neural-code-summarization\/","title":{"rendered":"On the Evaluation of Neural Code Summarization"},"content":{"rendered":"\n\n\n<p class=\"wp-block-paragraph\"><span dir=\"ltr\" role=\"presentation\">Source<\/span> <span dir=\"ltr\" role=\"presentation\">code<\/span> <span dir=\"ltr\" role=\"presentation\">summaries<\/span> <span dir=\"ltr\" role=\"presentation\">are<\/span> <span dir=\"ltr\" role=\"presentation\">important<\/span> <span dir=\"ltr\" role=\"presentation\">for<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">com<\/span><span dir=\"ltr\" role=\"presentation\">prehension<\/span> <span dir=\"ltr\" role=\"presentation\">and<\/span> <span dir=\"ltr\" role=\"presentation\">maintenance<\/span> <span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">programs.<\/span> <span dir=\"ltr\" role=\"presentation\">However,<\/span> <span dir=\"ltr\" role=\"presentation\">there<\/span> <span dir=\"ltr\" role=\"presentation\">are <\/span><span dir=\"ltr\" role=\"presentation\">plenty<\/span> <span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">programs<\/span> <span dir=\"ltr\" role=\"presentation\">with<\/span> <span dir=\"ltr\" role=\"presentation\">missing,<\/span> <span dir=\"ltr\" role=\"presentation\">outdated,<\/span> <span dir=\"ltr\" role=\"presentation\">or<\/span> <span dir=\"ltr\" role=\"presentation\">mismatched <\/span><span dir=\"ltr\" role=\"presentation\">summaries.<\/span> <span dir=\"ltr\" role=\"presentation\">Recently,<\/span> <span dir=\"ltr\" role=\"presentation\">deep<\/span> <span dir=\"ltr\" role=\"presentation\">learning<\/span> <span dir=\"ltr\" role=\"presentation\">techniques<\/span> <span dir=\"ltr\" role=\"presentation\">have<\/span> <span dir=\"ltr\" role=\"presentation\">been<\/span> <span dir=\"ltr\" role=\"presentation\">ex<\/span><span dir=\"ltr\" role=\"presentation\">ploited<\/span> <span dir=\"ltr\" role=\"presentation\">to<\/span> <span dir=\"ltr\" role=\"presentation\">automatically<\/span> <span dir=\"ltr\" role=\"presentation\">generate<\/span> <span dir=\"ltr\" role=\"presentation\">summaries<\/span> <span dir=\"ltr\" role=\"presentation\">for<\/span> <span dir=\"ltr\" role=\"presentation\">given<\/span> <span dir=\"ltr\" role=\"presentation\">code <\/span><span dir=\"ltr\" role=\"presentation\">snippets.<\/span> <span dir=\"ltr\" role=\"presentation\">To<\/span> <span dir=\"ltr\" role=\"presentation\">achieve<\/span> <span dir=\"ltr\" role=\"presentation\">a<\/span> <span dir=\"ltr\" role=\"presentation\">profound<\/span> <span dir=\"ltr\" role=\"presentation\">understanding<\/span> <span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">how<\/span> <span dir=\"ltr\" role=\"presentation\">far<\/span> <span dir=\"ltr\" role=\"presentation\">we <\/span><span dir=\"ltr\" role=\"presentation\">are<\/span> <span dir=\"ltr\" role=\"presentation\">from<\/span> <span dir=\"ltr\" role=\"presentation\">solving<\/span> <span dir=\"ltr\" role=\"presentation\">this <\/span><span dir=\"ltr\" role=\"presentation\">problem,<\/span> <span dir=\"ltr\" role=\"presentation\">in<\/span> <span dir=\"ltr\" role=\"presentation\">this<\/span> <span dir=\"ltr\" role=\"presentation\">paper,<\/span> <span dir=\"ltr\" role=\"presentation\">we<\/span> <span dir=\"ltr\" role=\"presentation\">conduct<\/span> <span dir=\"ltr\" role=\"presentation\">a <\/span><span dir=\"ltr\" role=\"presentation\">systematic<\/span> <span dir=\"ltr\" role=\"presentation\">and<\/span> <span dir=\"ltr\" role=\"presentation\">in-depth<\/span> <span dir=\"ltr\" role=\"presentation\">analysis<\/span> <span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">five<\/span> <span dir=\"ltr\" role=\"presentation\">state-of-the-art<\/span> <span dir=\"ltr\" role=\"presentation\">neural <\/span><span dir=\"ltr\" role=\"presentation\">source code summarization models on three widely used datasets. <\/span><span dir=\"ltr\" role=\"presentation\">Our evaluation results suggest that: (1) The BLEU metric, which <\/span><span dir=\"ltr\" role=\"presentation\">is<\/span> <span dir=\"ltr\" role=\"presentation\">widely<\/span> <span dir=\"ltr\" role=\"presentation\">used<\/span> <span dir=\"ltr\" role=\"presentation\">by<\/span> <span dir=\"ltr\" role=\"presentation\">existing<\/span> <span dir=\"ltr\" role=\"presentation\">work<\/span> <span dir=\"ltr\" role=\"presentation\">for<\/span> <span dir=\"ltr\" role=\"presentation\">evaluating<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">performance <\/span><span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">summarization<\/span> <span dir=\"ltr\" role=\"presentation\">models,<\/span> <span dir=\"ltr\" role=\"presentation\">has<\/span> <span dir=\"ltr\" role=\"presentation\">many<\/span> <span dir=\"ltr\" role=\"presentation\">variants.<\/span> <span dir=\"ltr\" role=\"presentation\">Ignoring<\/span> <span dir=\"ltr\" role=\"presentation\">the <\/span><span dir=\"ltr\" role=\"presentation\">differences<\/span> <span dir=\"ltr\" role=\"presentation\">among<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">BLEU<\/span> <span dir=\"ltr\" role=\"presentation\">variants<\/span> <span dir=\"ltr\" role=\"presentation\">could<\/span> <span dir=\"ltr\" role=\"presentation\">affect<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">validity <\/span><span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">claimed<\/span> <span dir=\"ltr\" role=\"presentation\">results.<\/span> <span dir=\"ltr\" role=\"presentation\">Furthermore,<\/span> <span dir=\"ltr\" role=\"presentation\">we<\/span> <span dir=\"ltr\" role=\"presentation\">discover<\/span> <span dir=\"ltr\" role=\"presentation\">an<\/span> <span dir=\"ltr\" role=\"presentation\">important, <\/span><span dir=\"ltr\" role=\"presentation\">previously unknown bug about BLEU calculation in a commonly-<\/span><span dir=\"ltr\" role=\"presentation\">used software package. (2) Code pre-processing choices can have <\/span><span dir=\"ltr\" role=\"presentation\">a<\/span> <span dir=\"ltr\" role=\"presentation\">large<\/span> <span dir=\"ltr\" role=\"presentation\">impact<\/span> <span dir=\"ltr\" role=\"presentation\">on<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">summarization<\/span> <span dir=\"ltr\" role=\"presentation\">performance,<\/span> <span dir=\"ltr\" role=\"presentation\">therefore <\/span><span dir=\"ltr\" role=\"presentation\">they<\/span> <span dir=\"ltr\" role=\"presentation\">should<\/span> <span dir=\"ltr\" role=\"presentation\">not<\/span> <span dir=\"ltr\" role=\"presentation\">be<\/span> <span dir=\"ltr\" role=\"presentation\">ignored.<\/span> <span dir=\"ltr\" role=\"presentation\">(3)<\/span> <span dir=\"ltr\" role=\"presentation\">Some<\/span> <span dir=\"ltr\" role=\"presentation\">important<\/span> <span dir=\"ltr\" role=\"presentation\">characteristics <\/span><span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">datasets<\/span> <span dir=\"ltr\" role=\"presentation\">(corpus<\/span> <span dir=\"ltr\" role=\"presentation\">size,<\/span> <span dir=\"ltr\" role=\"presentation\">data<\/span> <span dir=\"ltr\" role=\"presentation\">splitting<\/span> <span dir=\"ltr\" role=\"presentation\">method,<\/span> <span dir=\"ltr\" role=\"presentation\">and<\/span> <span dir=\"ltr\" role=\"presentation\">duplication <\/span><span dir=\"ltr\" role=\"presentation\">ratio)<\/span> <span dir=\"ltr\" role=\"presentation\">have<\/span> <span dir=\"ltr\" role=\"presentation\">a<\/span> <span dir=\"ltr\" role=\"presentation\">significant<\/span> <span dir=\"ltr\" role=\"presentation\">impact<\/span> <span dir=\"ltr\" role=\"presentation\">on<\/span> <span dir=\"ltr\" role=\"presentation\">model<\/span> <span dir=\"ltr\" role=\"presentation\">evaluation.<\/span> <span dir=\"ltr\" role=\"presentation\">Based<\/span> <span dir=\"ltr\" role=\"presentation\">on <\/span><span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">experimental<\/span> <span dir=\"ltr\" role=\"presentation\">results,<\/span> <span dir=\"ltr\" role=\"presentation\">we<\/span> <span dir=\"ltr\" role=\"presentation\">give<\/span> <span dir=\"ltr\" role=\"presentation\">some<\/span> <span dir=\"ltr\" role=\"presentation\">actionable<\/span> <span dir=\"ltr\" role=\"presentation\">guidelines<\/span> <span dir=\"ltr\" role=\"presentation\">on <\/span><span dir=\"ltr\" role=\"presentation\">more<\/span> <span dir=\"ltr\" role=\"presentation\">systematic<\/span> <span dir=\"ltr\" role=\"presentation\">ways<\/span> <span dir=\"ltr\" role=\"presentation\">for<\/span> <span dir=\"ltr\" role=\"presentation\">evaluating<\/span> <span dir=\"ltr\" role=\"presentation\">code<\/span> <span dir=\"ltr\" role=\"presentation\">summarization<\/span> <span dir=\"ltr\" role=\"presentation\">and <\/span><span dir=\"ltr\" role=\"presentation\">choosing<\/span> <span dir=\"ltr\" role=\"presentation\">the<\/span> <span dir=\"ltr\" role=\"presentation\">best<\/span> <span dir=\"ltr\" role=\"presentation\">method<\/span> <span dir=\"ltr\" role=\"presentation\">in<\/span> <span dir=\"ltr\" role=\"presentation\">different<\/span> <span dir=\"ltr\" role=\"presentation\">scenarios.<\/span> <span dir=\"ltr\" role=\"presentation\">We<\/span> <span dir=\"ltr\" role=\"presentation\">also<\/span> <span dir=\"ltr\" role=\"presentation\">suggest <\/span><span dir=\"ltr\" role=\"presentation\">possible<\/span> <span dir=\"ltr\" role=\"presentation\">future<\/span> <span dir=\"ltr\" role=\"presentation\">research<\/span> <span dir=\"ltr\" role=\"presentation\">directions.<\/span> <span dir=\"ltr\" role=\"presentation\">We<\/span> <span dir=\"ltr\" role=\"presentation\">believe<\/span> <span dir=\"ltr\" role=\"presentation\">that<\/span> <span dir=\"ltr\" role=\"presentation\">our<\/span> <span dir=\"ltr\" role=\"presentation\">results <\/span><span dir=\"ltr\" role=\"presentation\">can<\/span> <span dir=\"ltr\" role=\"presentation\">be<\/span> <span dir=\"ltr\" role=\"presentation\">of<\/span> <span dir=\"ltr\" role=\"presentation\">great<\/span> <span dir=\"ltr\" role=\"presentation\">help<\/span> <span dir=\"ltr\" role=\"presentation\">for<\/span> <span dir=\"ltr\" role=\"presentation\">practitioners<\/span> <span dir=\"ltr\" role=\"presentation\">and<\/span> <span dir=\"ltr\" role=\"presentation\">researchers<\/span> <span dir=\"ltr\" role=\"presentation\">in<\/span> <span dir=\"ltr\" role=\"presentation\">this <\/span><span dir=\"ltr\" role=\"presentation\">interesting<\/span> <span dir=\"ltr\" role=\"presentation\">area.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Source code summaries are important for the comprehension and maintenance of programs. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem, in this [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":[{"type":"text","value":"Ensheng Shi","user_id":0},{"type":"user_nicename","value":"Yanlin Wang","user_id":"39141"},{"type":"user_nicename","value":"Lun Du","user_id":"39144"},{"type":"text","value":"Junjie Chen","user_id":0},{"type":"user_nicename","value":"Shi Han","user_id":"33618"},{"type":"text","value":"Hongyu Zhang","user_id":0},{"type":"user_nicename","value":"Dongmei Zhang","user_id":"31665"},{"type":"text","value":"Hongbin Sun","user_id":0}],"msr_publishername":"","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"International Conference on Software Engineering (ICSE'22)","msr_doi":"","msr_arxiv_id":"","msr_mag_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_release_tracker_id":"","msr_highlight_type":"","msr_date_display_format":"","msr_main_download_label":"","msr_external_link_label":"","msr_doi_label":"","msr_published_date":"2022-02-11","msr_startdate":"","msr_presentation_date":"","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_year":2022,"msr_month":2,"msr_day":11,"msr_microsoftintellectualproperty":true,"msr_pub_id":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":false,"title":"https:\/\/arxiv.org\/abs\/2107.07112","label_id":252679,"label":0}],"msr_related_uploader":[],"msr_original_fields_of_study":[],"msr_s2_paper_id":"","msr_s2_pdf_url":"","msr_citation_count_updated":"","msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13556,13560],"msr-publication-type":[193716],"msr-publisher":[],"msr-publication-cta":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[],"msr-conference":[259390],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-804475","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-artificial-intelligence","msr-research-area-programming-languages-software-engineering","msr-locale-en_us"],"msr_publishername":"","msr_edition":"","msr_affiliation":"","msr_published_date":"2022-02-11","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"url","viewUrl":"false","id":"false","title":"https:\/\/arxiv.org\/abs\/2107.07112","label_id":"252679","label":0}],"msr_related_uploader":[],"msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[],"msr-author-ordering":[{"type":"text","value":"Ensheng Shi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Yanlin Wang","user_id":39141,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Yanlin Wang"},{"type":"user_nicename","value":"Lun Du","user_id":39144,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Lun Du"},{"type":"text","value":"Junjie Chen","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Shi Han","user_id":33618,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Shi Han"},{"type":"text","value":"Hongyu Zhang","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Dongmei Zhang","user_id":31665,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Dongmei Zhang"},{"type":"text","value":"Hongbin Sun","user_id":0,"rest_url":false}],"msr_impact_theme":[],"msr_research_lab":[199560],"msr_event":[838618],"msr_group":[],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475\/revisions"}],"predecessor-version":[{"id":804481,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/804475\/revisions\/804481"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=804475"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=804475"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=804475"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=804475"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=804475"},{"taxonomy":"msr-publication-cta","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-cta?post=804475"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=804475"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=804475"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=804475"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=804475"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=804475"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=804475"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=804475"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=804475"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}