Can neural clone detection generalize to unseen functionalities?

Chenyao Liu; Jian-Guang Lou; Zeqi Lin; Lijie Wen; Dongmei Zhang

Can neural clone detection generalize to unseen functionalities?

Chenyao Liu ,
Jian-Guang Lou ,
Zeqi Lin ,
Lijie Wen ,
Dongmei Zhang

2021 Automated Software Engineering | November 2021

Organized by ACM

Download BibTex

Many recently proposed code clone detectors exploit neural networks to capture latent semantics of source code, thus achieving impressive results for detecting semantic clones. These neural clone detectors rely on the availability of large amounts of labeled training data. We identify a key oversight in the current evaluation methodology for neural clone detection: cross-functionality generalization (i.e., detecting semantic clones of which the functionalities are unseen in training). Specifically, we focus on this question: do nerual clone detectors truly learn the ability to detect semantic clones, or they just learn how to model specific functionalities in training data while cannot generalize to realistic unseen functionalities? This paper investigates how the generalizability can be evaluated and improved.

Our contributions are 3-folds: (1) We propose an evaluation methodology that can systematically measure the cross-functionality generalizability of neural clone detection. Based on this evaluation methodology, an empirical study is conducted and the results indicate that current neural clone detectors cannot generalize well as expected. (2) We conduct empirical analysis to understand key factors that can impact the generalizability. We investigate 3 factors: training data diversity, vocabulary, and locality. Results show that the performance loss on unseen functionalities can be reduced through addressing the out-of-vocabulary problem and increasing training data diversity. (3) We propose a human-in-the-loop mechanism that help adapt neural clone detectors to new code repositories containing lots of unseen functionalities. It improves annotation efficiency with the combination of transfer learning and active learning. Experimental results show that it reduces the amount of annotations by about 88%.