Journal article

Efficient plagiarism detection for large code repositories

Steven Burrows, SMM Tahaghoghi, Justin Zobel



Unauthorized re-use of code by students is a widespread problem in academic institutions, and raises liability issues for industry. Manual plagiarism detection is time-consuming, and current effective plagiarism detection approaches cannot be easily scaled to very large code repositories. While there are practical text-based plagiarism detection systems capable of working with large collections, this is not the case for code-based plagiarism detection. In this paper, we propose techniques for detecting plagiarism in program code using text similarity measures and local alignment Through detailed empirical evaluation on small and large collections of programs, we show that our approach is hig..

View full abstract

University of Melbourne Researchers