The information has become easily accessible with the advent of the Web. Blogs, forums, repositories, etc. have made source code widely available to be read, to be copied and to be modified. Programmers are tempted to re-use debugged and tested source codes that can easily found on the Web. The vast amount of resources on the Web makes unfeasible the manual analysis of suspicious source code re-used. There is a need of development automatic systems for detecting source code re-use phenomenon.

Software companies have a special interest in preserving their own intellectual property. In a survey of 3,970 developers, more than 75 percent of respondents admitted that have re-use blocks of source code from elsewhere. Academic and programming environments have become a potential scenario for research in source code re-use because it is a frequent practice between students. In this context, students are tempted to re-use source code because they have to solve the same problem. It is a difficult scenario for detecting source code re-use because all the source codes will contain some degree of similarity.

The second edition of the SOCO task focuses on cross-language source code re-use detection. Participants will be provided with cross-language training and test sets of source code files. The task is about retrieving the source code pairs that have been re-used across programming languages.

Task Coordinators

Enrique Flores, Paolo Rosso, Lidia Moreno
Universitat Politècnica de València, Spain

Esaú Villatoro-Tello
Universidad Autónoma Metropolitana (UAM), Mexico