The information has become easily accessible with the advent of the Web. Blogs, forums, repositories, etc. have made source code widely available to be read, to be copied and to be modified. Programmers are tempted to re-use debugged and tested source codes that can easily found on the Web. The vast amount of resources on the Web makes unfeasible the manual analysis of suspicious source code re-used. There is a need of development automatic systems for detecting source code re-use phenomenon.

Software companies have a special interest in preserving their own intellectual property. In a survey of 3,970 developers, more than 75 percent of respondents admitted that have re-use blocks of source code from elsewhere. Academic and programming environments have become a potential scenario for research in source code re-use because it is a frequent practice between students. In this context, students are tempted to re-use source code because they have to solve the same problem. It is a difficult scenario for detecting source code re-use because all the source codes will contain some degree of similarity.

SOCO is a new task focused on source code re-use. We will offer a task based around the detection of source codes that have been re-used in a monolingual context such an academic environment. The task will involve identifying and distinguishing the most similar source code pairs among a large source code collection.

Task Coordinators

Enrique Flores, Paolo Rosso, Lidia Moreno
Universitat Politècnica de València, Spain

Esaú Villatoro-Tello
Universidad Autónoma Metropolitana (UAM), Mexico