Summary of Plagiarism Detection-Related Publications

This is a review of publications related to plagiarism and its automatic detection. For comments, contributions and corrections, write a message to Alberto Barrón-Cedeño

Surveys

Reference Topics Year Link
Paul Clough. Plagiarism in natural and programming languages: an overview of current tools and technologies. Research Memoranda: CS-00-05, Department of Computer Science, University of Sheffield, UK
  • Plagiarism basics
  • Description of a relevant features on plagiarism analysis
  • Description of plagiarism detection tools
2000
Paul Clough. Old and new challenges in automatic plagiarism detection National UK Plagiarism Advisory Service
2003
Paul Clough, Robert Gaizauskas. Corpora and Text Re-use. Lüdeling, Kytö and McEnery (eds) Handbook of Corpus Linguistics. (Series: Handbooks of Linguistics and Communication Science), 1249–1271, Mouton de Gruyter.
2009
Thomas Lancaster, Fintan Culwin. Classifications of Plagiarism Detection Engines. ITALICS 4 (2)
2005
Hermann Maurer, Frank Kappe, Bilal Zaka. Plagiarism - A Survey. Journal of Universal Computer Science 12(8), pp. 1050-1084
  • Basics on plagiarism analysis
  • Definition of plagiarism
  • Discussion on the reaction of different institutions
  • Description of commercial tools
2006
SIGIR 2007 Workshop. Plagiarism Analysis, Authorship Identification and Near-Duplicate Detection.
  • Proceedings of the PAN 2007 Workshop
2007

Plagiarism Concepts

Reference Topics Year Link
Brian Martin. Plagiarism: policy against cheating or policy for learning? Faculty of Arts - Papers
2004
Chris Park. In Other (People’s) Words: plagiarism by university students—literature and lessons. Assessment and Evaluation in Higher Education 28 (5), pp. 471-488
2003
Justin Zobel. "Uni Cheats Racket": A Case Study in Plagiarism Investigation. Lister, Young (editors) Conferences in Research and Practice in Information Technology (30)
2004

Cross-Language Plagiarism Analysis

Reference Topics Year Link
Alberto Barrón-Cedeño, Paolo Rosso, David Pinto, Alfons Juan. On cross-lingual plagiarism analysis using a statistical model. Proceedings of the ECAI'08 PAN Workshop: Uncovering Plagiarism, Authorship and Social Software Misuse, pp. 9-13. Patras, Greece
  • Alignment models exploitation
  • Text similarity calculation based on bilingual dictionaries
2008
Zdenek Ceska, Michal Toman, Karel Jezek. Multilingual Plagiarism Detection. Artificial Intelligence: Methodology, Systems, and Applications, LNCS (5253), pp. 83-92
2008
Chung-Hong Lee, Chih-Hong Wu, Hsin-Chang Yang. A Platform Framework for Cross-lingual Text Relatedness Evaluation and Plagiarism Detection. The 3rd Intetnational Conference on Innovative Computing Information and Control (ICICIC'08)
2008
David Pinto, Jorge Civera, Alberto Barrón-Cedeño, Alfons Juan, Paolo Rosso. A statistical approach to crosslingual natural language tasks. J. Algorithms doi:10.1016/j.jalgor.2009.02.005
  • Solution to different tasks based on machine translation techniques
  • Cross-language text similarity analysis based on statistical bilingual dictionaries
2009
Martin Potthast, Benno Stein, Maik Anderka. A Wikipedia-Based Multilingual Retrieval Model. Macdonald, Ounis, Plachouras, Ruthven and White (editors) 30th European Conference on IR Research, ECIR 2008, LNCS (4656), pp. 522-530 Glasgow
  • Exploitation of comparable corpora (Wikipedia)
  • CL analysis based on CL-Explicit Semantic Analysis
2008
Bruno Pouliquen, Ralf Steinberger, Camelia Ignat. Automatic Identification of Document Translations in Large Multilingual Document Collections. Angelova, Bontcheva, Mitkov, Nicolov, Nikolov (editors) Proceedings of the International Conference Recent Advances in Natural Language Processing (RANLP’03), pp. 401-408.
  • Cross-language retrieval of similar documents based on the EUROVOC thesaurus
2003

Extrinsic (Monolingual) Plagiarism Analysis

Reference Topics Year Link
JunPeng Bao, Caroline Lyon, Peter C. R. Lane, Wei Ji, James A. Malcolm. Comparing Different Text Similarity Methods. University of Hertfordshire
2007
Alberto Barrón-Cedeño, Paolo Rosso. On Automatic Plagiarism Detection based on n-grams Comparison. Boughanem et al. (Eds.) ECIR 2009, LNCS 5478, pp. 696-700, Springer-Verlag Berlin Heidelberg
  • Evaluation of n-gram based plagarism detection with the METER corpus
2009
Alberto Barrón-Cedeño, Paolo Rosso. Towards the Exploitation of Statistical Language Models for Plagiarism Detection with Reference. Stein, Koppel, and Stamatatos (editors) ECAI Workshop Uncovering on Plagiarism and Social Software Misuse (PAN 08), pp. 15-19
  • First essays in the exploitation of statistical language models and entropy for plagiarism detection
2008
Alberto Barrón-Cedeño, Paolo Rosso, José Miguel Benedí. Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance. Gelbukh A. (ed.) CICLing 2009, LNCS 5449, pp. 523-534 Springer-Verlag
  • Pre-selection of potential source documents given a suspicious text
  • Search space reduction based on the Kullback-Leibler distance
2009
Yaniv Bernstein, Justin Zobel. A Scalable System for Identifying Co-Derivative Documents. String Processing and Information Retrieval, LNCS (3246), pp. 55-67.
2004
Sergey Brin, James Davis, Hector Garcia-Molina. Copy Detection Mechanisms for Digital Documents Proceedings of the ACM SIGMOD Annual Conference, pp. 398-409
  • Description of the COPS system
1995
Andrei Z. Broder. On the resemblance and containment of documents. Compression and Complexity of Sequences (SEQUENCES’97).
1997
Zdenek Ceska. Plagiarism Detection based on Singular Value Decomposition. Advances in Natural Language Processing, LNCS (5221), pp. 108-119, Springer Berlin / Heidelberg.
2008
Abdur Chowdhury, Ophir Frieder, David Grossman, Mary Catherine McCabe. Collection Statistics for Fast Duplicate Document Detection. ACM Transactions on Information Systems 20(2), pp. 171–191.
2002
Paul Clough, Robert Gaizauskas, Scott Piao. Building and annotating a corpus for the study of journalistic text reuse. Proceedings of the 3rd International Conference on Language Resources and Evaluation (V) (LREC-02), Spain, pp. 1678-1691
  • Generation of a journalistic corpus for text reuse analysis
2002
Paul Clough, Robert Gaizauskas, Scott Piao, Yorick Wilks. METER: Measuring Text Reuse. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 152-159
2002
Timothy C. Hoad, Justin Zobel. Methods for Identifying Versioned and Plagiarised Documents. Journal of the AMerican Society for Information Science and Technology 54, pp. 203-215
2003
Parvati Iyer, Abhipsita Singh. Document Similarity Analysis for a Plagiarism Detection System. 2nd Indian Int. Conf. on Artificial Intelligence (IICAI-2005), pp. 2534-2544
2005
NamOh Kang, Alexander Gelbukh, SangYong Han. PPChecker: Plagiarism Pattern Checker in Document Copy Detection. Proc. TSD-2006: Text, Speech and Dialogue, LNAI (4188), pp. 661-667
  • Vocabulary expansion based on Wordnet
  • Distinction between plagiarism levels: copy, rewording, word insertion, word deletion
2006
Aleksander Kolcz, Abdur Chowdhury, Joshua Alspector. Improved Robustness of Signature-Based Near-Replica Detection via Lexicon Randomization. KDD-2004 (The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining), pp. 605-610, Seattle, WA
2004
Thomas Lancaster, Fintan Culwin. A visual argument for plagiarism detection using word pairs. Proceedings of the Plagiarism Conference
2004
Caroline Lyon, Ruth Barrett, James Malcolm. A theoretical basis to the automated detection of copying between texts, and its practical implementation in the Ferret plagiarism and collusion detector. Plagiarism: Prevention, Practice and Policies Conference
  • Description of the Ferret system
  • Method based on the exhaustive comparison of word n-grams
2004
Caroline Lyon, James Malcolm, Bob Dickerson. Detecting short passages of similar text in large document collections. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 118-125
  • Description of a method based on n-grams comparison
2001
Donald Metzler, Yaniv Bernstein, W. Bruce Croft, Alistair Moffat, Justin Zobel. Similarity Measures for Tracking Information Flow. Proc. CIKM’05
2005
Martin Potthast, Benno Stein. New Issues in Near-Duplicate Detection. Preisach, Burkhardt, Schmidt-Thieme, Decker (editors) Data Analysis, Machine Learning and Applications, pp. 601-609
2008
Narayanan Shivakumar, Hector Garcia-Molina. Building a scalable and accurate copy detection mechanism. Proceedings of 1st ACM Conference on Digital Libraries (DL'96), pp. 160 - 168
  • Detection of copied sentences via hash tables
1996
Mark Sanderson. Duplicate detection in the Reuters collection. Technical Report. Department of Computing Science, University of Glasgow
1997
Antonio Si, Hong Va Leong, Rynson W. H. Lau. CHECK: a document plagiarism detection system. Proc. of the 1997 ACM Symposium on Applied Computing, San Jose, CA, pp. 70-77
1997
Daniel R. White, Mike S. Joy. Sentence-Based Natural Language Plagiarism Detection. ACM Journal on Educational Resources in Computing 4(4)
2004

Intrinsic Plagiarism Analysis and Authorship Attribution

Reference Topics Year Link
Rosa Maria Coyotl-Morales, Luis Villaseñor-Pineda, Manuel Montes-y-Gómez, Paolo Rosso. Authorship attribution using Word Sequences. Proc. of the 11th Iberoamerican Congress on Pattern Recognition, (CIARP 2006), LNCS (4225), pp. 844-853.
  • Method based on Maximal Word Sequences comparison
2006
Ol'ga Feiguina, Graeme Hirst. Authorship attribution for small texts: Literary and forensic experiments. Stein, Koppel, and Stamatatos (editors) SIGIR Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN 07) 2007
Marco Kimler. Using Style Markers for Detecting Plagiarism in Natural Language Documents. Thesis
2003
Kim Luyckx,Walter Daelemans. Personae: a corpus for author and personality prediction from text. Proceedings of LREC-2008, Sixth International Language Resources and Evaluation Conference.
2008
Sven Meyer zu Eissen, Benno Stein. Intrinsic Plagiarism Detection. Lalmas et. al. (Eds.): Advances in Information Retrieval Proc. of the 28th European Conf. on IR research, ECIR 2006, pp. 565-569. London
  • Stylometric analysis of text
  • Statistical text features
  • Intrinsic plagiarism analysis
2006
Sven Meyer zu Eissen, Benno Stein, Marion Kulig. Plagiarism Detection without Reference Collections In: Reinhold Decker and Hans J. Lenz (editors) Advances in Data Analysis, pp. 359-366
  • Classification of plagiarism analysis methods
  • Nice introduction to intrinsic plagiarism analysis
  • Description of corpus building process
2006
Efstathios Stamatatos, Nikos Fakotakis, George Kokkinakis. Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics 26, pp. 471-495
  • Large overview on authorship analysis
2000
Efstathios Stamatatos, Nikos Fakotakis, George Kokkinakis. Computer-Based Authorship AttributionWithout Lexical Measures Computers and the Humanities 35(2)
2001
Benno Stein, Nedim Lipka, Sven Meyer zu Eissen. Meta Analysis within Authorship Verification. 19th International Conference on Database and Expert Systems Application, DEXA 2008, pp. 34-39
2008
Benno Stein, Sven Meyer zu Eissen. Intrinsic Plagiarism Analysis with Meta Learning. Stein, Koppel, and Stamatatos (editors) SIGIR Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection (PAN 07), pp. 45-50
2007
Yuta Tsuboi, Yuji Matsumoto. Authorship identification for heterogeneous documents. IPSJ SIG Notes
  • Authorship attribution of mails and web pages in japanesse
  • Exploitation of sequential word patterns for SVM-based classification
2002

Source code plagiarism detection

Reference Topics Year Link
Christian Arwin, S.M.M. Tahaghoghi. Plagiarism Detection across Programming Languages Twenty-Ninth Australasian Computer Science Conference (ACSC2006)
  • "Cross-Language" source code plagiarism analysis
2006
Francisco Rosales, Antonio García, Saltiago Rodríguez, José L. Pedraza, Rafael Méndez, Manuel M. Nieto Detection of Plagiarism in Programming Assignments IEEE Transaction on Education 51(2), pp. 174-183
  • Description of pk2, a tool developed at the UPM, that detects cases of plagiarism in high and low level programming
2008
Sam Grier. A Tool that Detects Plagiarism in Pascal Programs Proceedings of the twelfth SIGCSE technical symposium on Computer science education, pp. 15-20
1981
Mike Joy, Michael Luck. Plagiarism in Programming Assignments. IEEE TRANSACTIONS ON EDUCATION 42(2), pp. 129-133.
1999
Thomas Lancaster, Mark Tetlow. Does automated anti-plagiarism have to be complex? Evaluating more appropriate software metrics for finding collusion. Proceedings of the 2005 Ascilite Conference.
2005
Karl J. Ottenstein. An Algorithm Approach to the Detection and Prevention of Plagiarism. ACM SIGCSE Bulleton 8(4), pp. 30-41.
1976
Alan Parker. James O. Hamblen. Computer Algorithms for Plagiarism Detection. IEEE Transactions on Education 32, pp. 94-99
  • Survey of algorithms for the detection of source code plagiarism
1989
Kristina L. Verco, Michael J. Wise. Software for Detecting Suspected Plagiarism: Comparing Structure and Attribute-Counting Systems. First Australian Conference on Computer Science Education, Sydney, Australia
1996