Introduction


This edition of CL!TR focuses on journalistic text reuse. News agencies are a prolific source of text on the web and a valuable source of text in multiple languages. News stories are generated independently and consequently there is a need to link news stories covering the same events written in different languages. Linking these stories vastly enhances the user’s experience. In multilingual environment (such as India), a reader might want to refer to the local language version of a news story.


News stories covering the same event published in different languages may be rich sources of parallel and comparable text. Some fragments in these stories are parallel, for example, personal quotes and translated versions of the same content. Identification of such highly similar news stories solves dual purposes, enhancement of reader experience and generation of valuable multilingual resource.


This year we will offer a task based around the identification of highly similar journalistic articles and news stories in a cross-language setting. The task will involve identifying and linking highly similar news stories covering the same event published in different languages.



Task Coordinators


Parth Gupta, Paolo Rosso
NLE Lab @ Universitat Politècnica de València, Spain


Alberto Barrón-Cedeño
LSI @ Universitat Politécnica de Catalunya


Paul Clough, Mark Stevenson
IR & NLP Groups @ University of Sheffield, UK


Sobha Lalitha Devi
CLR Group @ AU-KBC Research Centre, Chennai, India







Links

Home
Task Description  
Corpus  
Evaluation  
Working Notes  
Program Committee  
Registration/Discussion
Run Submission  
Program  
Contact: clinss@dsic.upv.es

Current Events

PAN @ CLEF

Previous Events

PAN @ FIRE'11