This edition of CL!TR focuses on journalistic text reuse. News agencies are a prolific source of text on the web and a valuable source of text in multiple languages. News stories are generated independently and consequently there is a need to link news stories covering the same events written in different languages. Linking these stories vastly enhances the user’s experience. In multilingual environment (such as India), a reader might want to refer to the local language version of a news story.
News stories covering the same event published in different languages may be rich sources of parallel and comparable text. Some fragments in these stories are parallel, for example, personal quotes and translated versions of the same content. Identification of such highly similar news stories solves dual purposes, enhancement of reader experience and generation of valuable multilingual resource.
This year we will offer a task based around the identification of highly similar journalistic articles and news stories in a cross-language setting. The task will involve identifying and linking highly similar news stories covering the same event published in different languages.
Parth Gupta, Paolo Rosso
NLE Lab @ Universitat Politècnica de València, Spain
LSI @ Universitat Politécnica de Catalunya
Sobha Lalitha Devi
CLR Group @ AU-KBC Research Centre, Chennai, India