Evaluation Results


  • The results for English-Hindi*:
RankRunNDCG@1NDCG@5NDCG@10

1run-1-english-hindi-palkovskii0.3229 0.3259 0.3380
2run-2-english-hindi-deriupm 0.2100 0.2136 0.2613
3run-1-english-hindi-deriupm 0.1900 0.2110 0.2168
4run-1-english-hindi-iiith 0.1939 0.1994 0.2154
5run-3-english-hindi-deriupm 0.1500 0.1886 0.2030
6run-3-english-hindi-iiith 0.1837 0.1557 0.1722
7run-2-english-hindi-iiith 0.0204 0.0462 0.0512


* The ranking is based on the NDCG10 score.
  • The reuslts for English-Gujarati:
RankRunNDCG@1NDCG@5NDCG@10

1run-1-english-gujarati-palkovskii0.0541 0.0843 0.0955


  • Participating Teams
TeamMembersAffiliation

palkovskiiYurii PalkovskiiZhytomyr State University Ukraine
SkyLine LLC
deriupmNitish Aggarwal1, Kartik Asooja21DERI Galway and 2UPM Madrid
Paul Buitelaar1
iiithRambhoopal Reddy, Krish PerumalIIIT-H India

  • Relevance Judgements
English-Hindi:clinss12-en-hi.qrel
English-Gujarati:clinss12-en-gu.qrel

Evaluation Task


Let T be a set of target news stories. Let S be a set of potential source news stories. The task is to find and link news stories s in S which have the same news event as of corresponding news story t in T. Moreover, link each t to its corresponding s where t and s also share the same focal event.

Evaluation Corpus


The corpus contains a set of potential source news stories S, written in Hindi and Gujarati separately, and a set of target news stories T, written in English. In the corpus you will find plain text files encoded in UTF-8. The documents contain meta information including news title and publication date. In case of Gujarati the title is not available separately.


Test Collection


The test collection is composed on the same way than the training collection: a set of suspicious together with potential source documents.

  • Corpus Statistics
    • 11889 Source news stories in Gujarati
    • 50691 Source news stories in Hindi
    • 50 Target news stories in English
The corpora can be downloaded from the Corpus section.



Submission of Detection Results


Participants are allowed to submit up to three runs per language pair in order to experiment with different settings.

The results of your detection are required to be formatted as below.

<target-docid> Q0 <source-docid> <rank> <similarity>

where,

  • <target-docid> is the id of the corresponding English target document
  • Q0 is an unused parameter (use as it is)
  • <source-docid> is the id of the corresponding source document in the respective language
  • <rank> is the rank given to the source document for the given target document by your system (must be integer)
  • <similarity> is the similarity score given to the source document for the given target document by your system (must be double)


  • All the fields are just one <space> delimited.
  • Participants are required to submit rank-list up-to 100 source news stories for each target news story.
  • Name of the run file should be in the format of run-<1/2/3>-english-<hindi/gujarati>-<teamname>.txt


For example, a standard run file will look like,

english-document-00001.txt Q0 hindi-document-00345.txt 1 0.4644
english-document-00001.txt Q0 hindi-document-42325.txt 2 0.2823
.
.
english-document-00050.txt Q0 hindi-document-23443.txt 100 0.1123

Name of the file might be run-1-english-hindi-upv.txt



Performance Measures


The success of the cross-language news story discovery will be measured in terms of NDCG@1, NDCG@10 and NGCG@20. The relevance level of the source news stories for the given test queries will be in {2,1,0} where,

  • 2 = "same news event + same focal event"
  • 1 = "same news event + different focal event" and
  • 0 = "different news event"
For better understanding of "news event" and "focal event" see Task Description.

Links

Home
Task Description  
Corpus  
Evaluation  
Working Notes  
Program Committee  
Registration/Discussion
Run Submission  
Program  
Contact: clinss@dsic.upv.es

Current Events

PAN @ CLEF

Previous Events

PAN @ FIRE'11