| Named Entities Recognition (NER) Task |
zip |
| ANERCorp: Is a Corpus of more than 150,000 words annotated for the NER task. |
 |
| ANERGazet: Is a collection of 3 Gazetteers, (i) Locations: a Gazetteer containing names of continents, countries, cities, etc.; (ii) People: a Gazetteer containing names of people recollected manually from different Arabic websites; and finally (iii) Organizations: containing names of Organizations like companies, football teams, etc. |
 |
| Test-Bed for Passage Retrieval (PR) and Question Answering (QA) tasks |
zip |
| Documents: more than 11,000 Arabic Wikipedia Articles in SGML format (the format adopted in the CLEF and also the one accepted by the JIRS system). |
 |
| List of Questions: This is a list of 200 questions of different types. The proportion of each type of questions is the same proportion adopted in CLEF. |
 |
| List of Correct Answers: For each of the questions presented in my list of questions, I give you here a list of correct answers for each question. This list is very important for automatic evaluation. |
 |
| Doc |
- |
| Arabic language rules (in Arabic): Somebody has mailed me this pps file which summarizes all the Arabic rules, unfortunately there is no English version of the file. I would have translated it myself because it's really worth it but the file contains 812 slides!!. |
 |