Introduction
Transcription of handwritten text in old documents is an important,
time-consuming task for digital libraries. However, state-of-the-art
technologies for automatic handwritten text recognition are still far
from perfect, and thus post-editing automatically generated output is
not clearly better than simply ignoring it. A more effective approach
to transcribe old text documents is to follow an Interactive
Handwriting Recognition (IHR) paradigm in which both, the system is
guided by the human supervisor, and the supervisor is assisted by the
system to complete the transcription task as efficiently as possible.
Description
The aim of this competition is to promote the application of conventional, non-interactive handwriting recognition systems to IHR. To this end, we have chosen a transcription task of an old Spanish manuscript from 1545 known as RODRIGO, which comprises about 20000 text line images and their corresponding transcriptions. For the competition, RODRIGO will be splitted into an initial part of about 3000 lines for training (available to participants), and a final part with the remaining lines for testing (not available to participants).
System submission
Participants will be first asked to submit a conventional (unconstrained) systems to transcribe the test lines. Then, optionally, they will be asked to submit IHR (constrained) systems; that is, each test line image will be accompanied by a partially supervised transcription and the system will compute its most likely transcription for unsupervised parts.Evaluation
In both cases, systems will be compared both in terms of WER. Additionally, in the IHR evaluation, the number of supervised words needed to complete the transcription task will be measured.
Important dates
Release of training data: March 16th
Conventional and IHR system submission: May 23th
Contact & Registration
Nicolás Serrano nserrano@iti.upv.es
Adrià Giménez agimenez@iti.upv.es
Alfons Juan ajuan@iti.upv.es
References
- Nicolás Serrano, Alfons Juan. The RODRIGO database. Proceedings of the The seventh international conference on Language Resources, Evaluation (LREC 2010), 2010. May 19-21.