Latest news:

Jan, 2010:
Webpage updated.


ICFHR 2010 Contest

Quantitative evaluation of binarization algorithms of images of historical documents with bleeding noise


The evaluation and comparison of binarization algorithms proved to be difficult task since there is no objective way to compare the results. Several review papers have tried to compare binarization algorithms by using the precision and recall analysis of the resultant words in the foreground or by evaluating their effect on end-to-end character or word recognition performance in a complete archive document recognition system utilizing OCR. Every work, that performed comparison, presented some very interesting conclusions. However, the problem is that in all cases, they try to use results from ensuing tasks in document processing hierarchy, in order to survey the algorithm performance. Although in many case this is the objective goal, it is not always possible. Such is the case of historical documents, where their quality, in many cases obstructs the recognition, and sometimes even the word segmentation, this way of evaluation can be proved problematic.

On the other hand, we need different evaluation technique, since the processing of historical documents is one of the hardest cases and binarization can be required for removing the noise and facilitate their appropriate presentation. The ideal way of evaluation should be able to decide, for each pixel, if it finally has succeeded the right color (black or white) after the binarization. This is an easy task for a human observer but very difficult to do it automatically for all the pixels of several images.

The proposed method includes the experimentation on document archives made by constructing noisy images, using techniques of image mosaicing, and combining old blank historical document pages with noise-free pdf documents. After the application of the binarization algorithms to the synthetic images, it is easy to evaluate the results by comparing the resulted image with the original document as ground truth image. This way, we are able to count the exact amount of the remaining wrong pixels either on background or on foreground.

One example of degraded and groundtruth image:


As we already mentioned, our intention is to be able to check for every pixel if it is right or wrong. Thus, pixel error will be used, that is the total amount of pixels of the image that in the output image have wrong color. The number of black pixels classified correctly as black pixel is denoted by "b" and the number of white pixels classified correctly as white is denoted as "w". The total number of black and white pixels is B and W respectively. The document binarization can be considered an unbalanced problem where the number of black pixels is too much low than the number of white pixels. In this case a better measure could be the geometric-mean. The geometric-mean pixel acuracy is: sqrt((b/B)*(w/W))


The evaluation of the binarization methods will be made on synthetic images. That is, starting from a clean document image (doc), which is considered as the ground truth image, noise of different types is added (noisy images). This way, during the evaluation, it is able to decide, objectively, for every single pixel if its value is correct comparing it with the corresponding pixel in the original image.

For training purposes 60 document images obtained by using image mosaicing techniques are available. The doc set consists of three document images in pdf format, including tables, graphics, columns and many of the elements that can be found in a document. The noisy set consists of ten old blank images, taken from a digitized document archive of the 18th century. These include most kinds of problems that can be met in old documents: presence of stains and strains, background of big variations and uneven illumination, ink seepage etc. The docs are used as target images and all the noisy images are resized to A4 size. Then, two different techniques for the blending are used: the maximum intensity and the image averaging approaches. For the test set different groundtruth documents will be considered and a different number of images will be generated.