Performance Analysis of Document Image Analysis Systems
Research in Document Image Analysis is becoming increasingly more widespread. This fact has so far resulted in the development of a number of alternative methods for solving the various problems posed in the analysis of document images. Different approaches have been devised to suit different applications and, in most cases, each has been tested with very specific data.
The objective assessment of the performance of document analysis subsystems is relatively in its infancy (with OCR methods performance being the only exception). Currently, test methods and ground truth data are not widely available to suit the diverse needs of each individual component (subsystem) of a Document Analysis System. Moreover, the majority of existing methods and data are constrained by various assumptions on the nature of the image data, such as certain limitations to the freedom of the layout of a page.
Currently, research is being carried out to identify suitable methods and corresponding ground-truth data organisation to cater for documents with complex layouts. Particular attention is paid to the evaluation of methods applied before OCR.
Parts of the on-going work have been presented at International Conferences. For both an overview of the framework and a more detailed account of the analysis of Page Segmentation approaches see the Publications section.