Close

Cookies warning

This web site uses cookies to improve your experience. By viewing our content, you are accepting the use of cookies.

Cookies are small text documents stored on your computer; the cookies set by this website can only be used on this website and pose no security risk.

Please do not proceed if you do not want these cookies being set. [Show details]

University of Salford
PRImA - Pattern Recognition & Image Analysis Group

Further Details

The PAGE (Page Analysis and Ground-Truth Elements) Format Framework

S. Pletschacher, A. Antonacopoulos

Proceedings of the 20th International Conference on Pattern Recognition (ICPR2010), Istanbul, Turkey, August 23-26, 2010, IEEE‐CS Press, pp. 257-260.

Abstract

There is a plethora of established and proposed document representation formats but none that can adequately support individual stages within an entire sequence of document image analysis methods (from document image enhancement to layout analysis to OCR) and their evaluation. This paper describes PAGE, a new XML-based page image representation framework that records information on image characteristics (image borders, geometric distortions and corresponding corrections, binarisation etc.) in addition to layout structure and page content. The suitability of the framework to the evaluation of entire workflows as well as individual stages has been extensively validated by using it in high-profile applications such as in public contemporary and historical ground-truthed datasets and in the ICDAR Page Segmentation competition series.

Full Paper

Download Download

back


Valid XHTML 1.0! Valid CSS! Total number of visitors since 20 November 2003:
Best viewed in 1024x768 - Maintained by: Christos Papadopoulos (e-mail) - © 2004-05