This dataset has been created primarily for the evaluation of layout analysis (physical and logical) methods. It contains realistic documents with a wide variety of layouts, reflecting the various challenges in layout analysis. Particular emphasis is placed on magazines and technical/scientific publications which are likely to be the focus of digitisation efforts.
Each image in the dataset has associated comprehensive and detailed ground truth enabling in-depth evaluation.
In addition to the information provided, the dataset is presented through this interactive interface. This interface and the flexible structure of the database behind it, allow easy browsing, searching and selection of subsets (e.g. for evaluation on specific layout conditions).
A Realistic Dataset for Performance Evaluation of Document Layout Analysis
Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR2009), Barcelona, Spain, July 2009, pp. 296-300