Layout Analysis is of fundamental importance among Document Image Analysis steps and has been (and continues to be) relatively well researched. The motivation of the competition is to evaluate existing approaches using a realistic dataset and an objective performance analysis system.
The Historical Document Layout Analysis Competition follows the successful running of all previous ICDAR Page Segmentation competitions (2001, 2003, 2005, 2007 and 2009). The current competition focusses on historical documents, making use of a new dataset created by the IMPACT project. Historical documents are of particular interest as they pose a number of challenges and, at the same time, represent a very large proportion of printed documents in existence. With the increasing number of digitisation projects initiated by libraries world-wide, the problem of layout analysis of these documents is very topical.
The dataset to be used in this competition is a subset of the IMPACT dataset, representing key holdings of major European libraries. It is realistic in the sense that it represents a wide variety of layouts that reflect historical documents that are likely to be of broad interest to be digitised. It contains images and ground truth of a variety of layouts, mainly pages from books and (to a lesser extent) newspapers. While the majority of regions on each page are textual, there are graphic regions also present. Textual regions with fonts of varying sizes may also be present on each page.
The competition will use the evaluation approach successfully employed in the ICDAR2009 Page Segmentation Competition. It takes into account a wide range of situations and provides considerable details on performance of layout analysis methods. The system performs a geometric comparison between regions detected by a segmentation method and ground-truth regions in order to identify erroneous mergers between regions, or split, missed, partially missed or misclassified regions. Each type of error is weighted according to the type of regions involved and the situation they are found.
The creation of the dataset used for this competition has been supported in part by:
EU 7th Framework Programme grant IMPACT (Ref: 215064)