Proceedings of the 2011 Workshop on Historical Document Imaging and Processing (HIP2011), Beijing, China, September 2011, pp. 106-111
Historical document images frequently show evidence of geometric distortions mostly due to storage conditions (arbitrary warping) but also due to the original printing process (nonstraight text lines), the use of the document (folds) and scanning method (page curl). Correcting such distortions improves both recognition rate and visual appearance (e.g. for easier human reading or on-demand printing). However, the nature of the documents with layout irregularities and broken/touching characters of archaic fonts poses significant challenges. In addition, for largescale digitisation of books and newspapers, methods need to be robust, efficient, reversible and must be able to be applied unsupervised on (possibly multi-columned) documents that may or may not be warped (no distortion should be introduced on unwarped images). No such method exists in the literature. In this paper, an effective grid-based method is presented to geometrically model and correct arbitrarily warped historical documents with relatively complex layout (multi column with graphics). A global grid with sub-grids for differing parts of a page is constructed by accurately determining text baselines. The warped image is corrected by transforming each quadrilateral subgrid of the global grid into its intended rectangular form. Preliminary experimental results show that this method efficiently corrects arbitrarily warped historical documents, with an improved performance over a leading geometric correction method and the industry standard commercial system.
P. Yang, A. Antonacopoulos, C. Clausner, S. Pletschacher , "Grid-Based Modelling and Correction of Arbitrarily Warped Historical Document Images for Large-Scale Digitisation", Proceedings of the 2011 Workshop on Historical Document Imaging and Processing (HIP2011), Beijing, China, September 2011, pp. 106-111