Flexible Text Recovery from Degraded Typewritten Historical Documents
A. Antonacopoulos, C. Casado Castilla
Proceedings of the 18th International Conference on Pattern Recognition (ICPR2006), Hong Kong, August 20-24, 2006, IEEE-CS Press, pp. 1062-1065
The conversion of large collections of historical typewritten documents into digital libraries and archives is met with significant challenges that standard recognition techniques cannot address. The condition and individual nature of characters in these degraded documents necessitate a departure from existing thresholding approaches. This paper presents a flexible approach designed to overcome the difficulties presented by such documents by flexibly analysing each individual character and cautiously repairing it. The main sources of OCR errors are successfully addressed and reliable corrective actions are taken.