Not registered? - Request an account here

Efficient OCR Training Data Generation with Aletheia

C. Clausner, S. Pletschacher, A. Antonacopoulos

Short Paper Booklet of the 11th International Association for Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2014), Tours, France, April 2014, pp. 19-20

Abstract

We present how the ground-truthing tool Aletheia can be used to efficiently create training data for an open-source text recognition engine. The labelling process is sped up considerably through a top-down approach. Text content is thereby entered on region level. The characters are then propagated automatically to glyph objects. In addition, segmentation is simplified by several semi-automated tools.

Citation

C. Clausner, S. Pletschacher, A. Antonacopoulos , "Efficient OCR Training Data Generation with Aletheia", Short Paper Booklet of the 11th International Association for Pattern Recognition (IAPR) Workshop on Document Analysis Systems (DAS2014), Tours, France, April 2014, pp. 19-20

Full Paper

Download PDF

Related Projects

Europeana Newspapers