Two Approaches for Text Segmentation in Web Images
D. Karatzas, A. Antonacopoulos
Proceedings of the 7th International Conference on Document Analysis and Recognition (ICDAR2003), Edinburgh, UK, August 2003, pp. 131-136
There is a significant need to recognise the text in images on web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background).