Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

Representation of Digitized Documents Using Document Specific Alphabets and Fonts

S. Pletschacher

Proceedings of the 5th IS&T Archiving Conference, Bern, Switzerland, June 2008, pp. 198-202

Abstract

Today's digitization efforts lead to huge collections of scanned documents. However, the means for automatic preparation and further processing especially of ancient documents are still limited. In this paper, progress and implementation details of a framework for handling machine printed documents without traditional OCR-methods are shown. The approach is based on deriving any information needed for encoding directly from the original itself. This is achieved by extracting document specific alphabets and corresponding fonts. In particular, it is reported on how preprocessing, text segmentation, alphabet extraction, font generation, document encoding, as well as the repository work and interact. Moreover, the creation of ground truth data for evaluation and possible application scenarios for the system are shown.

Citation

S. Pletschacher , "Representation of Digitized Documents Using Document Specific Alphabets and Fonts", Proceedings of the 5th IS&T Archiving Conference, Bern, Switzerland, June 2008, pp. 198-202

Full Paper

Download PDF

PRImA

Representation of Digitized Documents Using Document Specific Alphabets and Fonts

Abstract

Citation

Full Paper