Cookie Name	Cookie Description	When not logged in	When logged in
prima_cookies	Remembers whether you have already closed this message.	Yes	Yes
prima_notice	Remembers if you have alreaded viewed any notice/warning message(s). Such a message is used to inform users of potential downtime or issues that might affect the normal operation of the website. It is set to expire after the date when such notice is obsolete (eg after an expected downtime/error is fixed).	Yes	Yes
PHPSESSID	The ID of your session.	Yes	Yes
__utma	This is set by Google Analytics. It stores each user's amount of visits, and the time of the first visit, the previous visit, and the current visit.	Yes	Yes
__utmb, __utmc	These are set by Google Analytics. They are used to check approximately how long you stay on a site (when a visit starts, and approximately ends).	Yes	Yes
__utmz	This is set by Google Analytics. It stores where a visitor came from (search engine, search keyword, link).	Yes	Yes

Performance Analysis of Document Image Analysis Systems

Introduction

Research in Document Image Analysis is becoming increasingly more widespread. This fact has so far resulted in the development of a number of alternative methods for solving the various problems posed in the analysis of document images. Different approaches have been devised to suit different applications and, in most cases, each has been tested with very specific data.

The objective assessment of the performance of document analysis subsystems is relatively in its infancy (with OCR methods performance being the only exception). Currently, test methods and ground truth data are not widely available to suit the diverse needs of each individual component (subsystem) of a Document Analysis System. Moreover, the majority of existing methods and data are constrained by various assumptions on the nature of the image data, such as certain limitations to the freedom of the layout of a page.

Current State

Currently, research is being carried out to identify suitable methods and corresponding ground-truth data organisation to cater for documents with complex layouts. Particular attention is paid to the evaluation of methods applied before OCR.

Further Information

Parts of the on-going work have been presented at International Conferences. For both an overview of the framework and a more detailed account of the analysis of Page Segmentation approaches see the Publications section.

Members Involved

Dr Apostolos Antonacopoulos
David Bridson

PRImA

Performance Analysis of Document Image Analysis Systems

Introduction

Current State

Further Information

Members Involved