Not registered? - Request an account here

Tools

Aletheia Document Analysis System

Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. It aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback from major libraries across Europe and from their digitisation service providers which are using it in a production environment.

Read more »

Download the latest version


WebAletheia

WebAletheia

A web based version of the Aletheia Document Analysis System, supporting a selected subset of features.

Read more »

Try it


Performance Evaluation A framework for Performance Analysis of OCR methods

Layout Evaluation

This tool is part of a framework for evaluating the performance of layout analysis methods. It combines efficiency and accuracy by using a special interval based geometric representation of regions. A wide range of sophisticated evaluation measures provide the means for a deep insight into the analysed systems, which goes far beyond simple benchmarking. The support of user-defined profiles allows the tuning for any kind of evaluation scenario related to real world applications.

Read more »

Text Evaluation

The Text Evaluation tool implements the word and character accuracy measures developed by the University of Nevada Las Vegas (UNLV dissertation by S. V. Rice). It has been complemented by a bag-of-words method which is independent from the reading order.

Read more »


PAGE Libraries

PAGE Libraries

Platform independent libraries for Java and C++ to create valid layout descriptions in PAGE XML format. The libraries can be easily integrated in other software projects such as page segmentation methods for ICDAR competitions.

Read more »

Java

Read more »

Download the latest version

C++

Read more »

Download the latest version

PAGE XML-schema

Access the latest version »

Read more »


PAGE Converter and Validator

PAGE Converter and Validator

This tool can be used to convert page layout files to the latest PAGE XML format

Read more »

Windows Version

Read more »

Download the latest version

Java Version

Read more »

Download the latest version

PAGE Metadata Scanner

PAGE Metadata Scanner

PAGE Metadata Scanner is a Java command line tool that scans a single PAGE XML file (document page layout and text content) and outputs its properties/statistics as comma-separated values.

Read more »

Download the latest version

Tesseract OCR to PAGE

Tesseract OCR to PAGE

Tesseract to PAGE is a Windows command line tool to analyse a document image using the open source OCR engine Tesseract and export the results to PAGE (Page Analysis and Ground truth Elements) XML format.

Read more »

Download the latest version

Extractor / Exporter

Extractor / Exporter

The PAGE Extractor/Exporter is a Windows command line tool to extract document snippets (image / layout description) for layout elements of documents in PAGE XML format. Furthermore, the text content of layout regions can be serialised according to the reading order and exported into a text file.

Read more »

Download the latest version

PAGE Viewer

PAGE Viewer

The Page Viewer tool is a simple viewer for page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well.

Read more »

Windows

Download the latest version

Linux

Download the latest version

Mac OS

Download the latest version

eMOP Layout Editor

eMOP Layout Editor Web-based crowdsourcing platform

A web-based page layout editor created for the eMOP Project, a crowdsourcing initiative funded by the Mellon Foundation.

Read more »