Not registered? - Request an account here

PAGE Metadata Scanner

PAGE Metadata Scanner

Download the latest version

Overview

PAGE Metadata Scanner is a command line tool that scans a single PAGE XML file (document page layout and text content) and outputs its properties/statistics as comma-separated values.

Following properties are supported:

  • Metadata (ID, creator, creation time, modification time, width, height)
  • Border and print space (true/false)
  • Content object count (per type and sub-type)
  • Text content statistics (number of characters and white spaces)
  • Language and script (semicolon separated list)
  • Reading order and layers (number of region references)

It is also possible to output statistics on all characters that appear in the text content of a PAGE file.

Access the latest source code

Download the latest version

Alternative download


Related Publications

A survey of OCR evaluation tools and metrics

C. Neudecker, K. Baierer, C. Clausner, A. Antonacopoulos, S. Pletschacher

In The 6th International Workshop on Historical Document Imaging and Processing (HIP '21). Association for Computing Machinery, New York, NY, USA, 13–18.

Details »  Download PDF