The recently concluded Census 1961 Feasibility Study was conducted to ascertain whether the complete 1961 Census data collection can be digitised and the information extracted and made available online in a highly versatile form similar to the newer Censuses.
The study was conducted in two parts by the authors in cooperation with the Office for National Statistics (ONS) from September 2015 to March 2017. The feasibility was tested by designing a digitisation pipeline, applying state-of-the-art page recognition systems, importing extracted fields into a database, applying sophisticated post-processing and quality assurance techniques and evaluating the results. The main questions to be answered were: What is the best way of digitising the material to maximise the quality of the output and is the quality high enough to satisfy the requirements of a trustworthy Census 1961 database with public access?
A prototype of a fully-functional pipeline was developed, including: image preprocessing, page analysis and recognition, post-processing, and data export. Each individual part of the pipeline was evaluated individually by testing a range of different analysis and recognition approaches on a representative data sample. Well-established performance evaluation metrics were used to precisely measure the impact of variations in the workflow on different types of data (image quality, page content etc.). In addition, the accuracy of the extracted tabular data was evaluated using model-intrinsic rules such as sums of values along table columns and/or rows and across different levels of geography.
Creating a Complete Workflow for Digitising Historical Census Documents: Considerations and Evaluation
Proceedings of the 2017 Workshop on Historical Document Imaging and Processing (HIP2017), Kyoto, Japan, November 2017, pp. 83-88
Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables
Proceedings of Second International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2017), Goettingen, Germany, 01 - 02 June 2017