This project aims at transforming 1961 Census microfilm data tables into a comprehensive dataset structured in such a way to allow researchers to make further analyses. We’re looking at a range of data at different geographical levels, such as Districts, Wards, and Enumeration Districts.
Our workflow also includes a crowdsourcing component on Zooniverse to transcribe text that was not recognised correctly by the OCR pipeline.
The Census 1961 Feasibility Study (concluded in 2017) was conducted to ascertain whether the complete 1961 Census data collection can be digitised and the information extracted and made available online in a highly versatile form similar to the newer Censuses.
The study was conducted in two parts by the authors in cooperation with the Office for National Statistics (ONS) from September 2015 to March 2017. The feasibility was tested by designing a digitisation pipeline, applying state-of-the-art page recognition systems, importing extracted fields into a database, applying sophisticated post-processing and quality assurance techniques and evaluating the results. The main questions to be answered were: What is the best way of digitising the material to maximise the quality of the output and is the quality high enough to satisfy the requirements of a trustworthy Census 1961 database with public access?
A prototype of a fully-functional pipeline was developed, including: image preprocessing, page analysis and recognition, post-processing, and data export. Each individual part of the pipeline was evaluated individually by testing a range of different analysis and recognition approaches on a representative data sample. Well-established performance evaluation metrics were used to precisely measure the impact of variations in the workflow on different types of data (image quality, page content etc.). In addition, the accuracy of the extracted tabular data was evaluated using model-intrinsic rules such as sums of values along table columns and/or rows and across different levels of geography.
A dataset with Census 1961 images can be found here
Ground truth for a small number of pages is available here
Creating a Complete Workflow for Digitising Historical Census Documents: Considerations and Evaluation
Proceedings of the 2017 Workshop on Historical Document Imaging and Processing (HIP2017), Kyoto, Japan, November 2017, pp. 83-88
Unearthing the Recent Past: Digitising and Understanding Statistical Information from Census Tables
Proceedings of Second International Conference on Digital Access to Textual Cultural Heritage (DATeCH 2017), Goettingen, Germany, 01 - 02 June 2017