TY - JOUR
T1 - Visual domain knowledge-based multimodal zoning for textual region localization in noisy historical document images
AU - Pack, Chulwoo
AU - Soh, Leen Kiat
AU - Lorang, Elizabeth
N1 - Funding Information:
This project was supported in part by the Trans-Atlantic Partnership Digging into Data Challenge (U.S. funder Institute of Museum and Library Services, IMLS) and has received previous support from IMLS and the National Endowment for the Humanities (NEH). Any views, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect those of IMLS or NEH. M. Datla, S. Kulwicki, and G. Thomas made early contributions to this project. Y. Liu also made contributions to this project.
Publisher Copyright:
© 2021 SPIE and IS&T.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - Document layout analysis, or zoning, is important for textual content analysis such as optical character recognition. Zoning document images such as digitized historical newspaper pages are challenging due to noise and quality of the document images. Recently, effective data-driven approaches, such as leveraging deep learning, have been proposed, albeit with the concern of requiring larger training data and thus incurring additional cost of ground truthing. We propose a zoning solution by incorporating a knowledge-driven document representation, gravity map, into a multimodal deep learning framework to reduce the amount of time and data required for training. We first generate a gravity map for each image, considering the centroid distance and area between a cell in a Voronoi tessellation and its content to encode visual domain knowledge of a zoning task. Second, we inject the gravity maps into a deep convolution neural network (DCNN) during training, as an additional modality to boost performance. We report on two investigations using two state-of-the-art DCNN architectures and three datasets: two sets of historical newspapers and a set of born-digital contemporary documents. Evaluations show that our solution achieved comparable segmentation accuracy using fewer training epochs and less training data compared to a naïve training scheme.
AB - Document layout analysis, or zoning, is important for textual content analysis such as optical character recognition. Zoning document images such as digitized historical newspaper pages are challenging due to noise and quality of the document images. Recently, effective data-driven approaches, such as leveraging deep learning, have been proposed, albeit with the concern of requiring larger training data and thus incurring additional cost of ground truthing. We propose a zoning solution by incorporating a knowledge-driven document representation, gravity map, into a multimodal deep learning framework to reduce the amount of time and data required for training. We first generate a gravity map for each image, considering the centroid distance and area between a cell in a Voronoi tessellation and its content to encode visual domain knowledge of a zoning task. Second, we inject the gravity maps into a deep convolution neural network (DCNN) during training, as an additional modality to boost performance. We report on two investigations using two state-of-the-art DCNN architectures and three datasets: two sets of historical newspapers and a set of born-digital contemporary documents. Evaluations show that our solution achieved comparable segmentation accuracy using fewer training epochs and less training data compared to a naïve training scheme.
KW - document image processing
KW - image analysis
KW - image decomposition
KW - image recognition
KW - image segmentation
UR - http://www.scopus.com/inward/record.url?scp=85122680308&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85122680308&partnerID=8YFLogxK
U2 - 10.1117/1.JEI.30.6.063028
DO - 10.1117/1.JEI.30.6.063028
M3 - Article
AN - SCOPUS:85122680308
SN - 1017-9909
VL - 30
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
IS - 6
M1 - 063028
ER -