TY - JOUR
T1 - Investigating coupling preprocessing with shallow and deep convolutional neural networks in document image classification
AU - Liu, Yi
AU - Soh, Leen Kiat
AU - Lorang, Elizabeth
N1 - Funding Information:
This project was supported in part by the Institute of Museum and Library Services and has received previous support from the National Endowment for the Humanities. Charles Nugent helped build the initial convolutional neural network architecture that shaped its development. This work was completed utilizing the Holland Computing Center of the University of Nebraska, which receives support from the Nebraska Research Initiative.
Publisher Copyright:
© 2021 The Authors. Published by SPIE.
PY - 2021/7/1
Y1 - 2021/7/1
N2 - Convolutional neural networks (CNNs) are effective for image classification, and deeper CNNs are being used to improve classification performance. Indeed, as needs increase for searchability of vast printed document image collections, powerful CNNs have been used in place of conventional image processing. However, better performances of deep CNNs come at the expense of computational complexity. Are the additional training efforts required by deeper CNNs worth the improvement in performance? Or could a shallow CNN coupled with conventional image processing (e.g., binarization and consolidation) outperform deeper CNN-based solutions? We investigate performance gaps among shallow (LeNet-5,-7, and-9), deep (ResNet-18), and very deep (ResNet-152, MobileNetV2, and EfficientNet) CNNs for noisy printed document images, e.g., historical newspapers and document images in the RVL-CDIP repository. Our investigation considers two different classification tasks: (1) identifying poems in historical newspapers and (2) classifying 16 document types in document images. Empirical results show that a shallow CNN coupled with computationally inexpensive preprocessing can have a robust response with significantly reduced training samples; deep CNNs coupled with preprocessing can outperform very deep CNNs effectively and efficiently; and aggressive preprocessing is not helpful as it could remove potentially useful information in document images.
AB - Convolutional neural networks (CNNs) are effective for image classification, and deeper CNNs are being used to improve classification performance. Indeed, as needs increase for searchability of vast printed document image collections, powerful CNNs have been used in place of conventional image processing. However, better performances of deep CNNs come at the expense of computational complexity. Are the additional training efforts required by deeper CNNs worth the improvement in performance? Or could a shallow CNN coupled with conventional image processing (e.g., binarization and consolidation) outperform deeper CNN-based solutions? We investigate performance gaps among shallow (LeNet-5,-7, and-9), deep (ResNet-18), and very deep (ResNet-152, MobileNetV2, and EfficientNet) CNNs for noisy printed document images, e.g., historical newspapers and document images in the RVL-CDIP repository. Our investigation considers two different classification tasks: (1) identifying poems in historical newspapers and (2) classifying 16 document types in document images. Empirical results show that a shallow CNN coupled with computationally inexpensive preprocessing can have a robust response with significantly reduced training samples; deep CNNs coupled with preprocessing can outperform very deep CNNs effectively and efficiently; and aggressive preprocessing is not helpful as it could remove potentially useful information in document images.
KW - convolutional neural network
KW - document classification
KW - document image
KW - document image analysis
KW - historical newspapers
KW - image denoising
KW - poetic content classification
UR - http://www.scopus.com/inward/record.url?scp=85114617596&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85114617596&partnerID=8YFLogxK
U2 - 10.1117/1.JEI.30.4.043024
DO - 10.1117/1.JEI.30.4.043024
M3 - Article
AN - SCOPUS:85114617596
VL - 30
JO - Journal of Electronic Imaging
JF - Journal of Electronic Imaging
SN - 1017-9909
IS - 4
M1 - 043024
ER -