TY - GEN
T1 - Is More Always Better? Effects of Patch Sampling in Distinguishing Chronic Lymphocytic Leukemia from Transformation to Diffuse Large B-Cell Lymphoma
AU - Bandyopadhyay, Rukhmini
AU - Chen, Pingjun
AU - El Hussein, Siba
AU - Rojas, Frank R.
AU - Ebare, Kingsley
AU - Wistuba, Ignacio I.
AU - Solis Soto, Luisa M.
AU - Medeiros, L. Jeffrey
AU - Zhang, Jianjun
AU - Khoury, Joseph D.
AU - Wu, Jia
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Distinguishing chronic lymphocytic leukemia (CLL), accelerated phase of CLL (aCLL), and diffuse large B-cell lymphoma transformation of CLL (Richter transformation; RT) has important clinical implications that greatly influence patient management. However, distinguishing between these disease phases on histologic grounds may be challenging in routine practice due to the presence of similar structures and homogeneous intensity, among others. In this work, we propose a whole slide image (WSI) level computational framework based on the integration of deep transfer learning, patch level random sampling, and machine learning modeling to distinguish CLL from aCLL and RT. The motivation behind the proposed random sampling-based classification is to address a fundamental question in WSI analysis: is it true that more data is always better? To answer this question, we apply this framework on a pilot cohort of 56 patients (total 95 WSIs). Interestingly, we observe that the tested machine learning models demonstrate a robust performance with just 1% randomly sampled patches from WSIs, on par with the model built on the entire WSI data. Among all three tested machine learning algorithms, multi-instance learning (MIL) has achieved the best prediction, outperforming SVM and XGBoost models. Taken together, our pilot study shows that machine learning models can potentially achieve a reasonable performance with a substantially lower amount of data from WSIs. This observation will shed light on shaping future WSI analysis, where we may reduce the computational burden by using fewer numbers of patches rather than all the data in WSIs, thereby improving computational efficiency. However, these results need to be validated and cautiously interpreted, where the findings may be fundamentally driven by the homogeneous appearance of CLL in pathology slides. It remains unclear if this finding will hold up when testing is performed on more heterogeneous cancer types.
AB - Distinguishing chronic lymphocytic leukemia (CLL), accelerated phase of CLL (aCLL), and diffuse large B-cell lymphoma transformation of CLL (Richter transformation; RT) has important clinical implications that greatly influence patient management. However, distinguishing between these disease phases on histologic grounds may be challenging in routine practice due to the presence of similar structures and homogeneous intensity, among others. In this work, we propose a whole slide image (WSI) level computational framework based on the integration of deep transfer learning, patch level random sampling, and machine learning modeling to distinguish CLL from aCLL and RT. The motivation behind the proposed random sampling-based classification is to address a fundamental question in WSI analysis: is it true that more data is always better? To answer this question, we apply this framework on a pilot cohort of 56 patients (total 95 WSIs). Interestingly, we observe that the tested machine learning models demonstrate a robust performance with just 1% randomly sampled patches from WSIs, on par with the model built on the entire WSI data. Among all three tested machine learning algorithms, multi-instance learning (MIL) has achieved the best prediction, outperforming SVM and XGBoost models. Taken together, our pilot study shows that machine learning models can potentially achieve a reasonable performance with a substantially lower amount of data from WSIs. This observation will shed light on shaping future WSI analysis, where we may reduce the computational burden by using fewer numbers of patches rather than all the data in WSIs, thereby improving computational efficiency. However, these results need to be validated and cautiously interpreted, where the findings may be fundamentally driven by the homogeneous appearance of CLL in pathology slides. It remains unclear if this finding will hold up when testing is performed on more heterogeneous cancer types.
KW - Accelerated CLL
KW - Chronic Lymphocytic Leukemia (CLL)
KW - Patch level analysis
KW - Random sampling
KW - Richter Transformation (RT)
UR - http://www.scopus.com/inward/record.url?scp=85140458693&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85140458693&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-17266-3_2
DO - 10.1007/978-3-031-17266-3_2
M3 - Conference contribution
AN - SCOPUS:85140458693
SN - 9783031172656
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 11
EP - 20
BT - Computational Mathematics Modeling in Cancer Analysis - 1st International Workshop, CMMCA 2022, Held in Conjunction with MICCAI 2022, Proceedings
A2 - Qin, Wenjian
A2 - Zaki, Nazar
A2 - Zhang, Fa
A2 - Wu, Jia
A2 - Yang, Fan
PB - Springer Science and Business Media Deutschland GmbH
T2 - 1st International Workshop on Computational Mathematics Modeling in Cancer Analysis, CMMCA 2022, held in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2022
Y2 - 18 September 2022 through 18 September 2022
ER -