Distinguishing chronic lymphocytic leukemia (CLL), accelerated phase of CLL (aCLL), and diffuse large B-cell lymphoma transformation of CLL (Richter transformation; RT) has important clinical implications that greatly influence patient management. However, distinguishing between these disease phases on histologic grounds may be challenging in routine practice due to the presence of similar structures and homogeneous intensity, among others. In this work, we propose a whole slide image (WSI) level computational framework based on the integration of deep transfer learning, patch level random sampling, and machine learning modeling to distinguish CLL from aCLL and RT. The motivation behind the proposed random sampling-based classification is to address a fundamental question in WSI analysis: is it true that more data is always better? To answer this question, we apply this framework on a pilot cohort of 56 patients (total 95 WSIs). Interestingly, we observe that the tested machine learning models demonstrate a robust performance with just 1% randomly sampled patches from WSIs, on par with the model built on the entire WSI data. Among all three tested machine learning algorithms, multi-instance learning (MIL) has achieved the best prediction, outperforming SVM and XGBoost models. Taken together, our pilot study shows that machine learning models can potentially achieve a reasonable performance with a substantially lower amount of data from WSIs. This observation will shed light on shaping future WSI analysis, where we may reduce the computational burden by using fewer numbers of patches rather than all the data in WSIs, thereby improving computational efficiency. However, these results need to be validated and cautiously interpreted, where the findings may be fundamentally driven by the homogeneous appearance of CLL in pathology slides. It remains unclear if this finding will hold up when testing is performed on more heterogeneous cancer types.