TY - GEN
T1 - Classification and cluster analysis of complex time-of-flight secondary ion mass spectrometry for biological samples
AU - Tian, Xue
AU - Reichenbach, Stephen E.
AU - Tao, Qingping
AU - Henderson, Alex
PY - 2009
Y1 - 2009
N2 - Identifying and separating subtly different biological samples is one of the most critical tasks in biological analysis. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is becoming a popular and important technique in the analysis of biological samples, because it can detect molecular information and characterize chemical composition. ToF-SIMS spectra of biological samples are enormously complex with large mass ranges and many peaks. As a result the classification and cluster analysis are challenging. This study presents a new classification algorithm, the most similar neighbor with a probability-based spectrum similarity measure (MSNPSSM), which uses all the information in the entire ToFSIMS spectra. MSN-PSSM is applied to automatically classify bacterial samples which are major causal agents of urinary tract infections. Experimental results show that MSN-PSSM is an accurate classification algorithm. It outperforms traditional techniques such as decision trees, principal component analysis (PCA) with discriminant function analysis (DFA), and soft independent modeling of class analogy (SIMCA). This study also applies a modern clustering algorithm, normalized spectral clustering, to automatically cluster the bacterial samples at the species level. Experimental results demonstrate that normalized spectral clustering is able to show accurate quantitative separations. It outperforms traditional techniques such as hierarchical clustering analysis, kmeans, and PCA with k-means. Copyright
AB - Identifying and separating subtly different biological samples is one of the most critical tasks in biological analysis. Time-of-flight secondary ion mass spectrometry (ToF-SIMS) is becoming a popular and important technique in the analysis of biological samples, because it can detect molecular information and characterize chemical composition. ToF-SIMS spectra of biological samples are enormously complex with large mass ranges and many peaks. As a result the classification and cluster analysis are challenging. This study presents a new classification algorithm, the most similar neighbor with a probability-based spectrum similarity measure (MSNPSSM), which uses all the information in the entire ToFSIMS spectra. MSN-PSSM is applied to automatically classify bacterial samples which are major causal agents of urinary tract infections. Experimental results show that MSN-PSSM is an accurate classification algorithm. It outperforms traditional techniques such as decision trees, principal component analysis (PCA) with discriminant function analysis (DFA), and soft independent modeling of class analogy (SIMCA). This study also applies a modern clustering algorithm, normalized spectral clustering, to automatically cluster the bacterial samples at the species level. Experimental results demonstrate that normalized spectral clustering is able to show accurate quantitative separations. It outperforms traditional techniques such as hierarchical clustering analysis, kmeans, and PCA with k-means. Copyright
UR - http://www.scopus.com/inward/record.url?scp=78951483412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=78951483412&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:78951483412
SN - 9781615676538
T3 - International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2009, BCBGC 2009
SP - 78
EP - 85
BT - International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics 2009, BCBGC 2009
T2 - 2009 International Conference on Bioinformatics, Computational Biology, Genomics and Chemoinformatics, BCBGC 2009
Y2 - 13 July 2009 through 16 July 2009
ER -