Performance comparison and an ensemble approach of transcriptome assembly

Sairam Behera, Adam Voshall, Jitender S. Deogun, Etsuko N. Moriyama

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Accurate transcriptome assembly using next-generation sequencing data is crucial in gene expression analysis. However, it has been observed that different assemblers generate significantly different outputs given the same RNA-Seq data. Even the same method often assembles different sets of transcripts when different sets of parameters are used. In this study, we performed comparative analysis of various transcriptome assemblers including four de novo and three genome-guided methods using simulated RNA-Seq data modeling Illumina Hi-Seq sequencing of Arabidopsis thaliana and Zea mays strain B73 transcriptomes. No assembler was able to reconstruct all of the reference transcripts correctly. A large number (∼30%) of transcripts were not assembled correctly by any assembler. Furthermore, each assembler produced a different set of reference transcripts with very few that are common among all. While the de novo tools were able to assemble similar numbers of transcripts correctly as genome-guided tools for one dataset, the former methods also produced much larger numbers of incorrectly assembled transcripts compared to genome-guided tools. These results indicate that there remains a large room for transcriptome assembly to be improved. Therefore, we further investigated a consensus-based ensemble approach. By taking the consensus contig set shared, for example, among three or more de novo assemblers, 10% more transcripts were correctly identified for Arabidopsis thaliana datasets. While the incorrect to correct contig ratio for the de novo assemblers ranged from 4.9 (for Trinity) to 10.7 (SOAPdenovo), for the genome-guided methods the ratios were from 1.3 to 1.7. Using the consensus de novo method, we successfully reduced the ratio to the level very close to or even lower than those obtained by the genome-guided methods (1.5). The results of this study provides us a direction to build a better ensemble approach that can reconstruct all the correct transcripts.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
EditorsIllhoi Yoo, Jane Huiru Zheng, Yang Gong, Xiaohua Tony Hu, Chi-Ren Shyu, Yana Bromberg, Jean Gao, Dmitry Korkin
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2226-2228
Number of pages3
ISBN (Electronic)9781509030491
DOIs
StatePublished - Dec 15 2017
Event2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017 - Kansas City, United States
Duration: Nov 13 2017Nov 16 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
Volume2017-January

Other

Other2017 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2017
CountryUnited States
CityKansas City
Period11/13/1711/16/17

Keywords

  • Ensemble Method
  • Transcriptome Assembly

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Fingerprint Dive into the research topics of 'Performance comparison and an ensemble approach of transcriptome assembly'. Together they form a unique fingerprint.

Cite this