TY - JOUR
T1 - Another lesson from unmapped reads
T2 - in-depth analysis of RNA-Seq reads from various horse tissues
AU - Gurgul, Artur
AU - Szmatoła, Tomasz
AU - Ocłoń, Ewa
AU - Jasielczuk, Igor
AU - Semik-Gurgul, Ewelina
AU - Finno, Carrie J.
AU - Petersen, Jessica L.
AU - Bellone, Rebecca
AU - Hales, Erin N.
AU - Ząbek, Tomasz
AU - Arent, Zbigniew
AU - Kotula-Balak, Małgorzata
AU - Bugno-Poniewierska, Monika
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive licence to Institute of Plant Genetics Polish Academy of Sciences.
PY - 2022/9
Y1 - 2022/9
N2 - In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences. Recent studies have shown that these reads can be useful, e.g., for refining reference genomes or detecting contaminating microorganisms present in the analyzed biological samples. A special case of this is RNA sequencing (RNA-Seq) reads that come from tissue transcriptomes. Unmapped reads from RNA-Seq have received much less attention than those from whole-genome sequencing. In particular, in the horse, an analysis of unmapped RNA reads has not been performed yet. Thus, in this study, we analyzed the unmapped reads originating from the RNA-Seq performed through the Functional Annotation of Animal Genomes (FAANG) project in the horse, using eight different tissues from two mares. We demonstrated that unmapped reads from RNA-Seq could be easily assembled into transcripts relating to many important genes present in the sequences of other mammals. Large portions of these transcripts did not have coding potential and, thus, can be considered as non-coding RNA. Moreover, reads that were not mapped to the reference genome but aligned to the entries in NCBI database of horse proteins were enriched for biological processes that largely correspond to the functions of organ from which RNA was isolated and thus are presumably true transcripts of genes associated with cell metabolism in those tissues. In addition, a portion of reads aligned to the common pathogenic or neutral microbiota, of which the most common was Brucella spp. These data suggest that unmapped reads can be an important target for in-depth analysis that may substantially enrich results of initial RNA-Seq experiments for various tissues and organs.
AB - In recent years, a vast amount of sequencing data has been generated and large improvements have been made to reference genome sequences. Despite these advances, significant portions of reads still do not map to reference genomes and these reads have been considered as junk or artificial sequences. Recent studies have shown that these reads can be useful, e.g., for refining reference genomes or detecting contaminating microorganisms present in the analyzed biological samples. A special case of this is RNA sequencing (RNA-Seq) reads that come from tissue transcriptomes. Unmapped reads from RNA-Seq have received much less attention than those from whole-genome sequencing. In particular, in the horse, an analysis of unmapped RNA reads has not been performed yet. Thus, in this study, we analyzed the unmapped reads originating from the RNA-Seq performed through the Functional Annotation of Animal Genomes (FAANG) project in the horse, using eight different tissues from two mares. We demonstrated that unmapped reads from RNA-Seq could be easily assembled into transcripts relating to many important genes present in the sequences of other mammals. Large portions of these transcripts did not have coding potential and, thus, can be considered as non-coding RNA. Moreover, reads that were not mapped to the reference genome but aligned to the entries in NCBI database of horse proteins were enriched for biological processes that largely correspond to the functions of organ from which RNA was isolated and thus are presumably true transcripts of genes associated with cell metabolism in those tissues. In addition, a portion of reads aligned to the common pathogenic or neutral microbiota, of which the most common was Brucella spp. These data suggest that unmapped reads can be an important target for in-depth analysis that may substantially enrich results of initial RNA-Seq experiments for various tissues and organs.
KW - Equine
KW - Genome assembly
KW - Misassembled
KW - Transcriptome
UR - http://www.scopus.com/inward/record.url?scp=85131522487&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85131522487&partnerID=8YFLogxK
U2 - 10.1007/s13353-022-00705-z
DO - 10.1007/s13353-022-00705-z
M3 - Article
C2 - 35670911
AN - SCOPUS:85131522487
SN - 1234-1983
VL - 63
SP - 571
EP - 581
JO - Journal of Applied Genetics
JF - Journal of Applied Genetics
IS - 3
ER -