Identification of novel genes in human genome

  • Wang, San Ming (PI)

Project: Research project

Project Details


DESCRIPTION (provided by applicant): A comprehensive understanding of the complexity of gene expression during development and differentiation requires the identification of all of the mRNA transcripts. The SAGE (serial analysis of gene expression) technique is one of the most comprehensive methods presently available to achieve this goal. Because it does not require the prior knowledge of the expressed transcripts present in a sample, SAGE can identify novel transcripts. We have identified a large number of novel SAGE tags from hematopoietic cells. Our analysis of the nature of the novel SAGE tags indicates that most of the novel SAGE tags are derived from novel transcripts, and many of these novel transcripts may represent novel genes not identified in the human genome. In his proposal, we have set three specific aims to isolate full-length novel cDNAs starting from novel SAGE tags: Specific Aim 1. Convert 10,000 novel SAGE tags identified in human CD34+ stem/progenitor cells into 3' cDNAs; Specific Aim 2. Convert about 6,000 novel 3' cDNAs generated from Specific Aim 1 into 8,000 to 10,000 full-length cDNAs including alternatively spliced variants; and Specific Aim 3. Annotate full-length novel cDNAs generated by Specific Aim 2 to the human genome sequence. We have developed a high-throughput system to achieve these aims. In this system, a SAGE tag of 14 bases is extended to the 3' end of the cDNA averaging 140 bases that can be mapped to the human genomic sequences. For the mapped sequences that show novelty, we will use the putative first exon predicted by the computational program First EF to design a sense primer. Together with the antisense primer designed based on the 3' cDNA, we will amplify the full-length cDNA represented by the novel SAGE tag and integrate the information into the human genome. Our analysis should reveal a significant number of novel genes, novel alternatively spliced transcripts originating from novel genes or known genes, non-coding RNAs, and transcripts from pseudogenes. The success of our proposal should make a significant contribution for the annotation of human genome. It should also provide information for studying normal and abnormal hematopoiesis. The information should also be useful for the improvement of genome annotation algorithms. Our approach should also be applicable for studies on other tissues of both human and non-human origins.
Effective start/end date9/19/036/30/07


  • National Institutes of Health: $94,629.00
  • National Institutes of Health: $504,196.00
  • National Institutes of Health: $509,626.00
  • National Institutes of Health: $574,527.00


  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.