FASTA-SWAP and FASTA-PAT: Pattern database searches using combinations of aligned amino acids, and a novel scoring theory

István Ladunga, Brent A. Wiese, Randall F. Smith

Research output: Contribution to journalArticlepeer-review

12 Scopus citations

Abstract

We introduce two new pattern database search tools that utilize statistical significance and information theory to improve protein function identification. Both the general pattern scoring theory with the specific matrices introduced here and the low redundancy of pattern databases increase search sensitivity and selectivity. Pattern scoring preferentially rewards matches at conserved positions in a pattern with higher scores than matches at variable positions, and assigns more negative scores to mismatches at conserved positions than to mismatches at variable positions. The theory of pattern scoring can be used to create log-odds pattern scores for patterns derived from any set of multiple alignments. This theoretical framework can be used to adapt existing sequence database search tools to pattern analysis. Our FASTA-SWAP and FASTA-PAT tools are extensions of the FASTA program that search a sequence query against a pattern database. In the first step, FASTA-SWAP searches the diagonals of the query sequence and the library pattern for high-scoring segments, while FASTA-PAT performs an extended version of hashing. In the second step, both methods refine the alignments and the scores using dynamic programming. The tools utilize an extremely compact binary representation of all possible combinations of amino acid residues in aligned positions. Our FASTA-SWAP and FASTA-PAT tools are well suited for functional identification of distant relatives that may be missed by sequence database search methods. FASTA-SWAP and FASTA-PAT searches can be performed using out World-Wide Web Server (http://dot.imgen.bcm.tmc.edu:9331/seq-search/Options/fastapat.htm1).

Original languageEnglish (US)
Pages (from-to)840-854
Number of pages15
JournalJournal of Molecular Biology
Volume259
Issue number4
DOIs
StatePublished - Jun 21 1996
Externally publishedYes

Keywords

  • Amino acid sequence pattern
  • FASTA
  • Protein database search
  • Protein function identification
  • Scoring theory

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Biology

Fingerprint

Dive into the research topics of 'FASTA-SWAP and FASTA-PAT: Pattern database searches using combinations of aligned amino acids, and a novel scoring theory'. Together they form a unique fingerprint.

Cite this