Motivation: PHYSEAN predicts protein classes with highly variable sequences on the basis of their physical, chemical and biological characteristics such as diverse hydrophobicity, structural propensity and steric properties. These characteristics, calculated from multiple positions in a sequence, may be conserved even between sequences that fail to produce alignments at any acceptable level of statistical significance PHYSEAN complements methods that require sequence alignments (BLAST FASTA, dynamic programming) by adding less residue- and position-specific physicochemical information on the protein or the domain. Results: We predict proteins or their domains like signal peptides using physical, chemical, geometric, and biological properties of the 20 amino acids. This comprehensive set of properties may cover the diagnostic functional and structural aspects of a domain or a protein class. We automatically select and weight a. subset of properties so as to discriminate between, e.g., signal peptides and amino-termini of cytosolic proteins with the lowest number of incorrect predictions. This optimal selection of properties and their weights significantly decreases the number of incorrect predictions as compared to any single property or any combination of unweighted properties. Weights have been optimized by high-performance linear programming models that systematically find the optimal solution from among an astronomic number of property/weight combinations. PHYSEAN's performance is demonstrated by highly accurate predictions of signal peptides (the vehicles for protein transport across membranes) and their cleavage sites. The results indicate reliable predictions are possible even in the lack of sequence conservation using an automated physical and chemical analysis of proteins. Availability: The source code for the prediction program will be available for collaborators.
ASJC Scopus subject areas
- Statistics and Probability
- Molecular Biology
- Computer Science Applications
- Computational Theory and Mathematics
- Computational Mathematics