Introduction
FOLDpro is a web server to predict protein 3D structure using
a machine learning fold recognition approach. It makes predictions in three steps.
-
Step 1: Use a machine learning information retrieval approach to rank
template proteins for the query protein, integrating a variety of similarity
features including sequence/family information, sequence-sequence alignment,
sequence-profile (profile-sequence) alignment, profile-profile alignment, and
structural features (see Table 1 for all the features).
-
Step 2: Generate profile-profile alignments between the query protein and the top
ranked template proteins. Multiple templates are used to improve both the alignment and the
structure modeling if necessary.
-
Step 3: Based on the query-template alignments and 3D structures of the templates,
Modeller (Sali and Blundell, 1993) is used to generate structure models for the
query protein. Five models are generated for the query. The models ranked higher
are generated based on the templates ranked higher. So they are presumably,
but not always, better than the models ranked lower
(e.g., fold1.pdb is likely better than fold2.pdb. fold2.pdb is likely better than
fold3.pdb).
Input and Output
-
Inputs to the web server include target name, sequence, and email address.
It usually takes several hours to process one query, depending on the server load and
sequence length. Sequence must be entered as a plain sequence of amino acids. Maximum sequence length
allowed is 500.
- Outputs include the top 20 ranked templates with their ranking scores and the predicted 3D structures. The ranking scores
are generated by Support Vector Machine Algorithm. A positive score usually indicates the
query and template are significantly related. The higher the score, the more significant
the match is. The PDB files and their corresponding alignment files used to generate
the structures are attached with the returned email. The alignment files are in the PIR
format. In addition, a ranking list of all templates for the query is also returned as a file: query.rank.
Download the feature dataset used in FOLDpro
Reference
-
J. Cheng and P. Baldi. A Machine Learning Information Retrieval Approach to Protein Fold Recognition.
Bioinformatics, vol. 22, no. 12, pp. 1456-1463, 2006. [PDF] [PDF at Bioinformatics web site]