J. Cheng and P. Baldi. A Machine Learning Information Retrieval Approach to Protein Fold Recognition. Bioinformatics, vol. 22, no. 12, pp. 1456-1463, 2006.

The whole feature dataset generated from Lindahl's dataset can be downloaded here.

The 54 Selected Similarity Features Ranked by Information Gains
Feature Name Information Gain
HHSearch score 0.0375
COMPASS evalue 0.0370
PRC reverse score on chk profile 0.0354
PRC reverse score on HMM profile 0.0341
HMMer pfam evalue 0.0287
dot product of SS and SA vectors 0.0266
HMMer search evalue 0.0264
SS match ratio 0.0263
correlation of SS and SA vectors 0.0263
PRC simple score on HMM profile 0.0248
cosine of SS and SA vectors 0.0246
Gaussian kernel on SS and SA vectors 0.0237
COMPASS score 0.0235
PRC coemis score on HMM profile 0.022
PSI-BLAST evalue 0.0205
IMPALA evalue 0.0181
RPS-BLAST evalue 0.0180
SA match ratio 0.0154
cosine of residue contact num (8AA) 0.0150
HMMer search score 0.0142
cosine of residue contact num (12AA) 0.0141
PRC simple score on chk profile 0.0140
normalized length of Palign alignment 0.0135
normalized contact probability (8AA) 0.0132
Gaussian kernel of sequence dimer composition 0.0121
correlation of residue contact num (8AA) 0.0120
cosine of residue contact order (8AA) 0.0117
correlation of family mononer composition 0.0116
correlation of family dimer composition 0.0116
PSI-BLAST alignment score 0.0116
Gaussian kernel of family dimer composition 0.0115
cosine of family dimer composition 0.0113
RPS-BLAST alignment score 0.0113
cosine of family monomer composition 0.0112
IMPALA alignment length 0.0111
RPS-BLAST alignment length 0.0111
Gaussian kernel of family monomer composition 0.0110
correlation of vectors of residue contact order 0.0109
IMPALA alignment score 0.0109
Palign alignment score 0.0102
normalized beta residue pairing probability 0.0100
PRC coemis score on chk profile 0.0091
correlation of residue contact num (12AA) 0.0083
correlation of residue contact order (8AA) 0.0074
cosine of residue contact order (12AA) 0.0072
correlation of residue contact order (12AA) 0.0068
cosine of sequence dimer composition 0.0065
normalized contact probability (12AA) 0.0058
Clustalw profile alignment score 0.0050
cosine of sequence monomer composition 0.0033
Gaussian kernel of sequence monomer composition 0.0033
correlation of sequence monomer composition 0.0027
correlation of family dimer composition 0.0022
Clustalw sequence alignment score 0.0010