December 5, Mon 2011
1:00 pm, MRB 200 Conference Room
Dr. Andrea Bazzoli
A protein design algorithm combining free-energy minimization and alignment to a target sequence profile
A working paradigm in protein design is that, among all candidate amino acid sequences, those with lower free energy in the target structure are more likely to adopt it. Nonetheless, the shaping of viable sequence–structure pairs over evolutionary time is also dictated by other selective pressures, such as constraints on folding kinetics and the presence of binding partners, which generally compete with the minimization of free energy. In this talk I will present a new algorithm for protein design, where free-energy minimization is combined with a statistical term that implicitly accounts for the other evolutionary pressures. In particular, this term biases the search toward the sequence space visited by a family of structures that share the same fold with the target. The bias is exerted by allowing higher sampling probabilities to candidate designed sequences that align better to a sequence profile representative of the family. The protein design algorithm is evaluated by I-TASSER tertiary structure prediction of the designed sequences. Over a test set of 48 globular target structures, the best I-TASSER model for a designed sequence has a TM-score to target of >0.5 in all cases, and an average TM-score to target of 0.90, which indicates that, if any globular fold is adopted by the designed sequences, it will likely be the target fold. This specificity of the designed sequences for the target fold is achieved despite their relatively low identity to the native sequence (25% on average). Similar figures of folding specificity and native sequence identity were previously obtained by a protein design algorithm based on free-energy minimization alone. On the other hand, thanks to the profile alignment term, the distribution of amino acid types in the designed sequences correlates much better with the native distribution, with a correlation coefficient of 0.90 versus the 0.23 under pure free-energy minimization. The present algorithm may therefore be considered a promising, general tool for protein design, considering that the strongest assumption it makes — namely, the availability of a family of sequences with target-like structure — holds in most of cases.