Knowledge of a protein structure is essential to understand its function, evolution, dynamics, stability, interactions and for knowledge-based protein- or drug-design. Experimental structure determination rates, however, are far exceeded by that of next-generation sequencing, resulting in less than 1/1000th of the known proteins having an experimentally resolved structure. Computational structure prediction seeks to alleviate this problem.
We are building a suite of meta-programs called TopSuite for computational structure prediction in order to benefit from the strengths of different methods and counteract their weaknesses. The suite includes programs for template selection (TopThreader), sequence and structure alignment (TopAligner), template-based model building (TopBuilder), model quality assessment (TopScore), model combination and refinement (TopRefiner), ab initio residue contact prediction (TopContact), protein-protein contact prediction (TopInterface) and protein-protein docking (TopDock). We developed the pipeline TopModel that combines TopThreader, TopAligner, TopScore, TopBuilder and TopRefiner to produce protein structure predictions based on detected templates. TopModel combines and improves predictions from a wide range of primary predictors using large diverse datasets, deep neural networks, and highly accurate model quality assessment to optimize template selection, template-target alignment, model selection, and model combination and refinement. TopModel has been used in several collaborative projects [100,124,135,163] producing high quality models for a wide range of systems.
Due to the sparse experimental information provided by low resolution techniques such as FRET or EPR, determining atomic resolution protein structures using only these experiments is impossible. Fortunately, computational methods can provide complementary information, such as detailed structural features. The combination of structure prediction with experimental input in a hybrid approach can lead to generation and verification of detailed multi-domain protein structure models [139] or quarternary structures [78], because experimental data such as FRET distance information can guide the model building. The key to hybrid modelling lies in the fine interplay between the computer simulations and the experiments to accomplish the most effective synergies between the strengths of both sides.
G-protein coupled receptors (GPCRs) are currently among the most important drug targets, yet only a few GPCRs have been crystallized. Thus, when developing drugs for not yet crystallized GPCRs, computational structure prediction is required for knowledge-based drug design. While the number of unique X-ray crystal structures of GPCRs is steadily increasing, they must still often be modeled at low sequence identities. This necessitates the use of as much evolutionary and mutational information as possible as well as a multi-template approach. As such, the sequences of the family of a specific GPCR can be used to guide the alignment of the templates to the target sequence as is done in the TopModel pipeline. For binding mode predictions in models of GPCRs, we perform molecular docking to a variety of models to include the flexibility of the binding pocket. When complemented by mutational analysis and molecular dynamics simulations in an iterative, integrated modeling approach, as done with TGR5 [120], this can result in a very detailed, predictive binding mode model.