npsm 새물리 New Physics : Sae Mulli

pISSN 0374-4914 eISSN 2289-0041
Qrcode

Article

Research Paper

New Phys.: Sae Mulli 2022; 72: 946-952

Published online December 31, 2022 https://doi.org/10.3938/NPSM.72.946

Copyright © New Physics: Sae Mulli.

Structure Optimization of the Trp-Cage Protein in the Three-Dimensional Off-Lattice AB Protein Model

Seung-Yeon Kim*

School of Liberal Arts and Sciences, Korea National University of Transportation, Chungju 27469, Korea

Correspondence to:*E-mail: sykimm@ut.ac.kr

Received: August 26, 2022; Revised: October 4, 2022; Accepted: October 4, 2022

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/3.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

The three-dimensional off-lattice AB protein model is one of the most successful simplified protein models that incorporates the essential physics of protein folding. In this study, more than one million three-dimensional conformations of the Trp-cage protein are effectively searched in the three-dimensional off-lattice AB protein model using a powerful global optimization method known as conformational space annealing. Among the more than one million three-dimensional conformations of the Trp-cage protein, we focus on ten thousand low-energy conformations. The energy landscape and the optimized three-dimensional structures of the Trp-cage protein are investigated by analyzing the ten thousand low-energy conformations of the Trp-cage protein.

Keywords: Structure optimization, Off-lattice AB protein model

Proteins[1-6] are linear biopolymers containing hydrophilic and hydrophobic residues (or amino acids). Proteins are built of twenty different residues with a defined residue sequence, called the primary structure. The human body contains approximately one hundred thousand different types of proteins that regulate almost all biological activities and functions. A protein's biological activities and functions are determined by its three-dimensional inherent tertiary structure.

A protein's hydrophobic residues fold it from its one-dimensional primary structure into its three-dimensional native tertiary structure. A protein's three-dimensional native tertiary structure is related to the structure of the global minimum of its energy function[7]. In its all-atom energy function, understanding a protein's three-dimensional native tertiary structure from its one-dimensional primary structure is too difficult. As a result, simplified protein models incorporating the essential physics of proteins have been used extensively to understand protein folding.

The HP model[8,9] is one of the most successful protein models, in which a protein consists of two types of residues, the hydrophobic (A) and hydrophilic (B) residues, and it is configured as a self-avoiding walk on a simple-cubic lattice. The three-dimensional off-lattice AB protein model[10] is a more generalized and realistic protein model with an energy function that includes the van der Waals interaction and bending energy. The three-dimensional off-lattice AB protein model has been extensively studied for protein models with Fibonacci sequences of A and B residues[11-27]. Additionally, the three-dimensional off-lattice AB protein model has been applied to natural proteins[24,25,27-29].

In this study, the Trp-cage protein (PDB ID: 1L2Y)[30-32] in the three-dimensional off-lattice AB protein model is investigated using a powerful global optimization method known as conformational space annealing[33-38]. The primary structure of the Trp-cage protein is NLYIQWLKDGGPSSGRPPPS. In the three-dimensional off-lattice AB protein model, this primary structure can be expressed as BAAABAABBAAABBABAAAB. More than one million three-dimensional conformations of the Trp-cage protein are searched in this study using conformational space annealing. Among the more than one million three-dimensional conformations of the Trp-cage protein, we focus on ten thousand lowest-energy conformations. The energy landscape and the optimized three-dimensional structures of the Trp-cage protein are investigated by analyzing the ten thousand lowest-energy conformations of the Trp-cage protein.

In the three-dimensional off-lattice AB protein model[10-29], the location of each residue is represented by its central Cα-atom position. The N residues of a protein are linearly connected by (N1) rigid bonds between the i-th and (i+1)-th residues to form a three-dimensional linear biopolymer. Therefore, the shape of a protein is determined by its (N1) rigid bond vectors.

In the three-dimensional off-lattice AB protein model, the energy function of a protein with N residues is represented by the following equation:

E= i=1 N2j=i+2N 4 rij12 C(i,j) rij6+14 i=2 N1(1+cosθi),

where rij is the distance between two non-neighboring residues i and j and θi is the bond angle determined by three neighboring residues i1, i, and i+1. All bond lengths in the three-dimensional off-lattice AB protein model are fixed to the unity. As a result, all physical quantities have no unit. The first energy term is the sum of the van der Waals interaction energies across all (non-neighboring) residue pairs. The second energy term is the sum of the bending energies determined by the three neighboring residues. Also, the van der Waals constant C(i,j) is defined by the following equation:

C(i,j)=4(AA residue pair)2(BB residue pair)2(AB residue pair).

According to Eq. (2), the van der Waals constant means a strong attraction for an AA residue pair, a weak attraction for a BB residue pair, and a weak repulsion for an AB residue pair.

A powerful global optimization method known as conformational space annealing[33-38] is used to effectively search the optimized low-energy conformations of the Trp-cage protein in the three-dimensional off-lattice AB protein model. The method of conformational space annealing employs the following three global optimization methods, simulated annealing, genetic algorithm, and basin-hopping algorithm. In conformational space annealing, only the phase space of local minima is considered when using the basin-hopping algorithm. Furthermore, in conformational space annealing, many conformations are collectively considered, and a subset of seed conformations is perturbed using the information in other conformations, as in the genetic algorithm. More importantly, in conformational space annealing, an annealing distance parameter is gradually reduced, similar to the role of temperature in simulated annealing.

The method of conformational space annealing is used to search more than one million three-dimensional conformations of the Trp-cage protein in the three-dimensional off-lattice AB protein model. To understand the energy landscape of the Trp-cage protein, among the more than one million three-dimensional conformations, we concentrate on ten thousand low-energy conformations between energies E0=25.70396 and E=-24.78937 of the Trp-cage protein. The lowest-energy conformation with E0=25.70396 may correspond to the global-minimum energy conformation of the Trp-cage protein. To investigate the energy landscape of the Trp-cage protein in the three-dimensional off-lattice AB protein model, we measure the root-mean-square deviation (RMSD) and the radius of gyration (Rg) for all ten thousand low-energy conformations from the global-minimum energy conformation with E0=25.70396.

Figure 1 shows the distributions of the RMSD as a function of energy for all ten thousand low-energy conformations derived from the global-minimum energy conformation. Most of the low-energy conformations of the Trp-cage protein have RMSD values greater than 0.5, as shown in Fig. 1. As a result, all ten thousand low-energy conformations of the Trp-cage protein can conveniently be classified into two groups: the native group of the low-energy conformations with RMSD<0.4 and the majority group of the low-energy conformations with RMSD>0.4. As shown in Fig. 1, the region of 0.3<RMSD<0.4 has the smallest number of different conformations. As a result, any value within the range of 0.3RMSD0.4 can be chosen to classify the native and the majority groups. In the majority group of the low-energy conformations, the conformation with the lowest energy has E=-25.69674, which is comparable to the energy of the global-minimum energy conformation. The energy difference between these two conformations is very small (ΔE=0.00722), due to the rugged energy landscape of the Trp-cage protein.

Figure 1. (Color online) Distributions of the root-mean-square deviation (RMSD) as a function of energy for all ten thousand low-energy conformations of the Trp-cage protein, from the global-minimum energy conformation with E0=25.70396.

Figure 2 shows the distributions of the radius of gyration (Rg) as a function of energy for all ten thousand low-energy conformations of the Trp-cage protein. Most of the low-energy conformations of the Trp-cage protein have Rg values between 1.41 and 1.46 as shown in Fig. 2. The Rg value of the global-minimum energy conformation is 1.4635, which is comparatively higher. However, the lowest-energy conformation in the majority group has a Rg value of 1.4631, similar to that of the global-minimum energy conformation. Figure 3 shows the distributions of the radius of gyration (Rg) as a function of the root-mean-square deviation (RMSD) for all ten thousand low-energy conformations of the Trp-cage protein. The native group of the low-energy conformations is clearly distinguished from the majority group of the low-energy conformations, as shown in Fig. 3. The center values of the majority group of the low-energy conformations in Fig. 3 are approximately RMSD=1.25 and Rg=1.44.

Figure 2. (Color online) Distributions of the radius of gyration (Rg) as a function of energy for all ten thousand low-energy conformations of the Trp-cage protein.

Figure 3. (Color online) Distributions of the radius of gyration (Rg) as a function of the root-mean-square deviation (RMSD) for all ten thousand low-energy conformations of the Trp-cage protein.

This section shows a few representative low-energy conformations of the Trp-cage protein in the three-dimensional off-lattice AB protein model. Figure 4 shows the stereographic view of the global-minimum energy conformation (E0=25.70396) whose radius of gyration is Rg=1.4635. As shown in Fig. 4, all hydrophobic residues of the global-minimum energy conformation form a hydrophobic core. Figure 5 shows the stereographic view of the lowest-energy conformation in the majority group of the Trp-cage protein whose energy is E=-25.69674, RMSD is 1.0680 (from the global-minimum energy conformation), and the radius of gyration is Rg=1.4631 (similar to that of the global-minimum energy conformation). The local structures of the lowest-energy conformation in the majority group are very similar to those of the global-minimum energy conformation, as shown in Fig. 5. However, the global arrangement of the local structures of the lowest-energy conformation in the majority group differs from that of the global-minimum energy conformation. The global-minimum energy conformation and the lowest-energy conformation in the majority group are rare conformations. They are shown here due to their energy values rather than their popularity.

Figure 4. (Color online) Stereographic view for the global-minimum energy conformation (E0=25.70396) of the Trp-cage protein whose radius of gyration is Rg=1.4635. Red and blue spheres correspond to hydrophobic (A) residues and hydrophilic (B) residues, respectively.

Figure 5. (Color online) Stereographic view for the lowest-energy conformation in the majority group of the Trp-cage protein whose energy is E=-25.69674, RMSD is 1.0680, and radius of gyration is Rg=1.4631.

Figure 6 shows the stereographic view for an average low-energy conformation of the Trp-cage protein whose energy is E=-24.95165, RMSD is 1.2221, and radius of gyration is Rg=1.4377 (more compact than the global-minimum energy conformation). The energy, RMSD, and radius of gyration values for this conformation are among the most common values in the ten thousand low-energy conformations. Figure 7 shows the stereographic view of the low-energy conformation with the minimum value of the radius of gyration of the Trp-cage protein whose energy is E=-24.86942, RMSD is 1.3066, and radius of gyration is Rg=1.3800. Although this is the most compact conformation, its energy is comparatively high.

Figure 6. (Color online) Stereographic view for an average low-energy conformation of the Trp-cage protein whose energy is E=-24.95165, RMSD is 1.2221, and radius of gyration is Rg=1.4377.

Figure 7. (Color online) Stereographic view for the low-energy conformation with the minimum value of the radius of gyration of the Trp-cage protein whose energy is E=-24.86942, RMSD is 1.3066, and radius of gyration is Rg=1.3800.

Figure 8 shows the stereographic view for the low-energy conformation with the maximum value of the radius of gyration of the Trp-cage protein whose energy is E=-24.82598, RMSD is 1.0920, and radius of gyration is Rg=1.5541. Although this conformation is not compact, its energy is similar to that of the most compact conformation. Figure 9 shows the stereographic view for the low-energy conformation with the maximum value of RMSD of the Trp-cage protein whose energy is E=-25.17824, RMSD is 1.7165, and radius of gyration is Rg=1.4321. Although this conformation is farthest from the global-minimum energy conformation, it is relatively compact and has comparatively low energy.

Figure 8. (Color online) Stereographic view for the low-energy conformation with the maximum value of the radius of gyration of the Trp-cage protein whose energy is E=-24.82598, RMSD is 1.0920, and radius of gyration is Rg=1.5541.

Figure 9. (Color online) Stereographic view for the low-energy conformation with the maximum value of the RMSD of the Trp-cage protein whose energy is E=-25.17824, RMSD is 1.7165, and radius of gyration is Rg=1.4321.

According to the folding-funnel theory[4-6,39] of protein folding, few conformations including the native tertiary structure, exist in a narrow and deep folding funnel, while very diverse conformations including conformations similar to the native tertiary structure lie outside the folding funnel, resulting in a rugged energy landscape. In some ways, our findings are consistent with the folding-funnel theory of protein folding.

Understanding protein folding from one-dimensional primary structure of a protein into its three-dimensional native tertiary structure is too difficult in its all-atom protein energy function. The three-dimensional off-lattice AB protein model is one of the most successful simplified protein models for understanding protein folding. A protein's hydrophobic residues fold it from its one-dimensional primary structure into its three-dimensional native tertiary structure. In the three-dimensional off-lattice AB protein model, twenty different residues are simplified into two types, the hydrophobic (A) residue and the hydrophilic (B) residue. Further, its energy function consists of the van der Waals interaction energy and the bending energy. We effectively searched more than one million three-dimensional conformations of the Trp-cage protein in the three-dimensional off-lattice AB protein model using the method of conformational space annealing, a powerful global optimization method. Among the more than one million three-dimensional conformations of the Trp-cage protein, we concentrated on ten thousand lowest-energy conformations. We investigated the energy landscape and the optimized three-dimensional structures of the Trp-cage protein by analyzing the ten thousand lowest-energy conformations of the Trp-cage protein.

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grant number: NRF-2017R1D1A3B06035840).

  1. W. E. Creighton, Proteins: Structures and Molecular Properties. 2nd ed. (W. H. Freeman and Company, New York, 1993).
  2. A. M. Lesk, Introduction to Protein Architecture. (Oxford University Press, Oxford, 2001).
  3. A. Tramontano, The Ten Most Wanted Solutions in Protein Bioinformatics. (Chapman & Hall, Boca Raton, 2005).
    CrossRef
  4. A. M. Lesk, Introduction to Protein Science. (Oxford University Press, Oxford, 2004).
  5. I. Roterman-Konieczna, Protein Folding in Silico. (Woodhead Publishing, Cambridge, 2012).
  6. R. Schweitzer-Stenner, Protein and Peptide Folding, Misfolding, and Non-Folding. (John Wiley & Sons, Hoboken, 2012).
    CrossRef
  7. C. B. Anfinsen, Science 181, 223 (1973).
    Pubmed CrossRef
  8. K. F. Lau and K. A. Dill, Macromolecules 22, 3986 (1989).
    CrossRef
  9. J. H. Lee, S.-Y. Kim and J. Lee, AIP Adv. 5, 127211 (2015).
    CrossRef
  10. F. Stillinger, T. Head-Gordon and C. L. Hirshfeld, Phys. Rev. E 48, 1469 (1993).
    Pubmed CrossRef
  11. H. P. Hsu, V. Mehra and P. Grassberger, Phys. Rev. E 68, 037703 (2003).
    Pubmed CrossRef
  12. M. Bachmann, H. Arkin and W. Janke, Phys. Rev. E 71, 031906 (2005).
    Pubmed CrossRef
  13. S.-Y. Kim, S. B. Lee and J. Lee, Phys. Rev. E 72, 011916 (2005).
    Pubmed CrossRef
  14. M. Chen and W. Q. Huang, J. Zhejiang Univ. Sci. B 7, 7 (2006).
    Pubmed KoreaMed CrossRef
  15. W. Q. Huang, M. Chen and Z. P. Lü, Phys. Rev. E 74, 041907 (2006).
    Pubmed CrossRef
  16. C. Zhang and J. Ma, Phys. Rev. E 76, 036708 (2007).
    Pubmed KoreaMed CrossRef
  17. J. F. Liu and W. Q. Huang, J. Comput. Sci. Technol. 22, 569 (2007).
    CrossRef
  18. X. Zhang and W. Cheng, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering, 184-187 (2008).
    CrossRef
  19. J. Kim, J. E. Straub and T. Keyes, Phys. Rev. E 76, 011913 (2007).
    Pubmed CrossRef
  20. J. W. Lee, K. H. Joo, S.-Y. Kim and J. Lee, J. Comput. Chem. 29, 2479 (2008).
    Pubmed CrossRef
  21. J. F. Liu, Chin. Phys. B 18, 2615 (2009).
    CrossRef
  22. X. L. Zhang et al, BMC Syst. Biol. 4, 6 (2010).
    Pubmed KoreaMed CrossRef
  23. Y. Wang, G. D. Guo and L. F. Chen, Mol. Biol. 47, 894 (2013).
    CrossRef
  24. T. Lipinski-Paes and O. N. de Souza, Electron. Notes Theor. Comput. Sci. 306, 45 (2014).
    CrossRef
  25. B. Li, R. Chiong and M. Lin, Comput. Biol. Chem. 54, 1 (2015).
    CrossRef
  26. S.-Y. Kim, Chaos Solit. Fractals 90, 111 (2016).
    CrossRef
  27. B. Bošković and J. Brest, Inf. Sci. 454, 178 (2018).
    CrossRef
  28. B. Li et al, J. Mol. Model. 21, 261 (2015).
    Pubmed CrossRef
  29. S.-Y. Kim, J. Phys.: Conf. Ser. 738, 012127 (2016).
    CrossRef
  30. J. W. Neidigh, R. M. Fesinmeyer and N. H. Andersen, Nat. Struct. Biol. 9, 425 (2002).
    Pubmed CrossRef
  31. K. H. Mok et al, Nature 447, 106 (2007).
    Pubmed KoreaMed CrossRef
  32. I.-H. Lee and S.-Y. Kim, BioMed Res. Int. 2013, 973867 (2013).
    Pubmed KoreaMed CrossRef
  33. J. Lee, H. A. Scheraga and S. Rackovsky, J. Comput. Chem. 18, 1222 (1997).
    CrossRef
  34. S.-Y. Kim, S. J. Lee and J. Lee, J. Chem. Phys. 119, 10274 (2003).
    CrossRef
  35. J. Lee et al, Proteins 56, 704 (2004).
    Pubmed CrossRef
  36. S.-Y. Kim, W. Lee and J. Lee, J. Chem. Phys. 125, 194908 (2006).
    Pubmed CrossRef
  37. S.-Y. Kim, J. Chem. Phys. 133, 135102 (2010).
    Pubmed CrossRef
  38. J. Lee et al, Nat. Commun. 8, 15443 (2017).
    Pubmed KoreaMed CrossRef
  39. K. A. Dill and H. S. Chan, Nat. Struct. Biol. 4, 10 (1997).
    Pubmed CrossRef

Stats or Metrics

Share this article on :

Related articles in NPSM