February 3, 2023

Medical Trend

Medical News and Medical Resources

Protein Folding Knowledge

Protein Folding Knowledge


Protein Folding Knowledge.

Protein folding is the process by which a protein obtains its functional structure and conformation.

Through this physical process, the protein folds from random coils into a specific functional three-dimensional structure.

When translating from the mRNA sequence into a linear peptide chain, the protein exists in the form of unfolded polypeptide or random coil.


Protein folding basic unit amino acid characteristics hydrophilic, hydrophobic, positively charged theoretical model framework model, hydrophobic collapse model, etc.

  1. Introduction
  2. Research overview
  3. Theoretical model
    ▪ Framework model
    ▪ Collapse model
    ▪ Bonding mechanism
    ▪ Growth model
    ▪ Imposition model
    ▪ Grid model
  4. molecular chaperone
  5. meaning
  6. Prospects
    ▪ Refolding of inclusion bodies
    ▪ Protein
    ▪ Pathogenic mechanism
    ▪ Reveal function
  7.  Diseases caused by protein folding




Structure determines function. Just knowing the sequence of the genome does not allow us to fully understand the function of a protein, let alone how it works. Proteins can assemble themselves in the cellular environment (specific pH, temperature, etc.) by virtue of their interactions. This self-assembly process is called protein folding.


The problem of protein folding is listed as an important subject of “Biophysics in the 21st Century”, and it is a major biological problem that has not been resolved by the Central Law of Molecular Biology. Predicting the tertiary structure of a protein molecule from the primary sequence and further predicting its function is a challenging task.


The study of protein folding, especially the early stages of folding, that is, the folding process of nascent peptides is a fundamental issue that finally clarifies the central principle. In this field, new discoveries in recent years have discovered the tradition of spontaneous folding of nascent peptides. The concept has been fundamentally revised.


Among them, X-ray crystal diffraction, various spectroscopic techniques, and electron microscopy techniques have played an extremely important role. At the 13th International Conference on Biophysics, Nobel Prize winner Ernst emphasized in his report that one of the main advantages of NMR for studying proteins is that it can study the dynamics of protein molecules in extremely detailed, that is, dynamic structures or structural The relationship between exercise and protein molecular functions.


The current NMR technology has been able to observe the movement of protein structures in the time domain from seconds to picoseconds, including the movement of the main chain and side chains, as well as the folding and unfolding of proteins under various temperatures and pressures.

The structural analysis of protein macromolecules is not only to solve a specific structure, but pay more attention to the fluctuation and movement of the structure. For example, enzymes and proteins that transport small molecules usually have two conformations, ligand-bound and unligand-bound.


Structural fluctuations within a conformation are a necessary prelude to conformational transformation. Therefore, it is necessary to combine spectroscopy, spectroscopy and X-ray structure analysis to study the balance of structural fluctuations, conformational changes and various intermediate states formed during the change.

For another example, in order to understand how a protein folds, it is necessary to know the time scale and mechanism of several basic processes during folding, including the formation of secondary structures (helix and folding), curling, long-range interactions, and unfolded peptides.

Total collapse. A variety of techniques are used to study sub-processes, such as fast nuclear magnetic resonance, fast spectroscopy techniques (fluorescence, far ultraviolet and near ultraviolet circular dichroism).



Research summary

In organisms, the flow of biological information can be divided into two parts: The first part is that the genetic information stored in the DNA sequence is transferred into the primary sequence of the protein through transcription and translation.

This is the transmission of one-dimensional information. The sub-codon mediates this transmission process; the second part is that the peptide chain undergoes the folding process of hydrophobic collapse, spatial twisting, and side chain aggregation to form the natural conformation of the protein, and at the same time obtains the biological activity, thereby expressing the life information; and the protein As the expression carrier of life information, the specific spatial structure formed by its folding is the basis for its biological function, that is to say, the transformation process of this one-dimensional information to three-dimensional information is necessary for the expression of vitality.


Since the 1960s, Anfinsen, based on the experimental results of reducing and denaturing bovine pancreas RNase without the help of any other substances, can restore its natural structure by removing the denaturant and reducing agent, and proposed that “the amino acid sequence of the polypeptide chain contains Since the “self-assembly theory” of “all the information necessary to form a thermodynamically stable natural conformation”, with the extensive development of protein folding research, people have further supplemented and expanded the theory of protein folding.

Anfinsen’s “Self-assembly Thermodynamic Hypothesis” has been proved by many in vitro experiments. There are indeed many proteins that can undergo reversible denaturation and renaturation in vitro, especially some small molecular weight proteins, but not all proteins are the same. And because of special environmental factors, the folding of proteins in the body is far from this.


The folding of proteins in the body often requires the participation of other cofactors and is accompanied by the hydrolysis of ATP. Therefore, in 1987, Ellis proposed the “assisted assembly theory” of protein folding. This shows that protein folding is not only a thermodynamic process, it is obviously also controlled by kinetics.

Some scholars have put forward the hypothesis that mRNA secondary structure may be used as a genetic code to influence protein structure based on the phenomenon that some proteins with similar amino acid sequences have different folding structures, while other proteins with different amino acid sequences are structurally similar.

But so far, there is no experimental evidence for this hypothesis, only some pure mathematical arguments [3]. So, how does the amino acid sequence of a protein determine its spatial conformation? Researchers have done a lot of excellent work on this issue, but so far our understanding of protein folding mechanism is still incomplete, and there are even erroneous opinions in some aspects.


A typical research example that has made important contributions in this regard is the study of the American C.B. Anfinson group on the denaturation and renaturation of bovine pancreatic ribonuclease.

Bovine pancreatic ribonuclease contains 124 amino acid residues and is composed of 8 sulfhydryl groups to form 4 pairs of disulfide bonds.

It can be calculated that there are 105 possible ways for the 8 sulfhydryl groups in the enzyme molecule to form 4 pairs of disulfide bonds, which provides a quantitative estimation index for refolding recombination.


Under mild alkaline conditions, 8 moles of concentrated urea and a large amount of mercaptoethanol can completely reduce the four pairs of disulfide bonds, the entire molecule becomes irregularly coiled, and the enzyme molecule is denatured.

Dialysis removes urea. In the presence of oxygen, the disulfide bond is re-formed, and the enzyme molecule is completely renatured.

The paired sulfhydryl groups in the disulfide bond are the same as natural. The renatured molecule can be crystallized and has the same X-rays as the natural enzyme crystal.

Diffraction patterns confirm that the enzyme molecule not only refolds spontaneously during the renaturation process, but also only selects one of 105 possible disulfide bond pairing modes.




Theoretical model


Framework Model

The framework model [4] assumes that the local conformation of a protein depends on the local amino acid sequence.

In the initial stage of the polypeptide chain folding process, unstable secondary structure units are formed rapidly; called “flickering clusters”, and then these secondary structures come close to contact to form a stable secondary structure framework; finally, the secondary structure The frames are spliced ​​together, and the peptide chains are gradually tightened, forming the tertiary structure of the protein.

This model believes that even a small molecule protein can be folded part by part, and the subdomain formed during it is an important structure of the folding intermediate.


Hydrophobic Collapse Model

In the hydrophobic collapse model [5], the hydrophobic force is considered to be the decisive factor in the protein folding process. Before any secondary and tertiary structures are formed, a rapid non-specific hydrophobic collapse occurs first.


Diffusion-Collision-Adhesion Model

The model believes that the folding of a protein starts at several sites on the stretched peptide chain, and unstable secondary structural units or hydrophobic clusters are generated at these sites, mainly relying on the progress of the local sequence or the mid-range (3-4 Residues) interact to maintain.

They spread, collide, and adhere to each other in a non-specific Brownian motion, leading to the formation of large structures and therefore increasing stability.

Further collisions form a spherical structure of a molten spherical intermediate with a hydrophobic core and a secondary structure.

The spherical intermediate is adjusted to a dense, inactive, highly ordered molten spherical structure similar to the natural structure.

Finally, the inactive, highly ordered molten spherical state is transformed into a complete vigorous natural state.


Nuclear-Condensation-Growth Model

According to this model, a certain region in the peptide chain can form a “folding nucleus”, with them as the core, the entire peptide chain continues to fold to obtain a natural conformation.

The so-called “crystal nucleus” is actually a network structure similar to natural interactions formed by some special amino acid residues.

These residues are not maintained by non-specific hydrophobic interactions, but by specific interactions.

The residues formed a tight packing. The formation of crystal nuclei is the rate-limiting step in the initial stage of folding.


Jig-Saw Puzzle Model

The central idea of ​​this model [9] is that the polypeptide chain can be folded along multiple different pathways. In the process of folding along each pathway, there are more and more natural structures, and eventually a natural conformation can be formed. The folding speed of the pathways is faster.

Compared with the single pathway folding method, the polypeptide chain speed is faster. On the other hand, small changes or mutations in the external physiological and biochemical environment may have a greater impact on the single folding pathway.

For folding methods with multiple pathways, these changes may affect one folding pathway, but will not affect other folding pathways, and therefore will not interfere with the folding of the polypeptide chain as a whole, unless these factors cause changes It is too large to fundamentally affect the folding of the polypeptide chain.



The grid model (also referred to as the HP model) was first proposed by Dill et al. in 1989. The grid model can be divided into two types: two-dimensional model and three-dimensional model.


The two-dimensional grid point model is to generate orthogonal unit-length grids in a planar space. Each amino acid molecule is placed on the intersection of these grids in the order of the sequence. The adjacent amino acid molecules in the sequence When placed in the grid, it must be adjacent, that is, the distance between adjacent amino acid molecules in the grid model is 1.

However, it should be noted that each intersection in the grid can only place one amino acid molecule at most. If an amino acid molecule in the sequence has been placed at this position, the subsequent amino acid molecule can no longer be placed at this position. Grid point.

If during the process of placing the amino acid molecule, there is no place for the amino acid molecule to be placed, it means that the configuration is unreasonable and needs to be repositioned. The three-dimensional grid point model is similar to the two-dimensional grid point model.


It is a three-dimensional grid of unit length generated in a three-dimensional space. The method of placing amino acid molecules in the grid is the same as that of the two-dimensional, but when placing amino acid molecules in the two-dimensional grid model, there are only three directions to choose from except the first two amino acid molecules in the sequence, which is complicated in the three-dimensional grid model. The degree has increased a lot, and there are up to five directions for placing amino acid molecules.






In 1978, Laskey conducted in vitro physiological ionic strength experiments for histones and DNA and found that there must be an acidic protein in the nucleus, nucleoplasmin, before the two can assemble into nucleosomes, otherwise Precipitation occurs. According to this, Laskey calls it a “molecular chaperone.”

Molecular chaperone refers to the one that can bind and stabilize the unstable conformation of another protein, and can promote the folding of new polypeptide chains, the assembly or degradation of polymers, and the transmembrane transport of organelle proteins through controlled binding and release. Protein-like [10,11].


Molecular chaperones are defined in terms of function. All proteins with this function are molecular chaperones, and their structures can be completely different.

This concept has now been extended to many proteins, and the molecular chaperones that have been identified mainly belong to three types of highly conserved protein families [12]: stress 90 family, stress 70 family, and stress 60 family. Among them, the stress 60 family exists in the mitochondria of eukaryotes (called Hsp58 in mammals) and chloroplasts (called cpn60).

In the cytoplasm of prokaryotes, it is called GroEL.






The clarification of the protein folding mechanism will reveal the second set of genetic codes in life, which is its theoretical significance. The research of protein folding, in a narrower sense, is to study the law of formation of a specific three-dimensional structure of protein, its stability and its relationship with its biological activity. Conceptually, there are thermodynamic and kinetic problems; protein folding in vitro and intracellular folding; there are theoretical and experimental research problems.


The most fundamental scientific question here is how does the primary structure of the polypeptide chain determine its spatial structure? Since the former determines the latter, there must be a certain definite relationship between the primary structure and the spatial structure. Is there a set of codes like nucleotides that determine the sequence of amino acids through the “triple code”? Some people call this imaginary code that determines the spatial structure of the primary structure as the “second genetic code.”


If the “triple code” has been deciphered but has actually become a clear code, then deciphering the “second genetic code” is the most direct theoretical solution to the folding problem of protein in the theory of protein structure prediction. This is the last few days of protein research. One of the unrevealed mysteries. “Protein structure prediction” is a theoretical thermodynamic problem. It is to predict the specific spatial structure determined by the Anfinsen principle based on the measured primary sequence of the protein.


The determination of protein amino acid sequence, especially the nucleotide sequence encoding protein has almost become a routine technique. From complementary DNA (cDNA) sequence, amino acid sequence can be deduced according to the “triple code”. These molecular organisms have made major breakthroughs in the last century. Learning technology has greatly accelerated the determination of protein primary structure. At present, there are about 170,000 primary structures of proteins in the protein database, but only about 12,000 proteins whose spatial structure has been determined.


Many of these proteins are very similar homologous proteins, while the only truly different proteins are More than 1,000. With the successful completion of the Human Genome Project and the interpretation of the entire sequence of human DNA, the data growth of the protein primary structure will inevitably explode, and the speed of spatial structure determination is far behind, so there will be more changes between the two. Large distances require more prediction of protein structure.






At the same time, it also has important potential application prospects, such as the following aspects:


Inclusion body renaturation

▲The use of DNA recombination technology can introduce foreign genes into host cells. However, the expression products of recombinant genes often form inactive and insoluble inclusion bodies. The clarification of the folding mechanism will be of great help to the renaturation of inclusion bodies.



▲The development of DNA recombination and peptide synthesis technology enables us to design longer peptide chains according to our wishes. But because we cannot know what conformation this polypeptide will fold into, we cannot design the protein we need with specific functions according to our wishes.


Pathogenic mechanism

▲Many diseases, such as Alzheimer’s, Mad Cow (BSE), transmissible spongiform encephalopathy (CJD), amyotrophic lateral sclerosis (ALS), and Parkinson’s disease (Parkinson’s), etc., are caused by mutations in some important proteins in cells, leading to protein aggregation or misfolding. Therefore, an in-depth understanding of the relationship between protein folding and misfolding will be of great help to the elucidation of the pathogenic mechanism of these diseases and the search for treatment methods.


Reveal function

▲The development of genome sequence has enabled us to obtain a large number of protein sequences. The acquisition of structural information is very important for revealing their biological functions.

Relying on the existing methods (X-ray crystal diffraction, NMR, and electron microscopy) to determine the structure of a protein requires a long time, so the pace of structural analysis has lagged behind the pace of discovery of new proteins.

Although the structure prediction method is fast, it is not reliable. Only when we have a better understanding of the physical and chemical factors that maintain protein structure and drive protein folding, can this method be fundamentally improved.

In addition, our research on the relationship between structure and function, such as protein interaction, the role of ligands and proteins, also depends on the elucidation of protein folding mechanisms.


Diseases caused by Protein Folding

The amino acid sequence of a protein molecule does not change, but changes in its structure or conformation can also cause diseases, which are called “conformational diseases” or “folding diseases”.


Mad cow disease is caused by the infection of Prion protein, which can also infect people and cause neurological diseases. In a normal body, Prion is a protein required for normal neural activity, and the primary structure of disease-causing Prion and normal Prion are exactly the same, but the spatial structure is different.


Diseases caused by molecular aggregation or even precipitation or failure to be transported in place due to abnormal protein folding include Alzheimer’s disease, cystic fibrosis, familial hypercholesterolemia, familial amyloidosis, certain tumors, cataracts, etc. Wait. Due to the crucial role of molecular chaperones in protein folding, mutations in the molecular chaperone itself will obviously cause abnormal protein folding and cause folding diseases.


With the deepening of protein folding research, scientists will discover the true causes of more diseases, more targeted treatments, and design more effective drugs.





(source:internet, reference only)

Disclaimer of medicaltrend.org