[proteamdavis] Fwd: steric effects on protein folding

  • From: Paul Limb <paulimb@xxxxxxxxx>
  • To: proteamdavis@xxxxxxxxxxxxx
  • Date: Sat, 3 Jul 2004 11:23:33 -0700

---------- Forwarded message ----------
From: Paul Limb <paulimb@xxxxxxxxx>
Date: Mon, 28 Jun 2004 17:27:16 -0700
Subject: Fwd: steric effects on protein folding
To: jtmorgan@xxxxxxxxxxx

---------- Forwarded message ----------
From: Paul Limb <paulimb@xxxxxxxxx>
Date: Mon, 28 Jun 2004 10:12:53 -0700
Subject: steric effects on protein folding
To: paulimb@xxxxxxxxx

A physical basis for protein secondary structure
Rajgopal Srinivasan* and George D. Rose=E2=80=A0

Department of Biophysics and Biophysical Chemistry, Johns Hopkins
University School of Medicine, 725 North Wolfe Street, Baltimore, MD

Communicated by Carl Frieden, Washington University School of
Medicine, St. Louis, MO, October 1, 1999 (received for review July 8,

* Present address: Jenkins Department of Biophysics, Johns Hopkins
University, 3400 North Charles Street, Baltimore, MD 21218.
=E2=80=A0 To whom reprint requests should be addressed. E-mail

This article has been cited by other articles in PMC.


A physical theory of protein secondary structure is proposed and
tested by performing exceedingly simple Monte Carlo simulations. In
essence, secondary structure propensities are predominantly a
consequence of two competing local effects, one favoring hydrogen bond
formation in helices and turns, the other opposing the attendant
reduction in sidechain conformational entropy on helix and turn
formation. These sequence specific biases are densely dispersed
throughout the unfolded polypeptide chain, where they serve to
preorganize the folding process and largely, but imperfectly,
anticipate the native secondary structure.


Elements of secondary structure=E2=80=94=CE=B1-helix, =CE=B2-sheet, and tig=
ht turns=E2=80=94are
ubiquitous in proteins (1). What is the physical reason for their
pervasive occurrence? Do these patterns arise as a direct consequence
of formative interactions within the elements themselves (i.e.,
locally determined), or are they an indirect consequence of longer
range interactions (i.e., globally determined)?

Surprisingly, the field lacks a simple physicochemical theory of
secondary structure in peptides and proteins (2, 3). Instead,
prediction methods tend to be based on statistical likelihoods (4) or,
more recently, on neural nets (5). Alternating patterns of hydrophilic
and hydrophobic residues have been noted in amphipathic helices and
strands (6, 7), but the interactions they engender are exerted
primarily within folded proteins and fail to explain the appearance of
corresponding structures in isolated peptides. Statistical mechanical
treatments (see, e.g., ref. 8) of secondary structure can be effective
(9) but require numerous adjustable, empirical parameters. Surely, the
absence of a simple physical theory of secondary structure has
contributed to the continuing suspicion that none exists.

Yet, numerous experiments on the kinetics of protein folding show that
native-like secondary structure elements form early and rapidly,
before substantial tertiary organization. Still, such elements might
be statistical accidents that play little or no role in guiding
subsequent folding events.

Here, we propose a physical theory for secondary structure based on
sterics and local interactions. Our findings demonstrate that local,
intrinsic, sequence-dependent biases to be in helix, strand, and turns
are densely dispersed throughout the polypeptide chain and are
unlikely to be merely accidental (2, 10). At root, these biases are
grounded in sterics (11), the most important organizing factor in
protein conformation (12). Work in this area began with Sasisekharan
(13) and Ramachandran (14), who showed that the conformational space
available to amino acids is highly restricted. All residues except
glycine and proline are largely constrained to occupy either of two
mainchain regions. In one, the polypeptide chain is contracted; in the
other, it is extended. Apart from these two, remaining alternatives
are disfavored because of steric interference.

In essence, secondary structure bias is largely a consequence of the
balance between two opposing local forces that govern the position of
equilibrium between these two mainchain states. The competing forces
are attractive local interactions vs. sidechain conformational
restriction. The former is enthalpic and favorable; the latter is
entropic and unfavorable. Contracted conformations are compatible with
local hydrogen bonds=E2=80=94both mainchain=E2=80=93mainchain and mainchain=
chain=E2=80=94but the bulky backbone can interfere with sidechain flexibili=
Steric interference between mainchain and side chains is relieved in
extended conformations, but hydrogen bonds are sacrificed in this
state. In some cases, short polar side chains can compensate for loss
of conformational freedom by forming hydrogen bonds to the backbone.
The equilibrium between these two states=E2=80=94contracted and extended=E2=
sequence-specific because sidechains differ in their steric
characteristics and ability to form hydrogen bonds (15=E2=80=9317). Glycine
and proline add further complexity to this picture because their
backbone geometry differs from that of the other 18 residues, but no
additional principles need be invoked.

This physical explanation is applicable to both repetitive and
nonrepetitive secondary structure. In repetitive structures=E2=80=94helix a=
strand=E2=80=94the energetic "tug-of-war" is largely between sidechain
conformational entropy and mainchain hydrogen bonding. In
nonrepetitive structure=E2=80=94tight turns (18)=E2=80=94the peptide chain =
contracted, similar to a single turn of helix, and sidechains may
clash with the bulky backbone, but stabilizing sidechain-to-mainchain
hydrogen bonds can provide energetic compensation.

Driven primarily by sterics and local hydrogen bonds, these secondary
structure biases are expected to emerge in the unfolded state and to
preorganize all subsequent folding events. Segments with strong biases
are poised to form persisting structure, especially when fortified by
additional stabilizing interactions.

We test these ideas by performing short Monte Carlo simulations using
LINUS (19) for a diverse set of experimentally interesting proteins.
Computer simulations are an especially effective tool in this regard
because, unlike actual experiments, only interactions of interest are
included; all others can be eliminated. As described below, we find
that sterics and local interactions are sufficient to engender
pronounced conformational biases that largely, but imperfectly,
anticipate the native secondary structure of the protein.


Protein conformational space is explored by using a conventional
Metropolis Monte Carlo procedure (20). Initially, the starting
conformation, C, is set to an extended chain. Progressing from the
amino to the carboxy terminus, successive residues, taken three at a
time, are perturbed at random, using a predefined move set, to produce
a trial conformation, C=E2=80=B2. Next, C=E2=80=B2 is evaluated: if free of=
clash and if application of the Metropolis criterion leads to
acceptance, C is set to C=E2=80=B2. Otherwise, C=E2=80=B2 is rejected and C=
retained. A "cycle" is said to be completed when the chain has been
traversed from one end to the other, using this procedure. On
completion of every cycle, the structure is saved. All proteins were
simulated three times, 1,000 cycles per simulation. Additional details
are given below.

Chain Geometry. Each residue, except glycine, is represented by
alanine: specifically, four backbone atoms (N, C=CE=B1, C=E2=80=B2, O) and =
=CE=B2-carbon (C=CE=B2). Also, each residue, except glycine and alanine, ha=
either one or two side chain pseudoatoms, depending on whether the
side chain is =CE=B2-branched. In particular, valine, threonine, and
isoleucine have two additional side chain atoms; others have only one.
All relevant geometric parameters for each amino acid are given in
Table 5, published as supplemental data on the PNAS web site,

Scoring Function. The scoring function used in the Metropolis
criterion consists of four terms, one repulsive and three attractive:
steric clashes are penalized and hydrogen bonds, hydrophobic contacts,
and salt bridges are all rewarded. To preclude nonlocal effects,
attractive forces are limited to nearby chain neighbors. Specifically,
the three attractive terms are evaluated only between amino acids
separated by no more than five residues in sequence. These four terms
are now described explicitly.

Electronic clouds of atoms are not allowed to overlap. Accordingly,
all conformations with a steric clash are rejected. Atomic radii are
given in the supplemental data.

An H-bond of maximal strength (0.5 units) is assigned to residues i
and j when the distance between the amide nitrogen of i and the
carbonyl oxygen of j is =E2=89=A43.5 =C3=85, and the out-of-plane dihedral =
O(j) =E2=88=92
N(i) =E2=88=92 CA(i) =E2=88=92 C(i =E2=88=92 1) > 140=C2=B0. This score sca=
les linearly to 0.0 as
the distance between donor and acceptor increases from 3.5 to 5.0 =C3=85.
All backbone amide nitrogens (except proline) are considered H-bond
donors, and all backbone carbonyl oxygens are considered H-bond
acceptors. Additionally, the side chains of Ser, Thr, Asn, Asp, Gln,
and Glu are also considered H-bond acceptors, with a maximal score of
1.0 unit. Two additional restrictions also apply: (i) a donor and
acceptor must be at least three residues apart in sequence, and (ii)
no donor can participate in more than one H-bond.

A hydrophobic contact is assigned between side chain carbon atoms i
and j of two residues when

where radius x is the atom's contact radius. The maximal value is
realized when the two atoms are in contact, and it scales linearly to
zero as the separation distance increases to 1.4 =C3=85. The maximal value
is 0.5 units when both residues are hydrophobic (Cys, Ile, Leu, Met,
Phe, Trp, Val), 0.25 units when one residue is hydrophobic and the
other is amphipathic (Ala, His, Thr, Tyr), and 0.0 units for all other

A salt bridge is assigned to contacts between oppositely charged
groups (namely, Arg or Lys with Glu or Asp), with a maximal strength
of 0.5 units that scales linearly to 0.0 over a separation interval of
1.4 =C3=85.

Move Set.  LINUS uses a "smart" move set in which three consecutive
residues are perturbed simultaneously. Initially, a move consists of
choosing one of four equiprobable categories (19) at random: =CE=B1-helix,
=CE=B2-strand, =CE=B2-turn, and random coil. Side chain torsion values are
chosen at random in the range [0=C2=B0, 359=C2=B0]. Both =CE=B2-turn and ra=
ndom coil
moves have multiple subcategories. Four =CE=B2-turn types are included:
types I, I=E2=80=B2, II, and II=E2=80=B2. A =CE=B2-turn move defines the co=
nformation of
two consecutive residues uniquely, with the third residue set to a
randomly chosen value. Specifically, a three residue sequence i-j-k
would have either i-j or j-k set to a =CE=B2-turn conformation, with k or
i, respectively, chosen randomly, resulting in eight possibilities.

To extract biases, secondary structure is assigned for all 1,000 saved
conformers in a simulation, using the procedure outlined below. This
ensemble is evaluated, and for every residue the fraction of
conformers in each of the four secondary structures is determined.
This fraction is a statistical weight, the probability that the given
residue will adopt one of the four secondary structures: helix,
strand, turn, or coil. We note in passing that an earlier version of
LINUS enforced biases by "freezing" the chain, an undesirable strategy
that abolished reversibility. The current protocol, which uses
LINUS-evolved biases as sample weights, does not suffer from this

Secondary Structure Assignment. Secondary structure is assigned to
protein conformation based solely on backbone torsion angles; hydrogen
bonding considerations are excluded deliberately. Our assignment
criteria are suited to simulations in which only sequentially local
interactions between residues are allowed, a restriction that
precludes formation of =CE=B2-sheet or other H-bonded interactions between
sequentially distant residues. If an H-bond based method, such as DSSP
(21), were used to assign secondary structure, then =CE=B2-strands would
evade detection.

Backbone conformation space is partitioned into 36 coarse-grained
bins, each represented by a letter code (Table 1). Initially, , y, and
=CF=89 values for each residue are computed and mapped into the closest
letter code. Conformation codes are then mapped into a secondary
structure class. Three codes (M, O, R) belong to two classes; 28 codes
belong to no class. Secondary structure classes are S =3D { A,F,G,L,M,R
}; H =3D { O }; T =3D { J,O,P }; T=E2=80=B2 =3D { j,o,p }; U =3D { M,R }; a=
nd U=E2=80=B2 =3D {
m, r }.

Progressing along the sequence, conformation codes for each triple of
consecutive residues,  j, j + 1, j + 2 , are used to classify the
central residue, j + 1, into the first applicable category satisfying
one of the following definitions:


The simulation protocol described in Methods has been applied to
dozens of proteins, with a similar degree of success in all cases.
Twelve molecules were selected for presentation here, based on their
perceived interest to the experimental folding community: (i)
chymotrypsin inhibitor [3ci2], (ii) intestinal fatty acid binding
protein [1ifb], (iii) phage lysozyme [2lzm], (iv) myoglobin [1mbo],
(v) myohemerythrin [2hmq], (vi) plastocyanin [6pcy], (vii) protein G
[1gb1], (viii) ribonuclease A (7rsa), (ix) ribonuclease S-peptide, (x)
ribonuclease H [2rn2], (xi) staphylococcal nuclease [1stg], and (xii)
ubiquitin [1ubq]. Protein Data Bank ID codes (22) are given in square
brackets. In every case, three sets of simulations were performed,
each with uniform sample weights. Little variation was seen in the
final sample weights among the three sets. Accordingly, the weights
from all three were averaged for presentation (see Fig. 2, published
as supplemental data on the PNAS web site, www.pnas.org). In each
protein, local biases extracted from simulations suggest the actual
secondary structure, though imperfectly.

We seek to compare these simulations to corresponding experimental
data. Given the nature of the simulations=E2=80=94local interactions and
sterics=E2=80=94perhaps the ideal data for comparison would be the populati=
that emerges in the dead time of most experiments, an elusive
quantity. Fragment studies are also revealing, when available.
Equilibrium folding studies of partially folded states are useful as

Of course, comparison with the native structure is irresistible.
Detailed comparisons are given in Table 2. For each secondary
structure element in every protein, Table 2 lists the fraction of
conformers in helix, strand, turn, and coil. In Table 3, the standard
errors for native segments computed from 10 independent simulations
are shown for two proteins, myoglobin and GB1. The examples represent
worst-case and typical-case LINUS simulations, respectively; in either
case, standard errors are slight.

Fig. 1 summarizes these data for the 36 helices, 63 strands, and 74
turns in the total set of proteins. In our simulations, sequences
corresponding to actual helices have helical biases that range between
4 and 78%. With one exception, all such sequences populate helical
conformers in at least 10% of the ensemble, and half of the sequences
populate helical conformers in at least 35% of the ensemble. Sequences
corresponding to actual strands have even stronger biases, ranging
between 15 and 93%. All but four populate strand conformers in at
least one-third of the ensemble. Sequences corresponding to actual
turns have turn biases that range between 1 and 38%. Although weaker
than both helices and strands, all but eight populate turn conformers
in at least 10% of the ensemble.

Often, the sum of turn and helix weights is high, indicating a
contracted conformation, although not specifically a =CE=B2-turn or
=CE=B1-helix. In fact, there is only a slight difference in conformation
between a turn of helix and a Type I or Type III peptide chain turn.
Accordingly, Fig. 1 also plots generalized turns, defined as the sum
of contracted conformations (i.e., helix + turn biases). Sequences
corresponding to actual turns have generalized turn biases ranging
between 2 and 76%; with one exception, all exceed 10%, and all but 12
exceed 25%.

Fig. 1 and Tables 2 and 3 demonstrate that a pronounced bias toward
the native conformation is detectable in almost every element of
secondary structure, despite the simplicity of these simulations and
the absence of all long range attractive interactions. To be sure, the
native structure does not necessarily have the highest weights in
every case. Segments in which either helix or strand bias toward a
non-native conformation exceeds that of the native conformation are
annotated with an asterisk in Tables 2 and 3. In this regard, it is
important to emphasize that these simulations should not be viewed as
a secondary structure prediction algorithm. Rather, they are only
intended to test our physical explanation for secondary structure
formation based on sterics and short-range attractive interactions,
particularly hydrogen-bonding. As seen in Fig. 1, a substantial bias
toward the native conformation is present in almost every case. It can
happen that segments with locally high helix or strand weights undergo
a conformational transition when longer range interactions are
included, but this issue is not addressed here.

Chymotrypsin Inhibitor. Chymotrypsin inhibitor has been studied
extensively by Fersht and coworkers (23), who find that the only
region with structure before the transition state is near the helix N
terminus (namely, residue 16). The simulations reveal such a bias,
along with other features of the native protein.

Intestinal Fatty Acid Binding Protein. Consistent with NMR studies
(24), biases for the second helix are weak. However, residues 67=E2=80=9373=
, a
=CE=B2-strand in the folded protein, have a clear helix/turn bias in the
simulations, and, to our knowledge, no other experimental data is
available about this site.

T4 Phage Lysozyme. Using pulsed hydrogen exchange, Lu and Dahlquist
(25) find that helices A and E, together with the N-terminal =CE=B2-sheet,
form an early folding intermediate. Although not the most prominent
simulated bias, helix E is readily apparent, as is the N-terminal
=CE=B2-sheet. Biases for helix A exhibit considerable turn/helix weights.
This N-terminal helix belongs to the C-terminal domain (26), but our
simulations are too local to include contributions from such
interactions. Both helices D and H have simulated high strand weights;
neither appears to be involved in formation of the early intermediate

Myoglobin. The structure of apomyoglobin has been studied extensively
by NMR (27). In equilibrium studies, Wright and coworkers (28)
characterized progressively folded states of the molecule. In their
hierarchic picture of the folding dynamics, helices A, D, and H are
the first to emerge; all have clear helical biases in simulation. In
contrast, helical bias is conspicuously absent in the region of the G
helix. A peptide fragment corresponding to this region was studied
experimentally by Waltho et al., who found "little propensity for
helix formation in aqueous solution" (ref. 29, p. 6346).

Myohemerythrin. This four-helix bundle protein was studied by Dyson et
al. (30), who synthesized peptide fragments that cover the molecule
and analyzed their conformational preferences by NMR. Fragments
corresponding to the native helices exhibit clear preferences for
helix-like conformations, which are more pronounced in the A and D
helices, and less pronounced in the B and C helices. Simulated biases
show the opposite tendency: regions corresponding to the B and C
helices have higher helical weights than those corresponding to the A
and D helices.

Plastocyanin. The native structure is a Greek key =CE=B2-barrel. Barrel
staves bracketed by turns are well delineated by the biases, despite a
complete absence of interstrand hydrogen bonds, which are precluded by
our simulation protocol. The region of non-native turn/helix bias
surrounding residue 60 was observed in NMR experiments of Dyson et al.
(31), who studied the conformational preferences of peptide fragments
that cover the molecule. They noted conspicuous "prepartitioning of
the conformational space sampled by the polypeptide backbone" (ref.
31, p. 819) in these isolated peptides.

Protein G B1 Domain. Fragment studies of Blanco and Serrano (32)
confirm a tendency to populate native-like conformations in peptides
corresponding to both the initial and final =CE=B2-hairpins and the central
helix. Simulation biases also reflect these tendencies.

Ribonuclease A and S-Peptide. Ribonuclease S-peptide (33), residues
1=E2=80=9320, is the progenitor of all peptide fragment studies, and the st=
signal for the N-terminal helix (residues 3=E2=80=9313) is known to be
preserved in the isolated peptide (34). In our simulation, a bias
toward helix spans the first two helices but continues through the
interconnecting nonhelical region. Puzzled by this result, S-peptide
was simulated in isolation; the stop signal is apparent in this case,
as shown in Fig. 2 in the supplemental data.

Ribonuclease H. Summarizing multiple kinetic and equilibrium
experiments, Chamberlain and Marqusee (35) find a self-consistent
hierarchic folding pathway for the molecule in which helices A and D
fold first and are then augmented by helix B and =CE=B2-strand 4. Each of
these regions has pronounced, native-like biases. In fact, the only
discrepant region between the native structure and the simulated
biases is around residues 78=E2=80=9382, corresponding to an irregular kink
between helices B and C.

Staphylococcal Nuclease. Wang and Shortle (36) synthesized several
fragments, one of them corresponding to residues 92=E2=80=9399, which overl=
residues 87=E2=80=9393, a =CE=B2-strand in the x-ray structure with signifi=
helical weights in the simulation (see supplemental data).
Unfortunately, no conclusion can be drawn because the region of
overlap is slight and the synthesized fragment has a residue
substitution (I92G).

Ubiquitin. Fragment studies of Cox et al. (37) using CD and NMR show a
marked tendency toward native-like structure in the molecule's
N-terminal half but not in the C-terminal half. Notably, the
N-terminal =CE=B2-hairpin (residues 1=E2=80=9317) can be detected in the A-=
In another study, Mu=C3=B1oz and Serrano (9) synthesized a fragment
(residues 62=E2=80=9376) that includes the final strand of =CE=B2-sheet (re=
65=E2=80=9371) and found it to have modest (=E2=89=888%), non-native helica=
l content
by CD. Both studies are consistent with the simulation biases.

Our simulations include additional details not presented here. Among
them, regions with high turn weights can be assigned to specific turn
types (38) from their backbone dihedral angles. To better understand
the physical basis for turns, a separate series of host-guest turn
simulations was conducted (see Fig. 3 in the supplemental data).

Turn Simulations. A 14-residue host sequence (Val5
-Ala-Pro-Gly-Ala-Val5) with a central turn-forming sequence (namely,
Pro-Gly) was simulated by using the protocol described in Methods. Six
guest residues were introduced at position six to probe
residue-specific effects: Asp, Asn, Ser, Leu, Glu, and Thr. Relative
to the alanyl host, Ser, Asp, Asn, and Leu increase the turn
propensity of the Pro-Gly sequence whereas Glu and Thr decrease the
turn propensity. For Ser, Asp, and Asn, the preferred turn
conformation is Type I or III, either of which enables the guest
residue sidechain to form a stabilizing hydrogen bond with the
backbone amide of Gly (i + 2) and/or Ala (i + 3). For Leu, Ala, and
Glu, which lack side chain to mainchain hydrogen bonds, the preferred
turn conformation is Type II. Thr does not show a marked preference.
In the case of Leu, a hydrophobic contact (in lieu of an H-bond) can
be made with Ala (i + 3) or Val (i + 5). Details are summarized in
Table 4 and in the supplemental material.

These simulated turn preferences are consistent with the usual
turn-formers, namely, Asp, Asn, and Ser (38, 39), and they arise for
understandable physical reasons (e.g., hydrogen bonding). LINUS
simulations are sensitive enough to distinguish between Asp, which
forms sidechain-backbone H-bonds readily, and Glu, which fails to do
so. The simulations also show that even a nonturn former, e.g., Leu,
can nonetheless stabilize a turn by using a hydrophobic interaction.


Our central purpose in this paper has been to demonstrate that
pronounced biases toward protein secondary structure are present in
natural protein sequences, that these biases have a discernible
physical basis, and that their existence begs reinterpretation of
current folding models. Unlike more sophisticated simulations that use
a comprehensive potential function=E2=80=94e.g., ref. 40=E2=80=94the biases=
 evident in
Tables 2 and 3 are a consequence of sterics and local interactions;
longer range interactions were suppressed in the simulation protocol.
In every case, these biases largely, albeit imperfectly, anticipate
the observed secondary structure of the folded molecule. In several
cases in which the LINUS-evolved biases differ from native secondary
structure and in which data describing early folding intermediates are
available, the simulations are consistent with these experimental data
(e.g., myoglobin, plastocyanin, and ubiquitin).

There has been considerable debate in the literature about whether
secondary structure formation is an early folding event (2). The
simulations shown here=E2=80=94together with dozens of others that were
conducted but not presented=E2=80=94confirm that sterically driven segments=
nascent secondary structure can emerge in the unfolded state and
preorganize all subsequent folding events.

If these simulations reproduced early folding events reliably, then
chain regions with a strong bias toward the "wrong" secondary
structure could signal the presence of a non-native intermediate. This
need not be true for discrepancies involving weak biases, which may
simply have lacked ample opportunity to develop. However, a strong
bias toward a discrepant contracted conformation=E2=80=94such as bias towar=
helix in a known =CE=B2-strand=E2=80=94would indicate the presence (though =
not the
stability) of an early, non-native intermediate; examples include the
non-native helices in intestinal fatty acid-binding protein and
plastocyanin, described in the previous section, or those in
=CE=B2-lactoglobulin, described in the review by Baldwin and Rose (3).

Conformational biases arise for several reasons, but the primary
factor involves steric interplay between the =CE=B1- and =CE=B2-regions of =
,y map. The =CE=B1-region (near  =3D =E2=88=9260=C2=B0, y =3D =E2=88=9240=
=C2=B0) is compatible with the
formation of local hydrogen bonds, but in this contracted state,
sidechains tend to clash with local backbone, resulting in unfavorable
conformational restriction. The price of restriction is measured as
loss of sidechain conformational entropy (11, 41). As that price
mounts, chain segments are driven toward the remaining alternative,
the =CE=B2-region (near  =3D =E2=88=92120=C2=B0, y =3D +130=C2=B0), an exte=
nded conformation in
which steric clash between sidechain and backbone is relieved.

In this physical context, =CE=B2-strand is appropriately regarded as
authentic secondary structure, even in the absence of a
hydrogen-bonded partner strand. Accordingly, =CE=B2-sheet, comprised of two
or more H-bonded =CE=B2-stands, is more appropriately classified as
tertiary structure, in that it involves the spatial organization of
multiple =CE=B2-strands, which are often removed from each other in
sequence. This distinction=E2=80=94or the lack of it=E2=80=94has spawned co=
confusion about suitable procedures to identify secondary structure
from atomic coordinates (42) and motivated our own approach (in
Methods), which is based solely on dihedral angles, not hydrogen

The conformational biases were extracted from Monte Carlo simulations
in which all moves are weighted equally. As such, these values almost
certainly underestimate the true bias in the protein. A better
estimate could have been obtained by using the extracted biases as
weights in another round of simulation. In fact, our simulations are
typically run by using just such a protocol. However, the simpler
protocol was adopted here deliberately because nothing more
complicated than that is needed to demonstrate the existence of
sharply differentiated, broadly dispersed chain bias.

Many proteins are found to adopt molten globule intermediates (43) at
low pH, a state having substantial secondary structure but lacking in
specific tertiary interactions. In this regard, the existence of
nascent secondary structure segments, as described here, anticipates
such states. Sterically driven biases are expected to manifest
themselves under essentially all folding conditions, and they would
become independently observable whenever specific conditions can be
found that destabilize the native protein (relative to the unfolded
form) but not some intermediate form.

Conformational Entropy and Protein Folding. Anfinsen proposed that
proteins attain their native state by folding to a global minimum of
Gibbs free energy (44). Typically, this hypothesis has been
interpreted to mean that the native conformation of individual
molecules also corresponds to a global minimum in internal energy
because a fully folded protein will have lost its conformational
entropy, or almost so. Thus, conformational entropy is thought to play
an insignificant role in the thermodynamics of protein folding.
Specifically, the Boltzmann-weighted populations of any two states x
and y, (gy /gx )e =E2=88=92(U y =E2=88=92U x )/kT (where k =3D the Boltzman=
n constant
and T =3D absolute temperature), are thought to depend predominantly on
their energy difference, U y =E2=88=92 U x , and not on the degeneracy of
state, gy /gx . In contrast, the work presented here reaches the
conclusion that conformational entropy, reflected in the degeneracy,
is the main factor that discriminates between the two energetically
degenerate ground states, =CE=B1 and =CE=B2, and, in so doing, preorganizes=

The Levinthal Paradox. The issue of secondary structure bias is
intimately related to the Levinthal paradox, which argues that a
folding protein does not explore conformational hyperspace freely;
otherwise, it would encounter an insoluble search problem (45). For
Levinthal, this insight was not a paradox at all, but a convincing
demonstration that some intrinsic constraint limits the effective size
of conformational space. In this view, proteins solve the "multiple
minimum problem" not by an extensive search that identifies the
deepest minimum but by a limited search that avoids false minima. The
existence of intrinsic bias resolves this paradox by prejudicing the
ensemble of available folding trajectories toward the native minimum
(46). Thus, a folding protein need not discriminate among an
astronomical number of conformations because intrinsic bias "steers"
the molecule toward a high degree of preorganization.

"Protein Micelles." The prevalence of native-like, stable subdomains
(47, 48) in proteins is an expected consequence of intrinsic chain
bias. Segments with strong biases are poised to form persisting
structure, especially when fortified by additional stabilizing
interactions. In this context, it is important to distinguish between
stability and specificity (49). Stability is associated with the
equilibrium between folded and unfolded forms in a cooperative,
two-state folding process. Specificity is associated with
conformational particulars of a given folded form (e.g., why does the
lysozyme sequence adopt the lysozyme fold and not, for example, the
ribonuclease fold?). If the protein's conformational specificity is
established primarily by built-in bias, as this paper has attempted to
demonstrate, then stabilizing interactions can be quite nonspecific.
Like folding up a carpenter's rule, the preorganized segments and
their interconnecting turns constrain the folding process, which can
then be exerted via nonspecific driving forces, such as
solvent-squeezing and hydrophobic burial. Thus, a chain segment long
enough to adopt conformations with protein-like surface-to-volume
ratios (i.e., =E2=89=A5=E2=89=8835 residues) (50, 51), and that spans sever=
al elements
of impending secondary structure with protein-like sequence
composition, would be sufficient to engender a stable subdomain. In
this view, such subdomains are merely "polypeptide micelles" with an
intrinsic chain bias. Indeed, many examples in the literature are
consistent with this interpretation (52=E2=80=9355).

Other related posts:

  • » [proteamdavis] Fwd: steric effects on protein folding