[proteamdavis] Fwd: protein packing measures

  • From: Paul Limb <paulimb@xxxxxxxxx>
  • To: proteamdavis@xxxxxxxxxxxxx
  • Date: Sat, 3 Jul 2004 11:23:48 -0700

---------- Forwarded message ----------
From: Paul Limb <paulimb@xxxxxxxxx>
Date: Mon, 28 Jun 2004 17:28:23 -0700
Subject: Fwd: protein packing measures
To: jtmorgan@xxxxxxxxxxx



---------- Forwarded message ----------
From: Paul Limb <paulimb@xxxxxxxxx>
Date: Mon, 28 Jun 2004 10:00:38 -0700
Subject: protein packing measures
To: paulimb@xxxxxxxxx

Measures of residue density in protein structures
Franck Baud and Samuel Karlin*

Department of Mathematics, Stanford University, Stanford, CA 94305-2125

Contributed by Samuel Karlin, August 26, 1999

* To whom reprint requests should be addressed. E-mail:
fd.zgg@xxxxxxxxxxxxxxxxxxxxxx

This article has been cited by other articles in PMC.

Top
Abstract
Introduction
Methods
Results
Discussion
References
Abstract

A hierarchy of residue density assessments and packing properties in
protein structures are contrasted, including a regular density, a
variety of charge densities, a hydrophobic density, a polar density,
and an aromatic density. These densities are investigated by
alternative distance measures and also at the interface of multiunit
structures. Amino acids are divided into nine structural categories
according to three secondary structure states and three solvent
accessibility levels. To take account of amino acid abundance
differences across protein structures, we normalize the observed
density by the expected density defining a density index. Solvent
accessibility levels exert the predominant influence in determinations
of the regular residue density. Explicitly, the regular density values
vary approximately linearly with respect to solvent accessibility
levels, the linearity parameters depending on the amino acid. The
charge index reveals pronounced inequalities between lysine and
arginine in their interactions with acidic residues. The aromatic
density calculations in all structural categories parallel the regular
density calculations, indicating that the aromatic residues are
distributed as a random sample of all residues. Moreover, aromatic
residues are found to be over-represented in the neighborhood of all
amino acids. This result might be attributed to nucleation sites and
protein stability being substantially associated with aromatic
residues.

protein folding | side-chain interactions | residue associations

Top
Abstract
Introduction
Methods
Results
Discussion
References
Introduction

Packing density and residue neighbor preferences in protein structures
have been investigated by different methods, for different purposes,
and in different structural contexts generally emphasizing core
components, secondary structure elements, solvent accessibility
variation, and protein conformation. Protein structure studies
encompass multi-residue associations (1, 2), characterizations of
three-dimensional residue clusters (e.g., charge concentrations,
cysteine knots, hydrophobic adherences) (3=E2=80=935), hydrogen-bond networ=
ks
(6), fold classifications (7), functional pathways (e.g., electron
transfer, proton pumping, substrate movements) (8, 9), protein
stability (10), and cotranslational folding processes (11, 12).
Methods of analysis can be based on partitioning protein structures
into Voronoi cells (13), dissecting protein domains with the aid of
contact matrices (14), and using residue scoring regimes in threading
predictions, in homology modeling, and in clustering protein
structures (15, 16).

In this paper, we offer a new approach to analyzing residue=E2=80=93residue
interactions and provide a great deal of statistical data on residue
environments. The d m distance between a residue pair in the
three-dimensional protein structure is calculated as the minimum
distance between their side-chain atoms (the C=CE=B1 atom of glycine is
considered as its side-chain atom). The D m distance of a residue pair
is calculated as the minimum distance with respect to all atoms
(side-chain and backbone). For each protein structure, residues are
classified into nine structural categories (SC) determined by one of
three secondary structure (Ss) states (=CE=B1-helix, =CE=B2-strand, coil) a=
nd
three side-chain solvent accessibility levels (Sa). The SCs relate to
the environments introduced in refs. 17 and 18. These classifications
are implemented on a data set of 418 representative protein
structures. Table 1 summarizes frequencies of each amino acid (aa) in
each of the nine SCs.

A variety of density assessments and residue associations in protein
structures are contrasted. A regular density (Reg-density) prescribes
a threshold (T) (say, 5 or 10 =C3=85) about each amino acid in each protein
structure and ascertains the residue count within T distance (d m or D
m ) averaged over all proteins in the set. We further consider three
charge densities [positive charge (+), negative charge (=E2=88=92), mixed
charge (=C2=B1) (acidic or basic)], a hydrophobic density , a polar density
p, and an aromatic density Ar. For example, the acidic-density is the
average count in the structure data set of acidic residues {Asp or
Glu} within T distance of each reference amino acid. The different
densities may not reflect the effective amino acid preferences because
the 20 amino acids generally possess variable abundances in the
different protein structures. Accordingly, we normalize the density by
the expected density producing an association index (see Methods).

In this paper, we focus on residue-residue densities. In the companion
paper (19), we determine atom densities for each amino acid type
counting the numbers of carbon, nitrogen, or oxygen atoms within a 5-=C3=85
distance about that amino acid type. The results of these density
assessments can assist in constructing residue and/or atom interaction
potentials. They can also help in deciding correctness of a protein
fold and provide insights for sequence protein threading and structure
clustering. In interpreting the different density measures and
indices, the following specific questions are addressed. How are
side-chain electrostatic, hydrophobic, and steric properties reflected
in the density assessments? How are amino acid abundances related to
three-dimensional structural elements, locations, and solvent
accessibility? Are hydrophobic residues nonspecifically associated to
other hydrophobics, or is there a hierarchical ordering? What kind of
inequalities occur in the density contrasts, for example, among the
aromatics, {Trp, Phe, Tyr}, among hydroxyl residues, between amide
residues, between acidic residues, and between basic residues?

Top
Abstract
Introduction
Methods
Results
Discussion
References
Methods

In this study, we use a representative set of 418 single chain protein
structures with <25% pairwise sequence identity (20). The Protein Data
Bank codes are listed as supplemental material on the PNAS web site,
www.pnas.org.

Structural categories. We define for each amino acid nine structural
categories (see introduction) determined by three side-chain solvent
accessibility levels and the three standard secondary structure
states. The solvent accessibility levels used are Sa =E2=89=A4 10% for the
buried state, 10% < Sa =E2=89=A4 40% for the partly buried state, and Sa > =
40%
for the exposed state. The nine SCs are abbreviated (=CE=B1-bu, =CE=B1-pb, =
=CE=B1-ex,
=CE=B2-bu, =CE=B2-pb, =CE=B2-ex, c-bu, c-pb, c-ex).

Density measures. With each distance measure (d m or D m ), the cutoff
thresholds T considered are 5 and 10 =C3=85. Residues of a given type
(e.g., charge, hydrophobic, aromatic) within the prescribed threshold
from a reference amino acid are counted and averaged over all protein
structures for each amino acid type aa and each SC [designated (aa,
SC)].

Index measures. Let S be a protein structure. For each amino acid of
type aa and of a given structural category SC, (aa, SC), and a count
measure for a prescribed threshold (say, 5 =C3=85), let Ca (+; S) be the
observed number of positively charged (+) residues within 5 =C3=85 of amino
acid a in the structure S. Summing over all amino acids of type (aa,
SC) produces the total number C S(+||aa, SC) =3D =CE=A3 a(aa, SC) Ca (+; S)=
,
and C(+||aa, SC) =3D=CE=A3 S CS (+||aa, SC) is the aggregate count over all
structures. Let fS (+) be the frequency of positively charge residues
in structure S, and let nS (Reg||aa, SC) be the number of 5 =C3=85
neighbors about an amino acid of type (aa, SC) in structure S. The
quantity fS (+)nS (Reg||aa, SC) is the expected number of (+) residues
in structure S about amino acids of type (aa, SC) assuming that the +
residues are distributed randomly. Adding over the structures of the
data set, the expected count is E(+ ||aa, SC) =3D =CE=A3 S fS (+)nS (Reg||a=
a,
SC). The (+) charge density index or association index for the (aa,
SC) type is calculated as I(+||aa, SC) =3D C(+||aa, SC)/E(+||aa, SC).
Similarly, we define the corresponding density indices: I(=C2=B1||aa, SC),
I(=E2=88=92||aa, SC), I(p||aa, SC), I(||aa, SC), I(Ar||aa, SC). It is plain
that, for the regular density, I(Reg||aa, SC)  1 for all amino acids
and all SCs because fS (Reg)  1 and nS (Reg||aa, SC)  C S(Reg||aa,
SC). On a random basis, index values between 0.6 and 1.4 can be
considered in the random range and otherwise statistically significant
on the low or high side, respectively (1).

Top
Abstract
Introduction
Methods
Results
Discussion
References
Results

Distributions Among the Amino Acid Structural Categories (SCs). Many
distributional tendencies (functional and structural properties of the
various amino acids) are familiar. Here, we highlight several new
possible findings.

Aromatics. With respect to the three Sa levels, Phe performs like the
major aliphatic residues whereas, with respect to Ss states, Phe
performs similarly to Tyr and Trp. From an evolutionary perspective
(PAM120 exchange matrix), Phe and Tyr substitute easily for each
other. However, a variety of important functional and structural roles
are shared by Tyr and Trp. For example, Tyr and Trp are abundant in
antigen=E2=80=93antibody contacts (21, 22); Tyr and Trp through their
side-chains can engage in H-bonding and even provide solubility (23);
many posttranslational modifications are targeted primarily to Tyr
(e.g., phosphorylation, hydroxylation, and polymerization). Because of
their simultaneous hydrophobic/hydrophilic capacities, Tyr and Trp are
less often found buried. The three aromatic residues emphasize Arg
among their over-represented neighbors whereas Lys tends to be
underrepresented. Why this asymmetry? A possible interpretation is
that, although cationic residues can establish hydrogen bonds with the
polar groups of Tyr and Trp, only Arg has in real structures a
favorable cation=E2=80=93aromatic interaction (24).

Aliphatics. All bulky aliphatics including Phe essentially agree in
frequencies among the SCs. Actually, Val and Ile abundances are
largely congruent for almost all SCs, and they substitute
significantly one for the other in evolutionary replacements. Leu and
Met also score positively by the PAM120 evolutionary exchange matrix,
but the total frequencies in protein usages show a high value for Leu
of =E2=89=888.9% and only 2.6% for Met. Because of the sulfur side-chain
component, Met participates in more refined catalytic activities than
Leu (e.g., in electron transfer when binding copper type I, in methyl
transference). Perhaps the reduced level of Met also helps to avoid
inappropriate starts of protein translation.

Cationic residues. Arg and Lys in evolutionary relatedness exchange
readily, but there is much asymmetry in structural associations
between these positively charged residues. Arg compared with Lys is
more buried, more often involved in salt bridges and H-bonds, and
participates in more cationic-aromatic contacts (24). Side-chain
interactions of Arg mainly involve the guanidinium group whereas Lys
has contacts with other residues about equally through its methylene
groups and its amino side-chain atom. Consistent with this asymmetry,
Lys and Arg usage frequencies in protein sequences are uncorrelated or
negatively correlated (25). Lys is found marginally more in coil
regions (45%) compared with arginine (40%) whereas arginine is more
buried (16%) vs. lysine (6%). Lysine often extends to the surface with
its amino group exposed.

Anionic residues. Although Glu and Asp in an evolutionary context
substitute easily for each other, they contrast sharply in Ss
propensities, with Glu favoring =CE=B1-helices whereas Asp is largely found
in loops and is detrimental to secondary structure formations. Asp
also contributes more than Glu in catalytic capacities, e.g., at
protease active sites, in metal coordination especially for calcium
ions, and in N-capping helices.

Histidine. His is a versatile residue that is rather uniformly
distributed in terms of solvent accessibility levels and secondary
structure placements but favors coil locations. Histidine contributes
in many functional activities, including Cu2+, Fe2+, Zn2+, and heme
coordination, is part of the classical catalytic triad of protease
active sites and can adopt flexible roles in conformation (8).

Small hydroxyl residues. Ser is marginally more exposed and more in
coil regions than Thr. Ser and Thr are versatile in hydrogen bonding
to backbone groups, side-chains, or solvent. The strong associations
of Ser and Thr with His might be explained by the histidine amphoteric
behavior as either an acceptor or donor of protons. These residues are
often structurally near to active sites (e.g., serine and metallo
proteases) and, more generally, occur proximally in surface locations.
Ser, Thr, and Asp are over-represented at amino helix caps and in
turns (His at the carboxyl helix cap), probably because of their
hydrogen bonding capacity and a favorable interaction with the helix
dipole (26).

Small amino acids. Ala is found primarily in =CE=B1-helical states (47%)
and secondarily in a coil state (36%). With respect to solvent
accessibility, Ala is mostly buried (50%) but also significantly
exposed (29%). Gly is predominantly found in coil regions (69%) and
assiduously avoids secondary structures. Indeed, for steric reasons,
Gly tends to disrupt secondary structural elements but can contribute
structural flexibility between long helices and between strands. Gly
is about equally exposed as buried. In many protein families Gly is
among the most conserved residues [e.g., for heat shock proteins 60
and 70 (27) and for the repair proteins RecA/Rad51/RadA (28)].
Conserved glycine residues often appear as doublets (GG) and sometimes
as higher order runs that may ameliorate bent structural positions.
Generally, glycine residues cannot be replaced by other residues
because of steric constraints.

Residue Densities. For each amino acid, each structural category (SC)
and a prescribed threshold (say, 5 =C3=85), the regular (Reg) d m density
is the number of residues within a 5-=C3=85 d m distance averaged over the
structure data set. For each amino acid, the regular density
variations for the nine SCs discriminate the buried from the partly
buried states and the partly buried states from the exposed states but
are largely independent of the three Ss states. On a continuous scale
of Sa values, the resulting curves are approximately linear.

Comparing regular densities among amino acids (Fig. 1). For
corresponding Sa levels, the density values are highest for the
aromatic residue Trp, secondly Tyr, thirdly Phe, and fourthly Arg,
although Arg has a larger side-chain surface area compared with Tyr.
Major hydrophobic residues at each Sa level obey the density
comparison: coil < (helix, beta). Independent of residue sizes, the
major aliphatics {Leu, Met, Ile, Val, Phe} possess quite congruent
density values for each Ss state. Density packing of the two acidic
residues Asp and Glu are reasonably synonymous. However, this is not
the case for the basic residues Arg and Lys. At all Sa levels the
difference in density (Arg > Lys) is =E2=89=881 residue count. This may be =
a
consequence of their size difference and/or their pKa difference.

Positive-charge (+) density; negative-charge (=E2=88=92) density. The (+) a=
nd
(=E2=88=92) densities are the average count of positive charge residues {Ar=
g,
Lys} and negative charge residues {Asp, Glu}, respectively, within 5 =C3=85
of the reference amino acid. For charge residues, (+) and (=E2=88=92) densi=
ty
values are compared in the following display:

Asp and Glu have a higher (+) density in the partly buried state than
in a buried or exposed state. Glu and Asp are found predominantly in
the exposed state, and when a salt-bridge or H-bond engages their
side-chain atoms the solvent accessibility level is reduced and
thereby score as partly buried. We also observe that the (=E2=88=92) densit=
y
increases as the Sa level decreases. This may indicate that charge
effects have been neutralized with the help of salt bridges, water
attachments, or small ligands. The net charge of most proteins is
slightly negative, but the counts of (+) and (=E2=88=92) charge residues ar=
e
both high (8). They are often paired in salt-bridges that will prefer
the low dielectric medium of buried conditions to the high dielectric
medium of exposed conditions (water).

There is a clear inequality with respect to (Glu  Asp) and (Arg  Lys)
for the (+) density. In assessments of the (=E2=88=92) density, the orderin=
g
(Arg  Lys) holds at all Sa levels, and is greatest in the buried
state. Why these inequalities? Arg possesses enhanced capacity for
charge interactions via salt bridges and for cationic-aromatic
contacts compared with Lys (1, 24). Size differences may also apply.
For the (=E2=88=92) density, the ordering (Glu  Asp) happens only in the
buried state. Glu compared with Asp is more flexible in helix and coil
placements that ameliorate charge interactions.

Mixed-charge (=C2=B1) density. The (=C2=B1) density is the average count of=
 all
charge residues {Lys, Arg, Glu, Asp} within (d m distance) 5 =C3=85 of the
reference (aa, SC). The maximum (=C2=B1) density is attained for Arg (1.92
in the buried state, 1.69 in the partly buried state, 1.11 in the
exposed state), compared with Lys (1.62 in the buried state) and next
Tyr, Trp, Glu (1.57, 1.56, 1.55 in the partly buried state). Why are
buried core positions prominent in (=C2=B1) density? Buried Arg and Lys
residues are often involved in salt bridges and H-bonding
relationships. The lowest (=C2=B1) density occurs for Ala (0.46) with range
(0.36=E2=80=9365). The second lowest density occurs for Gly (0.48) with ran=
ge
(0.34=E2=80=930.68) and then for Cys(0.6) with range (0.41=E2=80=930.75). T=
he highest
(=C2=B1) density for all aliphatic residues is in the partially buried
state, with values =E2=89=881.

Polar uncharge (p) density. The p-density is the average count of
polar residues {His, Thr, Ser, Asn, Gln, Tyr} within 5 =C3=85 of the
reference amino acid. The charge amino acids attain the highest
p-density under buried conditions (range 2.0=E2=80=932.6). Under exposed
conditions, the p-density range is (0.8=E2=80=930.9). For all Sa levels, th=
e
p-density mainly traverses the range 0.8=E2=80=932.5 whereas the aromatics =
Trp
and Tyr possess the elevated range 1.2=E2=80=932.2. Phe is intermediate. Th=
e
lowest p-density occurs for Gly with range (0.39=E2=80=931.1). Ala is secon=
d
lowest with range (0.40=E2=80=931.13). Cys has range (0.6=E2=80=931.35). An=
 inequality
in the p-density applies to Ser < Thr.

Hydrophobic () density. The -density is the average count of major
hydrophobic residues within 5 =C3=85 of the reference amino acid. The
combination of size and hydrophobicity underlies the highest -density
attained for Phe generally in the =CE=B2-buried and =CE=B1-buried states, 4=
.93
and 4.92, respectively; next for Trp in the =CE=B1-buried (4.82) and
=CE=B2-buried (4.35) states but reduced in situations of c-buried,
Phe(3.93), Trp(3.87). The highest -density for Tyr obtains for
=CE=B1-buried (4.17), next =CE=B2-buried (3.79), and then c-buried (3.08).
Apparently, among the aromatics, Tyr interacts preferentially with
nonhydrophobic residues. As expected, the bulky aliphatic residues
register relatively high -density, emphasizing =CE=B2-buried and =CE=B1-bur=
ied
states. The -density values have Leu (=CE=B2-bu =E2=89=88 4.7, =CE=B1-bu =
=E2=89=88 4.6), Met
(=CE=B2-bu =E2=89=88 4.2, =CE=B1-bu =E2=89=88 4.4), Ile (=CE=B2-bu =E2=89=
=88 4.6, =CE=B1-bu =E2=89=88 4.4), Val (=CE=B2-bu =E2=89=88
4.0, =CE=B1-bu =E2=89=88 3.8). In these cases, the -density in c-buried sta=
tes on
average are reduced =E2=89=881 residue count. The lowest -density (range
0.20=E2=80=930.99) occurs in the c-exposed state for all amino acid types.
Under =CE=B1-buried or =CE=B2-buried conditions, Thr  Ser (range 2.49=E2=80=
=933.06 for
Thr, range 1.77=E2=80=932.14 for Ser).

Aromatic (Ar) density (Fig. 2). The Ar-density is the average count of
aromatic residues {Phe, Trp, Tyr} within 5 =C3=85 of the reference amino
acid. Sa levels strongly influence Ar-density values, showing a linear
relationship paralleling the regular density distribution across the
SC density values. This could be interpreted as a lack of specificity
with a high affinity by aromatic residues for all amino acids. The
over-representation of aromatics about aromatics reflect to some
extent stacking interactions. Arg and Lys for the =CE=B2-buried state show
Ar-density values about 1.4. This may reflect cationic-aromatic and
H-bonding interactions of positively charged residues with aromatic
residues. Moreover, Arg > Lys in the Ar-density at all Sa levels. Leu
and Met in the buried state entail relatively high Ar-density values
(1.3=E2=80=931.6).

Association Indices.

Positive-charge (+) index. As expected, Asp and Glu are
over-represented in all exposed and partly buried states (Table 2).
I(+||Asp, SC) values reach 2.35 in the =CE=B1-exposed state whereas
I(+||Glu, SC) is highest in the =CE=B2-exposed state. The only other
significant over-represented amino acids are Trp (1.6) in the
=CE=B2-exposed condition and Asn (1.5) when =CE=B1-exposed (data not shown)=
.
Under-representations are manifest for the major and minor hydrophobic
residues {Leu, Met, Ile, Val, Phe, Trp, Cys, Ala} at the buried Sa
level independent of the Ss state.

Negative-charge (=E2=88=92) index. The over-represented residues highlight =
Arg
and Lys in the =CE=B1-exposed state of index values 2.0 and 2.15,
respectively. Moreover, for all Sa levels, Arg and Lys obey the
inequalities I(=E2=88=92||Arg) < I(=E2=88=92||Lys). His carries the high in=
dex values
1.7 and 1.6 for =CE=B2-exposed and c-exposed conditions, respectively.
Under-representations of (=E2=88=92) residues occur for all hydrophobics in
the buried state. Ala and Pro in the helix-exposed state show
significantly high (=E2=88=92) index values about 1.5.

Mixed-charge (=C2=B1) index. All charged residues and His carry index (=C2=
=B1)
values in the normal range about 0.8 to 1.4. The same applies to the
polar residues Ser, Thr, Asn, and Gln. Aliphatic residues register the
lowest index values under buried conditions.

Polar (p) index. There are no significant over- or
under-representations for polar residues. Charge, His, and amide
residues marginally favor polar neighbors. Aliphatic and aromatic
residues tend to disfavor polar neighbor residues.

Hydrophobic () index. The greatest affinity for major hydrophobic
residues occur for hydrophobic residues (I values about 2) in the
=CE=B2-buried conditions (range 1.5=E2=80=932.0). Ala shows the same extent=
 of
overrepresentation. The lowest index calculations result for charge
residues (range 0.76=E2=80=930.96) and the polar residues Ser, Asn
(0.95,0.86).

Aromatic (Ar) index. Strikingly, under most Sa and Ss conditions, the
index values for aromatic residues exceed 1, and many are
significantly high, =E2=89=A51.4, in the buried and some in the partly buri=
ed
state. This is consistent with the finding that most residues have Tyr
over-represented as a nearest neighbor and similarly but to a lesser
extent for Trp (1). The small residues show strong attraction on
average for aromatics I(Ar||Gly) =3D 1.64, I(Ar||Pro) =3D 1.67, I(Ar||Ala)
=3D 1.65, and I(Ar||Cys) =3D 1.62 (Table 2).

Over- and under-representations. Some residues show unequivocal
affinity for specific residue types. These include charge and
hydrophobic. The other residues tend to be neutral or nonspecific in
neighbor preferences: {His, Ser, Thr, Asn, Gln, Gly, Cys, Pro} (Table
2). Lys favors negatively charged neighbors more than Arg. Glu
attracts marginally more positively charged residues than Asp except
when buried. In buried states, Ala and to some extent Cys behave like
hydrophobic residues. In the exposed state, the index value for {Leu,
Met, Ile, Val, Phe, Trp} and Tyr for all SCs are relatively tight
(Table 2). This means that in the exposed state these residues
associate nonspecifically with other residues. His attracts residues
of negative charge more than residues of positive charge. This also
applies to Ser and Thr and of course to Arg and Lys. For {Asp, Glu,
Arg, Lys, His, Ser, Thr, Asn, Cys} and Ala, hydrophobic neighbors are
under-represented.

Index of Unconditional Amino Acids. A strong attraction between
residues of opposite charge is manifest. In fact, the index value is
significantly high, almost 2, compared with the random range of =E2=89=880.=
6
to 1.4. There is an index asymmetry between Arg and Lys, I(=E2=88=92||Lys) =
=3D
1.78 versus I(=E2=88=92||Arg) =3D 1.62, which indicates that overall Lys mo=
re
than Arg attracts acidic residues in its 5-=C3=85 neighborhood. Glu has a
slightly greater affinity for (+) charge residues than Asp. His is
significantly over-represented with neighboring aromatic residues
(I(Ar||His) =3D 1.59). Moderately low index values for residues of the
same charge sign is undoubtedly attributable to electrostatic
repulsion. Hydrophobic residues have a varied index of
over-representations reflected in their high hydrophobic and aromatic
indices, with under-representation of charged residues. These
estimates derive mainly from the fact that charged residues are
predominantly at the protein surface whereas hydrophobic residues tend
to be buried. The index is always lower for (=E2=88=92) charged residues th=
an
for (+) charged residues, probably because of the property that
cationic residues have longer side-chains whose aliphatic parts tend
to be buried, unlike those of anionic residues.

D m Index Measure. Independent of the SC, the range of the D m index
of the different density types {=C2=B1,+,=E2=88=92,Ar,p,} is persistently s=
maller
than the corresponding d m range. A high correlation between the d m
and the D m index values for each amino acid except Ser, Asn, Gln
indicates that the d m and the D m index values are effectively
linearly related (data not shown). We interpret this to mean that
interactions involving backbone atoms have a minor influence on
residue associations whereas side-chain interactions have a major
influence. The d m and the D m index values are the most correlated
when amino acids are in the buried state.

Top
Abstract
Introduction
Methods
Results
Discussion
References
Discussion

 General observations and implications of density measures. (i) The
Sa levels influence decisively values of the Reg-density and of the
Ar-density but only marginally influence the hydrophobic and polar
densities and not coherently the three charge densities. (ii) Density
values are high in the =CE=B2-buried state consistent, with the property
that =CE=B2-strands often occupy buried locations. (iii) The coil-exposed
SC registers the lowest density assessments of all types. This surely
reflects on the preponderant surface locations and flexible
disposition of coil residues in protein structures. (iv) Density
inequalities are definite between Arg and Lys (Arg favored) for the
(=E2=88=92) density and between Glu and Asp (Glu favored) for the (+) densi=
ty.
(v) For multimeric proteins, the interface density shows electrostatic
more than hydrophobic interactions (data not shown). (vi) The aromatic
density distributions for all of the amino acids in all of the SCs
parallel the Reg-density distributions. Actually, the aromatic density
performs as if the aromatic residues constitute a random sample of all
residues. Equivalently, most amino acids tend to associate with
aromatic residues in a nonspecific manner. (vii) The mixed-charge
index is about the same for all charge amino acids at favorable levels
but not significantly high, 1.25=E2=80=931.4.

 Linearity of the Reg-density values with respect to Sa levels.
Linearity applies for each amino acid and all proximal distance
thresholds (5 =C3=85, 10 =C3=85). The slopes are correlated with the total
side-chain amino acid surface area (see Table 3 and Fig. 3 in the
supplemental material). What can account for the linear relationships?
Perhaps, on average, for any amino acid, a neighbor residue removes
the same amount of Sa surface area, or, equivalently, the amount of
surface buried and number of residue neighbors are highly correlated.

 Asymmetries in index values. Asymmetry in the index assessments
shows I(=E2=88=92||Lys) =3D 1.78 compared with I(=E2=88=92||Arg) =3D 1.62 s=
uch that, on
average, Lys around its 5-=C3=85 neighborhood contains more acidic residues
than are present among neighbors of Arg. This inequality seems at
variance with the finding in ref. 1 affirming that the nearest
neighbor (d m distance) of Glu and Asp is pervasively tuned to Arg
more than to Lys. This might be accounted for by the following facts.
First, the Arg charge is delocalized over the guanidinium group versus
a localized charge for Lys; Arg carries a pKa about 11=E2=80=9313 dominatin=
g
the Lys pKa of 9=E2=80=9310. Moreover, the pKa of a Lys residue can be
suppressed depending on environmental influences. These considerations
would accommodate salt-bridge interactions with acidic nearest
neighbor residues in the protein interior more forcefully for Arg than
with Lys. Second, Arg is relatively more buried than Lys, neutralized
by salt-bridges or hydrogen bonds in the protein interior, whereas the
Lys side-chain tends to be surface exposed with a variable nearest
neighbor ambience. Along these lines, the protein surface generally
carries many acidic residues that can contribute to the 5-=C3=85
neighborhood about Lys residues.

 Over-representation of aromatic residues. The index analysis shows
that aromatic residues, especially Trp and Tyr plus His, play a
distinctive role in protein structures. These combined residues show
high affinity in associating with most other amino acids. On this
basis, we propose that these aromatic residues participate in early
events of protein folding as nucleation sites (29, 30), preceding the
formation of a molten globule structure. Aromatics and especially Tyr
can sequester a microenvironment that allows interactions of both
hydrophobic (through its aromatic plane) and polar character (through
its hydroxyl group for Tyr). Trp also engages hydrogen bonding
potential through its imino nitrogen. All aromatic residues project
p-electron clouds with a hydrogen atom periphery capable of generating
electrostatic attractions with aromatic=E2=80=93aromatic, cation=E2=80=93ar=
omatic, and
anion=E2=80=93aromatic interactions (29).

Nearest-neighbor interactions (d m distance) among aromatic residues
emphasize distal primary sequence (primary sequence positions =E2=89=A55
apart) residues [e.g., Phe and Tyr have nearest neighbors 13.5%
proximal (primary sequence positions =E2=89=A44 apart), 75.5% distal],
suggesting a role of the aromatic rings in connecting distinct
secondary structure elements (31). Aromatic rings contribute to
hydrophobic interactions, but they also favorably interact with
solvent and polar residues (23). Individual secondary structures are
established by patterns of backbone hydrogen bonds and to some extent
are assisted by specific polar or ionic side-chain interactions. An
optimal hydrophobic packing of secondary structure elements might by
itself be sufficient to determine the native state conformation of the
protein, as in the jigsaw puzzle model (32). Alternatively, the
folding process may involve first formation of the hydrophobic core of
a flexible molten globule. Subsequently, other interactions, based on
extended polar Tyr, Trp, His, or ionic (Arg) residues, may help orient
the molten globule and maneuver various secondary structure groups
until a favorable (native) conformation is attained.

In view of the pervasive overrepresentation (d m distance) of
side-chain interactions with aromatic residues, especially tyrosine
(Tyr), by almost all residue types and the significant
over-representations of tryptophan (Trp) and histidine (His) relative
to many residue types, the residues Tyr, Trp, and His might perform as
dynamic initiation and early intermediate foci of the protein fold.
How does our analysis on density distributions conform with this
hypothesis? The statistics that neighbors involving Tyr, Trp, Phe, and
His show equivalent counts among proximal and the many distal
interactions are consistent with their capacity for versatile
hydrophobic and hydrophilic interactions (1, 31). The predominance of
D m distance nearest-neighbors for proximal positions among
hydrophobic pairings seems to reflect the early formation of
backbone=E2=80=93backbone hydrogen bonds in individual secondary structures=
.
The relatively low frequency of side-chain nearest-neighbor linear
proximal interactions and the preponderance of distal side-chain
interactions among hydrophobic (core) residues, and among Tyr, Trp,
His, and Arg residues, underscore the role of these side-chains in
inter secondary structure packing.

Other related posts:

  • » [proteamdavis] Fwd: protein packing measures