The protein non-folding problem: amino acid determinants of intrinsic order and disorder

[edit] 1 Extended Central Dogma of Molecular Biology and Intrinsic Disorder

  • DNA Sequence -> RNA Sequence -> AA Sequence -> 3D Structure -> Function
  • Protein folding problem: AA Sequence -> 3D Structure
  • Protein structure/function paradigm: AA Sequence -> 3D Structure -> Function
  • Protein non-folding problem: prediction of intrinsic disorder and order from AA sequence

[edit] 2 Materials and Methods

  • Derivative database developed using sequence homology: BLAST searches followed by ClustalW to find and align related sequences
  • Putative disordered regions identified by homology to know region of disorder

Comparing amino acid compositions of disordered database Failed to parse (Cannot write to or create math temp directory): a

with ordered database Failed to parse (Cannot write to or create math temp directory): b

, following computed for each amino acid Failed to parse (Cannot write to or create math temp directory): j

Failed to parse (Cannot write to or create math temp directory): M_j^a - M_j^b / M_j^b


  • Failed to parse (Cannot write to or create math temp directory): M_j^a
-- the mole fraction of amino acid Failed to parse (Cannot write to or create math temp directory): j
in disordered database Failed to parse (Cannot write to or create math temp directory): a
  • Failed to parse (Cannot write to or create math temp directory): M_j^b
-- the mole fraction of amino acid Failed to parse (Cannot write to or create math temp directory): j
in disordered database Failed to parse (Cannot write to or create math temp directory): b


Variance of these ratios:

Failed to parse (Cannot write to or create math temp directory): Var(M_j^a - M_j^b)/M_j^b = (M_j^a/M_j^b)^2{Var(M_j^a)/(M_j^a)^2 + Var(M_j^b)/(M_j^b)^2}


  • Failed to parse (Cannot write to or create math temp directory): Var(M_j^a)
-- variance of amino acid Failed to parse (Cannot write to or create math temp directory): j
for database Failed to parse (Cannot write to or create math temp directory): a
  • Failed to parse (Cannot write to or create math temp directory): Var(M_j^b)
-- variance of amino acid Failed to parse (Cannot write to or create math temp directory): j
for database Failed to parse (Cannot write to or create math temp directory): b
  • Standard deviation = square root of variance
  • 265 values for amino acid properties (hydropathy, polarity, volume) compiled, many highly correlated with each other
  • Plots of conditional probabilities of order and disorder versus property values constructed and then ranked by the relative degree of separation between the two probability curves using the area ratio method (described in other papers)
  • Only the highest ranking property was kept from sets with a pair-wise correlation coefficient >= 0.9
  • Amino acids arranged by flexibility index (tendency of an amino acid to be buried or exposed): (buried) W C F I Y V L H M A T R G Q S N P D E K (exposed)
  • Disordered regions depleted in W, C, F, I, Y, V, L, N; enriched in A, R, G, Q, S, P, E, K; most greater than 3 standard deviations from ordered proteins

[edit] 3 Discussion

  • Homology increases the amount of disordered data: argument against is that the sequences are correlated and adding sequences by this method does not increase information content effectively, but the disordered parts in many proteins show significantly less sequence similarity than ordered parts (work in progress) and thus might be an effective way to increase information content still
  • Ordered proteins contain a higher proportion of atoms that tend to be buried, disordered proteins have a higher proportion of atoms on the surface of ordered proteins
  • Ordered set: 45% aas from W C F I Y V L H M A, 55% from T R G Q S N P D E K
  • Disordered set dis_ALL: 34% from W C F I Y V L H M A, 66% from T R G Q S N P D E K
  • Disordered set dis_ALL: 37% from W C F I Y V L H M A, 63% from T R G Q S N P D E K
  • Thus, balance of order and disorder promoting amino acids correlate with classification of protein
  • Disordered segments not enriched in T, N, and D as expected; speculated this anomaly from hydrogen bond forming capability of the polar β-carbon branches, which could lower the configurational entropy of the backbone in the disordered state and reduce disorder promotion
  • Wanted: 10 attributes ranked in top 15% for discrimination between order and disorder and correlated as little as possible with each other (correlation cut off ends up being .9)
  1. 14 Å Contact Number
  2. Optimal matching hydrophobicity
  3. Beta sheet propensity
  4. HPLC Hydrophobicity
  5. Hydrophobic parameter pi
  6. Fraction of site occupied by water
  7. Information measure for pleated sheet
  8. Partition free energy
  9. Coordination number
  10. Free-energy beta-strand conformation
  • "Packing capacity" (14 Å contact number and coordination number relate to # of side chains found close to a given side chain in a set of proteins of known structure) ranks first for discriminating for this data
  • Four of top ten properties associated with hydrophobicity; one is associated with polarity, three relate to propensity of amino acids to from β-strands
  • Might be biological selection against disordered regions with a high propensity to form sheets
  • Accuracies seem to improve when predictions are carried out flavor-by-flavor
XHTML 1.1 CSS 2 Sec 508