Sequence Complexity of Disordered Protein

  • Corollary to amino acid sequence determining protein 3D structure: structure is prerequisite to function by mechanisms such as lock and key or induced fit

Contents

[edit] 1 Statistics (pg 39)

[edit] 1.1 Shannon's Entropy

Used as a measure of sequence complexity

Failed to parse (Cannot write to or create math temp directory): K_2 = - \sum_{i=1}^{N} n_i / L * ( log_2(n_i / L )) = - \sum_{i=1}^{N} f_i log_2(f_i)


  • Failed to parse (Cannot write to or create math temp directory): N
= number of letters in alphabet, ie 20 amino acids
  • Failed to parse (Cannot write to or create math temp directory): L
= length of window
  • Failed to parse (Cannot write to or create math temp directory): n_i
= number of times the letter i appears in the window
  • Failed to parse (Cannot write to or create math temp directory): f_i
= fraction of amino acid i over the window

[edit] 1.2 Mole fractions for amino acids

Failed to parse (Cannot write to or create math temp directory): P_j = \sum (n_i P_{ji}) / \sum n_i


  • Failed to parse (Cannot write to or create math temp directory): P_ji
-- frequency of amino acid Failed to parse (Cannot write to or create math temp directory): j
in sequence i of length Failed to parse (Cannot write to or create math temp directory): n_i
  • summation -- over all sequences in a given database

[edit] 1.3 Variation of amino acids

Failed to parse (Cannot write to or create math temp directory): Var(P_{ji}) = (\sum n_i^2 Var(P_{ji})) / (\sum n_i)^2


Failed to parse (Cannot write to or create math temp directory): Var(P_{ji}) = P_{ji} ( 1 - P_{ji} ) / n


Fractional difference in composition between two sets Failed to parse (Cannot write to or create math temp directory): a

and Failed to parse (Cannot write to or create math temp directory): b

Failed to parse (Cannot write to or create math temp directory): (P_{j}^{a} - P_{j}^{b}) / P_{j}^{b}


Variance for these ratios:

Failed to parse (Cannot write to or create math temp directory): Var(P_{j}^{a} - P_{j}^{b})/P_{j}^{b} = (P_j^a / P_j^b)^2{Var(P_j^a)^2 + Var(P_j^b) / (P_j^b)^2 }


  • Failed to parse (Cannot write to or create math temp directory): P_j^a
is the mole fraction of amino acid Failed to parse (Cannot write to or create math temp directory): j
for database Failed to parse (Cannot write to or create math temp directory): a
  • Failed to parse (Cannot write to or create math temp directory): Var(P_j^a)
is the variance of amino acid Failed to parse (Cannot write to or create math temp directory): j
for database Failed to parse (Cannot write to or create math temp directory): a


[edit] 2 Discussion

  • Globular ordered proteins need to have polar and nonpolar amino acids to define outside surfaces and core regions (pg 44)
  • Disordered proteins have a significant fraction of low complexity sequences compared to structured proteins (pg 44), but even very high complexity sequences can be disordered (pg 45)
  • The disordered prediction score depends on attributes associated with the depletion of most order promoting amino acids and enrichment of disorder promoting ones; because increased depletions and enrichments lead to decreased complexity, sequence complexity goes down as prediction scores go up
  • A possible explanation for non-fibrous, low complexity segments is they were selected over evolutionary time for the specific purpose of being disordered (pg 46)
  • Structural genomics projects will not succeed until disordered proteins are taken into account
XHTML 1.1 CSS 2 Sec 508