Sequence Complexity of Disordered Protein
- Corollary to amino acid sequence determining protein 3D structure: structure is prerequisite to function by mechanisms such as lock and key or induced fit
Contents |
[edit] 1 Statistics (pg 39)
[edit] 1.1 Shannon's Entropy
Used as a measure of sequence complexity
Failed to parse (Cannot write to or create math temp directory): K_2 = - \sum_{i=1}^{N} n_i / L * ( log_2(n_i / L )) = - \sum_{i=1}^{N} f_i log_2(f_i)
- Failed to parse (Cannot write to or create math temp directory): N
= number of letters in alphabet, ie 20 amino acids
- Failed to parse (Cannot write to or create math temp directory): L
= length of window
- Failed to parse (Cannot write to or create math temp directory): n_i
= number of times the letter i appears in the window
- Failed to parse (Cannot write to or create math temp directory): f_i
= fraction of amino acid i over the window
[edit] 1.2 Mole fractions for amino acids
Failed to parse (Cannot write to or create math temp directory): P_j = \sum (n_i P_{ji}) / \sum n_i
- Failed to parse (Cannot write to or create math temp directory): P_ji
-- frequency of amino acid Failed to parse (Cannot write to or create math temp directory): j
in sequence i of length Failed to parse (Cannot write to or create math temp directory): n_i
- summation -- over all sequences in a given database
[edit] 1.3 Variation of amino acids
Failed to parse (Cannot write to or create math temp directory): Var(P_{ji}) = (\sum n_i^2 Var(P_{ji})) / (\sum n_i)^2
Failed to parse (Cannot write to or create math temp directory): Var(P_{ji}) = P_{ji} ( 1 - P_{ji} ) / n
Fractional difference in composition between two sets Failed to parse (Cannot write to or create math temp directory): a
and Failed to parse (Cannot write to or create math temp directory): b
Failed to parse (Cannot write to or create math temp directory): (P_{j}^{a} - P_{j}^{b}) / P_{j}^{b}
Variance for these ratios:
Failed to parse (Cannot write to or create math temp directory): Var(P_{j}^{a} - P_{j}^{b})/P_{j}^{b} = (P_j^a / P_j^b)^2{Var(P_j^a)^2 + Var(P_j^b) / (P_j^b)^2 }
- Failed to parse (Cannot write to or create math temp directory): P_j^a
is the mole fraction of amino acid Failed to parse (Cannot write to or create math temp directory): j for database Failed to parse (Cannot write to or create math temp directory): a
- Failed to parse (Cannot write to or create math temp directory): Var(P_j^a)
is the variance of amino acid Failed to parse (Cannot write to or create math temp directory): j for database Failed to parse (Cannot write to or create math temp directory): a
[edit] 2 Discussion
- Globular ordered proteins need to have polar and nonpolar amino acids to define outside surfaces and core regions (pg 44)
- Disordered proteins have a significant fraction of low complexity sequences compared to structured proteins (pg 44), but even very high complexity sequences can be disordered (pg 45)
- The disordered prediction score depends on attributes associated with the depletion of most order promoting amino acids and enrichment of disorder promoting ones; because increased depletions and enrichments lead to decreased complexity, sequence complexity goes down as prediction scores go up
- A possible explanation for non-fibrous, low complexity segments is they were selected over evolutionary time for the specific purpose of being disordered (pg 46)
- Structural genomics projects will not succeed until disordered proteins are taken into account
| Date published | 1 January 2001 + |
| Has author | P. Romero +, Z. Obradovic +, X. Li +, E. C. Garner +, C. J. Brown +, and A. K. Dunker + |
| Paper topic | Disordered proteins + |
| PubMed ID | 11,093,259 + |
| Published in | Proteins: Structure, Function, and Genetics + |
| Title | Sequence Complexity of Disordered Protein + |