Hierarchische und mehrkriterielle Optimierungssystematik nach dem - - PowerPoint PPT Presentation
Hierarchische und mehrkriterielle Optimierungssystematik nach dem - - PowerPoint PPT Presentation
Hierarchische und mehrkriterielle Optimierungssystematik nach dem Vorbild der RNA-Selektion Peter Schuster Institut fr Theoretische Chemie und Molekulare Strukturbiologie der Universitt Wien BBAW Studiengruppe: Strukturbildung und
Hierarchische und mehrkriterielle Optimierungssystematik nach dem Vorbild der RNA-Selektion Peter Schuster Institut für Theoretische Chemie und Molekulare Strukturbiologie der Universität Wien BBAW Studiengruppe: Strukturbildung und Innovation Berlin, 21.– 22.11.2003
N1
O CH2 OH O P O O ON2
O CH2 OH O P O O ON3
O CH2 OH O P O O ON4
N A U G C
k =
, , ,
3' - end 5' - end Na Na Na Na
RNA
nd 3’-end
GCGGAU AUUCGC UUA AGUUGGGA G CUGAAGA AGGUC UUCGAUC A ACCA GCUC GAGC CCAGA UCUGG CUGUG CACAG 3'-end 5’-end
70 60 50 40 30 20 10
Definition of RNA structure
5'-e
5'-End 5'-End 5'-End 3'-End 3'-End 3'-End
70 60 50 40 30 20 10 GCGGAUUUAGCUCAGDDGGGAGAGCMCCAGACUGAAYAUCUGGAGMUCCUGUGTPCGAUCCACAGAAUUCGCACCA
Sequence Secondary structure Symbolic notation
- A symbolic notation of RNA secondary structure that is equivalent to the conventional graphs
S1
(h)
S9
(h)
Free energy G Minimum of free energy Suboptimal conformations
S0
(h) S2
(h)
S3
(h)
S4
(h)
S7
(h)
S6
(h)
S5
(h)
S8
(h)
G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end
The minimum free energy structures on a discrete space of conformations
S0 S1
Kinetic Structures Free Energy S0 S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9 Minimum Free Energy Structure Suboptimal Structures T = 0 K , t T > 0 K , t T > 0 K , t finite
5.90Different notions of RNA structure including suboptimal conformations
Free energy G "Reaction coordinate" Sk S{ Saddle point T
{ k
F r e e e n e r g y G Sk S{ T
{ k
"Barrier tree"
Definition of a ‚barrier tree‘
5 . 1
2 8
14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 413 . 3 7 . 4
5 3 7 4 10 9 6
13 12 3.10 11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48S0 S1
Kinetic folding
S0 S1 S2 S3 S4 S5 S6 S7 S8 S10 S9
Suboptimal structures
lim t finite folding time
5 . 9
A typical energy landscape of a sequence with two (meta)stable comformations
Kinetics RNA refolding between a long living metastable conformation and the minmum free energy structure
Minimal hairpin loop size: nlp 3 Minimal stack length: nst 2
Recursion formula for the number of acceptable RNA secondary structures
Computed numbers of minimum free energy structures over different nucleotide alphabets
- P. Schuster, Molecular insights into evolution of phenotypes. In: J. Crutchfield & P.Schuster,
Evolutionary Dynamics. Oxford University Press, New York 2003, pp.163-215.
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers Mapping from sequence space into structure space and into function
G G G G G G G G G G G G G G G G U U U U U U U U U U U A A A A A A A A A A A A U C C C C C C C C C C C C 5’-end 3’-end GGCGCGCCCGGCGCC GUAUCGAAAUACGUAGCGUAUGGGGAUGCUGGACGGUCCCAUCGGUACUCCA UGGUUACGCGUUGGGGUAACGAAGAUUCCGAGAGGAGUUUAGUGACUAGAGG
RNAStudio.lnk
Folding of RNA sequences into secondary structures of minimal free energy, G0
300
Hamming distance d (S ,S ) =
H 1 2
4 d (S ,S ) = 0
H 1 1
d (S ,S ) = d (S ,S )
H H 1 2 2 1
d (S ,S ) d (S ,S ) + d (S ,S )
H H H 1 3 1 2 2 3
- (i)
(ii) (iii)
The Hamming distance between structures in parentheses notation forms a metric in structure space
f0 f f1 f2 f3 f4 f6 f5 f7
Replication rate constant: fk = / [+ dS
(k)]
- dS
(k) = dH(Sk,S
)
Evaluation of RNA secondary structures yields replication rate constants
Stock Solution Reaction Mixture
Replication rate constant: fk = / [+ dS
(k)]
- dS
(k) = dH(Sk,S
) Selection constraint: # RNA molecules is controlled by the flow N N t N ± ≈ ) ( The flowreactor as a device for studies of evolution in vitro and in silico
s p a c e Sequence Concentration
Master sequence Mutant cloud “Off-the-cloud” mutations
The molecular quasispecies in sequence space
S{ = ( ) I{ f S
{ {
ƒ = ( )
S{ f{ I{
Mutation Genotype-Phenotype Mapping Evaluation of the Phenotype
Q{
j
I1 I2 I3 I4 I5 In
Q
f1 f2 f3 f4 f5 fn
I1 I2 I3 I4 I5 I{ In+1 f1 f2 f3 f4 f5 f{ fn+1
Q
Evolutionary dynamics including molecular phenotypes
In silico optimization in the flow reactor: Trajectory (biologists‘ view) Time (arbitrary units) A v e r a g e d i s t a n c e f r
- m
i n i t i a l s t r u c t u r e 5
- d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
In silico optimization in the flow reactor: Trajectory (physicists‘ view) Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
AUGC GC Movies of optimization trajectories over the AUGC and the GC alphabet
Runtime of trajectories F r e q u e n c y
1000 2000 3000 4000 5000 0.05 0.1 0.15 0.2
Statistics of the lengths of trajectories from initial structure to target (AUGC-sequences)
44
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Endconformation of optimization
44 43
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the last step 43 44
44 43 42
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of last-but-one step 42 43 ( 44)
44 43 42 41
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 41 42 ( 43 44)
44 43 42 41 40
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of step 40 41 ( 42 43 44)
44 43 42 41 40 39 Evolutionary process Reconstruction
Average structure distance to target dS
- Evolutionary trajectory
1250 10
44 42 40 38 36 Relay steps Number of relay step Time
Reconstruction of the relay series
Transition inducing point mutations Neutral point mutations
Change in RNA sequences during the final five relay steps 39 44
In silico optimization in the flow reactor: Trajectory and relay steps Time (arbitrary units) A v e r a g e s t r u c t u r e d i s t a n c e t
- t
a r g e t d
- S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
Relay steps
10 08 12 14 Time (arbitrary units) Average structure distance to target dS
- 500
250 20 10
Uninterrupted presence Evolutionary trajectory Number of relay step
28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations Neutral point mutations
Neutral genotype evolution during phenotypic stasis
18 19 20 21 26 28 29 31
Time (arbitrary units)
750 1000 1250
Average structure distance to target dS
- 30
20 10
Uninterrupted presence Evolutionary trajectory 35 30 25 20 Number of relay step
A random sequence of minor or continuous transitions in the relay series
18 19 25 27 20 22 24 21 23 26 30 28 29 31
A random sequence of minor or continuous transitions in the relay series
Time (arbitrary units)
750 1000 1250
Average structure distance to target dS
- 30
20 10
Uninterrupted presence Evolutionary trajectory 35 30 25 20 Number of relay step
A random sequence of minor or continuous transitions in the relay series
In silico optimization in the flow reactor: Main transitions Main transitions Relay steps Time (arbitrary units) Average structure distance to target d S
500 750 1000 1250 250 50 40 30 20 10
Evolutionary trajectory
00 09 31 44
Three important steps in the formation of the tRNA clover leaf from a randomly chosen initial structure corresponding to three main transitions.
Number of transitions F r e q u e n c y
20 40 60 80 100 0.05 0.1 0.15 0.2 0.25 0.3
All transitions Main transitions
Statistics of the numbers of transitions from initial structure to target (AUGC-sequences)
Alphabet Runtime Transitions Main transitions
- No. of runs
AUGC 385.6 22.5 12.6 1017 GUC 448.9 30.5 16.5 611 GC 2188.3 40.0 20.6 107
Statistics of trajectories and relay series (mean values of log-normal distributions)
10 08 12 14 Time (arbitrary units) Average structure distance to target dS
- 500
250 20 10
Uninterrupted presence Evolutionary trajectory Number of relay step
28 neutral point mutations during a long quasi-stationary epoch Transition inducing point mutations Neutral point mutations
Neutral genotype evolution during phenotypic stasis
Variation in genotype space during optimization of phenotypes
Mean Hamming distance within the population and drift velocity of the population center in sequence space.
Spread of population in sequence space during a quasistationary epoch: t = 150
Spread of population in sequence space during a quasistationary epoch: t = 170
Spread of population in sequence space during a quasistationary epoch: t = 200
Spread of population in sequence space during a quasistationary epoch: t = 350
Spread of population in sequence space during a quasistationary epoch: t = 500
Spread of population in sequence space during a quasistationary epoch: t = 650
Spread of population in sequence space during a quasistationary epoch: t = 820
Spread of population in sequence space during a quasistationary epoch: t = 825
Spread of population in sequence space during a quasistationary epoch: t = 830
Spread of population in sequence space during a quasistationary epoch: t = 835
Spread of population in sequence space during a quasistationary epoch: t = 840
Spread of population in sequence space during a quasistationary epoch: t = 845
Spread of population in sequence space during a quasistationary epoch: t = 850
Spread of population in sequence space during a quasistationary epoch: t = 855
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers Mapping from sequence space into structure space and into function
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers
Sk I. = ( ) ψ
fk f Sk = ( )
Sequence space Structure space Real numbers
The pre-image of the structure Sk in sequence space is the neutral network Gk
Neutral networks are sets of sequences forming the same structure. Gk is the pre-image of the structure Sk in sequence space: Gk =
- 1(Sk) π{Ij |
(Ij) = Sk} The set is converted into a graph by connecting all sequences of Hamming distance one. Neutral networks of small RNA molecules can be computed by exhaustive folding of complete sequence spaces, i.e. all RNA sequences of a given chain length. This number, N=4n , becomes very large with increasing length, and is prohibitive for numerical computations. Neutral networks can be modelled by random graphs in sequence
- space. In this approach, nodes are inserted randomly into sequence
space until the size of the pre-image, i.e. the number of neutral sequences, matches the neutral network to be studied.
λj = 27 = 0.444 ,
/
12 λk = (k)
j
| | Gk
λ κ
cr = 1 -
- 1 (
1)
/ κ- λ λ
k cr . . . .
> λ λ
k cr . . . .
< network is connected Gk network is connected not Gk Connectivity threshold: Alphabet size : = 4
- AUGC
G S S
k k k
= ( ) | ( ) =
- 1
U
- I
I
j j
- cr
2 0.5 3 0.423 4 0.370
GC,AU GUC,AUG AUGC
Mean degree of neutrality and connectivity of neutral networks
A connected neutral network
Giant Component
A multi-component neutral network
Alphabet Degree of neutrality
AU AUG AUGC UGC GC
- -
- -
0.275 0.064 0.263 0.071 0.052 0.033
- -
0.217 0.051 0.279 0.063 0.257 0.070
- 0.057 0.034
- 0.073 0.032
0.201 0.056 0.313 0.058 0.250 0.064 0.068 0.034
- Degree of neutrality of cloverleaf RNA secondary structures over different alphabets
Reference for postulation and in silico verification of neutral networks
Structure
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G
Compatible sequence Structure
5’-end 3’-end
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G G G G G G C C C C G G G G C C C C C C C U A U U G U A A A A U
Compatible sequence Structure
5’-end 3’-end
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G G G C C C C G G G G C C G G G G G C C C C C U A U U G U A A A A U
Compatible sequence Structure
5’-end 3’-end
Base pairs: AU , UA GC , CG GU , UG Single nucleotides: A U G C , , ,
Gk Neutral Network
Structure S
k
Gk C k
Compatible Set Ck
The compatible set Ck of a structure Sk consists of all sequences which form Sk as its minimum free energy structure (the neutral network Gk) or one of its suboptimal structures.
Structure S Structure S
1
The intersection of two compatible sets is always non empty: C0 C1 π
Reference for the definition of the intersection and the proof of the intersection theorem
C U G G G A A A A A U C C C C A G A C C G G G G G U U U C C C C G G
3’-end
M i n i m u m f r e e e n e r g y c
- n
f
- r
m a t i
- n
S S u b
- p
t i m a l c
- n
f
- r
m a t i
- n
S 1
G G G G G G G G G G G G C C C C U U U U C C C C C C U A A A A A C G G G G G G C C C C U U G G G G G C C C C C C C U U A A A A A U G
A sequence at the intersection of two neutral networks is compatible with both structures
5.10 5.90
2 8
14 15 18 17 23 19 27 22 38 45 25 36 33 39 40 43 413.30 7.40
5 3 7 4 10 9 6
13 12 3 . 1 11 21 20 16 28 29 26 30 32 42 46 44 24 35 34 37 49 31 47 48S0 S1
basin '1' long living metastable structure basin '0' minimum free energy structure
Barrier tree for two long living structures
A ribozyme switch
E.A.Schultes, D.B.Bartel, Science 289 (2000), 448-452
Two ribozymes of chain lengths n = 88 nucleotides: An artificial ligase (A) and a natural cleavage ribozyme of hepatitis-
- virus (B)
The sequence at the intersection: An RNA molecules which is 88 nucleotides long and can form both structures
Two neutral walks through sequence space with conservation of structure and catalytic activity
Sequence of mutants from the intersection to both reference ribozymes
Dolomites
Examples of rugged landscapes on Earth
Bryce Canyon
Genotype Space Fitness
Start of Walk End of Walk
Evolutionary optimization in absence of neutral paths in sequence space
Genotype Space F i t n e s s
Start of Walk End of Walk Random Drift Periods Adaptive Periods
Evolutionary optimization including neutral paths in sequence space
Grand Canyon
Example of a landscape on Earth with ‘neutral’ ridges and plateaus
Neutral ridges and plateus
Web-Page for further information: http://www.tbi.univie.ac.at/~pks