Accurate prediction for atomic-level protein design and its - - PowerPoint PPT Presentation

accurate prediction for atomic level protein design and
SMART_READER_LITE
LIVE PREVIEW

Accurate prediction for atomic-level protein design and its - - PowerPoint PPT Presentation

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer Science Duke University 1 Outline


slide-1
SLIDE 1

1

Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space

Pablo Gainza CPS 296: Topics in Computational Structural Biology Department of Computer Science Duke University

slide-2
SLIDE 2

2

Outline

1) Problem definition 2) Formulation as an inference problem 3) Graphical Models 4) tBMMF algorithm 5) Results 6) Conclusions

slide-3
SLIDE 3

3

Protein design algorithm

Positions to design & allowed Rotamers/Amino Acids Rotamer Library Energy f(x) Protein structure

www.cs.duke.edu/donaldlab

  • 1. Problem Definition

GMEC 2 S

slide-4
SLIDE 4

4

1

r ! Rotamer assignment (RA)

ri ! Rotamer at position i for RA r r1

2

  • 1. Problem Definition (2)
slide-5
SLIDE 5

5

E(r) ! Energy of rotamer assignment r

1 2

E(r) = X

i

Ei(ri) + X

i;j

Eij(ri; rj) Ei(ri) ! Energy between rotamer ri and ¯xed backbone Eij(ri; rj) ! Energy between rotamers ri and rj E

i

( r

1

) Eij(r1; r2)

  • 1. Problem Definition (3)
slide-6
SLIDE 6

6

1 2

T(k) ! returns amino acid type of rotamer k T(r) ! returns sequence of rotamer assignment r T(r1) = hexagon T(r2) = cross T(r) = hexagon; cross r

  • 1. Problem Definition (4)
slide-7
SLIDE 7

7

Protein design algorithm

Positions to design & allowed Rotamers/Amino Acids Rotamer Library Energy f(x) Protein structure

S¤ = T(arg min

r

E(r))

www.cs.duke.edu/donaldlab

  • 1. Problem Definition (5)

GMEC 2 S

slide-8
SLIDE 8

8

DEE / A* BroMAP BWM Global Minimum Energy Conformation SCMF MCSA Low energy conformation Probabilistic Methods Exact Methods

  • 1. Problem Definition (6)

Related Work

slide-9
SLIDE 9

9

Model Inaccurate!

Positions to design & allowed Rotamers/Amino Acids Rotamer Library Energy f(x) Protein structure

www.cs.duke.edu/donaldlab

  • 1. Problem Definition (7)
slide-10
SLIDE 10

10

Protein design algorithm

Algorithm Fast or provable

S¤ = T(arg min

r

E(r))

  • 1. Problem Definition (8)
slide-11
SLIDE 11

11

Too stable Low binding specificity

Low energy conformation

Not fold to target

  • 1. Problem Definition (9)
slide-12
SLIDE 12

12

Solution: Find a set

  • f low energy

sequences DEE/A* Ordered set of gap-free low energy conformations, including GMEC tBMMF Probabilistic Methods Provable Methods Set of low energy conformations

  • 1. Problem Definition (10)
slide-13
SLIDE 13

13

Problem Definition: Summary

  • Protein design algorithms search for the

sequence with the Global Minimum Energy Conformation (GMEC).

  • Our model is inaccurate: more than one low

energy sequence is desirable.

  • Fromer et al. Propose tBMMF to generate a set
  • f low energy sequences.
slide-14
SLIDE 14

14

Ãij(ri; rj) = e

¡Eij(ri;rj ) T

Ãi(ri) = e

¡Ei(ri) T

Probabilistic factor for self-interactions Probabilistic factor for pairwise interactions

  • 2. Our problem as an inference problem
slide-15
SLIDE 15

15

P(r1; :::; rN) = 1 Z Y

i

Ãi(ri) Y

i;j

Ãij(ri; rj) = 1 Z e

¡E(r) T

Probability distribution for rotamer assignment

Z = X

r

e

E(r) T

Partition function

r

  • 2. Inference problem (2)
slide-16
SLIDE 16

16

S¤ = T(arg min

r

E(r))

Minimization goal (from definition) Minimization goal for a graphical model problem

  • 2. Inference problem (3)

S¤ = T(arg max

r

Pr(r))

slide-17
SLIDE 17

17

Example: Inference problem Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 E(r0) =? E(r00) =?

What is our GMEC??

  • 2. Inference problem (4)
slide-18
SLIDE 18

18

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 E(r00) = (¡1 + ¡4) + (¡3 + ¡4) E(r0) = (¡1 + ¡2) + (¡5 + ¡2) = ¡10 = ¡12 r00 is our GMEC

  • 2. Inference problem (5)
slide-19
SLIDE 19

19

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 Ãi(r0

1) = e

¡Ei(r0 1) T

= e T = 1 (for our example) Ãi(r0

2) = e

¡Ei(r0 2) T

= e5 Ãi(r00

1) = e

¡Ei(r00 1 ) T

= e Ãi(r00

2) = e

¡Ei(r00 2 ) T

= e3

  • 2. Inference problem (6)
slide-20
SLIDE 20

20

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 T = 1 (for our example) Ãij(r0

1; r0 2) = e

¡Eij (r0 1;r0 2) T

= e2 Ãij(r00

1; r00 2) = e

¡Eij (r00 1 ;r00 2 ) T

= e4 Z = X

r

e

E(r) T

= e10 + e12

  • 2. Inference problem (7)
slide-21
SLIDE 21

21

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 T = 1 (for our example)

P(r0

1; r0 2) = 1

Z Y

i

Ãi(r0

i)

Y

i;j

Ãij(r0

i; r0 j)

= e10 e10 + e12

  • 2. Inference problem (8)
slide-22
SLIDE 22

22

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 T = 1 (for our example)

P(r00

1 ; r00 2) = 1

Z Y

i

Ãi(r0

i)

Y

i;j

Ãij(r00

i ; r00 j )

= e12 e10 + e12

  • 2. Inference problem (9)
slide-23
SLIDE 23

23

Allowed 1 2

Position #1 Position #2

Eij(r1; r2) Ei(r1) Ei(r2)

  • 5
  • 3
  • 1
  • 2
  • 4

r0 r00 T = 1 (for our example) S¤ = T(r00) S¤ = T(arg max

r

Pr(r))

  • 2. Inference problem (10)
slide-24
SLIDE 24

24

S¤ = T(arg min

r

E(r))

Minimization goal (from definition) Minimization goal for a graphical model problem We still have a non-polynomial problem! But formulated as an inference problem Probabilistic methods

S¤ = T(arg max

r

Pr(r))

  • 2. Inference problem (11)
slide-25
SLIDE 25

25

Summary: Inference problem

  • We model our problem as an inference

problem.

  • We can use probabilistic methods to solve it.
slide-26
SLIDE 26

26

  • 3. Graphical models for protein design and belief

propagation (BP)

  • 1. Model each design

position as a random variable

  • 2. Build interaction graph that shows

conditional independence between variables

Source: Fromer M, Yanover, C. Proteins (2008)

SspB dimer interface: Inter-monomeric interactions (Cα)

slide-27
SLIDE 27

27

1 2

Example: Belief propagation

3 3 2 1

r0

3

r00

3

node in the graphical model: interacting residue in the structure. node in the graphical model: random variable

  • 3. Graphical Models/BP (2)
slide-28
SLIDE 28

28

1 2

Example: Belief propagation

b 3 3 2 1

r0

3

r00

3

edge: energy interaction between two residues.

4 4

If two residues are distant from each

  • ther, no edge

between them. edge: causal relationship between two nodes

  • 3. Graphical Models/BP (3)
slide-29
SLIDE 29

29

1 2

Example: Belief propagation

3 3 2 1

r0

3

r00

3

Every random variable can be in

  • ne of several states: allowable

rotamers for that position

r0

2

r0

1

r0

3

r00

3

r0

2

r0

1

  • 3. Graphical Models/BP (4)
slide-30
SLIDE 30

30

1 2

Example: Belief propagation

3 3 2 1

r0

3

r00

3

The energy of each state depends on:

  • its singleton energy
  • its pairwise energies
  • the energies of the states of its

parents

r0

2

r0

1

r0

3

r00

3

r0

2

r0

1

  • 3. Graphical Models/BP (5)
slide-31
SLIDE 31

31

Example: Belief propagation

3 2 1

r0

3

Belief propagation: each node tells its neighbors nodes what it believes their state should be

r00

3

r0

2

r0

1

A message is sent from node i to node j The message is a vector where # of dimensions: allowed states/rotamers in recipient

m

2!3

m2!3(r0

3)

m2!3(r00

3)

  • 3. Graphical Models/BP (6)
slide-32
SLIDE 32

32

Example: Belief propagation

3 2 1

r0

3

Who sends the first message?

r00

3

r0

2

r0

1

  • 3. Graphical Models/BP (7)
slide-33
SLIDE 33

33

Example: Belief propagation

3 2 1

r0

3

Who sends the first message?

r00

3

r0

2

r0

1

In a tree: the leaves

  • Belief propagation is proven to

be correct in a tree!

  • 3. Graphical Models/BP (8)
slide-34
SLIDE 34

34

Example: Belief propagation

3 2 1

r0

3

Who sends the first message?

r00

3

r0

2

r0

1

In a graph with cycles:

  • Set initial values
  • Send in parallel

No guarantees can be made! There might not be any convergence

  • 3. Graphical Models/BP (9)
slide-35
SLIDE 35

35

Example: Belief propagation

3 2 1

r0

3

r00

3

r0

2

r0

1

m2!3(r0

3) = 1

m2!3(r00

3) = 1

m

2!3

m

3!2

m

1 ! 3

m

3 ! 1

m1!2 m2!1 m1!3(r0

3) = 1

m1!3(r0

3) = 1

m2!1(r0

1) = 1

m1!2(r0

2) = 1

m3!1(r0

1) = 1

m3!2(r0

2) = 1

We iterate from there.

  • 3. Graphical Models/BP (10)
slide-36
SLIDE 36

36

Example: Belief propagation

3 2 1

r0

3

r00

3

r0

2

r0

1

m

2!3

m

1 ! 3

Node 3 receives messages from nodes 1 and 2

m2!3(r0

3) = 1

m2!3(r0

3) = 1

m1!3(r0

3) = 1

m1!3(r0

3) = 1

  • 3. Graphical Models/BP (11)
slide-37
SLIDE 37

37

Example: Belief propagation

3 2 1

r0

3

r00

3

r0

2

r0

1

What message does node 3 send to node 1 on the next iteration?

m

3 ! 1

m3!1(r0

1) =?

  • 3. Graphical Models/BP (12)
slide-38
SLIDE 38

38

Belief propagation: message passing

mi!j(rj) = max

ri :e

¡Ei(ri)¡Eij (ri;rj ) t

Y

k2N(i)nj

mk!i(ri); N(i) ! Neighbors of variable i

Message that gets sent on each iteration

slide-39
SLIDE 39

39

Ei(r2)

  • 2

Ei(r1)

  • 1

Position #1 Position #2

Eij(r1; r2)

  • 4

Position #3 Position #2

  • 3
  • 1

Eij(r2; r3)

Position #3 Position #1

  • 4
  • 1

Eij(r1; r3)

  • 6
  • 2

Ei(r3) Example: Belief propagation

Pairwise energies Singleton energies

r0

3

r00

3

r0

3

r00

3

r0

3

r00

3

Iteration 0:

m3!1(r0

1) = max r3 :e

¡Ei(r3)¡Eij (r0 3;r1) t

m2!3(r0

3);

=? e

¡Ei(r3)¡Eij (r00 3 ;r1) t

m2!3(r00

3);

slide-40
SLIDE 40

40

Example: Belief propagation

3 2 1

r0

3

r00

3

r0

2

r0

1

Once it converges we can compute the belief each node has about itself

m

2!3

m

1 ! 3

Belief about one's state: Multiply all incoming messages by singleton energy

  • 3. Graphical Models/BP (15)
slide-41
SLIDE 41

41

Belief propagation: Max-marginals

i = arg

max

ri2Rotsi Pr1 i (ri)

Pr1

i (ri) = max r0:r0

i=ri Pr(r0)

MMi(ri) = e

¡Ei(ri) t

Y

k2N(i)

mk!i(ri)

Belief about each rotamer “Most likely” rotamer for position i

slide-42
SLIDE 42

42 Fromer M, Yanover, C. Proteins (2008) Fromer M, Yanover, C. Proteins (2008)

  • 3. Graphical Models/BP (17)
slide-43
SLIDE 43

43 Fromer M, Yanover, C. Proteins (2008) Fromer M, Yanover, C. Proteins (2008)

  • 3. Graphical Models/BP (18)
slide-44
SLIDE 44

44

  • 3. Graphical Models: Summary
  • Formulate as an inference problem
  • Model our design problem as a graphical model
  • Establish edges between interacting residues
  • Use Belief Propagation to find the beliefs for

each position

slide-45
SLIDE 45

45

  • 4. tBMMF: type specific BMMF
  • Paper's main contribution
  • Builds on previous work by C. Yanover (2004)
  • Uses Belief propagation to find lowest energy

sequence and constrains space to find subsequent sequences

slide-46
SLIDE 46

46

TBMMF (simplification)

  • 1. Find the lowest energy sequence using BP
  • 2. Find the next lowest energy sequence

while excluding amino acids from the previous one

  • 3. Partition into two subspaces using constraints

according to the next lowest energy sequence

slide-47
SLIDE 47

47

Example: tBMMF (1)

Fromer M, Yanover, C. Proteins (2008)

  • 4. tBMMF (3)
slide-48
SLIDE 48

48

Example: tBMMF (2)

Fromer M, Yanover, C. Proteins (2008)

  • 4. tBMMF (4)
slide-49
SLIDE 49

49

Results

Fromer M, Yanover, C. Proteins (2008)

slide-50
SLIDE 50

50

Results (2)

Fromer M, Yanover, C. Proteins (2008)

slide-51
SLIDE 51

51

Results(3)

  • Algorithms tried:

– DEE / A* (Goldstein, 1-split, 2-split, Magic Bullet) – tBMMF – Ros: Rosetta – SA: Simulated annealing over sequence space

slide-52
SLIDE 52

52

Results (4): Assessment results

Fromer M, Yanover, C. Proteins (2008)

slide-53
SLIDE 53

53 Fromer M, Yanover, C. Proteins (2008)

Results(5)

slide-54
SLIDE 54

54 Fromer M, Yanover, C. Proteins (2008)

Results(6)

slide-55
SLIDE 55

55 Fromer M, Yanover, C. Proteins (2008)

Results(6)

slide-56
SLIDE 56

56

Results (7)

  • DEE/A* was not feasible for any case except

the prion

  • SspB: A* could only output one sequence
  • DEE also did not finish after 12 days
  • BD/K* did not finish after 12 days
slide-57
SLIDE 57

57

Results (8)

  • Predicted sequences where highly similar

between themselves. (high sequence identity)

  • Very different from wild type sequence
  • Solution: grouped tBMMF: apply constraints to

whole groups of amino acids – proof of concept

  • nly
slide-58
SLIDE 58

58

Conclusions

  • Fast and accurate algorithm
  • Outperforms all other algorithms:

– A* is not feasible – Better accuracy than other probabilistic algorithms

slide-59
SLIDE 59

59

Conclusions (2)

  • tBMMF produces a large set of very similar low

energy results.

  • This might be due to the many inaccuracies in

the model

  • Grouped tBMMF can produce a diverse set of

low energy sequences

slide-60
SLIDE 60

60

Conclusions (3)

  • The results lack experimental data for

validation.

slide-61
SLIDE 61

61

Related Work: (Fromer et al. 2008)

  • Fromer F, Yanover C. A computational

framework to empower probabilistic protein

  • design. ISMB 2008
  • Phage display:

– 109 – 1010 randomized protein sequences – Simultaneously tested for relevant biological

function

slide-62
SLIDE 62

62

Related Work: (Fromer et al. 2008)

Fromer M, Yanover, C. Bioinformatics (2008)

slide-63
SLIDE 63

63

Related Work: (Fromer et al. 2008)

  • Uses sum-product instead of max-product
  • Obtain per-position amino acid probabilities
  • Tried until convergence or 100000 iterations; all

structures converged

slide-64
SLIDE 64

64

Related Work: (Fromer et al. 2008)

  • Conclusions:

– Model results in probability distributions far from

those observed experimentally.

– Limitations of the model:

  • Imprecise energy function
  • Decomposition into pairwise energy terms
  • Assumption of a fixed backbone
  • Discretization of side chain conformations
slide-65
SLIDE 65

65

tBMMF algorithm