Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence - - PowerPoint PPT Presentation

sequence alignment
SMART_READER_LITE
LIVE PREVIEW

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence - - PowerPoint PPT Presentation

Sequence Alignment Mark Voorhies 4/12/2018 Mark Voorhies Sequence Alignment Exercise: Scoring an ungapped alignment Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment. Mark Voorhies


slide-1
SLIDE 1

Sequence Alignment

Mark Voorhies 4/12/2018

Mark Voorhies Sequence Alignment

slide-2
SLIDE 2

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

Mark Voorhies Sequence Alignment

slide-3
SLIDE 3

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def s c o r e (S , x , y ) : a s s e r t ( len ( x ) == len ( y )) s = 0 f o r ( i , j ) i n z i p ( x , y ) : s += S [ i ] [ j ] return s Mark Voorhies Sequence Alignment

slide-4
SLIDE 4

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def subseqs ( x , y , i ) : i f ( i > 0 ) : y = y [ i : ] e l i f ( i < 0 ) : x = x[− i : ] L = min ( len ( x ) , len ( y )) return x [ : L ] , y [ : L ] Mark Voorhies Sequence Alignment

slide-5
SLIDE 5

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def alignment ( x , y , i ) : i f ( i > 0 ) : x = ” −”∗ i+x e l i f ( i < 0 ) : y = ” −”∗(− i )+y L = len ( y ) − len ( x ) i f (L > 0 ) : x += ” −”∗L e l i f (L < 0 ) : y += ” −”∗(−L) return x , y Mark Voorhies Sequence Alignment

slide-6
SLIDE 6

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def ungapped (S , x , y ) : best = None b e s t s c o r e = None f o r i i n range(−len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f (( b e s t s c o r e i s None )

  • r

( s > b e s t s c o r e ) ) : b e s t s c o r e = s best = i return best , b e s t s c o r e , alignment ( x , y , best ) Mark Voorhies Sequence Alignment

slide-7
SLIDE 7

Exercise: Scoring an ungapped alignment

Given two sequences and a scoring matrix, find the offset that yields the best scoring ungapped alignment.

def ungapped (S , x , y ) : best = None b e s t s c o r e = None f o r i i n range(−len ( x )+1 , len ( y ) ) : ( sx , sy ) = subseqs ( x , y , i ) s = s c o r e (S , sx , sy ) i f (( b e s t s c o r e i s None )

  • r

( s > b e s t s c o r e ) ) : b e s t s c o r e = s best = i return best , b e s t s c o r e , alignment ( x , y , best ) Mark Voorhies Sequence Alignment

slide-8
SLIDE 8

Exercise: Scoring a gapped alignment

Write a new scoring function with separate penalties for opening a zero length gap (e.g., G = -11) and extending an open gap by one base (e.g., E = -1). Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

Mark Voorhies Sequence Alignment

slide-9
SLIDE 9

Exercise: Scoring a gapped alignment

Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

Mark Voorhies Sequence Alignment

slide-10
SLIDE 10

Exercise: Scoring a gapped alignment

Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

def gapped score ( seq1 , seq2 , s , g = 0 , e = −1): gap = None s c o r e = 0 f o r p a i r i n z i p ( seq1 , seq2 ) : a s s e r t ( p a i r != ( ” −” , ” −” )) t r y : curgap = p a i r . index ( ” −” ) except ValueError : s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ] gap = None e l s e : i f ( gap != curgap ) : s c o r e += g gap = curgap s c o r e += e return s c o r e Mark Voorhies Sequence Alignment

slide-11
SLIDE 11

Exercise: Scoring a gapped alignment

Sgapped(x, y) = S(x, y) +

gaps

  • i

(G + E ∗ len(i))

def gapped score ( seq1 , seq2 , s , g = 0 , e = −1): gap = None s c o r e = 0 f o r p a i r i n z i p ( seq1 , seq2 ) : a s s e r t ( p a i r != ( ” −” , ” −” )) t r y : curgap = p a i r . index ( ” −” ) except ValueError : s c o r e += s [ p a i r [ 0 ] ] [ p a i r [ 1 ] ] gap = None e l s e : i f ( gap != curgap ) : s c o r e += g gap = curgap s c o r e += e return s c o r e def gapped score ( seq1 , seq2 , s , g = 0 , e = −1): gap = None s c o r e = 0 f o r ( c1 , c2 ) i n z i p ( seq1 , seq2 ) : i f ( ( c1 == ” −” ) and ( c2 == ” −” ) ) : r a i s e ValueError e l i f ( c1 == ” −” ) : i f ( gap != 1 ) : s c o r e += g gap = 1 s c o r e += e e l i f ( c2 == ” −” ) : i f ( gap != 2 ) : s c o r e += g gap = 2 s c o r e += e e l s e : s c o r e += s [ c1 ] [ c2 ] gap = None return s c o r e Mark Voorhies Sequence Alignment

slide-12
SLIDE 12

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-13
SLIDE 13

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-14
SLIDE 14

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-15
SLIDE 15

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-16
SLIDE 16

How many ways can we align two sequences?

Mark Voorhies Sequence Alignment

slide-17
SLIDE 17

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r!

Mark Voorhies Sequence Alignment

slide-18
SLIDE 18

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n!

Mark Voorhies Sequence Alignment

slide-19
SLIDE 19

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n! Stirling’s approximation: x! ≈ √ 2π

  • xx+ 1

2

  • e−x

Mark Voorhies Sequence Alignment

slide-20
SLIDE 20

How many ways can we align two sequences?

Binomial formula: k r

  • =

k! (k − r)!r! 2n n

  • = (2n)!

n!n! Stirling’s approximation: x! ≈ √ 2π

  • xx+ 1

2

  • e−x

2n n

  • ≈ 22n

√πn

Mark Voorhies Sequence Alignment

slide-21
SLIDE 21

Dynamic Programming

Mark Voorhies Sequence Alignment

slide-22
SLIDE 22

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-23
SLIDE 23

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-24
SLIDE 24

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-25
SLIDE 25

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-26
SLIDE 26

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-27
SLIDE 27

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-28
SLIDE 28

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-29
SLIDE 29

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-30
SLIDE 30

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-31
SLIDE 31

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-32
SLIDE 32

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-33
SLIDE 33

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-34
SLIDE 34

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-35
SLIDE 35

Needleman-Wunsch

Mark Voorhies Sequence Alignment

slide-36
SLIDE 36

Smith-Waterman

The implementation of local alignment is the same as for global alignment, with a few changes to the rules: Initialize edges to 0 (no penalty for starting in the middle of a sequence) The maximum score is never less than 0, and no pointer is recorded unless the score is greater than 0 (note that this implies negative scores for gaps and bad matches) The trace-back starts from the highest score in the matrix and ends at a score of 0 (local, rather than global, alignment) Because the naive implementation is essentially the same, the time and space requirements are also the same.

Mark Voorhies Sequence Alignment

slide-37
SLIDE 37

Smith-Waterman A G C G G T A G A G C G G A 1 1 1 1 2 1 1 1 1 3 2 1 2 4 3 2 1 1 3 1 5 4 3 1 2 4 4 5

Mark Voorhies Sequence Alignment

slide-38
SLIDE 38

Final Homework

Implement Needleman-Wunsch global alignment with zero gap

  • pening penalties. Try attacking the problem in this order:

1 Initialize and fill in a dynamic programming matrix by hand

(e.g., try reproducing the example from my slides on paper).

2 Write a function to create the dynamic programming matrix

and initialize the first row and column.

3 Write a function to fill in the rest of the matrix 4 Rewrite the initialize and fill steps to store pointers to the

best sub-solution for each cell.

5 Write a backtrace function to read the optimal alignment

from the filled in matrix. If that isn’t enough to keep you occupied, try implementing Smith-Waterman local alignment and/or non-zero gap opening penalties.

Mark Voorhies Sequence Alignment