Sequence-Structure Alignment A General Formulation Unifying view on - - PowerPoint PPT Presentation

sequence structure alignment a general formulation
SMART_READER_LITE
LIVE PREVIEW

Sequence-Structure Alignment A General Formulation Unifying view on - - PowerPoint PPT Presentation

Sequence-Structure Alignment A General Formulation Unifying view on Edit Distance, SA&F, ... IN S 1 , . . . , S k P 1 , . . . , P k { 1 , . . . , | S i |} : sets of basepairs score on alignments OUT Alignment


slide-1
SLIDE 1

S.Will, 18.417, Fall 2011

Sequence-Structure Alignment — A General Formulation

“Unifying view on Edit Distance, SA&F, ...”

IN

  • S1, . . . , Sk ∈ Σ
  • P1, . . . , Pk ∈ {1, . . . , |Si|}: sets of basepairs
  • score on alignments

OUT

Alignment A = (S∗

1, P∗ 1, . . . , S∗ k, P∗ k) that maximizes score(A),

where S∗

i |Σ = Si, “P∗ i |Σ” ⊆ Pi, . . .

Exact conditions and score vary problem classes: restrict input and output structures, score

slide-2
SLIDE 2

S.Will, 18.417, Fall 2011

Alignment with Fixed Input Structures

Jiang et al. A General Edit Distance between RNA Structures. JCB, 2002.

  • “P∗

i |Σ” = Pi, i.e. output structure = input structure

  • score is rather general edit distance (breaking of basepairs)
  • only pairwise, k = 2
  • efficient only for NESTED/CROSSING with “not so general

score”

slide-3
SLIDE 3

S.Will, 18.417, Fall 2011

Alignment with Fixed Input Structures – Pseudoknots

  • CROSSING/CROSSING, i.e. pseudoknots allowed
  • restricted pseudoknots:

e.g., no crossing of 3 basepairs Patricia A. Evans. Finding common RNA pseudoknot structures in polynomial time. CPM 2006.

b) interleaved left−right endpoints a) a three−knot

  • hl, Will, Backofen. Lifting prediction to alignment of

RNA pseudoknots. RECOMB 2009.

  • general crossing:

  • hl, Will, Backofen. Fixed parameter tractable

alignment of RNA structures including arbitrary

  • pseudoknots. CPM 2008
slide-4
SLIDE 4

S.Will, 18.417, Fall 2011

Simultaneous Alignment and Folding (SA&F)

David Sankoff. Simultaneous solution of the RNA folding, alignment and protosequence problems. SIAM J. Appl. Math., 1985.

  • “P∗

i |Σ” ⊆ Pi

  • input structures crossing (all potential basepairs)
  • output structures non-crossing

Example Input: P1 =

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

S1 = ACGGACUUACGGACUUGACUCGGACU S2 = CGGAACGUAUACGGACUCCAGACUACGUGCA P2 =

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

slide-5
SLIDE 5

S.Will, 18.417, Fall 2011

Example SA&F

IN: P1 =

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

S1 = ACGGACUUACGGACUUGACUCGGACU S2 = CGGAACGUAUACGGACUCCAGACUACGUGCA P2 =

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

OUT: P∗

1 ≡

  • ---.(.((..(........)..)).)...----

S∗

1 =

  • ---ACGGACUUACGGACUUGACUCGGACU----

S∗

2 =

CGGAACGUAUACGGACUCCAGACUACG---UGCA P∗

2 ≡

.....(.((..(........)..)).)---....

slide-6
SLIDE 6

S.Will, 18.417, Fall 2011

Incomplete history of SA&F

  • 1985 Sankoff. Computationally heavy, no implementation
  • 1997 Foldalign (Gorodkin et only stems, simpler energy
  • 2002 Dynalign (Mathews, Turner) first “full” implementation
  • 2004 PMcomp (Hofacker et al.) clever simplification
  • 2007 FoldalignM Mc (Torarinsson et al.), PMcomp

implementation

  • 2007 LocARNA (Will, et al.), PMcomp-based, more time and

space efficient, optionally local

  • 2008 RAF (Do, et al.), PMcomp-based, sequence-sparsity,

machine learning

  • 2011 LocARNA-P (Will, et al.), efficient partition function
slide-7
SLIDE 7

S.Will, 18.417, Fall 2011

PMcomp: A Realistic Nussinov-style Sankoff-Algorithm

Idea:

  • Simplify Energy Model of SA&F:

Loop-based (Zuker-style) ⇒ Base-pair-based (Nussinov-style)

  • Advantage?
  • Problem?
  • Add realistic energy scoring again!: McCaskill pair probabilities
slide-8
SLIDE 8

S.Will, 18.417, Fall 2011

PMcomp: Nussinov-style Sankoff — Recursion

Mi j;k l = max            Mi j−1;k l−1 + σ(Aj, Bl) Mi j−1;k l + γ Mi j;k l−1 + γ max

j′l′ Mi j′−1;k l′−1 + Dj′ j;l′ l

Di j;k l = Mi+1 j−1;k+1 l−1 + τ(i, j, k, l)

i j l k j’ l’

slide-9
SLIDE 9

S.Will, 18.417, Fall 2011

PMcomp: Nussinov-style Sankoff — Recursion

Mi j;k l = max            Mi j−1;k l−1 + σ(Aj, Bl) Mi j−1;k l + γ Mi j;k l−1 + γ max

j′l′ Mi j′−1;k l′−1 + Dj′ j;l′ l

Di j;k l = Mi+1 j−1;k+1 l−1 + τ(i, j, k, l)

i j l k j’ l’

slide-10
SLIDE 10

S.Will, 18.417, Fall 2011

PMcomp — Scoring

Mi j;k l = max            Mi j−1;k l−1 + σ(Aj, Bl) Mi j−1;k l + γ Mi j;k l−1 + γ max

j′l′ Mi j′−1;k l′−1 + Dj′ j;l′ l

Di j;k l = Mi+1 j−1;k+1 l−1 + τ(i, j, k, l) Idea:

  • τ(i, j, k, l) = ΨA

ij + ΨB kl

  • ΨA

ij , ΨB kl: log odds scores for base-pairs

  • “McCaskill”-basepair probabilities vs. background

Hofacker et al. Alignment of RNA base pairing probability

  • matrices. Bioinformatics, 2004.
slide-11
SLIDE 11

S.Will, 18.417, Fall 2011

Complexity PMcomp

Mi j;k l = max            Mi j−1;k l−1 + σ(Aj, Bl) Mi j−1;k l + γ Mi j;k l−1 + γ max

j′l′ Mi j′−1;k l′−1 + Dj′ j;l′ l

Di j;k l = Mi+1 j−1;k+1 l−1 + τ(i, j, k, l)

  • O(n2 · m2) entries in M
  • per entry: O(nm) time

Total Complexity: O(n3m3) time, O(n2m2) space

slide-12
SLIDE 12

S.Will, 18.417, Fall 2011

LocARNA: Making PMcomp/Sankoff practical

Ideas:

  • follow PMcomp idea for scoring
  • only consider significant base pairs: “cut-off probability”

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

  • reformulate recursion
  • profit in time and space complexity
slide-13
SLIDE 13

S.Will, 18.417, Fall 2011

Effect of Base-Pair Filtering

pcutoff = 0.005

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

slide-14
SLIDE 14

S.Will, 18.417, Fall 2011

Effect of Base-Pair Filtering

pcutoff = 0.01

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

slide-15
SLIDE 15

S.Will, 18.417, Fall 2011

Effect of Base-Pair Filtering

pcutoff = 0.05

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

slide-16
SLIDE 16

S.Will, 18.417, Fall 2011

Effect of Base-Pair Filtering

pcutoff = 0.1

1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364

slide-17
SLIDE 17

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Matrices

b1 b2 b3 b4

D

a1 a2 a3 a1 a2 a3 b1 b2 b3 b4

1 n 1 m A B

slide-18
SLIDE 18

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Matrices

b1 b2 b3 b4

D

a1 a2 a3 a1 a2 a3 b1 b2 b3 b4

1 n 1 m A B 1 m n

M 1

slide-19
SLIDE 19

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Matrices

b1 b2 b3 b4

D

a1 a2 a3 a1 a2 a3 b1 b2 b3 b4

1 n 1 m A B 1 m n

M 1

slide-20
SLIDE 20

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Recursion

a=(al,ar) a a b=(bl,br) b b al al al bl bl bl ar ar ar br br br

= +

D(a,b) M(a,b;ar−1,br−1) tau(a,b)

slide-21
SLIDE 21

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Recursion

a=(al,ar) a a a a b=(bl,br) b b b b al+1 al+1 al+1 al+1 al+1 bl+1 bl+1 bl+1 bl+1 bl+1 i i i a’ b’ i i j j j j j

= max

M(a,b;i,j) M(a,b;i−1,j−1) + sigma(Ai,Bj) M(a,b;i,j−1) + gamma M(a,b;i−1,j) + gamma max a’b’: M(a,b;a’l−1,b’l−1) + D(a’,b’) where a’r=i, b’r=j

slide-22
SLIDE 22

S.Will, 18.417, Fall 2011

Locarna Basic Algorithm: Recursion

Ma b(i, j) = max                  Ma b(i − 1, j − 1) + σ(Ai, Bj) Ma b(i − 1, j) + γ Ma b(i, j − 1) + γ max

a′b′ Ma b(a′ l − 1, b′ l − 1) + D(a′, b′)

where a′

r = i, b′ r = j

D(a, b) = Ma b(ar − 1, br − 1) + τ(a, b)

slide-23
SLIDE 23

S.Will, 18.417, Fall 2011

Complexity LocARNA

Ma b(i, j) = max                  Ma b(i − 1, j − 1) + σ(Ai, Bj) Ma b(i − 1, j) + γ Ma b(i, j − 1) + γ max

a′b′ Ma b(a′ l − 1, b′ l − 1) + D(a′, b′)

where a′

r = i, b′ r = j

D(a, b) = Ma b(ar − 1, br − 1) + τ(a, b)

  • compute D(a, b) for all base-pairs edges:

a ∈ P1, b ∈ P2 [and a, b compatible] = ⇒ O(|P1||P2|)

  • combine D(a, b)-computation for common (al, bl) ⇒ O(nm)
  • per (al, bl): O(nm · rdeg1 rdeg2)

Total Complexity: O(nm|P1||P2|) time, O(|P1||P2| + nm) space

slide-24
SLIDE 24

S.Will, 18.417, Fall 2011

Affine Gap Cost

  • Basic algorithm: linear gap cost
  • Affine gap cost g(k) = α + β · k: ala Gotoh

1 m n

M F E

1

slide-25
SLIDE 25

S.Will, 18.417, Fall 2011

Affine Gap Cost

Ma b(i, j) = max                  Ma b(i − 1, j − 1) + σ(Ai, Bj) E a b

i

(j) F a b

i j

max

a′b′ Ma b(a′ l − 1, b′ l − 1) + D(a′, b′)

where a′

r = i, b′ r = j

D(a, b) = Ma b(ar − 1, br − 1) + τ(a, b) E a b

i

(j)= max{E a b

i−1(j) + β, Ma b(i − 1, j) + α + β}

F a b

i j = max{F a b i j−1 + β, Ma b(i, j − 1) + α + β}

slide-26
SLIDE 26

S.Will, 18.417, Fall 2011

Stacking

  • Distinguish stacked and un-stacked base pair matches
  • Implementation without change of recursion structure
  • No additional computational cost

a1 a2 a3 b1 b2 b3 b4

1 n 1 m A B

slide-27
SLIDE 27

S.Will, 18.417, Fall 2011

Stacking

  • Distinguish stacked and un-stacked base pair matches
  • Implementation without change of recursion structure
  • No additional computational cost

a1 a2 a3 b1 b2 b3 b4

1 n 1 m A B

slide-28
SLIDE 28

S.Will, 18.417, Fall 2011

Stacking Recursion

Ma b(i, j) = max                  Ma b(i − 1, j − 1) + σ(Ai, Bj) Ma b(i − 1, j) + γ Ma b(i, j − 1) + γ max

a′b′ Ma b(a′ l − 1, b′ l − 1) + D(a′, b′)

where a′

r = i, b′ r = j

D(a, b) = max      Ma b(ar − 1, br − 1) + τ(a, b) D(a′, b′) + τ ′(a, b) where (a, b) stacked to (a′, b′)

slide-29
SLIDE 29

S.Will, 18.417, Fall 2011

LocARNA: sequence local alignment

  • find best alignment of subsequences
  • special “last” recursion for pseudo-arcs a0, b0

Ma0 b0(i, j) = max                      Ma0 b0(i − 1, j − 1) + σ(Aj, Bl) Ma0 b0(i − 1, j) + γ Ma0 b0(i, j − 1) + γ max

a′b′ Ma0 b0(a′ l − 1, b′ l − 1) + D(a′, b′)

where a′

r = i, b′ r = j

  • back-trace from maximal entry to 0-entry (cf. local sequence

alignment).

slide-30
SLIDE 30

S.Will, 18.417, Fall 2011

LocARNA: structure local alignment

What is structure local?

A G C G A C G C G C U A G C U A G C A G CU C GG A C C U G

exlcusion exlcusion

allowed disallowed

.....(((..((....)) ..((.. ..))...((....))..))).. CGACCCGUCGACUCUAGU AGAGUU GAUUGACUAUAUCUAGGACGGG || | | | | | | CCCCGAGGGCGGCAGCCC−GAAUCUGAUAGGGAUUGA GUCCUUCG ....((..((....)).. .((.((....))..)).. .)).....

Find best alignment of “connected” sub-structures.

Idea

  • exclusions, allow only one per basepair-match per sequence
  • counting: 0/1 exclusions in seq 1, 0/1 exclusions in seq 2

= ⇒ 4 states/matrices

  • Gotoh’s trick: exclusion opening + exclusion extension

= ⇒ 8 states/matrices

Reward: Structure locality without increasing complexity

slide-31
SLIDE 31

S.Will, 18.417, Fall 2011

Application of LocARNA: Clustering of RNAs

  • GOAL: identify groups of related RNAs
  • IN: set of RNAs
  • OUT: hierarchical clustering of RNAs
  • Steps
  • compare RNAs all-2-all using LocARNA
  • cluster-tree by hiererchical clustering (WPGMA)
  • identify meaningful clusters
  • Application: cluster RNAs from RNAz screen

RNAz can identify potential non-coding RNAs in genomes more about RNAz and prediction of ncRNA in genomes: Guest Lecture: Thursday, Oct 27: Stefan Washietl

slide-32
SLIDE 32

S.Will, 18.417, Fall 2011

Evaluation: Reproducing RNA families of Rfam

Rfam = collection of RNA families and their alignments ( = known classification)

1.0 0.8 0.6 0.4 0.2 0.0 Specificity = dd/(dd+sd) 0.0 0.2 0.4 0.6 0.8 1.0 Sensitivity = ss/(ss+ds)

:

Minimum recall level Average recall Average precision 0.50 0.5818 0.8280 0.55 0.6996 0.7819 0.60 0.7277 0.7530 0.65 0.7596 0.7117 0.70 0.8092 0.6831 0.75 0.8519 0.5949 0.80 0.8763 0.5701 0.85 0.9381 0.4794 0.90 0.9599 0.4419 0.95 0.9766 0.3907

slide-33
SLIDE 33

S.Will, 18.417, Fall 2011

LocARNA: Clustering of RNAz ncRNA Predictions

  • Clustering of 3332 putative ncRNAs in Ciona intestinalis
  • Clustering of bacterial RNAz predictions

ec_123−EC[tRNA−Leu] ec_109−EC[16S_ribosomal_RNA] ec_81−EC[16S_ribosomal_RNA] ec_89−EC[16S_ribosomal_RNA] ec_96−EC[16S_ribosomal_RNA] ec_122−EC[tRNA−Leu] ec_23−EC[tRNA−Ser] ec_107 ec_24−EC[tRNA−Tyr] ec_3−EC[16S_ribosomal_RNA] ec_2−RF[SSU_rRNA_5] ec_91−EC[tRNA−Ile] ec_5−EC[tRNA−Ile] ec_1 ec_26 ec_64−EC[unknown_RNA] ec_32−EC[tRNA−Asn] ec_75−EC[16S_ribosomal_RNA] ec_46−EC[16S_ribosomal_RNA] ec_28 ec_106−EC[tRNA−Gly] ec_61−EC[tRNA−Phe] ec_50 ec_116−EC[23S_ribosomal_RNA] ec_54 ec_104−EC[23S_ribosomal_RNA] ec_78−EC[16S_ribosomal_RNA] ec_49−EC[16S_ribosomal_RNA] ec_112−EC[23S_ribosomal_RNA] ec_7−EC[23S_ribosomal_RNA] ec_84−EC[23S_ribosomal_RNA] ec_93−EC[23S_ribosomal_RNA] ec_100−EC[23S_ribosomal_RNA] ec_110−EC[16S_ribosomal_RNA] ec_12−RF[THI] ec_4−EC[16S_ribosomal_RNA] ec_60−RF[THI] ec_82−EC[16S_ribosomal_RNA] ec_90−EC[16S_ribosomal_RNA] ec_97−EC[16S_ribosomal_RNA] ec_79 ec_71−EC[23S_ribosomal_RNA] ec_43−EC[23S_ribosomal_RNA] ec_33−EC[tRNA−Asn] ec_121−EC[tRNA−Gly] ec_119−EC[tRNA−Gly] ec_120−EC[tRNA−Gly] ec_40−EC[tRNA−Lys] ec_39−EC[tRNA−Val] ec_38−EC[tRNA−Val] ec_18−EC[tRNA−Val] ec_27−EC[tRNA−Val] ec_11−EC[tRNA−Arg] ec_21−EC[tRNA−Lys] ec_20−EC[tRNA−Lys] ec_86−EC[tRNA−His] ec_57−EC[tRNA−Met] ec_56−EC[tRNA−Met] ec_34−EC[tRNA−Asn] ec_105−EC[tRNA−Tyr] ec_98−EC[tRNA−Glu] ec_118−EC[tRNA−Phe] ec_65−EC[tRNA−Met] ec_74−EC[tRNA−Ile] ec_73−EC[tRNA−Ala] ec_37−EC[tRNA−Ala] ec_69−EC[tRNA−Thr] ec_58−EC[tRNA−Gly] ec_31−EC[tRNA−Gly] ec_53−EC[tRNA−Arg] ec_52−EC[tRNA−Arg] ec_51−EC[tRNA−Arg] ec_15−EC[tRNA−Met] ec_17−EC[tRNA−Gln] ec_16−EC[tRNA−Gln] ec_14−EC[tRNA−Gln] ec_95−EC[Spot_42_RNA] ec_45−EC[tRNA−Glu] ec_41 ec_117 ec_108 ec_13−EC[tRNA−Gln] ec_68−RF[5S_rRNA] ec_66−EC[tRNA−Leu] ec_30−EC[unknown_RNA] ec_80 ec_63 ec_111−EC[23S_ribosomal_RNA] ec_6−EC[23S_ribosomal_RNA] ec_83−EC[23S_ribosomal_RNA] ec_92−EC[23S_ribosomal_RNA] ec_99−EC[23S_ribosomal_RNA] ec_25−EC[tRNA−Tyr] ec_87 ec_76−EC[16S_ribosomal_RNA] ec_47−EC[16S_ribosomal_RNA] ec_62−EC[tRNA−Ile] ec_22−RF[THI] ec_10−RF[THI] ec_114−EC[23S_ribosomal_RNA] ec_102−EC[23S_ribosomal_RNA] ec_19−EC[tRNA−Val] ec_59−EC[6S_regulatory_RNA] ec_55−EC[regulatory_sRNA] ec_115−EC[23S_ribosomal_RNA] ec_103−EC[23S_ribosomal_RNA] ec_88 ec_36 ec_29 ec_77−EC[16S_ribosomal_RNA] ec_48−EC[16S_ribosomal_RNA] ec_9−RF[THI] ec_72−EC[23S_ribosomal_RNA] ec_44−EC[23S_ribosomal_RNA] ec_70−EC[23S_ribosomal_RNA] ec_42−EC[23S_ribosomal_RNA] ec_113−EC[23S_ribosomal_RNA] ec_8−EC[23S_ribosomal_RNA] ec_101−EC[23S_ribosomal_RNA] ec_85−EC[23S_ribosomal_RNA] ec_94−EC[23S_ribosomal_RNA] ec_67 ec_35 0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1399.9 1599.9

slide-34
SLIDE 34

S.Will, 18.417, Fall 2011

LocARNA: Clustering of RNAz ncRNA Predictions

  • Clustering of 3332 putative ncRNAs in Ciona intestinalis
  • Clustering of bacterial RNAz predictions

ec_123−EC[tRNA−Leu] ec_109−EC[16S_ribosomal_RNA] ec_81−EC[16S_ribosomal_RNA] ec_89−EC[16S_ribosomal_RNA] ec_96−EC[16S_ribosomal_RNA] ec_122−EC[tRNA−Leu] ec_23−EC[tRNA−Ser] ec_107 ec_24−EC[tRNA−Tyr] ec_3−EC[16S_ribosomal_RNA] ec_2−RF[SSU_rRNA_5] ec_91−EC[tRNA−Ile] ec_5−EC[tRNA−Ile] ec_1 ec_26 ec_64−EC[unknown_RNA] ec_32−EC[tRNA−Asn] ec_75−EC[16S_ribosomal_RNA] ec_46−EC[16S_ribosomal_RNA] ec_28 ec_106−EC[tRNA−Gly] ec_61−EC[tRNA−Phe] ec_50 ec_116−EC[23S_ribosomal_RNA] ec_54 ec_104−EC[23S_ribosomal_RNA] ec_78−EC[16S_ribosomal_RNA] ec_49−EC[16S_ribosomal_RNA] ec_112−EC[23S_ribosomal_RNA] ec_7−EC[23S_ribosomal_RNA] ec_84−EC[23S_ribosomal_RNA] ec_93−EC[23S_ribosomal_RNA] ec_100−EC[23S_ribosomal_RNA] ec_110−EC[16S_ribosomal_RNA] ec_12−RF[THI] ec_4−EC[16S_ribosomal_RNA] ec_60−RF[THI] ec_82−EC[16S_ribosomal_RNA] ec_90−EC[16S_ribosomal_RNA] ec_97−EC[16S_ribosomal_RNA] ec_79 ec_71−EC[23S_ribosomal_RNA] ec_43−EC[23S_ribosomal_RNA] ec_33−EC[tRNA−Asn] ec_121−EC[tRNA−Gly] ec_119−EC[tRNA−Gly] ec_120−EC[tRNA−Gly] ec_40−EC[tRNA−Lys] ec_39−EC[tRNA−Val] ec_38−EC[tRNA−Val] ec_18−EC[tRNA−Val] ec_27−EC[tRNA−Val] ec_11−EC[tRNA−Arg] ec_21−EC[tRNA−Lys] ec_20−EC[tRNA−Lys] ec_86−EC[tRNA−His] ec_57−EC[tRNA−Met] ec_56−EC[tRNA−Met] ec_34−EC[tRNA−Asn] ec_105−EC[tRNA−Tyr] ec_98−EC[tRNA−Glu] ec_118−EC[tRNA−Phe] ec_65−EC[tRNA−Met] ec_74−EC[tRNA−Ile] ec_73−EC[tRNA−Ala] ec_37−EC[tRNA−Ala] ec_69−EC[tRNA−Thr] ec_58−EC[tRNA−Gly] ec_31−EC[tRNA−Gly] ec_53−EC[tRNA−Arg] ec_52−EC[tRNA−Arg] ec_51−EC[tRNA−Arg] ec_15−EC[tRNA−Met] ec_17−EC[tRNA−Gln] ec_16−EC[tRNA−Gln] ec_14−EC[tRNA−Gln] ec_95−EC[Spot_42_RNA] ec_45−EC[tRNA−Glu] ec_41 ec_117 ec_108 ec_13−EC[tRNA−Gln] ec_68−RF[5S_rRNA] ec_66−EC[tRNA−Leu] ec_30−EC[unknown_RNA] ec_80 ec_63 ec_111−EC[23S_ribosomal_RNA] ec_6−EC[23S_ribosomal_RNA] ec_83−EC[23S_ribosomal_RNA] ec_92−EC[23S_ribosomal_RNA] ec_99−EC[23S_ribosomal_RNA] ec_25−EC[tRNA−Tyr] ec_87 ec_76−EC[16S_ribosomal_RNA] ec_47−EC[16S_ribosomal_RNA] ec_62−EC[tRNA−Ile] ec_22−RF[THI] ec_10−RF[THI] ec_114−EC[23S_ribosomal_RNA] ec_102−EC[23S_ribosomal_RNA] ec_19−EC[tRNA−Val] ec_59−EC[6S_regulatory_RNA] ec_55−EC[regulatory_sRNA] ec_115−EC[23S_ribosomal_RNA] ec_103−EC[23S_ribosomal_RNA] ec_88 ec_36 ec_29 ec_77−EC[16S_ribosomal_RNA] ec_48−EC[16S_ribosomal_RNA] ec_9−RF[THI] ec_72−EC[23S_ribosomal_RNA] ec_44−EC[23S_ribosomal_RNA] ec_70−EC[23S_ribosomal_RNA] ec_42−EC[23S_ribosomal_RNA] ec_113−EC[23S_ribosomal_RNA] ec_8−EC[23S_ribosomal_RNA] ec_101−EC[23S_ribosomal_RNA] ec_85−EC[23S_ribosomal_RNA] ec_94−EC[23S_ribosomal_RNA] ec_67 ec_35 0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1399.9 1599.9

tRNAs 16S rRNA substructures 16S rRNA substructures

ec_123−EC[tRNA−Leu] ec_109−EC[16S_ribosomal_RNA] ec_81−EC[16S_ribosomal_RNA] ec_89−EC[16S_ribosomal_RNA] ec_96−EC[16S_ribosomal_RNA] ec_122−EC[tRNA−Leu] ec_23−EC[tRNA−Ser] ec_107 ec_24−EC[tRNA−Tyr] ec_3−EC[16S_ribosomal_RNA] ec_2−RF[SSU_rRNA_5] ec_91−EC[tRNA−Ile] ec_5−EC[tRNA−Ile] ec_1 ec_26 ec_64−EC[unknown_RNA] ec_32−EC[tRNA−Asn] ec_75−EC[16S_ribosomal_RNA] ec_46−EC[16S_ribosomal_RNA] ec_28 ec_106−EC[tRNA−Gly] ec_61−EC[tRNA−Phe] ec_50 ec_116−EC[23S_ribosomal_RNA] ec_54 ec_104−EC[23S_ribosomal_RNA] ec_78−EC[16S_ribosomal_RNA] ec_49−EC[16S_ribosomal_RNA] ec_112−EC[23S_ribosomal_RNA] ec_7−EC[23S_ribosomal_RNA] ec_84−EC[23S_ribosomal_RNA] ec_93−EC[23S_ribosomal_RNA] ec_100−EC[23S_ribosomal_RNA] ec_110−EC[16S_ribosomal_RNA] ec_12−RF[THI] ec_4−EC[16S_ribosomal_RNA] ec_60−RF[THI] ec_82−EC[16S_ribosomal_RNA] ec_90−EC[16S_ribosomal_RNA] ec_97−EC[16S_ribosomal_RNA] ec_79 ec_71−EC[23S_ribosomal_RNA] ec_43−EC[23S_ribosomal_RNA] ec_33−EC[tRNA−Asn] ec_121−EC[tRNA−Gly] ec_119−EC[tRNA−Gly] ec_120−EC[tRNA−Gly] ec_40−EC[tRNA−Lys] ec_39−EC[tRNA−Val] ec_38−EC[tRNA−Val] ec_18−EC[tRNA−Val] ec_27−EC[tRNA−Val] ec_11−EC[tRNA−Arg] ec_21−EC[tRNA−Lys] ec_20−EC[tRNA−Lys] ec_86−EC[tRNA−His] ec_57−EC[tRNA−Met] ec_56−EC[tRNA−Met] ec_34−EC[tRNA−Asn] ec_105−EC[tRNA−Tyr] ec_98−EC[tRNA−Glu] ec_118−EC[tRNA−Phe] ec_65−EC[tRNA−Met] ec_74−EC[tRNA−Ile] ec_73−EC[tRNA−Ala] ec_37−EC[tRNA−Ala] ec_69−EC[tRNA−Thr] ec_58−EC[tRNA−Gly] ec_31−EC[tRNA−Gly] ec_53−EC[tRNA−Arg] ec_52−EC[tRNA−Arg] ec_51−EC[tRNA−Arg] ec_15−EC[tRNA−Met] ec_17−EC[tRNA−Gln] ec_16−EC[tRNA−Gln] ec_14−EC[tRNA−Gln] ec_95−EC[Spot_42_RNA] ec_45−EC[tRNA−Glu] ec_41 ec_117 ec_108 ec_13−EC[tRNA−Gln] ec_68−RF[5S_rRNA] ec_66−EC[tRNA−Leu] ec_30−EC[unknown_RNA] ec_80 ec_63 ec_111−EC[23S_ribosomal_RNA] ec_6−EC[23S_ribosomal_RNA] ec_83−EC[23S_ribosomal_RNA] ec_92−EC[23S_ribosomal_RNA] ec_99−EC[23S_ribosomal_RNA] ec_25−EC[tRNA−Tyr] ec_87 ec_76−EC[16S_ribosomal_RNA] ec_47−EC[16S_ribosomal_RNA] ec_62−EC[tRNA−Ile] ec_22−RF[THI] ec_10−RF[THI] ec_114−EC[23S_ribosomal_RNA] ec_102−EC[23S_ribosomal_RNA] ec_19−EC[tRNA−Val] ec_59−EC[6S_regulatory_RNA] ec_55−EC[regulatory_sRNA] ec_115−EC[23S_ribosomal_RNA] ec_103−EC[23S_ribosomal_RNA] ec_88 ec_36 ec_29 ec_77−EC[16S_ribosomal_RNA] ec_48−EC[16S_ribosomal_RNA] ec_9−RF[THI] ec_72−EC[23S_ribosomal_RNA] ec_44−EC[23S_ribosomal_RNA] ec_70−EC[23S_ribosomal_RNA] ec_42−EC[23S_ribosomal_RNA] ec_113−EC[23S_ribosomal_RNA] ec_8−EC[23S_ribosomal_RNA] ec_101−EC[23S_ribosomal_RNA] ec_85−EC[23S_ribosomal_RNA] ec_94−EC[23S_ribosomal_RNA] ec_67 ec_35 0.0 200.0 400.0 600.0 800.0 1000.0 1200.0 1399.9 1599.9

slide-35
SLIDE 35

S.Will, 18.417, Fall 2011

LocARNA Cluster: Known and Predicted microRNAs

alidot.ps A U U _ _ G A U A U G A A A A A A U A C _ U C U U A A _ G A U U G _ A A A G U A U G A U _ U C A U A A U A C A U A A A U G U G G A A U A C A A A A U A U A _ _ _ A U A A U _ _ _ U U A U U A U A C U U C U G G A A U A A U A _ G A U U _ _ G A U A U G A A A A A A U A C _ U C U U A A _ G A U U G _ A A A G U A U G A U _ U C A U A A U A C A U A A A U G U G G A A U A C A A A A U A U A _ _ _ A U A A U _ _ _ U U A U U A U A C U U C U G G A A U A A U A _ G A U U _ _ G A U A U G A A A A A A U A C _ U C U U A A _ G A U U G _ A A A G U A U G A U _ U C A U A A U A C A U A A A U G U G G A A U A C A A A A U A U A _ _ _ A U A A U _ _ _ U U A U U A U A C U U C U G G A A U A A U A _ G A U U _ _ G A U A U G A A A A A A U A C _ U C U U A A _ G A U U G _ A A A G U A U G A U _ U C A U A A U A C A U A A A U G U G G A A U A C A A A A U A U A _ _ _ A U A A U _ _ _ U U A U U A U A C U U C U G G A A U A A U A _ G A U U _ _ G A U A U G AA A A A A U A C _ U C U U A A _ G A U UG _ A A A G U A U G A U _ U C A U A A U A CA U A A A U G U G G A A U A C A A A A U A U A _ _ _ A U A A U _ _ _ U U A U U A U A C U U C U G G A A U A AU A _ G A A U _ G C A U A U A A A U A A U A A G C G

735 N=3 SCI=0.79 MPI=29.18

A A A _ A C A U U U UC A U G C C U G U G G A A U G C G G G G A U G A G _ U A C G A U G U A A C G A A U A A U G G A A A A A A A C _ C G G A U U A A A A G G C A A G U U A AG U G _ A _ A U _ C U G U A _ _ C _ A U A G U G G G _ G C G A U A G A A A G A _ A U A U U G alidot.ps A A A _ A C A U U U U C A U G C C U G U G G A A U G C G G G G A U G A G _ U A C G A U G U A A C G A A U A A U G G A A A A A A A C _ C G G A U U A A A A G G C A A G U U A A G U G _ A _ A U _ C U G U A _ _ C _ A A A _ A C A U U U U C A U G C C U G U G G A A U G C G G G G A U G A G _ U A C G A U G U A A C G A A U A A U G G A A A A A A A C _ C G G A U U A A A A G G C A A G U U A A G U G _ A _ A U _ C U G U A _ _ C _ A A A _ A C A U U U U C A U G C C U G U G G A A U G C G G G G A U G A G _ U A C G A U G U A A C G A A U A A U G G A A A A A A A C _ C G G A U U A A A A G G C A A G U U A A G U G _ A _ A U _ C U G U A _ _ C _ A A A _ A C A U U U U C A U G C C U G U G G A A U G C G G G G A U G A G _ U A C G A U G U A A C G A A U A A U G G A A A A A A A C _ C G G A U U A A A A G G C A A G U U A A G U G _ A _ A U _ C U G U A _ _ C _ alidot.ps U A A U A C A A A G U C G A A A G U G C A G A A A A A A _ G U C _ C G C C A U _ C U C C C U A A C G A C C C A C G G A A A C A C C C A A A _ A A G U A U G U C A G C _ U C G U A C A G G G A C U A A U A A A A C A _ _ U U C U C G C G A A C U C A A U A C A A C C G G C A U A A U A C A A A G U C G A A A G U G C A G A A A A A A _ G U C _ C G C C A U _ C U C C C U A A C G A C C C A C G G A A A C A C C C A A A _ A A G U A U G U C A G C _ U C G U A C A G G G A C U A A U A A A A C A _ _ U U C U C G C G A A C U C A A U A C A A C C G G C A U A A U A C A A A G U C G A A A G U G C A G A A A A A A _ G U C _ C G C C A U _ C U C C C U A A C G A C C C A C G G A A A C A C C C A A A _ A A G U A U G U C A G C _ U C G U A C A G G G A C U A A U A A A A C A _ _ U U C U C G C G A A C U C A A U A C A A C C G G C A U A A U A C A A A G U C G A A A G U G C A G A A A A A A _ G U C _ C G C C A U _ C U C C C U A A C G A C C C A C G G A A A C A C C C A A A _ A A G U A U G U C A G C _ U C G U A C A G G G A C U A A U A A A A C A _ _ U U C U C G C G A A C U C A A U A C A A C C G G C A U A A U A C A A A G U C G A A A G U G C A G A A A A A A _ G U C _ C G C C A U _ C U C C CU A A C G A C C C A C G G A A A C A C C C A A A _ A A G U A U G U C A G C_ U C G U A C A G G G A C U A A U A A A A C A_ _ U U C U C G C G A A C U C AA U A C A A C C G G C A U _ C C A C C A alidot.ps U G _ A _ _ U A U U _ _ A A A A A A C C A U U G A U G A G U G G U G A _ U U C U G G G _ A A C G U U G C _ G U G U G A C U A _ U A C A A U A A U A C A U C C G _ G _ G A A U G A U A A U G A A G U U U U G C G G A A G U C _ A _ U A _ A C _ U G _ A _ _ U A U U _ _ A A A A A A C C A U U G A U G A G U G G U G A _ U U C U G G G _ A A C G U U G C _ G U G U G A C U A _ U A C A A U A A U A C A U C C G _ G _ G A A U G A U A A U G A A G U U U U G C G G A A G U C _ A _ U A _ A C _ U G _ A _ _ U A U U _ _ A A A A A A C C A U U G A U G A G U G G U G A _ U U C U G G G _ A A C G U U G C _ G U G U G A C U A _ U A C A A U A A U A C A U C C G _ G _ G A A U G A U A A U G A A G U U U U G C G G A A G U C _ A _ U A _ A C _ U G _ A _ _ U A U U _ _ A A A A A A C C A U U G A U G A G U G G U G A _ U U C U G G G _ A A C G U U G C _ G U G U G A C U A _ U A C A A U A A U A C A U C C G _ G _ G A A U G A U A A U G A A G U U U U G C G G A A G U C _ A _ U A _ A C _ U G _ A _ _ U A U U _ _ A A A A A A CCA U U G A U G A G U G G U G A _ U U C U G G G _ A A C GU U G C _ G U G U GA C UA _ U A C A A U A A U A C A U C C G _G _ G A A U G A U A A U G A A G U UUU G C G G A A G U C _ A _ U A _ A C _ A U A G A U U G A G U G C G C G A U U G A C A _ alidot.ps _ A A A C A C A A U U U C A U G A C U G U G G A _ U G C G G A G _ U G U G U A A A _ U U U U G A A U A C A A A U U G A A _ A _ A U C _ U _ G _ G U A U U A U A C U G A A G G A A U C U G A _ _ _ A A A C A C A A U U U C A U G A C U G U G G A _ U G C G G A G _ U G U G U A A A _ U U U U G A A U A C A A A U U G A A _ A _ A U C _ U _ G _ G U A U U A U A C U G A A G G A A U C U G A _ _ _ A A A C A C A A U U U C A U G A C U G U G G A _ U G C G G A G _ U G U G U A A A _ U U U U G A A U A C A A A U U G A A _ A _ A U C _ U _ G _ G U A U U A U A C U G A A G G A A U C U G A _ _ _ A A A C A C A A U U U C A U G A C U G U G G A _ U G C G G A G _ U G U G U A A A _ U U U U G A A U A C A A A U U G A A _ A _ A U C _ U _ G _ G U A U U A U A C U G A A G G A A U C U G A _ _ _ A A A C A C A A U UU C A U G A C U G U G G A _ U G C G G A G _ U G U GU A A A _ U U U U G A A U A C A A A U U G A A _ A _ A U C _ U _G _ G U A U U A U A C U G A A GG A A U C U G A _ _ A U A U U G A U U G _ G C A A A G A U A U _ A U A U C U alidot.ps U A A U A _ _ _ A _ U _ _ _ C A _ U G C A G U A A U A A _ G U C A U G C C A _ U G U U C A G U A G U G A U U A C C G G G A A C A _ U G C U G _ A U G A C G A C A C A A U A A G G C A C C C G G G A A A U G C C A A A C A A U U U U U G C C G C U C U C A A C A A U A A U G G U A U A A U A _ _ _ A _ U _ _ _ C A _ U G C A G U A A U A A _ G U C A U G C C A _ U G U U C A G U A G U G A U U A C C G G G A A C A _ U G C U G _ A U G A C G A C A C A A U A A G G C A C C C G G G A A A U G C C A A A C A A U U U U U G C C G C U C U C A A C A A U A A U G G U A U A A U A _ _ _ A _ U _ _ _ C A _ U G C A G U A A U A A _ G U C A U G C C A _ U G U U C A G U A G U G A U U A C C G G G A A C A _ U G C U G _ A U G A C G A C A C A A U A A G G C A C C C G G G A A A U G C C A A A C A A U U U U U G C C G C U C U C A A C A A U A A U G G U A U A A U A _ _ _ A _ U _ _ _ C A _ U G C A G U A A U A A _ G U C A U G C C A _ U G U U C A G U A G U G A U U A C C G G G A A C A _ U G C U G _ A U G A C G A C A C A A U A A G G C A C C C G G G A A A U G C C A A A C A A U U U U U G C C G C U C U C A A C A A U A A U G G U A U A A U A _ _ _ A _ U _ _ _ C A _ U G C A G U AAU A A _ G U C A U G C C A _ U GU U C A GU A G U G A U U A C C G G G A A C A _ U G C U G _ A U G A C G A C A C A A U A A G G C A C C C G G G A A A U G C C A A A C AA U U U U U G C C G C UCU C A A C A A U A A U G G U A G U U A C A A U A U A A C A G C A _ A C A U A _ U A A A C U A A U A A C G A A A G A G C G A U G A U C A G U AG U A U U U A C U G U G G AC C U U G C U G U G U G A C G U C A C A A U A A G GC A C G C G G U G A A U G CC A A C C A A A A G U C G C C G C UC G C A A C A A G A A U G C G A U A A alidot.ps U A A A C U A A U A A C G A A A G A G C G A U G A U C A G U A G U A U U U A C U G U G G A C C U U G C U G U G U G A C G U C A C A A U A A G G C A C G C G G U G A A U G C C A A C C A A A A G U C G C C G C U C G C A A C A A G A A U G U A A A C U A A U A A C G A A A G A G C G A U G A U C A G U A G U A U U U A C U G U G G A C C U U G C U G U G U G A C G U C A C A A U A A G G C A C G C G G U G A A U G C C A A C C A A A A G U C G C C G C U C G C A A C A A G A A U G U A A A C U A A U A A C G A A A G A G C G A U G A U C A G U A G U A U U U A C U G U G G A C C U U G C U G U G U G A C G U C A C A A U A A G G C A C G C G G U G A A U G C C A A C C A A A A G U C G C C G C U C G C A A C A A G A A U G U A A A C U A A U A A C G A A A G A G C G A U G A U C A G U A G U A U U U A C U G U G G A C C U U G C U G U G U G A C G U C A C A A U A A G G C A C G C G G U G A A U G C C A A C C A A A A G U C G C C G C U C G C A A C A A G A A U G

743 N=11 SCI=0.54 MPI=24.96 739 N=7 SCI=0.86 MPI=26.8 742 N=4 SCI=1.03 MPI=33.58 740 N=2 SCI=1.03 MPI=68.97 738 N=4 SCI=1.08 MPI=27.65 741 N=2 SCI=1.13 MPI=36.84

mir-124

  • taken from clustering of 3332

predicted ncRNAs in

  • C. intestinalis
  • local and global alignment
  • f base pairing probability

matrices

  • detection of conserved

structural RNAs by clustering

  • successfully tested on RFAM
slide-36
SLIDE 36

S.Will, 18.417, Fall 2011

Case Study 1

ci_558069 ci_557306 ci_555831 ci_555830 ci_555401 ci_557531 ci_556678 ci_557415 ci_554789 ci_555710 ci_555454

cluster1378 cluster1381 cluster1383

alidot.ps A _ _ U _ _ C U U G C U U A A C A C A U C G G A G U A A G A G G C G G _ U U A A C C A A A A U G U U C G U G G A _ C G A A C A C A C C C A C G G A C C U _ U G G U _ A A U _ C A A C U U C A A A A U U A A A A C U A A C G A _ _ U _ _ C U U G C U U A A C A C A U C G G A G U A A G A G G C G G _ U U A A C C A A A A U G U U C G U G G A _ C G A A C A C A C C C A C G G A C C U _ U G G U _ A A U _ C A A C U U C A A A A U U A A A A C U A A C G A _ _ U _ _ C U U G C U U A A C A C A U C G G A G U A A G A G G C G G _ U U A A C C A A A A U G U U C G U G G A _ C G A A C A C A C C C A C G G A C C U _ U G G U _ A A U _ C A A C U U C A A A A U U A A A A C U A A C G A _ _ U _ _ C U U G C U U A A C A C A U C G G A G U A A G A G G C G G _ U U A A C C A A A A U G U U C G U G G A _ C G A A C A C A C C C A C G G A C C U _ U G G U _ A A U _ C A A C U U C A A A A U U A A A A C U A A C G A _ _ U _ _ C U U G C U U A A C A C AU C GG A G U A A G A GG C G G _ U U A A C C A A A A U G U U C G U G G A_C G A A C A C A C C C A C G G A C C U_ U GG U _ A A U_ C A A C U U C A A A A U U A A A A C U A A C G G U G _ A _ G C A U U G C A G U U A G C U C U G U A A A

cluster1378 N=5 MPI=34.15 SCI=0.74

alidot.ps G G _ G A A U U A C U C A U A G U C G C C _ _ _ U U G A _ A _ G C _ C U U A C C A A _ A G _ U A G U A A A C C _ U A C C A C A A G A U G A A G A A _ _ C U G A A A G A C U U G U A A _ U G G A U U G G U _ _ _ A U U U A G G _ G A A U U A C U C A U A G U C G C C _ _ _ U U G A _ A _ G C _ C U U A C C A A _ A G _ U A G U A A A C C _ U A C C A C A A G A U G A A G A A _ _ C U G A A A G A C U U G U A A _ U G G A U U G G U _ _ _ A U U U A G G _ G A A U U A C U C A U A G U C G C C _ _ _ U U G A _ A _ G C _ C U U A C C A A _ A G _ U A G U A A A C C _ U A C C A C A A G A U G A A G A A _ _ C U G A A A G A C U U G U A A _ U G G A U U G G U _ _ _ A U U U A G G _ G A A U U A C U C A U A G U C G C C _ _ _ U U G A _ A _ G C _ C U U A C C A A _ A G _ U A G U A A A C C _ U A C C A C A A G A U G A A G A A _ _ C U G A A A G A C U U G U A A _ U G G A U U G G U _ _ _ A U U U A G G _ G A A U U A C U C AU A G U C G C C _ _ _ U U G A _ A _ G C_ C U U A C C A A _ A G _ U A G U A A A C C _U A C CA C A A G A U G A A G A A _ _ C U G A A AG A C U U G U A A _ U G G A U U G G U _ _ _ A U U U A CG G _ A C A U A U U G A G _ A G A

cluster1381 N=4 MPI=25.36 SCI=0.79

alidot.ps U U C G A C C A _ A U C A C A G C _ C C C C C A A A C C G A C C C A _ C _ A A C C G C C C C C G A A A A A G A A A _ A C A A U A U A A A C A A A U G A C A _ C A A C _ A U C G C G G G _ C U A A G U _ A C A C C A A C A G A A C C G C C G U C U U C G A C C A _ A U C A C A G C _ C C C C C A A A C C G A C C C A _ C _ A A C C G C C C C C G A A A A A G A A A _ A C A A U A U A A A C A A A U G A C A _ C A A C _ A U C G C G G G _ C U A A G U _ A C A C C A A C A G A A C C G C C G U C U U C G A C C A _ A U C A C A G C _ C C C C C A A A C C G A C C C A _ C _ A A C C G C C C C C G A A A A A G A A A _ A C A A U A U A A A C A A A U G A C A _ C A A C _ A U C G C G G G _ C U A A G U _ A C A C C A A C A G A A C C G C C G U C U U C G A C C A _ A U C A C A G C _ C C C C C A A A C C G A C C C A _ C _ A A C C G C C C C C G A A A A A G A A A _ A C A A U A U A A A C A A A U G A C A _ C A A C _ A U C G C G G G _ C U A A G U _ A C A C C A A C A G A A C C G C C G U C U U C G A C C A _ A U C A C A G C _ C C C C C A A AC C G A C C C A _ C _ A A C C G C C C C C G A A A A A G A A A _ A C A A U A U A A A C A A A U G A C A _ C A A C_ A UC G C G G G_ C U A A G U _ A C A C C A A C A G A A C C G C C G U C A A A _ C C G C A C _ C C G U A G C C C

cluster1383 N=2 MPI=31.09 SCI=0.89

alidot.ps _ G U G G U A A A A A U A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U G U U A A A G C _ U _ _ A A C A C A _ A C A _ G G A A G U _ U _ G U U A A U _ A C U A C U U C A A G A U U G _ _ A C U _ A A A G _ _ G U G G U A A A A A U A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U G U U A A A G C _ U _ _ A A C A C A _ A C A _ G G A A G U _ U _ G U U A A U _ A C U A C U U C A A G A U U G _ _ A C U _ A A A G _ _ G U G G U A A A A A U A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U G U U A A A G C _ U _ _ A A C A C A _ A C A _ G G A A G U _ U _ G U U A A U _ A C U A C U U C A A G A U U G _ _ A C U _ A A A G _ _ G U G G U A A A A A U A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U G U U A A A G C _ U _ _ A A C A C A _ A C A _ G G A A G U _ U _ G U U A A U _ A C U A C U U C A A G A U U G _ _ A C U _ A A A G _ _ G U G G U A A A A A U A UU U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U G U U A AAG C_U__AA C A C A _ A C A _ G G A A G U _ U_ G U U A A U _ A C U A C U U C A A G A U U G _ _ A C U _ A A A G _ CG A U UG A U U A G C U A A G A U G U U A A U

cluster1382 N=9 MPI=26.41 SCI=0.45

alidot.ps U G G G U A U U U A A A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U A U U A A A G C A U _ _ A A C A C A _ A U A _ G G A C G U _ U _ G U U _ A A G _ A C A A G U U C A A G A U U G _ _ A C U _ A A C G U G G G U A U U U A A A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U A U U A A A G C A U _ _ A A C A C A _ A U A _ G G A C G U _ U _ G U U _ A A G _ A C A A G U U C A A G A U U G _ _ A C U _ A A C G U G G G U A U U U A A A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U A U U A A A G C A U _ _ A A C A C A _ A U A _ G G A C G U _ U _ G U U _ A A G _ A C A A G U U C A A G A U U G _ _ A C U _ A A C G U G G G U A U U U A A A U U U G A C U A C _ _ _ _ _ G A _ G U C G C _ U U A A C A A A A A _ U A U U A A A G C A U _ _ A A C A C A _ A U A _ G G A C G U _ U _ G U U _ A A G _ A C A A G U U C A A G A U U G _ _ A C U _ A A C G U G G G U A U U U A A A UU U G A C U A C _ _ _ _ _ G A _ GU C G C_ U U A A C A A A A A _ U A U U A A A G C A U_ _ A A C AC A _ A U A _ G G A C G U _ U_ G U U _ A A G _ A C A A G U U C A A G A U U G _ _ A C U _ A A C G C U A G UG CG A U U U A U A U U A A C U A G C C G A U C G A U U A A G

cluster1384 N=11 MPI=24.96 SCI=0.45

0.1

cluster1382 cluster1384

slide-37
SLIDE 37

S.Will, 18.417, Fall 2011

Case Study 2

ci_558075 ci_554658 ci_557675 ci_557680 ci_555810 ci_554114 ci_557993 ci_557520 ci_556669 ci_555017 ci_554243 ci_557863 ci_556203 ci_554798

cluster1241 cluster1248 cluster1238 cluster1240 cluster1245 cluster1247

alidot.ps _ _ U G C G U U A U _ _ A U U G _ A U U A G U _ U A C A U U U _ A U G A C _ G C A U A U _ G A G G A U G C A G A U _ U G _ A G A C _ G A A C _ A U _ C U U _ U G G _ U G A G _ _ _ G A C _ A U G _ G G _ G G A U G G _ _ _ G U _ _ G _ G U A U _ _ _ _ _ _ C _ _ U G C G U U A U _ _ A U U G _ A U U A G U _ U A C A U U U _ A U G A C _ G C A U A U _ G A G G A U G C A G A U _ U G _ A G A C _ G A A C _ A U _ C U U _ U G G _ U G A G _ _ _ G A C _ A U G _ G G _ G G A U G G _ _ _ G U _ _ G _ G U A U _ _ _ _ _ _ C _ _ U G C G U U A U _ _ A U U G _ A U U A G U _ U A C A U U U _ A U G A C _ G C A U A U _ G A G G A U G C A G A U _ U G _ A G A C _ G A A C _ A U _ C U U _ U G G _ U G A G _ _ _ G A C _ A U G _ G G _ G G A U G G _ _ _ G U _ _ G _ G U A U _ _ _ _ _ _ C _ _ U G C G U U A U _ _ A U U G _ A U U A G U _ U A C A U U U _ A U G A C _ G C A U A U _ G A G G A U G C A G A U _ U G _ A G A C _ G A A C _ A U _ C U U _ U G G _ U G A G _ _ _ G A C _ A U G _ G G _ G G A U G G _ _ _ G U _ _ G _ G U A U _ _ _ _ _ _ C _ _ U G C G U U A U _ _ A U U G _ A U U A G U _ U A C A U U U_ A U G A C _G C A U A U _G A G G A UG C A G A U _ UG _ A G A C _G A A C _ A U _ C U U _ U G G _U G A G _ _ _ G A C _ A U G _ G G _ G G A U G G _ _ _ G U _ _ G _ G U A U _ _ _ _ _ _ C A U G C C G A U G U U A U A U U A U A C U A G G U G A A G A G C G A G U G C U U A U _ _ C U G G

cluster1249 N=14 MPI=22.89 SCI=0.46

alidot.ps C C A A A G U G U U C A A U U A G A C _ A A C A _ A A U G C U G A U A C A C C A A A G G A A A A G U A C A U A _ A _ G C C _ _ G C _ C U A A C C A _ A U A C U U G A A U U A G U C A U G A C A G A C A A A G G C U G A A A U A U _ C C A A A G U G U U C A A U U A G A C _ A A C A _ A A U G C U G A U A C A C C A A A G G A A A A G U A C A U A _ A _ G C C _ _ G C _ C U A A C C A _ A U A C U U G A A U U A G U C A U G A C A G A C A A A G G C U G A A A U A U _ C C A A A G U G U U C A A U U A G A C _ A A C A _ A A U G C U G A U A C A C C A A A G G A A A A G U A C A U A _ A _ G C C _ _ G C _ C U A A C C A _ A U A C U U G A A U U A G U C A U G A C A G A C A A A G G C U G A A A U A U _ C C A A A G U G U U C A A U U A G A C _ A A C A _ A A U G C U G A U A C A C C A A A G G A A A A G U A C A U A _ A _ G C C _ _ G C _ C U A A C C A _ A U A C U U G A A U U A G U C A U G A C A G A C A A A G G C U G A A A U A U _ C C A A A G U G U U C A A U U A G A C _ A A C A _ A A U G C UG AU A C A C C A AA G G AAAA G U A CA U A _A_G C C _ _G C _C U A A C C A_A U A C U UG A A U U A G U C A U G A C A G A C A A A G G C U G A A A U A U _ G C A A A U U U U U A A _ A A _ A C G U C U A U U U

cluster1238 N=3 MPI=25.82 SCI=0.92

alidot.ps U G G G A G G C A U _ A U C G _ G U U A G A G A C A A U A A _ U G C C G C A A A U A G A G G G U A G _ G A U G _ _ U G G A G A A U C U _ A G U _ U G G A U G C G G C A C A U _ _ G G _ G G U C G C _ A _ _ U _ _ G C U U A U U U U C U C U G G G A G G C A U _ A U C G _ G U U A G A G A C A A U A A _ U G C C G C A A A U A G A G G G U A G _ G A U G _ _ U G G A G A A U C U _ A G U _ U G G A U G C G G C A C A U _ _ G G _ G G U C G C _ A _ _ U _ _ G C U U A U U U U C U C U G G G A G G C A U _ A U C G _ G U U A G A G A C A A U A A _ U G C C G C A A A U A G A G G G U A G _ G A U G _ _ U G G A G A A U C U _ A G U _ U G G A U G C G G C A C A U _ _ G G _ G G U C G C _ A _ _ U _ _ G C U U A U U U U C U C U G G G A G G C A U _ A U C G _ G U U A G A G A C A A U A A _ U G C C G C A A A U A G A G G G U A G _ G A U G _ _ U G G A G A A U C U _ A G U _ U G G A U G C G G C A C A U _ _ G G _ G G U C G C _ A _ _ U _ _ G C U U A U U U U C U C U G G G A G G C A U _ A U C G _ G U U A G A G A C A A U AA _ U G C C G C A AA U AG A G G G U A G _ G A U G _ _ U G G A G A A U C U _ A G U _ U G G A U G C G G C A C A U _ _ G G _ G G U C G C _ A _ _ U _ _ G C U U A U U U U C U C A G G A C G U A U U A G U _ C A C G G A G U G C _ _ U G G G

cluster1245 N=5 MPI=26.05 SCI=1.00

alidot.ps A G U G U A C A A _ U U A G _ A G C A _ U _ A A A G U G U _ G U U A C A C C U A A G _ C G A A C A C A G A U C _ A _ A A C C _ G C C C A A U C C A _ G U A U _ U A A G U A A G A G A _ U G _ _ G G U G A U A G U _ U U G A G A C A U A G U G U A C A A _ U U A G _ A G C A _ U _ A A A G U G U _ G U U A C A C C U A A G _ C G A A C A C A G A U C _ A _ A A C C _ G C C C A A U C C A _ G U A U _ U A A G U A A G A G A _ U G _ _ G G U G A U A G U _ U U G A G A C A U A G U G U A C A A _ U U A G _ A G C A _ U _ A A A G U G U _ G U U A C A C C U A A G _ C G A A C A C A G A U C _ A _ A A C C _ G C C C A A U C C A _ G U A U _ U A A G U A A G A G A _ U G _ _ G G U G A U A G U _ U U G A G A C A U A G U G U A C A A _ U U A G _ A G C A _ U _ A A A G U G U _ G U U A C A C C U A A G _ C G A A C A C A G A U C _ A _ A A C C _ G C C C A A U C C A _ G U A U _ U A A G U A A G A G A _ U G _ _ G G U G A U A G U _ U U G A G A C A U A G U G U A C A A _ U U A G _ A G C A _ U _ A A A G U G U _ G U U A C A C C U A A G _ C G A A C A C A G A UC _ A _ A A C C _G C CC A A UC C A _ G U A U _ U A A G U A A G A G A _ U G _ _ G G U G A U A G U _ U U G A G A C A U A U U A U U G A A G G C A C U C G A _ U _ _ G U

cluster1241 N=6 MPI=24.81 SCI=0.68

alidot.ps G _ _ _ A _ _ A A U A U A A U A _ U U A C _ A G C A G U A A A A C A A U A U U A A A U A U C A G U C G A A U A A G _ A A C U A _ A A A C _ G C C A A A U _ A G A C A U U _ U A C G A A A G A G A A U A _ _ G U U G G U _ _ U _ U A A _ G A C C A _ _ _ C C A G _ _ _ A _ _ A A U A U A A U A _ U U A C _ A G C A G U A A A A C A A U A U U A A A U A U C A G U C G A A U A A G _ A A C U A _ A A A C _ G C C A A A U _ A G A C A U U _ U A C G A A A G A G A A U A _ _ G U U G G U _ _ U _ U A A _ G A C C A _ _ _ C C A G _ _ _ A _ _ A A U A U A A U A _ U U A C _ A G C A G U A A A A C A A U A U U A A A U A U C A G U C G A A U A A G _ A A C U A _ A A A C _ G C C A A A U _ A G A C A U U _ U A C G A A A G A G A A U A _ _ G U U G G U _ _ U _ U A A _ G A C C A _ _ _ C C A G _ _ _ A _ _ A A U A U A A U A _ U U A C _ A G C A G U A A A A C A A U A U U A A A U A U C A G U C G A A U A A G _ A A C U A _ A A A C _ G C C A A A U _ A G A C A U U _ U A C G A A A G A G A A U A _ _ G U U G G U _ _ U _ U A A _ G A C C A _ _ _ C C A G _ _ _ A _ _ A A U A U A A U A _ U U A C _ A G C A G UA A AA C A A U A U U A A A U A U C A G U C G A A U A A G _ A AC U A _ AA A C_G C C A A A U _ A G A C A U U _ U A C G A A A G A G A A U A _ _ G U U G G U _ _ U _ U A A _ G A C C A _ _ _ C C A U A A U _ A _ C

cluster1240 N=3 MPI=23.72 SCI=0.99

alidot.ps U U G C G A A A U C C G C G U U A A A U G U U U G G U U C C A U U U A A A C G U A G G A G U U G A U G A A G C _ U G A U G U A G U G A G A C A C U C U G G U G U A A A C A C C U G G G U A G U C U G G G U A G _ G C G C A U _ C C A A U G C C C C A A U U G C G A A A U C C G C G U U A A A U G U U U G G U U C C A U U U A A A C G U A G G A G U U G A U G A A G C _ U G A U G U A G U G A G A C A C U C U G G U G U A A A C A C C U G G G U A G U C U G G G U A G _ G C G C A U _ C C A A U G C C C C A A U U G C G A A A U C C G C G U U A A A U G U U U G G U U C C A U U U A A A C G U A G G A G U U G A U G A A G C _ U G A U G U A G U G A G A C A C U C U G G U G U A A A C A C C U G G G U A G U C U G G G U A G _ G C G C A U _ C C A A U G C C C C A A U U G C G A A A U C C G C G U U A A A U G U U U G G U U C C A U U U A A A C G U A G G A G U U G A U G A A G C _ U G A U G U A G U G A G A C A C U C U G G U G U A A A C A C C U G G G U A G U C U G G G U A G _ G C G C A U _ C C A A U G C C C C A A U U G C G A AA U C C G C G U U A A A U G U U U G G U U CC A U U U A AA C G U AG G A G U U G A U G A A GC _ U G A UG U A G U G A G A C A C U C U GGUG U A A A C A C C U G G G U A G U C U GGG U A G _ G C G C A U _ C C A A U G C C C C A A U A CG

cluster1247 N=3 MPI=53.55 SCI=0.97

alidot.ps U A U G C G C C U U _ A A U G _ U U U A G U U C C A U U U A _ A G G C _ G C A A A U _ G A G G A A G G _ G G A U G U A G G G A A A C A C _ A U U _ G G G _ U G A A G A C _ A U _ _ G U _ G U U U G C _ _ _ _ U _ _ _ C G _ A U _ G C A U _ C U A U G C G C C U U _ A A U G _ U U U A G U U C C A U U U A _ A G G C _ G C A A A U _ G A G G A A G G _ G G A U G U A G G G A A A C A C _ A U U _ G G G _ U G A A G A C _ A U _ _ G U _ G U U U G C _ _ _ _ U _ _ _ C G _ A U _ G C A U _ C U A U G C G C C U U _ A A U G _ U U U A G U U C C A U U U A _ A G G C _ G C A A A U _ G A G G A A G G _ G G A U G U A G G G A A A C A C _ A U U _ G G G _ U G A A G A C _ A U _ _ G U _ G U U U G C _ _ _ _ U _ _ _ C G _ A U _ G C A U _ C U A U G C G C C U U _ A A U G _ U U U A G U U C C A U U U A _ A G G C _ G C A A A U _ G A G G A A G G _ G G A U G U A G G G A A A C A C _ A U U _ G G G _ U G A A G A C _ A U _ _ G U _ G U U U G C _ _ _ _ U _ _ _ C G _ A U _ G C A U _ C U A U G C G C C U U _ A A U G _ U U U A G U U C C A U U UA _ A G G C _G C A A AU _ GA G G A A G G _ G G A U G U A G G G A A A C A C _ A U U _ GGG _ U G A A G A C _ A U _ _ G U _ G U U U G C _ _ _ _ U _ _ _ C G _ A U _ G C A U _ C A U G C C G A U U A U A A G C G G A A G C G _ A C G G G U A U A G C U A _ _ G G

cluster1248 N=8 MPI=27.49 SCI=0.54 0.1

cluster1249

slide-38
SLIDE 38

S.Will, 18.417, Fall 2011

Multiple LocARNA: Progressive Alignment

AU−GUUGGAGGGGAACCC−GUAAGGGACCCUCCAAG−AU UUACGAUGUGCCGAACCCUUUAAGGGAGGCACAUCGAAA CC−−−UCGAGGGGAACCC−GAAAGGGACCCGAGA−−−GG

fruA C C U C G A G G G G A A C C C G A A A G G G A C C C G A G A G G C C U C G A G G G G A A C C C G A A A G G G A C C C G A G A G G C C U C G A G G G G A A C C C G A A A G G G A C C C G A G A G G C C U C G A G G G G A A C C C G A A A G G G A C C C G A G A G G fwdB A U G U U G G A G G G G A A C C C G U A A G G G A C C C U C C A A G A U A U G U U G G A G G G G A A C C C G U A A G G G A C C C U C C A A G A U A U G U U G G A G G G G A A C C C G U A A G G G A C C C U C C A A G A U A U G U U G G A G G G G A A C C C G U A A G G G A C C C U C C A A G A U selD U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A selD U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A U U A C G A U G U G C C G A A C C C U U U A A G G G A G G C A C A U C G A A A

A B C AB A B C

A B C

  • pairwise comparison all-2-all
  • guide tree
  • aligning alignments along guide tree
  • heuristic: can make mistakes
slide-39
SLIDE 39

S.Will, 18.417, Fall 2011

BRALIBASE 2.1

Compilation of “true” RNA alignments from Rfam Benchmark set for multiple RNA alignment Set #Sequences #Alignments k2 2 8976 k3 3 4835 k5 5 2405 k7 7 1426 k10 10 845 k15 15 503

slide-40
SLIDE 40

S.Will, 18.417, Fall 2011

Bralibase SPS plots

40 50 60 70 80 90 0.0 0.2 0.4 0.6 0.8 1.0

Bralibase 2.1 − clustalw − k2

APSI SPS reference clustalw/k15

slide-41
SLIDE 41

S.Will, 18.417, Fall 2011

Bralibase SPS plots

40 50 60 70 80 90 0.0 0.2 0.4 0.6 0.8 1.0

Bralibase 2.1 − clustalw − k2

APSI SPS reference clustalw/k15

slide-42
SLIDE 42

S.Will, 18.417, Fall 2011

Pairwise LocARNA vs. Others

30 40 50 60 70 80 90 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Bralibase 2.1

APSI SPS reference locarna/k2 lara/k2 foldalign/k2 stral/k2

Data for Lara, Foldalign, Stral: Bauer, Klau, Reinert. BMC 2007. Only ≤ 50% available.

slide-43
SLIDE 43

S.Will, 18.417, Fall 2011

Multiple LocARNA vs. Others - 7 sequences

30 40 50 60 70 80 90 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Bralibase 2.1 − k7

APSI SPS reference locarna/k7 lara/k7 foldalign/k7 stral/k7

slide-44
SLIDE 44

S.Will, 18.417, Fall 2011

Multiple LocARNA vs. Others - 15 sequences

30 40 50 60 70 80 90 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Bralibase 2.1 − k15

APSI SPS reference locarna/k15 lara/k15 foldalign/k15 stral/k15