Maximal Exact Matchings similar to (local) alignment, but only - - PowerPoint PPT Presentation

maximal exact matchings
SMART_READER_LITE
LIVE PREVIEW

Maximal Exact Matchings similar to (local) alignment, but only - - PowerPoint PPT Presentation

Maximal Exact Matchings similar to (local) alignment, but only identify parts that are exactly identical (no gaps) exact matches must be connected at sequence or structure level S.Will, 18.417, Fall 2011 faster than structure


slide-1
SLIDE 1

S.Will, 18.417, Fall 2011

Maximal Exact Matchings

  • similar to (local) alignment, but only identify parts that are

exactly identical (no gaps)

  • exact matches must be connected at sequence or structure

level

  • faster than structure alignment: O(n2) time & space
slide-2
SLIDE 2

S.Will, 18.417, Fall 2011

Maximal Exact Matchings

expaRNA: compute alignments fast with help of exact matchings

  • step 1: compute matchings
  • step 2: chaining of matchings (select “chain” of compatible

matchings, i.e. no overlap, no crossing)

  • step 3: compute alignment using chain of matchings as

anchor constraints in LocARNA → speed-up over LocARNA Steffen Heyne, Sebastian Will, Michael Beckstette, Rolf Backofen, Lightweight Comparison of RNAs Based on Exact Sequence-Structure Matches, Bioinformatics 2009

slide-3
SLIDE 3

S.Will, 18.417, Fall 2011

Bralibase 2.1-Benchmark

20 40 60 80 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Sequence Identity SPS reference Locarna ExpLoc−P (heuristic) ExpLoc−P (suboptimal) ExpLoc (minsize10) ExpLoc (minsize9) RAF

  • ExpLoc: Exact matchings as anchors in LocARNA; 4.4, 5.4 times

speed-up

  • ExpLoc-P: Exact matchings from structure ensembles as anchors in

LocARNA (submitted RECOMB’12; speed-up: 4.9, 6.0)

  • RAF: Do et al., Bioinformatics 2008; speed-up 15.9
slide-4
SLIDE 4

S.Will, 18.417, Fall 2011

Whole Genome Realignment for ncRNA Prediction

UU U G A A _ _ _ _ _ _ C G U U U C U U C G A U U C A U C A A G _ _ _ G U U U A A __ U G A A U C U A U G G A G C G A G U A A U G G G C U U G A A G C U G U _ G U U U A U C U G G U C A C A U G U A U U G A _ _ _ _ _ _ _ _ _ _ A U G G C G U A U G U U C C G U A A U A U A C C G U A _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ U U U G A G G G _ U U U C UUGGC U U C A U U A A G G U U U A AUG A A U C U A U G G A G CG A G UA A UG C G C U U GA A G C U G G G U G U U U C U G G U C A C A U G U A U _ _ _ U G A _ _ A G U U A U A C G G C G G G C

whole genome alignment

1

stable loci

realign based on sequence and structure

realigned loci

2 3

slice and filter by thermodynamic stability

  • f single RNA structures

estimate ncRNA likelihood

predictions (q-values)

unstable conserved structure of original locus alignment stable conserved structure

............................................................ DroMel_CAF1

  • ----------------------TTTGAGTG-TTTCTTGTGTTCATTAAGGTTTAATGAA 36

DroSim_CAF1

  • ----------------------TTTTAGTG-TTTCTTGGGTTCATTAAGGTTTAATGAA 36

DroYak_CAF1

  • ----------------------TTTGATGG-TTACTTTGCTTCATCAAGGTTTAATGGT 36

DroEre_CAF1

  • ----------------------TTTGATGG-TTTCTTTGCTTCATCAAGTTTTAATGAT 36

DroPse_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 DroPer_CAF1 GGGCCATGGCCTCCTCTGATCGATTAG-GGGTTTTCTTGCTTGATTTATCGGTTGATGGA 59 .........10........20........30........40........50......... ..((((..((.((....((((........))))....)).))..))))............ DroMel_CAF1 TCTATGGAGCGAGTAATGCGCTTGAAGCTGTGTTTATCTGGTCACATGTAT---TGA--A 91 DroSim_CAF1 TCTATGGAGCGAGTACTGGGCTTGAAGCTGGGCTTATCTGGTCACATGTAT---TGA--A 91 DroYak_CAF1 TCTATGGAGCGAGTATTGGGCTTGAAGCTGTGTGTTTCTGGTCGCATGTAT---TGA--A 91 DroEre_CAF1 TCTATGAAGCGAGTATTGCGCTTGAAGCTGTGTGTTTCTGGTCACATGTAT---TGA--A 91 DroPse_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 DroPer_CAF1 GCAATGGGGTG----ATGCTAGTGA--GTGGGTGATTCTGGCCATGGCCATAGGTGAATA 113 .........70........80........90........100.......110........

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ U U U G A G G G _ U U U C UUGGC U U C A U U A A G G U U U A AUG A A U C U A U G G A G CG A G UA A UG C G C U U GA A G C U G G G U G U U U C U G G U C A C A U G U A U _ _ _ U G A _ _ A G U U A U A C G G C G G G C

structure- based realignment − − −− →

.((((.......((((((.((((((((((((......))))..)))))))).)))))).. DroMel_CAF1 UUUGAG------UGUUUCUUGUGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroSim_CAF1 UUUUAG------UGUUUCUUGGGUUCAUUAAG---GUUUAA--UGAAUCUAUGGAGCGAG 49 DroYak_CAF1 UUUGAU------GGUUACUUUGCUUCAUCAAG---GUUUAA--UGGUUCUAUGGAGCGAG 49 DroEre_CAF1 UUUGAU------GGUUUCUUUGCUUCAUCAAG---UUUUAA--UGAUUCUAUGAAGCGAG 49 DroPse_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 DroPer_CAF1 GGGCCAUGGCCUCCUCUGAUCGAUUAGGGGUUUUCUUGCUUGAUUUAUCGGUUGAUGGAG 60 .........10........20........30........40........50......... .....((((.(((((....)))..)).)))).......))))........... DroMel_CAF1 UAAUGCGCUUGAAGCUGU-GUUUAUCUGGUCACAUGUAUUGA----------A 91 DroSim_CAF1 UACUGGGCUUGAAGCUGG-GCUUAUCUGGUCACAUGUAUUGA----------A 91 DroYak_CAF1 UAUUGGGCUUGAAGCUGU-GUGUUUCUGGUCGCAUGUAUUGA----------A 91 DroEre_CAF1 UAUUGCGCUUGAAGCUGU-GUGUUUCUGGUCACAUGUAUUGA----------A 91 DroPse_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 DroPer_CAF1 CAAUGGGGUGAUGCUAGUGAGUGGGUGAUUCUGGCCAUGGCCAUAGGUGAAUA 113 .........70........80........90........100.......110.

UU U G A A _ _ _ _ _ _ C G U U U C U U C G A U U C A U C A A G _ _ _ G U U U A A _ _ U G A A U C U A U G G A G C G A G U A A U G G G C U U G A A G C U G U _ G U U U A U C U G G U C A C A U G U A U U G A _ _ _ _ _ _ _ _ _ _ A U G G C G U A U G U U C C G U A A U A U A C C G U A

slide-5
SLIDE 5

S.Will, 18.417, Fall 2011

RNA Shapes : Idea

  • A more coarse-grained look at RNA structure
  • intuition: often general shape of RNA is more important for

RNA function than “details”

  • example: cloverleaf structure of tRNAs

Shape can be considered at different levels of abstraction Robert Giegerich, Bj¨

  • rn Voss, Marc Rehmsmeier, Abstract

shapes of RNA, Nucleic Acids Research, 2004

slide-6
SLIDE 6

S.Will, 18.417, Fall 2011

RNA Shapes : different Shape Types

  • 5 Most abstract - helix nesting pattern and no unpaired

regions

  • 4 Helix nesting pattern in internal loops and multiloops
  • 3 Nesting pattern for all loop types but no unpaired regions
  • 2 Nesting pattern for all loop types and unpaired regions in

bulges, internal loops, and multiloops

  • 1 Most accurate - all loops and all unpaired

RNAshapes: Computes shape probabilities for a sequence (+ Shrep = representative structure for each shape)