Computational aspects of ncRNA research Mihaela Zavolan - - PowerPoint PPT Presentation

computational aspects of ncrna research
SMART_READER_LITE
LIVE PREVIEW

Computational aspects of ncRNA research Mihaela Zavolan - - PowerPoint PPT Presentation

Computational aspects of ncRNA research Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics Computational aspects on ncRNA research Bacterial ncRNAs Gene discovery Target discovery Discovery of transcription


slide-1
SLIDE 1

Computational aspects of ncRNA research

Mihaela Zavolan Biozentrum, Basel Swiss Institute of Bioinformatics

slide-2
SLIDE 2

Computational aspects on ncRNA research

  • Gene discovery
  • Target discovery
  • Discovery of transcription regulatory elements for

ncRNAs Bacterial ncRNAs

slide-3
SLIDE 3

Computational aspects on ncRNA research

  • Gene discovery:

automated annotation gene prediction

  • Expression profiling:

sample comparisons visualization

  • Target discovery:

modeling miRNA-mRNA target interaction

  • Characterization of regulatory networks involving RNAs:

miRNA target prediction prediction of transcription regulatory elements miRNAs

slide-4
SLIDE 4

Computational aspects on ncRNA research

siRNAs: design

  • Optimization of silencing efficacy
  • Minimization of off-target effects
slide-5
SLIDE 5

ncRNA gene prediction

Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure.

Proportion of miRNA sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)

Structure stabilization

GGACaag GUCC GUGCucauGUAC GGACag GUUC GUAUuuu GUAC

Identification of pairs of sites with high mutual information

Mutations that are fixed in evolution preserve RNA structure (covariance models behind tRNAscan-SE (S. Eddy), RNAalifold (I. Hofacker))

slide-6
SLIDE 6

ncRNA gene prediction

Main feature: RNA secondary structure is important. Look for evidence of selection on the secondary structure.

50-50 200-200 300-50

mir-100 is expected to preserve its hairpin secondary structure through the various steps of miRNA biogenesis.

slide-7
SLIDE 7

Prediction of bacterial ncRNAs

slide-8
SLIDE 8

http://cwx.prenhall.com/horton/medialib/media_portfolio/

TATA box factor binding site

Promoter regions recognized by 70 subunit of E.coli

slide-9
SLIDE 9

RNA hairpins regulate transcription termination

http://cwx.prenhall.com/horton/medialib/media_portfolio/

slide-10
SLIDE 10

Conserved secondary structures of Vibrio ncRNAs

Lenz et al. - The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae. Cell 118:69-82 (2004).

slide-11
SLIDE 11

miRNA gene discovery

Studies driven by computation Studies driven by experiment

  • 1. genome-wide computational

prediction

  • 2. validation

(Lai et al., 2003 - fly; Lim et al., 2003 - worm; Lim et al., 2003 - vertebrates; Berezikov et al., 2005 - vertebrates; Pfeffer et al., 2005 - viruses). 1. large-scale cloning 2. functional annotation 3. miRNA gene prediction 4. validation (Houbaviy et al., 2003 - mouse; Dostie et al., 2003 - rat; Aravin et al., 2003 - fly; Suh et al., 2004 - man; Pfeffer et al., 2004 - man, viruses).

Fast, incomplete. Laborious, exhaustive.

slide-12
SLIDE 12

Functional annotation of small RNAs

Genome sequence Sequences with known function (mRNA, rRNA, tRNA, miRNA, etc.)

ALIGNMENT

Small (16-30 nc) cloned RNAs

slide-13
SLIDE 13

Functional annotation of small RNAs

Small (16-30 nc) cloned RNAs

slide-14
SLIDE 14

Functional annotation of small RNAs

rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs

match known sequences

slide-15
SLIDE 15

Functional annotation of small RNAs

rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs

match genome multiple copies hairpin conservation

Novel miRNAs rRNA tRNA miRNA mRNA

slide-16
SLIDE 16

Functional annotation of small RNAs

rRNA tRNA miRNA mRNA Small (16-30 nc) cloned RNAs

match genome multiple copies hairpin conservation

Novel miRNAs rRNA tRNA miRNA mRNA

multiple genome matches

Novel miRNAs rRNA tRNA miRNA mRNA rasiRNA

slide-17
SLIDE 17

miRNA gene prediction

He & Hannon (Nat. Rev. Genet. 2004)

Main clue: miRNA precursors form stem loop structures Issues:

  • find the locations in the genome that can give rise to

miRNAs

  • predict the sequence of the mature miRNA
slide-18
SLIDE 18

... so do many other genomic regions

let-7a mir-147 Fragment of protein- coding gene

slide-19
SLIDE 19

miRNA gene prediction using SVM

Classify candidate stem loops using the model. Build a model from positive and negative examples. Detect candidate stem loops in (large) genomic sequences.

slide-20
SLIDE 20

miRNA gene prediction using SVM

hsa-let-7c

L = 84 dG = -33.5 kcal/mole Nucleotide composition:

A - 20% C - 19% G - 29% U - 32%

Paired nucleotides:

A-U - 31% G-U - 14% G-C - 29%

longest symmetrical region longest slighly asymmetrical region

Proportion of nucleotides in:

symmetrical loops - 17% asymmetrical loops - 4%

average distance between loops

negative stem

longest symmetrical regions longest slightly asymmetrical region

L = 68 dG = -22.6 kcal/mole Pfeffer et al. 2005

slide-21
SLIDE 21

miRNA gene prediction using SVM

Negatives Positives 29% false negatives 3% false positives

Negatives: mRNAs, rRNAs, tRNAs, viral stem loops Positives: human genomic regions containing known miRNAs

Features with largest negative weights: Free energy

  • Nr. nc. in symmetrical loops in LSAR
  • Nr. nc. in asymmetrical loops in LSAR
  • Avg. size of asymmetrical loops

Features with largest positive weights: Stem length Length longest symmetrical region

  • Nr. A-U pairs in LSAR
  • Nr. G-C pairs in LSAR

Used SVMlight http://svmlight.joachims.org/

slide-22
SLIDE 22

Detecting candidate stem loops

50-50 200-200 300-50

Search for stems whose secondary structure remains the same irrespective of their flanking sequences.

example: hsa-mir-100

86% of the known human microRNAs belong to such robust stems. Density of robust stems in human genome: approximately 1 every 10 kb.

slide-23
SLIDE 23

Classification of candidate stem loops

L = 78 dG = 31.6 kcal/mole

LSR LSAR

miRNA precursor? SVM score: 0.8 yes: miR-UL1 of CMV (cloning frequency: 101)

slide-24
SLIDE 24

Identification of microRNAs of the herpesvirus family. Nature Methods (2005).

Application: miRNA gene prediction in viruses

slide-25
SLIDE 25

Sensitivity-specificity plots for evaluating the performance of prediction programs

Sn = TP TP + FN ,Sp = TP TP + FP

slide-26
SLIDE 26

Sensitivity-specificity plots for evaluating the performance of prediction programs

Sn = TP TP + FN ,Sp = TP TP + FP

slide-27
SLIDE 27

Sensitivity-specificity plots for evaluating the performance of prediction programs

Sn = TP TP + FN ,Sp = TP TP + FP

slide-28
SLIDE 28

Sensitivity-specificity plots for evaluating the performance of prediction programs

Sn = TP TP + FN ,Sp = TP TP + FP

slide-29
SLIDE 29

Variations on miRNA gene prediction

Lim, L. P. et al. (2003) Genes & Dev. 17:991

=

f

w

f

  • f

v

slide-30
SLIDE 30

Variations on miRNA gene prediction

Berezikov, E. et al. (2005) Cell 120:21 Proportion of miRNA sequences with a P- value less than specified threshold (Bonnet et al. (2004) Bioinformatics 20:2911)

slide-31
SLIDE 31

Variations on miRNA gene prediction

Xie, X. et al. (2004) Nature 434:338

slide-32
SLIDE 32

miRNA gene prediction servers

http://genes.mit.edu/mirscan/ http://www.mirz.unibas.ch

slide-33
SLIDE 33

Prediction of ncRNAs using comparative genomics

RNAz (www.tbi.univie.ac.at/~wash/RNAz)

  • Start with an alignment of homologous sequences
  • Compute the following features:
  • mean free energy of aligned sequences
  • structure conservation index ( )
  • mean pairwise identity
  • number of sequences in the alignment
  • Use a SVM to classify candidates

SCI = EA / E

EA is the free energy of the alignment (takes into account mutations that preserve the structure), and is the mean free energy of aligned sequences. E

slide-34
SLIDE 34

Modeling miRNA-mRNA interaction for target prediction

target: C.e._COG-1A miRNA : cel-lsy-6 target 5' C CA A 3' GU CUUAUACAAAA CG GAGUAUGUUUU miRNA 3' GCUUUA CA 5' target: C.e_LIN-41A miRNA : cel-let-7 target 5' U AUU U 3' UUAUACAACC CUGCCUC GAUAUGUUGG GAUGGAG miRNA 3' UU AU U 5' target: C.e_hbl-1 miRNA : cel-let-7 target 5' U GUU C A 3' AUUAUACAACC C ACCUCA UGAUAUGUUGG G UGGAGU miRNA 3' U AU A 5'

Hybrids generated using RNAhybrid http://bibiserv.techfak.uni-bielefeld.de/rnahybrid/

Known miRNA-mRNA interactions in C.elegans

slide-35
SLIDE 35

Modeling miRNA-mRNA interaction

Use evolutionary conservation to determine what defines an miRNA target site.

  • Define an interaction model (e.g. the first 8

nucleotides of the miRNA have to be perfectly paired with their mRNA target site).

slide-36
SLIDE 36

Modeling miRNA-mRNA interaction

Use evolutionary conservation to determine what defines an miRNA target site.

  • Define an interaction model (e.g. the first 8

nucleotides of the miRNA have to be perfectly paired with their mRNA target site).

  • Determine the locations of all candidate sites in a

reference species (e.g. human).

slide-37
SLIDE 37

Modeling miRNA-mRNA interaction

Use evolutionary conservation to determine what defines an miRNA target site.

  • Define an interaction model (e.g. the first 8

nucleotides of the miRNA have to be perfectly paired with their mRNA target site).

  • Determine the locations of all candidate sites in a

reference species (e.g. human).

  • Determine the number of these candidate sites that

are conserved in a set of species that have the miRNA.

slide-38
SLIDE 38

Modeling miRNA-mRNA interaction

Use evolutionary conservation to determine what defines an miRNA target site.

  • Define an interaction model (e.g. the first 8

nucleotides of the miRNA have to be perfectly paired with their mRNA target site).

  • Determine the locations of all candidate sites in a

reference species (e.g. human).

  • Determine the number of these candidate sites that

are conserved in a set of species that have the miRNA.

  • Compare with the number of conserved candidate

sites that we get for a “random miRNA” that has approximately the same number of predicted sites in the species of reference.

slide-39
SLIDE 39

Modeling miRNA-mRNA interaction

Use evolutionary conservation to determine what defines an miRNA target site.

  • Define an interaction model (e.g. the first 8

nucleotides of the miRNA have to be perfectly paired with their mRNA target site).

  • Determine the locations of all candidate sites in a

reference species (e.g. human).

  • Determine the number of these candidate sites that

are conserved in a set of species that have the miRNA.

  • Compare with the number of conserved candidate

sites that we get for a “random miRNA” that has approximately the same number of predicted sites in the species of reference.

Lewis et al. 2005

slide-40
SLIDE 40

Modeling miRNA-mRNA interaction

S/N ratio Interaction model Some miRNAs have hundreds

  • f targets but many do not.
slide-41
SLIDE 41

miRNA target prediction servers

http://pictar.bio.nyu.edu/

slide-42
SLIDE 42

siRNA design

Empirical “rules” for siRNA design - derived from the work in the Tuschl Lab (siRNA user’s guide: http://www.rockefeller.edu/labheads/tuschl/sirna.html):

slide-43
SLIDE 43

siRNA design

Refining the rules by analyzing large datasets of siRNAs (Reynolds et al. 2004, many others): different siRNAs for the same gene can have markedly different silencing efficiencies.

slide-44
SLIDE 44

siRNA design

S<50% S>50% S>80% S>95%

+1 +1/A +1 +1 +1 +1

  • 1
  • 1
slide-45
SLIDE 45

siRNA design

Far et al. (2003) Nucl. Acids

  • Res. 31:4417

Accesibility of target site influences siRNA efficacy:

Target accessibility prediction server http://sfold.wadsworth.org/index.pl