A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF - - PowerPoint PPT Presentation

a new annotation tool for malaria based on inference of
SMART_READER_LITE
LIVE PREVIEW

A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF - - PowerPoint PPT Presentation

A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF PROBABILISTIC GENETIC NETWORKS J. Barrera 1, R. M. Cesar Jr. 1 , D. C. Martins Jr. 1 , E. F Merino 2 , R. Z. N. Vncio 1 , F. G. Leonardi 1, M. M. Yamamoto 2 , C. A. B. Pereira 1 , H. A.


slide-1
SLIDE 1

A NEW ANNOTATION TOOL FOR MALARIA BASED ON INFERENCE OF PROBABILISTIC GENETIC NETWORKS

  • J. Barrera1, R. M. Cesar Jr. 1, D. C. Martins Jr.1,
  • E. F Merino2 , R. Z. N. Vêncio1 , F. G. Leonardi1,
  • M. M. Yamamoto2, C. A. B. Pereira1, H. A. del Portillo2

UNIVERSITY OF SAO PAULO, BRAZIL 1- Institute of Mathematics and Statistics 2- Institute of Biomedical Sciences

slide-2
SLIDE 2

Layout

  • Introduction
  • Probabilistic genetic

network (PGN)

  • PGN design
  • Data analysis pipeline
  • Biological

interpretation

slide-3
SLIDE 3

Introduction

slide-4
SLIDE 4

Functional Classification

Sinusoidal signal Non Sinusoidal signal

slide-5
SLIDE 5

Interaction Graph

glycolysis plastid genome

slide-6
SLIDE 6

Probabilistic Genetic Network (PGN )

slide-7
SLIDE 7

Network dynamics: State of the regulatory network at time t:

  • =

] [ . . ] [ ] [ ] [

2 1

t x t x t x t x

n

} 1 , , 1 { ] [ + − ∈ t xi

]) [ ( ] 1 [ t x t x φ = +

Expression of gene i at time t:

slide-8
SLIDE 8

φi

] 1 [ + t xi ] [t x j ] [t xk

  • =

n

φ φ φ φ . . .

2 1

]) [ ( ] 1 [ t x t x

i i

φ = +

predictors target

slide-9
SLIDE 9

Probabilistic Genetic Network (PGN)

φi

− = + ]) [ ], [ | 1 ( 1 ]) [ ], [ | ( ]) [ ], [ | 1 ( 1 ] 1 [ t x t x p t x t x p t x t x p t x

k j k j k j i

] [t xj ] [t xk

]) [ ], [ | ( ]) [ ], [ | ( ]) [ ], [ | ( : }, 1 , , 1 { , , t x t x w p t x t x z p t x t x y p w z y w z y

k j k j k j

+ >> ≠ ≠ − ∈ ∃

slide-10
SLIDE 10
  • is characterized by the conditional probabilities

]) [ | ] 1 [ ( t x t x p

i

+

This system

  • depends just on the previous time
  • is time translation invariant
  • is a conditionally independent Markov chain

=

+ = +

n i i

t x t x p t x t x P

1

]) [ | ] 1 [ ( ]) [ | ] 1 [ (

slide-11
SLIDE 11

PGN Design

slide-12
SLIDE 12

PGN Design.

] 48 [ ],..., 2 [ ], 1 [ x x x

Target genes

slide-13
SLIDE 13

Entropy

= → −

} 1 , , 1 {

1 ) ( ] 1 , [ } 1 , , 1 { :

y

y P P

Distribution of Y

− =

} 1 , , 1 {

) ( log ) ( ) (

y

y P y P Y H

Mutual information

) | ( ) ( ) , ( ≥ − = X Y H Y H Y X I

  • 1

1 P(Y)

  • 1

1 P(Y’)

) ' ( ) ( Y H Y H > ) ' ' ( ) ' ( Y H Y H =

  • 1

1 P(Y’’)

slide-14
SLIDE 14

Mean mutual information estimation

∧ ∧ ∧

− = . )) | ( log( ) | ( ) ( )] | ( [ X Y P X Y P X P X Y H E

Mean conditional entropy

= )) | ( log( ). | ( ) ( )] | ( [ X Y P X Y P X P X Y H E

Mean mutual information

]] | [ [ ) ( )] , ( [ X Y H E Y H Y X I E − = )] | ( [ ) ( )] , ( [ X Y H E Y H Y X I E

∧ ∧ ∧

− =

slide-15
SLIDE 15

Estimation of P(Y|X) If #(X=(a,b)) ≥ n, then If #(X=(a,b)) < n, then is uniform

)) , ( | (

^

b a X Y P =

)) , ( ( # )) , ( ) (( # )) , ( | (

^

b a X b a X c Y b a X c Y P = = ∧ = = = =

Y: the taget gene at t+1, that is, X: the predictors at t, that is, ]) [ ], [ ( t x t x X

k j

= ] 1 [ + = t x Y

i

For a fixed parameter n

slide-16
SLIDE 16

Estimation of P(X) for a fixed parameter n

X P(X)

If #(X=(a,b)) ≥ n, then If #(X=(a,b)) < n, then

]) [ ], [ ( t x t x X

k j

=

< = −

= =

) , ( , )) , ( ( #

)) , ( ( #

b a n b a X

b a X N

≥ = +

= =

) , ( , )) , ( ( #

)) , ( ( #

b a n b a X

b a X N | } )) , ( ( :# ) , {( | 3 1 )) , ( (

2 ^

n b a X b a N N N b a X P ≥ = − × + = =

+ − − + + − +

= × + = = N b a X N N N b a X P )) , ( ( # )) , ( (

^

slide-17
SLIDE 17
  • For each target gene, rank all predictors by their mean

estimated mutual information;

  • Choose best predictors;
  • Design the interaction graph

Building Interactions Graphs

Target gene Predictor genes

slide-18
SLIDE 18

Data analysis pipeline

slide-19
SLIDE 19

USP dataset determination USP dataset

Scaling and quantization

Quantized USP dataset

Design

  • f

PGN

Target genes Plasmo DB Metabolic Pathways DeRisi´s transcriptome Table

  • f

predictors GraphViz Functional groups Overview set Output graph

Biological Interpretation

Data analysis pipeline

GPR

slide-20
SLIDE 20

USP-dataset

  • directly from original .gpr “raw” data;
  • intensity = foreground mean - background median;
  • mean for replicated time points;
  • different definition of “weak” spots and elimination rules;
  • no interpolation used;
  • consider ALL accepted oligos as unique entities (including

_almost sinusoidal). USP-dataset: 6532 oligos Overview dataset: 3719 oligos

slide-21
SLIDE 21

Weak spots definition

X = (0, 0, ... , 100, 100, ... , 100, 0, 0, ... , 0, 0) <X> = 9 * 100 / 46 = 19.56 R = normalized cy5/cy3 = X/<X> = R = (0, 0, ... , 5.11, 5.11, ... , 5.11, 0, 0, ... , 0, 0) log2(R) = (-∞, -∞, ... , 1.63, 1.63, ..., 1.63, -∞, -∞, ... , -∞) Not amenable to Fourier analysis due to infinities.

slide-22
SLIDE 22

Scaling

For each i, estimate the mean and standard desviation

]] [ [

^

t x E

i

]] [ [

^

t xi σ ]] [ [ ]] [ [ ] [

^ ^

t x t x E x t n

i i i i

σ − =

normal transform

slide-23
SLIDE 23

Quantization

Let and denote, respectively, the normalized signals greater and lower than zero at t..

] [t ni

+

] [t ni

1 then ]], [ [ ] [ If then ]], [ [ ] [ and ]] [ [ ] [ If 1 then ]], [ [ ] [ If

^ ^ ^ ^

− = < = < > + = >

− − + + − − + +

[t] x t n E t n [t] x t n E t n t n E t n [t] x t n E t n

i i i i i i i i i i i

slide-24
SLIDE 24

Output example

Plastid genome In Overview Organelar Translation machinery In Overview Unknown group In Overview Unknown group Not in Overview

In Overview Not in Overview

slide-25
SLIDE 25

Back

slide-26
SLIDE 26

Back

slide-27
SLIDE 27

Back

slide-28
SLIDE 28
slide-29
SLIDE 29

Biological Interpretation

slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32

Glycolytic PGN network (single genes)

glycolysis transcription machinery cytoplasmic translation ribonucleotide synthesis deoxynucleotide synthesis DNA replication proteoasome plastid genome kinases actin myosoin motors mitochondrial

isomerase G3PDH 6 PFK mutase hexokinase Pyruvate kinase TP isomerase aldolase Pyruvate kinase PG kinase enolase

slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35

No TCA genes

slide-36
SLIDE 36

550 apicoplast proteins 124 apicoplast proteins

slide-37
SLIDE 37
slide-38
SLIDE 38

Apicoplast PGN network (singlets)

glycolysis transcription machinery cytoplasmic translation ribonucleotide synthesis deoxynucleotide synthesis DNA replication proteoasome plastid genome kinases actin myosoin motors mitochondrial

slide-39
SLIDE 39
slide-40
SLIDE 40

Apicoplast PGN network (doublets)

glycolysis transcription machinery cytoplasmic translation ribonucleotide synthesis deoxynucleotide synthesis DNA replication proteoasome plastid genome kinases actin myosoin motors mitochondrial

slide-41
SLIDE 41

466/ bipartite 124/phase PGN 676 Biological validation

slide-42
SLIDE 42
  • J. Barrera, R.M. Cesar Jr., C. P. Pereira, D. Martins,
  • R. Z. Vencio, E. F. Merino, M. M. Yamamoto