Learning by Fusing Heterogeneous Data Marinka Zitnik Thesis - - PowerPoint PPT Presentation

learning by fusing heterogeneous data
SMART_READER_LITE
LIVE PREVIEW

Learning by Fusing Heterogeneous Data Marinka Zitnik Thesis - - PowerPoint PPT Presentation

Learning by Fusing Heterogeneous Data Marinka Zitnik Thesis Defense, October 22 2015 Motivation Marinka Zitnik - PhD Thesis Large Heterogeneous Data Compendia Marinka Zitnik - PhD Thesis Large Heterogeneous Data Compendia Large-scale


slide-1
SLIDE 1

Learning by Fusing Heterogeneous Data

Marinka Zitnik

Thesis Defense, October 22 2015

slide-2
SLIDE 2

Marinka Zitnik - PhD Thesis

Motivation

slide-3
SLIDE 3

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

slide-4
SLIDE 4

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments

slide-5
SLIDE 5

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments

slide-6
SLIDE 6

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments

Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

slide-7
SLIDE 7

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

slide-8
SLIDE 8

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

Global navigation satellite systems

slide-9
SLIDE 9

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

Global navigation satellite systems

slide-10
SLIDE 10

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

Global navigation satellite systems

Molecular biology

Response to bacterium Response to

  • ther organisms

Response to external biotic stimulus Response to external stimulus Response to biotic stimulus Defense response Defense response to

  • ther organism

Response to stress Defense response to bacterium T1 T2 T3 T4 T5 T6 T7 V1 V2 V3 V4 V5

slide-11
SLIDE 11

Marinka Zitnik - PhD Thesis

Large Heterogeneous Data Compendia

Large-scale physics experiments Social networks, recommender systems

users movies

  • ther objects

reviews relatjonships Good Will Huntjng Saving Private Ryan The Terminal Schindler’s List Matu Damon (actor) Drama (genre) Tom Hanks (actor) Steven Spielberg (director) War (tag) Liam Neeson (actor)

Global navigation satellite systems

Molecular biology

Response to bacterium Response to

  • ther organisms

Response to external biotic stimulus Response to external stimulus Response to biotic stimulus Defense response Defense response to

  • ther organism

Response to stress Defense response to bacterium T1 T2 T3 T4 T5 T6 T7 V1 V2 V3 V4 V5

slide-12
SLIDE 12

Marinka Zitnik - PhD Thesis

Complex relationships

slide-13
SLIDE 13

Marinka Zitnik - PhD Thesis

Complex relationships Objects of different types

slide-14
SLIDE 14

Marinka Zitnik - PhD Thesis

Complex relationships Different points in time, space and scale Objects of different types

slide-15
SLIDE 15

Marinka Zitnik - PhD Thesis

Complex relationships Different points in time, space and scale Objects of different types Different perspectives

slide-16
SLIDE 16

Marinka Zitnik - PhD Thesis

Warming-Up

slide-17
SLIDE 17

Marinka Zitnik - PhD Thesis

One Data Matrix

A B

slide-18
SLIDE 18

Marinka Zitnik - PhD Thesis

One Data Matrix

A B B A

Recipe matrix of A Recipe matrix of B Backbone matrix of A-B

slide-19
SLIDE 19

Marinka Zitnik - PhD Thesis

One Data Matrix

A B B A

Recipe matrix of A Recipe matrix of B Backbone matrix of A-B

slide-20
SLIDE 20

Marinka Zitnik - PhD Thesis

One Data Matrix

A B B A

Recipe matrix of A Recipe matrix of B Backbone matrix of A-B

~ ~

x x =

Reconstructed matrix A-B

slide-21
SLIDE 21

Marinka Zitnik - PhD Thesis

Two Data Matrices

C B A

slide-22
SLIDE 22

Marinka Zitnik - PhD Thesis

Two Data Matrices

C B A

C B A

slide-23
SLIDE 23

Marinka Zitnik - PhD Thesis

Two Data Matrices

C B A

C B A

slide-24
SLIDE 24

Marinka Zitnik - PhD Thesis

Two Data Matrices

C B A

C B A

slide-25
SLIDE 25

Marinka Zitnik - PhD Thesis

Two Data Matrices

C B A

C B A

Shared factor

slide-26
SLIDE 26

Marinka Zitnik - PhD Thesis

Data Fusion by Collective Matrix Factorization

slide-27
SLIDE 27

Marinka Zitnik - PhD Thesis

E B D C A F G

Many Data Matrices

slide-28
SLIDE 28

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

slide-29
SLIDE 29

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

slide-30
SLIDE 30

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

slide-31
SLIDE 31

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

slide-32
SLIDE 32

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

many shared factors

Optimization Problem

slide-33
SLIDE 33

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

many shared factors

Optimization Problem

Given

slide-34
SLIDE 34

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

many shared factors

Optimization Problem

Given Find latent matrices and that minimize

B A

GA GB SAB

slide-35
SLIDE 35

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

many shared factors

Optimization Problem

Given Find latent matrices and that minimize The problem is non-convex. The global optimum is unknown

B A

GA GB SAB

slide-36
SLIDE 36

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

Solution: DFMF Algorithm

slide-37
SLIDE 37

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

Solution: DFMF Algorithm

(8)

Input: A set R of relation matrices Rij; constraint matrices Θ(t) for t 2 {1, 2, . . . , maxi ti}; ranks k1, k2, . . . , kr (i, j 2 [r]). Output: Matrix factors S and G. 1) Initialize Gi for i = 1, 2, . . . , r. 2) Repeat until convergence:

  • Construct R and G using their definitions in Eq. (1) and
  • Eq. (3).
  • Update S using:

S (GT G)−1GT RG(GT G)−1.

  • Set G(e)

i

0 for i = 1, 2, . . . , r.

  • Set G(d)

i

0 for i = 1, 2, . . . , r.

  • For Rij 2 R:

G(e)

i

+= (RijGjST

ij)+ + Gi(SijGT j GjST ij)−

G(d)

i

+= (RijGjST

ij)− + Gi(SijGT j GjST ij)+

G(e)

j

+= (RT

ijGiSij)+ + Gj(ST ijGT i GiSij)−

G(d)

j

+= (RT

ijGiSij)− + Gj(ST ijGT i GiSij)+ (10)

  • For t = 1, 2, . . . , maxi ti:

G(e)

i

+= [Θ(t)

i

]−Gi for i = 1, 2, . . . , r G(d)

i

+= [Θ(t)

i

]+Gi for i = 1, 2, . . . , r (11)

  • Construct G as:

G G Diag( v u u t G(e)

1

G(d)

1

, v u u t G(e)

2

G(d)

2

, . . . , v u u t G(e)

r

G(d)

r

), (12) where denotes the Hadamard product. The p· and

· ·

are entry-wise operations.

slide-38
SLIDE 38

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

Solution: DFMF Algorithm

(8) Input: A set R of relation matrices Rij; constraint matrices Θ(t) for t 2 {1, 2, . . . , maxi ti}; ranks k1, k2, . . . , kr (i, j 2 [r]). Output: Matrix factors S and G. 1) Initialize Gi for i = 1, 2, . . . , r. 2) Repeat until convergence:
  • Construct R and G using their definitions in Eq. (1) and
  • Eq. (3).
  • Update S using:
S (GT G)−1GT RG(GT G)−1.
  • Set G(e)
i 0 for i = 1, 2, . . . , r.
  • Set G(d)
i 0 for i = 1, 2, . . . , r.
  • For Rij 2 R:
G(e) i += (RijGjST ij)+ + Gi(SijGT j GjST ij)− G(d) i += (RijGjST ij)− + Gi(SijGT j GjST ij)+ G(e) j += (RT ijGiSij)+ + Gj(ST ijGT i GiSij)− G(d) j += (RT ijGiSij)− + Gj(ST ijGT i GiSij)+ (10)
  • For t = 1, 2, . . . , maxi ti:
G(e) i += [Θ(t) i ]−Gi for i = 1, 2, . . . , r G(d) i += [Θ(t) i ]+Gi for i = 1, 2, . . . , r (11)
  • Construct G as:
G G Diag( v u u t G(e) 1 G(d) 1 , v u u t G(e) 2 G(d) 2 , . . . , v u u t G(e) r G(d) r ), (12) where denotes the Hadamard product. The p· and · · are entry-wise operations.
slide-39
SLIDE 39

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

Solution: DFMF Algorithm

(8) Input: A set R of relation matrices Rij; constraint matrices Θ(t) for t 2 {1, 2, . . . , maxi ti}; ranks k1, k2, . . . , kr (i, j 2 [r]). Output: Matrix factors S and G. 1) Initialize Gi for i = 1, 2, . . . , r. 2) Repeat until convergence:
  • Construct R and G using their definitions in Eq. (1) and
  • Eq. (3).
  • Update S using:
S (GT G)−1GT RG(GT G)−1.
  • Set G(e)
i 0 for i = 1, 2, . . . , r.
  • Set G(d)
i 0 for i = 1, 2, . . . , r.
  • For Rij 2 R:
G(e) i += (RijGjST ij)+ + Gi(SijGT j GjST ij)− G(d) i += (RijGjST ij)− + Gi(SijGT j GjST ij)+ G(e) j += (RT ijGiSij)+ + Gj(ST ijGT i GiSij)− G(d) j += (RT ijGiSij)− + Gj(ST ijGT i GiSij)+ (10)
  • For t = 1, 2, . . . , maxi ti:
G(e) i += [Θ(t) i ]−Gi for i = 1, 2, . . . , r G(d) i += [Θ(t) i ]+Gi for i = 1, 2, . . . , r (11)
  • Construct G as:
G G Diag( v u u t G(e) 1 G(d) 1 , v u u t G(e) 2 G(d) 2 , . . . , v u u t G(e) r G(d) r ), (12) where denotes the Hadamard product. The p· and · · are entry-wise operations.
slide-40
SLIDE 40

Marinka Zitnik - PhD Thesis

Many Data Matrices

E B D C A F G E B D C A F G

Many shared factors

Solution: DFMF Algorithm

(8) Input: A set R of relation matrices Rij; constraint matrices Θ(t) for t 2 {1, 2, . . . , maxi ti}; ranks k1, k2, . . . , kr (i, j 2 [r]). Output: Matrix factors S and G. 1) Initialize Gi for i = 1, 2, . . . , r. 2) Repeat until convergence:
  • Construct R and G using their definitions in Eq. (1) and
  • Eq. (3).
  • Update S using:
S (GT G)−1GT RG(GT G)−1.
  • Set G(e)
i 0 for i = 1, 2, . . . , r.
  • Set G(d)
i 0 for i = 1, 2, . . . , r.
  • For Rij 2 R:
G(e) i += (RijGjST ij)+ + Gi(SijGT j GjST ij)− G(d) i += (RijGjST ij)− + Gi(SijGT j GjST ij)+ G(e) j += (RT ijGiSij)+ + Gj(ST ijGT i GiSij)− G(d) j += (RT ijGiSij)− + Gj(ST ijGT i GiSij)+ (10)
  • For t = 1, 2, . . . , maxi ti:
G(e) i += [Θ(t) i ]−Gi for i = 1, 2, . . . , r G(d) i += [Θ(t) i ]+Gi for i = 1, 2, . . . , r (11)
  • Construct G as:
G G Diag( v u u t G(e) 1 G(d) 1 , v u u t G(e) 2 G(d) 2 , . . . , v u u t G(e) r G(d) r ), (12) where denotes the Hadamard product. The p· and · · are entry-wise operations.
slide-41
SLIDE 41

Marinka Zitnik - PhD Thesis

Two Case Studies of Collective Matrix Factorization

slide-42
SLIDE 42

Marinka Zitnik - PhD Thesis

#1: Amoeba

slide-43
SLIDE 43

Marinka Zitnik - PhD Thesis

Search for Bacterial Response Genes

genetic screen 50,000 clonal mutants genome found workload estimated 12,000 genes 7 genes 5 years ~200 genes

Gram+ defective: swp1, gpi, nagB1 Gram- defective: clkB, spc3, alyL, nip7

Nasser et al (2013) Curr Biol

slide-44
SLIDE 44

Marinka Zitnik - PhD Thesis

Dictyostelium Bacterial Gene Hunt

Žitnik et al. PLoS Comp Bio 2015

14 data sources 4 Gram- seed genes 9 candidate genes

A data-driven approach

slide-45
SLIDE 45

Marinka Zitnik - PhD Thesis

Dictyostelium Bacterial Gene Hunt

R1,10 Θ1 ABC family

Miranda et al. 2013

Gene Gene Ontology term Phenotype Ontology term PubMed identifier MeSH descriptor

Development

Parikh et al. 2010

Bacterial RNA-seq

Nasser et al. 2013

KEGG pathway Reactome pathway 1 4 8 2 3 5 6 7 9 10 R1,9 R1,8 R1,7 R1,6 R6,5 R6,4 R1,5 R1,4 R1,2 R2,3 R2,4 R5,4

Žitnik et al. PLoS Comp Bio 2015

14 data sources 4 Gram- seed genes 9 candidate genes

A data-driven approach

slide-46
SLIDE 46

Marinka Zitnik - PhD Thesis

Latent Chaining and Profiling

Drugs Dicty genes

Diseases

G1 S1,2 S2,3

1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-47
SLIDE 47

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases

Latent Chaining and Profiling

Drugs Dicty genes

Diseases

G1 S1,2 S2,3

1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-48
SLIDE 48

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases

Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

Drugs Dicty genes

Diseases

G1 S1,2 S2,3

1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-49
SLIDE 49

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

Drugs Dicty genes Diseases G1 S1,2 S2,3 1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-50
SLIDE 50

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

R1,10 Θ1 ABC family

Miranda et al. 2013

Gene Gene Ontology term Phenotype Ontology term PubMed identifier MeSH descriptor

Development

Parikh et al. 2010

Bacterial RNA-seq

Nasser et al. 2013

KEGG pathway Reactome pathway 1 4 8 2 3 5 6 7 9 10 R1,9 R1,8 R1,7 R1,6 R6,5 R6,4 R1,5 R1,4 R1,2 R2,3 R2,4 R5,4

Drugs Dicty genes Diseases G1 S1,2 S2,3 1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-51
SLIDE 51

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

R1,10 Θ1 ABC family

Miranda et al. 2013

Gene Gene Ontology term Phenotype Ontology term PubMed identifier MeSH descriptor

Development

Parikh et al. 2010

Bacterial RNA-seq

Nasser et al. 2013

KEGG pathway Reactome pathway 1 4 8 2 3 5 6 7 9 10 R1,9 R1,8 R1,7 R1,6 R6,5 R6,4 R1,5 R1,4 R1,2 R2,3 R2,4 R5,4

Latent chains

Drugs Dicty genes Diseases G1 S1,2 S2,3 1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-52
SLIDE 52

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

R1,10 Θ1 ABC family Miranda et al. 2013 Gene Gene Ontology term Phenotype Ontology term PubMed identifier MeSH descriptor Development Parikh et al. 2010 Bacterial RNA-seq Nasser et al. 2013 KEGG pathway Reactome pathway 1 4 8 2 3 5 6 7 9 10 R1,9 R1,8 R1,7 R1,6 R6,5 R6,4 R1,5 R1,4 R1,2 R2,3 R2,4 R5,4 Latent chains Drugs Dicty genes Diseases G1 S1,2 S2,3 1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-53
SLIDE 53

Marinka Zitnik - PhD Thesis

Dicty genes Drugs Diseases Dicty genes Diseases

x x

=

Profile matrix

Latent Chaining and Profiling

R1,10 Θ1 ABC family Miranda et al. 2013 Gene Gene Ontology term Phenotype Ontology term PubMed identifier MeSH descriptor Development Parikh et al. 2010 Bacterial RNA-seq Nasser et al. 2013 KEGG pathway Reactome pathway 1 4 8 2 3 5 6 7 9 10 R1,9 R1,8 R1,7 R1,6 R6,5 R6,4 R1,5 R1,4 R1,2 R2,3 R2,4 R5,4 Latent chains

Seed genes Similarity score aggregation Seed genes Similarity scoring Candidate gene Chains i ix ii iv v vi vii viii iii Scored candidate gene

Drugs Dicty genes Diseases G1 S1,2 S2,3 1 2 3

Žitnik et al. PLoS Comp Bio 2015

slide-54
SLIDE 54

Marinka Zitnik - PhD Thesis

Dictyostelium Bacterial Gene Hunt

Žitnik et al. PLoS Comp Bio 2015

slide-55
SLIDE 55

Marinka Zitnik - PhD Thesis

Dictyostelium Bacterial Gene Hunt

cf50-1 smlA acbA pirA rps10 abpC tirA DDB_G0272184 pikB vps46 pikA swp1 ggtA DDB_G0288519 pten DDB_G0288551 tra2 DDB_G0286429 dscA-1 cinC udpB sfbA modA DDB_G0287399 prmt5 sh DDB pt cf ac sm DDB DDB tr si rb DDB pi DDB DG1 ad DDB DD_ ds gdt pi DDB DDB ab Žitnik et al. PLoS Comp Bio 2015

slide-56
SLIDE 56

Marinka Zitnik - PhD Thesis

Dictyostelium Bacterial Gene Hunt

14 data sources 4 Gram- seed genes 9 candidate genes

abpC– modA– cf50-1– tirA– Day 2 # of D. d cells AX4 acbA– smlA– pikA–/pikB– pten– 104 103 102 10 104 103 102 10 Day 3

8/9 predictions correct!

cf50-1 smlA acbA pirA rps10 abpC tirA DDB_G0272184 pikB vps46 pikA swp1 ggtA DDB_G0288519 pten DDB_G0288551 tra2 DDB_G0286429 dscA-1 cinC udpB sfbA modA DDB_G0287399 prmt5 sh DDB pt cf ac sm DDB DDB tr si rb DDB pi DDB DG1 ad DDB DD_ ds gdt pi DDB DDB ab Žitnik et al. PLoS Comp Bio 2015

slide-57
SLIDE 57

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Žitnik & Zupan IEEE TPAMI 2015

slide-58
SLIDE 58

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Žitnik & Zupan IEEE TPAMI 2015

slide-59
SLIDE 59

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Chemical

Θ1

Pharmacologic Action

R12

PMID

R13

Depositor

R14

Substructure Fingerprint

R15

Depositor Category

R46

1 2 3 4 5 6

Žitnik & Zupan IEEE TPAMI 2015

slide-60
SLIDE 60

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Chemical

Θ1

Pharmacologic Action

R12

PMID

R13

Depositor

R14

Substructure Fingerprint

R15

Depositor Category

R46

1 2 3 4 5 6

Žitnik & Zupan IEEE TPAMI 2015

slide-61
SLIDE 61

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Chemical

Θ1

Pharmacologic Action

R12

PMID

R13

Depositor

R14

Substructure Fingerprint

R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811

Žitnik & Zupan IEEE TPAMI 2015

slide-62
SLIDE 62

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Chemical

Θ1

Pharmacologic Action

R12

PMID

R13

Depositor

R14

Substructure Fingerprint

R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819

Žitnik & Zupan IEEE TPAMI 2015

slide-63
SLIDE 63

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6

Chemical

Θ1

Pharmacologic Action

R12

PMID

R13

Depositor

R14

Substructure Fingerprint

R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

slide-64
SLIDE 64

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

slide-65
SLIDE 65

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

slide-66
SLIDE 66

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

Mining disease associations

Žitnik et al Scientific Reports 2013

slide-67
SLIDE 67

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

Mining disease associations

Žitnik et al Scientific Reports 2013

Predicting drug toxicity

Žitnik & Zupan Systems Biomedicine 2014 (CAMDA Award)

slide-68
SLIDE 68

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

Mining disease associations

Žitnik et al Scientific Reports 2013

Predicting drug toxicity

Žitnik & Zupan Systems Biomedicine 2014 (CAMDA Award)

Predicting gene functions

Žitnik & Zupan In PSB 2014

slide-69
SLIDE 69

Marinka Zitnik - PhD Thesis

#2: Functional Genomics

Prediction task DFMF F1 AUC 100 D. discoideum genes 0.799 0.801 1000 D. discoideum genes 0.826 0.823 Whole D. discoideum genome 0.831 0.849 Pharmacologic actions 0.663 0.834

Gene

PMID

R14

Experimental Condition

R13 1

GO Term

R 12

KEGG Pathway

R16 2

MeSH Descriptor

R45 R42 5 6 R62

1 2 3 4 5 6 Chemical

Θ1 Pharmacologic Action R12

PMID

R13

Depositor

R14 Substructure Fingerprint R15

Depositor Category

R46

1 2 3 4 5 6

MKL AUC F1 AUC 0.801 0.781 0.788 0.823 0.787 0.798 0.849 0.800 0.821 0.834 0.639 0.811 RF AUC F1 AUC 0.788 0.761 0.785 0.798 0.767 0.788 0.821 0.782 0.801 0.811 0.643 0.819 tri-SPMF AUC F1 AUC 0.785 0.731 0.724 0.788 0.756 0.741 0.801 0.778 0.787 0.819 0.641 0.810

Žitnik & Zupan IEEE TPAMI 2015

Mining disease associations

Žitnik et al Scientific Reports 2013

Predicting drug toxicity

Žitnik & Zupan Systems Biomedicine 2014 (CAMDA Award)

Predicting gene functions

Žitnik & Zupan In PSB 2014

Predicting cancer survival

Žitnik & Zupan Systems Biomedicine 2015 (CAMDA Award)

slide-70
SLIDE 70

Marinka Zitnik - PhD Thesis

Model parameters

Key Idea: Transfer of Knowledge

slide-71
SLIDE 71

Marinka Zitnik - PhD Thesis

Model parameters

Key Idea: Transfer of Knowledge

slide-72
SLIDE 72

Marinka Zitnik - PhD Thesis

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-73
SLIDE 73

Marinka Zitnik - PhD Thesis

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-74
SLIDE 74

Marinka Zitnik - PhD Thesis

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-75
SLIDE 75

Marinka Zitnik - PhD Thesis

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-76
SLIDE 76

Marinka Zitnik - PhD Thesis

Heterogeneous data domain space

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-77
SLIDE 77

Marinka Zitnik - PhD Thesis

Heterogeneous data domain space

Data view Objects of one type Model parameters

Key Idea: Transfer of Knowledge

slide-78
SLIDE 78

Marinka Zitnik - PhD Thesis

Heterogeneous data domain space

Data view Objects of one type Model parameters

Context jumping in the latent space

Key Idea: Transfer of Knowledge

slide-79
SLIDE 79

Marinka Zitnik - PhD Thesis

Transfer of Knowledge: Another Example

Network Inference from Mixed Data

slide-80
SLIDE 80

Marinka Zitnik - PhD Thesis

slide-81
SLIDE 81

Marinka Zitnik - PhD Thesis

Direct inference

threshold value

slide-82
SLIDE 82

Marinka Zitnik - PhD Thesis

Direct inference

threshold value

Model-based inference

model parameters

slide-83
SLIDE 83

Marinka Zitnik - PhD Thesis

Mixed Data

slide-84
SLIDE 84

Marinka Zitnik - PhD Thesis

RNA-seq count data

count transcripts mapped to genomic locations

Mixed Data

slide-85
SLIDE 85

Marinka Zitnik - PhD Thesis

RNA-seq count data

count transcripts mapped to genomic locations

P

  • i

s s

  • n

N e g a t i v e b i n

  • m

i a l

Mixed Data

slide-86
SLIDE 86

Marinka Zitnik - PhD Thesis

RNA-seq count data

count transcripts mapped to genomic locations

P

  • i

s s

  • n

N e g a t i v e b i n

  • m

i a l

Somatic mutations

No mutation Single base substitution Short indel

Mixed Data

slide-87
SLIDE 87

Marinka Zitnik - PhD Thesis

RNA-seq count data

count transcripts mapped to genomic locations

P

  • i

s s

  • n

N e g a t i v e b i n

  • m

i a l

Somatic mutations

No mutation Single base substitution Short indel

M u l t i n

  • m

i a l

Mixed Data

slide-88
SLIDE 88

Marinka Zitnik - PhD Thesis

Network Inference from Mixed Data

slide-89
SLIDE 89

Marinka Zitnik - PhD Thesis

Network Inference from Mixed Data

is an object of interest

slide-90
SLIDE 90

Marinka Zitnik - PhD Thesis

Nodes Edges

Network Inference from Mixed Data

is an object of interest

slide-91
SLIDE 91

Marinka Zitnik - PhD Thesis

Nodes Edges Object weights

Network Inference from Mixed Data

is an object of interest

slide-92
SLIDE 92

Marinka Zitnik - PhD Thesis

Nodes Edges Object weights Object-object interactions

Network Inference from Mixed Data

is an object of interest

slide-93
SLIDE 93

Marinka Zitnik - PhD Thesis

Objective function

Network Inference from Mixed Data

slide-94
SLIDE 94

Marinka Zitnik - PhD Thesis

Objective function

Network Inference from Mixed Data

Data following distribution

slide-95
SLIDE 95

Marinka Zitnik - PhD Thesis

Objective function

Network Inference from Mixed Data

Data following distribution Data following distribution

slide-96
SLIDE 96

Marinka Zitnik - PhD Thesis

Objective function Latent factor reuse

Network Inference from Mixed Data

Data following distribution Data following distribution

slide-97
SLIDE 97

Marinka Zitnik - PhD Thesis

Network Inference from Mixed Data

slide-98
SLIDE 98

Marinka Zitnik - PhD Thesis

Data Data

Network Inference from Mixed Data

slide-99
SLIDE 99

Marinka Zitnik - PhD Thesis

Data Data

Network Inference from Mixed Data

slide-100
SLIDE 100

Marinka Zitnik - PhD Thesis

Data Data

Network Inference from Mixed Data

slide-101
SLIDE 101

Marinka Zitnik - PhD Thesis

Data

FuseNet

slide-102
SLIDE 102

Marinka Zitnik - PhD Thesis

Model Data

FuseNet

slide-103
SLIDE 103

Marinka Zitnik - PhD Thesis

Model Data

0.0 0.1 0.3 0.4 0.5

Neighborhood of

Network

FuseNet

slide-104
SLIDE 104

Marinka Zitnik - PhD Thesis

Sample 1 Sample 2 Sample 3 Sample 4 452 872 495 124 482 719 56 2 24 726 198 99 348 2 297 348 982 132 376 872 193 239 29 77 144 287 173 346 928 376 660

Poisson Data

slide-105
SLIDE 105

Marinka Zitnik - PhD Thesis

Sample 1 Sample 2 Sample 3 Sample 4 452 872 495 124 482 719 56 2 24 726 198 99 348 2 297 348 982 132 376 872 193 239 29 77 144 287 173 346 928 376 660 Poisson distribution

Poisson Data

slide-106
SLIDE 106

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8

FuseNet - Our method; LPGM - Allen & Liu 2014; NPN-Copula - Liu et al. 2009; log-GLASSO - Gallopin et al 2013; GLASSO - Friedman et al 2007

0.1 0.2 0.3 0.4 B a s e l i n e

Recovery of Poisson Networks

slide-107
SLIDE 107

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8

FuseNet - Our method; LPGM - Allen & Liu 2014; NPN-Copula - Liu et al. 2009; log-GLASSO - Gallopin et al 2013; GLASSO - Friedman et al 2007

0.1 0.2 0.3 0.4 B a s e l i n e

GLASSO Log-GLASSO

Recovery of Poisson Networks

slide-108
SLIDE 108

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8

FuseNet - Our method; LPGM - Allen & Liu 2014; NPN-Copula - Liu et al. 2009; log-GLASSO - Gallopin et al 2013; GLASSO - Friedman et al 2007

0.1 0.2 0.3 0.4 B a s e l i n e

LPGM NPN-Copula GLASSO Log-GLASSO

Recovery of Poisson Networks

slide-109
SLIDE 109

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8

FuseNet - Our method; LPGM - Allen & Liu 2014; NPN-Copula - Liu et al. 2009; log-GLASSO - Gallopin et al 2013; GLASSO - Friedman et al 2007

0.1 0.2 0.3 0.4 B a s e l i n e

FuseNet LPGM NPN-Copula GLASSO Log-GLASSO

Recovery of Poisson Networks

slide-110
SLIDE 110

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8 1.0

Functional Content of Inferred Cancer Networks

Higher score indicates a more informative network Data from International Cancer Genome Consortium, BRCA

slide-111
SLIDE 111

Marinka Zitnik - PhD Thesis

0.0 0.2 0.4 0.6 0.8 1.0

Mutation & RNA-seq

Our method

RNA-seq

Allen & Liu 2014

Mutation

Jalali et al 2011

Functional Content of Inferred Cancer Networks

Higher score indicates a more informative network Data from International Cancer Genome Consortium, BRCA

slide-112
SLIDE 112

Marinka Zitnik - PhD Thesis

Summary of Contributions

slide-113
SLIDE 113

Marinka Zitnik - PhD Thesis

Relation Heterogeneity

Markov network inference for mixed data Epistasis network inference Collective pairwise classification for multi-way data

Z & Z. JMLR 2012; Z & Z. Bioinformatics 2014 (in ISMB 2014); Z & Z. Bioinformatics 2015 (in ISMB 2015); Z & Z. In PSB 2016

slide-114
SLIDE 114

Marinka Zitnik - PhD Thesis

Relation Heterogeneity

Markov network inference for mixed data Epistasis network inference Collective pairwise classification for multi-way data

Z & Z. JMLR 2012; Z & Z. Bioinformatics 2014 (in ISMB 2014); Z & Z. Bioinformatics 2015 (in ISMB 2015); Z & Z. In PSB 2016

Object Heterogeneity

Latent profile chaining

Z et al. PLOS Comp Bio 2015

slide-115
SLIDE 115

Marinka Zitnik - PhD Thesis

Relation Heterogeneity

Markov network inference for mixed data Epistasis network inference Collective pairwise classification for multi-way data

Z & Z. JMLR 2012; Z & Z. Bioinformatics 2014 (in ISMB 2014); Z & Z. Bioinformatics 2015 (in ISMB 2015); Z & Z. In PSB 2016

Dual Heterogeneity

Network guided matrix completion Survival regression by data fusion

Z & Z. Systems Biomedicine 2015; Z & Z. In RECOMB 2014; Z & Z. Journal of Comp Bio 2015

Object Heterogeneity

Latent profile chaining

Z et al. PLOS Comp Bio 2015

slide-116
SLIDE 116

Marinka Zitnik - PhD Thesis

Relation Heterogeneity

Markov network inference for mixed data Epistasis network inference Collective pairwise classification for multi-way data

Z & Z. JMLR 2012; Z & Z. Bioinformatics 2014 (in ISMB 2014); Z & Z. Bioinformatics 2015 (in ISMB 2015); Z & Z. In PSB 2016

Dual Heterogeneity

Network guided matrix completion Survival regression by data fusion

Z & Z. Systems Biomedicine 2015; Z & Z. In RECOMB 2014; Z & Z. Journal of Comp Bio 2015

Object Heterogeneity

Latent profile chaining

Z et al. PLOS Comp Bio 2015

Triple Heterogeneity

collective matrix factorization

Z et al. Scientific Reports 2013; Z & Z. Systems Biomedicine 2014; Z & Z. In PSB 2014; Z & Z. IEEE TPAMI 2015;

slide-117
SLIDE 117

Marinka Zitnik - PhD Thesis

Relation Heterogeneity

Markov network inference for mixed data Epistasis network inference Collective pairwise classification for multi-way data

Z & Z. JMLR 2012; Z & Z. Bioinformatics 2014 (in ISMB 2014); Z & Z. Bioinformatics 2015 (in ISMB 2015); Z & Z. In PSB 2016

Dual Heterogeneity

Network guided matrix completion Survival regression by data fusion

Z & Z. Systems Biomedicine 2015; Z & Z. In RECOMB 2014; Z & Z. Journal of Comp Bio 2015

Object Heterogeneity

Latent profile chaining

Z et al. PLOS Comp Bio 2015

Triple Heterogeneity

collective matrix factorization

Z et al. Scientific Reports 2013; Z & Z. Systems Biomedicine 2014; Z & Z. In PSB 2014; Z & Z. IEEE TPAMI 2015;

Exploring Heterogeneity

Sensitivity estimation using Frechet derivatives

slide-118
SLIDE 118

Marinka Zitnik - PhD Thesis Best poster awards at BC^2 2015 (Basel, Switzerland); RECOMB 2014 (Pittsburgh, PA, USA)

I wonder what's next? All this excitement about data fusion! Gene function prediction, Disease associations, prediction

  • f drug toxicity, Gene

prioritization, cancer networks, disease progression, drug interactions, pharmacogenomics.

slide-119
SLIDE 119

Marinka Zitnik - PhD Thesis

Blaz Zupan

Adam Kuspa Edward Nam Chris Dinh Gad Shaulsky Rafael Rosengarten Mariko Kurasawa Balaji Santhanam Thomas Helleday Jordi C. Puigvert Jure Leskovec Natasa Przulj Vuk Janjic Charles Boone Mojca M. Usaj Uroš Petrovic Petra Kaferle