[PPT] - Learning Polytrees with Constant Number of Roots from Data Jan PowerPoint Presentation

SLIDE 1

Learning Polytrees with Constant Number of Roots from Data

Jan Manuch1,2, Javad Safaei1, Ladislav Stacho2

1. University of British Columbia, Department of Computer Science 2. Simon Fraser University, Department of Mathematics

SLIDE 2

Introduction

The goal is to learn a probabilistic Graphical Model (Direct

Acyclic Graph or DAG) given a dataset that optimizes an

bjective function.
Types of objective functions:

▫ Bayesian Score ▫ Maximum Likelihood (ML) Score

Chickering (1996) [1] has shown that learning optimal

Bayesian DAGs is NP-complete. Similarly, learning minimal ML DAGs is NP-complete.

Minimal ML DAG is a ML DAG with minimum number of

edges.

SLIDE 3

Data Set

Data is a set of m vectors

(Dj, 1 ≤ j ≤ m).

Each vector X has fixed number

n of features (Xi, 1 ≤ i ≤ n).

Each feature can take different

values val(Xi)={v1, v2, … , vmi }.

The value of the - vector and
feature will be denoted as

, .

ML Score of D and DAG
=
Example: m=16, n=4, mi=2

Vector X1 X2 X3 X4

1 1 1 2 1 1 3 1 1 4 1 1 5 1 1 1 1 6 1 1 7 1 1 1 8 1 1 1 9 1 1 10 1 1 1 11 1 1 1 12 1 1 13 14 1 1 15 16 1 1

SLIDE 4

Learning Tree Structures

Chow and Liu (1968) [2] has shown that learning ML trees is

polynomial and can be computed in ( + log ) .

1. Compute mutual information of every two vertices 2. Find the MST (maximum spanning tree) using MI as weights 3. Pick any vertex as the root and orient the edges (it can be shown that the choice does not affect ML score)

Definition. Polytrees are directed graphs with no undirected

loops.

Dasgupta (1999) [3] showed that learning ML polytrees from

data is NP-complete.

We study finding ML polytrees with constant number of roots.

SLIDE 5

Factorization

Definition. The probability for every input vector

given a DAG

is defined as:

= ( = ,|!" = #,)

,

where !" is a set of all parent nodes of in , and #, all of their values in vector

.

is also called factorized form of distribution with respect to .

itself is called empirical distribution and is computed from data:

= ' = ∑ 〈, = ' 〉

+

SLIDE 6

Merging nodes and edges

Definition. Vertices having more than one parent in a

DAG are called merging nodes, and merging edges are all incoming edges to merging nodes.

Proposition 1 (Verma and Pearl 1990 [4]). Two

DAGs with the same skeleton and merging edges, factorize a distributions similarly, i.e., If ̅ = ′ . /01 23 = 23(′) then = ′ , where 23 is a collection of all merging edges.

Proposition 1 helps us to avoid enumerating all
rientations of edges.

SLIDE 7

Learning Polytrees Algorithm

Proposition 2. In a polytree with 4 > 1 number of

roots the following properties hold: 2 ≤ |8ℓ| ≤ 4, ∑ |8ℓ|

: ℓ

= 4 + ; − 1, where ; is the total number of merging nodes, and |8ℓ| = =ℓ + 2 is the number of parents of the ℓ- merging node.

Algorithm for >-root polytree:

1. Generate a set of merging edges respecting Proposition 2. 2. For each selection of merging edges, run MST algorithm but we do not allow components contain more than one merging node. 3. Pick any orientation of undirected edges that does not introduce any new merging node (ok by Proposition 1).

SLIDE 8

Example

◮ Pick a selection of merging edges (n = 7, k = 3)

SLIDE 9

Example

◮ Run modified MST algorithm

SLIDE 10

Example

◮ Run modified MST algorithm

SLIDE 11

Example

◮ Run modified MST algorithm

SLIDE 12

Example

◮ Run modified MST algorithm

SLIDE 13

Example

◮ Run modified MST algorithm

SLIDE 14

Example

◮ Run modified MST algorithm

SLIDE 15

Example

◮ Orient edges in components containing merging nodes

(merging nodes is the root)

SLIDE 16

Example

◮ Orient edges in other components (roots in each component

can be picked arbitrarily)

SLIDE 17

Counting selections of merging edges

Let ?(, 4) be total number of selections of merging

edges in polytrees with nodes and 4 roots: ? , 4 = ∑ ∑

:

@ ABC @ ADC ABCADC⋯CAFG@:@ G@ :

…

@ AHC

≤ ∑ I: 4 − ; − 1, ; :CG@

G@ :

(by Proposition 2) ≤ GC J():@ 4 − 2 ; − 1 =

G@ :

GC 1 + G@ ∈ (LG@L)

SLIDE 18

Algorithm’s Complexity

Enumerates ?(, 4) ∈ (LG@L) merging edge selections, and

for each spends + + time for edge completion,

rientation assignment, and likelihood computation. Hence,

the total complexity of our algorithm is M N0O>@P .

Gasper and et al. [5] introduced 4-branchings as polytrees

that by removing 4 arcs transform to directed forests, and provided an algorithm for learning k-branching working in time M(N0O>CQ)

Proposition. 4-branching is equivalent with learning

polytrees with up to 4 + 1 roots. Our algorithm is by (L) factor faster than the algorithm of Gasper and et al. [5].

SLIDE 19

Experiment: Identification of phosphorylation sites

Peptides are shorts sequences of amino acids. We

consider peptides of length 9 centered at a phosphorylation site (Serine, Threonine, Tyrosine) which is phosphorylated by protein kinases.

Two different peptides datasets is used

▫ 803 peptides that are phosphorylated by protein kinase PKC ▫ 1000 randomly selected peptides that are phosphorylated by some kinase

We learn the maximum likelihood polytrees of two and

three roots.

SLIDE 20

Results

Algorithm Peptides of PKC 1000 Random Peptides Score Time # Trees tested Score Time # Trees tested MST(1 root) = tree

19.15

0.15 1

21.47

0.04 1

Heuristic: MST+(2 roots)

18.14

1.07 9

20.40

1.13 8

MST+(3 roots)

17.26

2.86 23

19.35

2.77 18

Exact: 2 roots

18.02

27.47 252

20.37

35.98 252

3 roots

16.97

2551.50 23184

19.30

3235.38 23184

PKC peptides have higher avg. likelihood score than random peptides as it is

expected and they are more convergent.

The higher number of roots, the better likelihood score as expected.

SLIDE 21

Application?: Predicting peptides structure

If we assume that connected nodes in the learned polytree are close

positions in the peptides in 3D structure, then we could get some information about 3D structure of the peptides:

Tree- Structure learned by MST Tree-Structure learned by 3- root polytree

SLIDE 22

Conclusions

We presented LG@ algorithm for learning

polytrees with n roots and k roots, which improves the algorithm of Gasper and et al. [5] by (L) factor.

Applied this algorithm to predicting peptides

that are phosphorylated (or phosphorylated by a particular kinase).

Is there an FPT algorithm for this problem?

SLIDE 23

References

[1] Chickering, D.M.: Learning Bayesian networks is NP-complete. In:

Learning from data, pp. 121–130. Springer (1996)

[2] Chow, C., Liu, C.: Approximating discrete probability distributions with

dependence trees. IEEE Transactions on Information Theory 14(3), 462– 467 (1968)

[3] Dasgupta, S.: Learning polytrees. In: Uncertainty in Artificial

Intelligence, pp. 134–141 (1999)

[4] Verma, T.S., Pearl, J.: Equivalence and synthesis of causal models. In:

Uncertainty in Artificial Intelligence (UAI). pp. 220–227 (1990)

[5] Gaspers, S., Koivisto, M., Liedloff, M., Ordyniak, S., Szeider, S.: On

finding optimal polytrees. In: Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)

SLIDE 24

Thank you

Any questions?