Rank of tensors of l-out-of-k functions: an application in - - PowerPoint PPT Presentation

rank of tensors of l out of k functions an application in
SMART_READER_LITE
LIVE PREVIEW

Rank of tensors of l-out-of-k functions: an application in - - PowerPoint PPT Presentation

Rank of tensors of l-out-of-k functions: an application in probabilistic inference Ji r Vomlel Institute of Information Theory and Automation ( UTIA) Academy of Sciences of the Czech Republic Contents The computer game of


slide-1
SLIDE 1

Rank of tensors of l-out-of-k functions: an application in probabilistic inference

Jiˇ r´ ı Vomlel

Institute of Information Theory and Automation (´ UTIA) Academy of Sciences of the Czech Republic

slide-2
SLIDE 2

Contents

  • The computer game of Minesweeper
slide-3
SLIDE 3

Contents

  • The computer game of Minesweeper
  • Probabilistic reasoning given evidence

(using a simple example)

slide-4
SLIDE 4

Contents

  • The computer game of Minesweeper
  • Probabilistic reasoning given evidence

(using a simple example)

  • Improving the computational efficiency
slide-5
SLIDE 5

Contents

  • The computer game of Minesweeper
  • Probabilistic reasoning given evidence

(using a simple example)

  • Improving the computational efficiency
  • Rank-one decomposition of probability tables representing

addition

slide-6
SLIDE 6

Contents

  • The computer game of Minesweeper
  • Probabilistic reasoning given evidence

(using a simple example)

  • Improving the computational efficiency
  • Rank-one decomposition of probability tables representing

addition

  • Results of experiments
slide-7
SLIDE 7

The game of Minesweeper

slide-8
SLIDE 8

Bayesian network for the game of Minesweeper

? ? ? ℓ

slide-9
SLIDE 9

Bayesian network for the game of Minesweeper

? ? ? ℓ Y X1 X3 X2

slide-10
SLIDE 10

Bayesian network for the game of Minesweeper

? ? ? ℓ Y X1 X3 X2

P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if ℓ = x1 + x2 + x3

  • therwise.
slide-11
SLIDE 11

Bayesian network for the game of Minesweeper

? ? ? ℓ Y X1 X3 X2

P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if ℓ = x1 + x2 + x3

  • therwise.

P(Xi) = r s · t − o r is the number of mines, o is the number of observations s, t are the dimensions of the game grid.

slide-12
SLIDE 12

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
slide-13
SLIDE 13

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
  • We compute by Bayes rule

P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ)

slide-14
SLIDE 14

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
  • We compute by Bayes rule

P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3

i=1 P(Xi = xi)

P(Y = ℓ)

slide-15
SLIDE 15

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
  • We compute by Bayes rule

P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3

i=1 P(Xi = xi)

P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)

slide-16
SLIDE 16

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
  • We compute by Bayes rule

P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3

i=1 P(Xi = xi)

P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)

  • This is a probability table over 3 binary variables X1, X2, X3:

P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if x1 + x2 + x3 = ℓ

  • therwise.
slide-17
SLIDE 17

Bayes rule for updating probabilities

  • Assume we observe Y = ℓ.
  • We compute by Bayes rule

P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3

i=1 P(Xi = xi)

P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)

  • This is a probability table over 3 binary variables X1, X2, X3:

P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if x1 + x2 + x3 = ℓ

  • therwise.

= ψ(X1 = x1, X2 = x2, X3 = x3) .

slide-18
SLIDE 18

Tensors of ℓ-out-of-k functions

We can visualize probability table ψ as a tensor (for ℓ = 1):      1

  • 1
  • 1

    In this talk all tensors are functions from {0, 1}k to real numbers.

slide-19
SLIDE 19

Tensors of ℓ-out-of-k functions

We can visualize probability table ψ as a tensor (for ℓ = 1):      1

  • 1
  • 1

    In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:

  • ℓ is the observed state of Y and
  • k is the number of binary variables - parents of Y.
slide-20
SLIDE 20

Tensors of ℓ-out-of-k functions

We can visualize probability table ψ as a tensor (for ℓ = 1):      1

  • 1
  • 1

    In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:

  • ℓ is the observed state of Y and
  • k is the number of binary variables - parents of Y.

fℓ(x1, . . . , xk) =

  • 1

if ℓ = k

i=1 xi

  • therwise.
slide-21
SLIDE 21

Tensors of ℓ-out-of-k functions

We can visualize probability table ψ as a tensor (for ℓ = 1):      1

  • 1
  • 1

    In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:

  • ℓ is the observed state of Y and
  • k is the number of binary variables - parents of Y.

fℓ(x1, . . . , xk) =

  • 1

if ℓ = k

i=1 xi

  • therwise.

In our example ℓ = 1 and k = 3.

slide-22
SLIDE 22

Combining information

? ? ? ? ? ? 1

slide-23
SLIDE 23

Combining information

X1 Y1 X2 X3 X6 Y2 X5 X4

slide-24
SLIDE 24

Combining information

X1 X3 X6 X5 X4 X2

ξ(X1, . . . , X6) = ψ(X1, . . . , X3) · ϕ(X1, X2, X4, . . . , X6)

slide-25
SLIDE 25

Combining information

X1 X3 X6 X5 X4 X2

ξ(X1, . . . , X6) = ψ(X1, . . . , X3) · ϕ(X1, X2, X4, . . . , X6) Total table size is 23 + 25 = 8 + 32 = 40.

slide-26
SLIDE 26

A more efficient way of combining information

X2 X3 X6 X5 X4 X1

ξ(X1, . . . , X6) = ψ1(X1) · . . . · ψ3(X3) ·ϕ1(X1, X2, X4, . . . , X6)

slide-27
SLIDE 27

A more efficient way of combining information

X2 X3 X6 X5 X4 X1

ξ(X1, . . . , X6) = ψ1(X1) · . . . · ψ3(X3) ·ϕ1(X1, X2, X4, . . . , X6) Total table size is 3 · 2 + 25 = 6 + 32 = 38.

slide-28
SLIDE 28

An even more efficient way of combining information

X6 X2 X3 X1 X5 X4 B2

ξ(X1, . . . , X6) =

  • B2

ψ1(X1) · . . . · ψ3(X3) · ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6)

slide-29
SLIDE 29

An even more efficient way of combining information

X6 X2 X3 X1 X5 X4 B2

ξ(X1, . . . , X6) =

  • B2

ψ1(X1) · . . . · ψ3(X3) · ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) Since B is binary the total table size is 3·2+5·22 = 6+20 = 26.

slide-30
SLIDE 30

Tensor rank

We have just seen that ϕ1(X1, X2, X4, . . . , X6) =

  • B2

ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) .

slide-31
SLIDE 31

Tensor rank

We have just seen that ϕ1(X1, X2, X4, . . . , X6) =

  • B2

ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6)

slide-32
SLIDE 32

Tensor rank

We have just seen that ϕ1(X1, X2, X4, . . . , X6) =

  • B2

ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6) What is the minimal number of states of a variable B so that it holds that ψ(X1, . . . , Xk) =

  • B

k

  • i=1

ψi(B, Xi) ?

slide-33
SLIDE 33

Tensor rank

We have just seen that ϕ1(X1, X2, X4, . . . , X6) =

  • B2

ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6) What is the minimal number of states of a variable B so that it holds that ψ(X1, . . . , Xk) =

  • B

k

  • i=1

ψi(B, Xi) ? This number is called the rank of tensor ψ.

slide-34
SLIDE 34

Symmetric rank of tensors of ℓ-out-of-k functions

  • Generally, finding the rank of a tensor is NP-hard.
slide-35
SLIDE 35

Symmetric rank of tensors of ℓ-out-of-k functions

  • Generally, finding the rank of a tensor is NP-hard.
  • However, tensors of ℓ-out-of-k functions define a restricted

class of tensors.

slide-36
SLIDE 36

Symmetric rank of tensors of ℓ-out-of-k functions

  • Generally, finding the rank of a tensor is NP-hard.
  • However, tensors of ℓ-out-of-k functions define a restricted

class of tensors.

  • These tensors are all symmetric. A tensor ψ is symmetric if

ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.

slide-37
SLIDE 37

Symmetric rank of tensors of ℓ-out-of-k functions

  • Generally, finding the rank of a tensor is NP-hard.
  • However, tensors of ℓ-out-of-k functions define a restricted

class of tensors.

  • These tensors are all symmetric. A tensor ψ is symmetric if

ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.

  • The symmetric rank of tensor ψ is the minimum number of

symmetric tensors of rank one that sum up to ψ.

slide-38
SLIDE 38

Symmetric rank of tensors of ℓ-out-of-k functions

  • Generally, finding the rank of a tensor is NP-hard.
  • However, tensors of ℓ-out-of-k functions define a restricted

class of tensors.

  • These tensors are all symmetric. A tensor ψ is symmetric if

ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.

  • The symmetric rank of tensor ψ is the minimum number of

symmetric tensors of rank one that sum up to ψ.

Theorem

The symmetric rank of a tensor representing an ℓ-out-of-k function (for 0 < ℓ < k) is at least max{ℓ + 1, k − ℓ}.

slide-39
SLIDE 39

Border rank of tensors of ℓ-out-of-k functions

Definition (Border rank)

The border rank of a tensor A is min{r : ∀ε > 0 ∃ tensor E : ||E|| < ε, rank(A + E) = r} , where || · || is any norm.

slide-40
SLIDE 40

Border rank of tensors of ℓ-out-of-k functions

Definition (Border rank)

The border rank of a tensor A is min{r : ∀ε > 0 ∃ tensor E : ||E|| < ε, rank(A + E) = r} , where || · || is any norm.

Theorem (Upper bound of the border rank)

The border rank of a tensor A(ℓ, k) representing an ℓ-out-of-k function is at most min{ℓ + 1, k − ℓ + 1}.

slide-41
SLIDE 41

Tensor approximations

Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:

  • of the same order and the same dimensions
slide-42
SLIDE 42

Tensor approximations

Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:

  • of the same order and the same dimensions
  • having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
slide-43
SLIDE 43

Tensor approximations

Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:

  • of the same order and the same dimensions
  • having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
  • that is a good approximation of the original tensor.
slide-44
SLIDE 44

Tensor approximations

Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:

  • of the same order and the same dimensions
  • having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
  • that is a good approximation of the original tensor.

We used a kind of stochastic hill-clibing algorithm.

slide-45
SLIDE 45

Tensor approximations - example

The tensor for      1

  • 1
  • 1

   

slide-46
SLIDE 46

Tensor approximations - example

The tensor for      1

  • 1
  • 1

    ∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75))

slide-47
SLIDE 47

Tensor approximations - example

The tensor for      1

  • 1
  • 1

    ∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75)) + 1.19 exp(−13.90)(1, exp(−13.90)) ⊗ . . . ⊗ (1, exp(−13.90))

slide-48
SLIDE 48

Tensor approximations - example

The tensor for      1

  • 1
  • 1

    ∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75)) + 1.19 exp(−13.90)(1, exp(−13.90)) ⊗ . . . ⊗ (1, exp(−13.90)) =      2.33 · 10−10 1.0

  • 1.0

1.07 · 10−6

  • 1.0

1.07 · 10−6

  • 1.07 · 10−6

9.96 · 10−13

   

slide-49
SLIDE 49

Tensor with noisy inputs

In the real world there is usually a noise that modifies functional relations between variables.

slide-50
SLIDE 50

Tensor with noisy inputs

In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that

slide-51
SLIDE 51

Tensor with noisy inputs

In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =

  • (j1,j2,...,jk)∈{0,1}k

Aj1,j2,...,jk(ℓ, k) ·

k

  • n=1

Min,jn(p, q) ,

slide-52
SLIDE 52

Tensor with noisy inputs

In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =

  • (j1,j2,...,jk)∈{0,1}k

Aj1,j2,...,jk(ℓ, k) ·

k

  • n=1

Min,jn(p, q) , where Aj1,j2,...,jk(ℓ, k) represents the (exact) ℓ-out-of-k function,

slide-53
SLIDE 53

Tensor with noisy inputs

In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =

  • (j1,j2,...,jk)∈{0,1}k

Aj1,j2,...,jk(ℓ, k) ·

k

  • n=1

Min,jn(p, q) , where Aj1,j2,...,jk(ℓ, k) represents the (exact) ℓ-out-of-k function, Min,jn(p, q) are elements of matrix M(p, q) defined by Min,jn(p, q) =        q if jn = 0 and in = 0 1 − q if jn = 1 and in = 0 1 − p if jn = 0 and in = 1 p if jn = 1 and in = 1 0 < p 1, 0 < q 1 are parameters of the input noise.

slide-54
SLIDE 54

Experiments

We performed experiments with the game of Minesweeper for the 20 × 20 grid size.

slide-55
SLIDE 55

Experiments

We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game.

slide-56
SLIDE 56

Experiments

We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network

  • 1. the standard method consisting of moralization and

triangulation steps and

slide-57
SLIDE 57

Experiments

We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network

  • 1. the standard method consisting of moralization and

triangulation steps and

  • 2. the tensor rank-one decomposition applied to CPTs with

number of parents higher than three (for CPTs with less than four parents we used the moralization) followed by the triangulation step.

slide-58
SLIDE 58

Experiments

We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network

  • 1. the standard method consisting of moralization and

triangulation steps and

  • 2. the tensor rank-one decomposition applied to CPTs with

number of parents higher than three (for CPTs with less than four parents we used the moralization) followed by the triangulation step. In both networks we then used the lazy propagation method of Madsen and Jensen with the computations performed with lists of tables over the junction trees.

slide-59
SLIDE 59

Results of experiments

Numerical experiments reveal that we can get a gain in the order

  • f two magnitudes but at the expense of a certain loss of precision.

See Figure.

50 100 150 200 250 300 350 1 2 3 4 5 6 the number of observations log10 of avg. of max. table size

rank one decomposition the standard technique

50 100 150 200 250 300 350 0.000 0.004 0.008 0.012 the number of observations

  • avg. med. error