SLIDE 1
Rank of tensors of l-out-of-k functions: an application in probabilistic inference
Jiˇ r´ ı Vomlel
Institute of Information Theory and Automation (´ UTIA) Academy of Sciences of the Czech Republic
SLIDE 2 Contents
- The computer game of Minesweeper
SLIDE 3 Contents
- The computer game of Minesweeper
- Probabilistic reasoning given evidence
(using a simple example)
SLIDE 4 Contents
- The computer game of Minesweeper
- Probabilistic reasoning given evidence
(using a simple example)
- Improving the computational efficiency
SLIDE 5 Contents
- The computer game of Minesweeper
- Probabilistic reasoning given evidence
(using a simple example)
- Improving the computational efficiency
- Rank-one decomposition of probability tables representing
addition
SLIDE 6 Contents
- The computer game of Minesweeper
- Probabilistic reasoning given evidence
(using a simple example)
- Improving the computational efficiency
- Rank-one decomposition of probability tables representing
addition
SLIDE 7
The game of Minesweeper
SLIDE 8
Bayesian network for the game of Minesweeper
? ? ? ℓ
SLIDE 9
Bayesian network for the game of Minesweeper
? ? ? ℓ Y X1 X3 X2
SLIDE 10 Bayesian network for the game of Minesweeper
? ? ? ℓ Y X1 X3 X2
P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if ℓ = x1 + x2 + x3
SLIDE 11 Bayesian network for the game of Minesweeper
? ? ? ℓ Y X1 X3 X2
P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if ℓ = x1 + x2 + x3
P(Xi) = r s · t − o r is the number of mines, o is the number of observations s, t are the dimensions of the game grid.
SLIDE 12 Bayes rule for updating probabilities
SLIDE 13 Bayes rule for updating probabilities
- Assume we observe Y = ℓ.
- We compute by Bayes rule
P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ)
SLIDE 14 Bayes rule for updating probabilities
- Assume we observe Y = ℓ.
- We compute by Bayes rule
P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3
i=1 P(Xi = xi)
P(Y = ℓ)
SLIDE 15 Bayes rule for updating probabilities
- Assume we observe Y = ℓ.
- We compute by Bayes rule
P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3
i=1 P(Xi = xi)
P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)
SLIDE 16 Bayes rule for updating probabilities
- Assume we observe Y = ℓ.
- We compute by Bayes rule
P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3
i=1 P(Xi = xi)
P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)
- This is a probability table over 3 binary variables X1, X2, X3:
P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if x1 + x2 + x3 = ℓ
SLIDE 17 Bayes rule for updating probabilities
- Assume we observe Y = ℓ.
- We compute by Bayes rule
P(X1 = x1, X2 = x2, X3 = x3|Y = ℓ) = P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) · 3
i=1 P(Xi = xi)
P(Y = ℓ) ∝ P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3)
- This is a probability table over 3 binary variables X1, X2, X3:
P(Y = ℓ|X1 = x1, X2 = x2, X3 = x3) = 1 if x1 + x2 + x3 = ℓ
= ψ(X1 = x1, X2 = x2, X3 = x3) .
SLIDE 18 Tensors of ℓ-out-of-k functions
We can visualize probability table ψ as a tensor (for ℓ = 1): 1
In this talk all tensors are functions from {0, 1}k to real numbers.
SLIDE 19 Tensors of ℓ-out-of-k functions
We can visualize probability table ψ as a tensor (for ℓ = 1): 1
In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:
- ℓ is the observed state of Y and
- k is the number of binary variables - parents of Y.
SLIDE 20 Tensors of ℓ-out-of-k functions
We can visualize probability table ψ as a tensor (for ℓ = 1): 1
In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:
- ℓ is the observed state of Y and
- k is the number of binary variables - parents of Y.
fℓ(x1, . . . , xk) =
if ℓ = k
i=1 xi
SLIDE 21 Tensors of ℓ-out-of-k functions
We can visualize probability table ψ as a tensor (for ℓ = 1): 1
In this talk all tensors are functions from {0, 1}k to real numbers. We are interested in tensors of ℓ-out-of-k functions fℓ(x1, . . . , xk), where:
- ℓ is the observed state of Y and
- k is the number of binary variables - parents of Y.
fℓ(x1, . . . , xk) =
if ℓ = k
i=1 xi
In our example ℓ = 1 and k = 3.
SLIDE 22
Combining information
? ? ? ? ? ? 1
SLIDE 23
Combining information
X1 Y1 X2 X3 X6 Y2 X5 X4
SLIDE 24
Combining information
X1 X3 X6 X5 X4 X2
ξ(X1, . . . , X6) = ψ(X1, . . . , X3) · ϕ(X1, X2, X4, . . . , X6)
SLIDE 25
Combining information
X1 X3 X6 X5 X4 X2
ξ(X1, . . . , X6) = ψ(X1, . . . , X3) · ϕ(X1, X2, X4, . . . , X6) Total table size is 23 + 25 = 8 + 32 = 40.
SLIDE 26
A more efficient way of combining information
X2 X3 X6 X5 X4 X1
ξ(X1, . . . , X6) = ψ1(X1) · . . . · ψ3(X3) ·ϕ1(X1, X2, X4, . . . , X6)
SLIDE 27
A more efficient way of combining information
X2 X3 X6 X5 X4 X1
ξ(X1, . . . , X6) = ψ1(X1) · . . . · ψ3(X3) ·ϕ1(X1, X2, X4, . . . , X6) Total table size is 3 · 2 + 25 = 6 + 32 = 38.
SLIDE 28 An even more efficient way of combining information
X6 X2 X3 X1 X5 X4 B2
ξ(X1, . . . , X6) =
ψ1(X1) · . . . · ψ3(X3) · ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6)
SLIDE 29 An even more efficient way of combining information
X6 X2 X3 X1 X5 X4 B2
ξ(X1, . . . , X6) =
ψ1(X1) · . . . · ψ3(X3) · ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) Since B is binary the total table size is 3·2+5·22 = 6+20 = 26.
SLIDE 30 Tensor rank
We have just seen that ϕ1(X1, X2, X4, . . . , X6) =
ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) .
SLIDE 31 Tensor rank
We have just seen that ϕ1(X1, X2, X4, . . . , X6) =
ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6)
SLIDE 32 Tensor rank
We have just seen that ϕ1(X1, X2, X4, . . . , X6) =
ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6) What is the minimal number of states of a variable B so that it holds that ψ(X1, . . . , Xk) =
k
ψi(B, Xi) ?
SLIDE 33 Tensor rank
We have just seen that ϕ1(X1, X2, X4, . . . , X6) =
ϕ1(B2, X1) · ϕ2(B2, X2) · ϕ4(B2, X4) . . . ϕ6(B2, X6) . But there is no way we can write ϕ1(X1, X2, X4, . . . , X6) = ϕ1(X1) · ϕ2(X2) · ϕ4(X4) . . . ϕ6(X6) What is the minimal number of states of a variable B so that it holds that ψ(X1, . . . , Xk) =
k
ψi(B, Xi) ? This number is called the rank of tensor ψ.
SLIDE 34 Symmetric rank of tensors of ℓ-out-of-k functions
- Generally, finding the rank of a tensor is NP-hard.
SLIDE 35 Symmetric rank of tensors of ℓ-out-of-k functions
- Generally, finding the rank of a tensor is NP-hard.
- However, tensors of ℓ-out-of-k functions define a restricted
class of tensors.
SLIDE 36 Symmetric rank of tensors of ℓ-out-of-k functions
- Generally, finding the rank of a tensor is NP-hard.
- However, tensors of ℓ-out-of-k functions define a restricted
class of tensors.
- These tensors are all symmetric. A tensor ψ is symmetric if
ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.
SLIDE 37 Symmetric rank of tensors of ℓ-out-of-k functions
- Generally, finding the rank of a tensor is NP-hard.
- However, tensors of ℓ-out-of-k functions define a restricted
class of tensors.
- These tensors are all symmetric. A tensor ψ is symmetric if
ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.
- The symmetric rank of tensor ψ is the minimum number of
symmetric tensors of rank one that sum up to ψ.
SLIDE 38 Symmetric rank of tensors of ℓ-out-of-k functions
- Generally, finding the rank of a tensor is NP-hard.
- However, tensors of ℓ-out-of-k functions define a restricted
class of tensors.
- These tensors are all symmetric. A tensor ψ is symmetric if
ψ(X1 = x1, . . . , Xk = xk) = ax1+...+xk where a = (a0, . . . , ak) is a vector of real numbers.
- The symmetric rank of tensor ψ is the minimum number of
symmetric tensors of rank one that sum up to ψ.
Theorem
The symmetric rank of a tensor representing an ℓ-out-of-k function (for 0 < ℓ < k) is at least max{ℓ + 1, k − ℓ}.
SLIDE 39
Border rank of tensors of ℓ-out-of-k functions
Definition (Border rank)
The border rank of a tensor A is min{r : ∀ε > 0 ∃ tensor E : ||E|| < ε, rank(A + E) = r} , where || · || is any norm.
SLIDE 40
Border rank of tensors of ℓ-out-of-k functions
Definition (Border rank)
The border rank of a tensor A is min{r : ∀ε > 0 ∃ tensor E : ||E|| < ε, rank(A + E) = r} , where || · || is any norm.
Theorem (Upper bound of the border rank)
The border rank of a tensor A(ℓ, k) representing an ℓ-out-of-k function is at most min{ℓ + 1, k − ℓ + 1}.
SLIDE 41 Tensor approximations
Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:
- of the same order and the same dimensions
SLIDE 42 Tensor approximations
Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:
- of the same order and the same dimensions
- having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
SLIDE 43 Tensor approximations
Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:
- of the same order and the same dimensions
- having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
- that is a good approximation of the original tensor.
SLIDE 44 Tensor approximations
Given a symmetric tensor representing an ℓ-out-of-k function our goal is to find another symmetric tensor:
- of the same order and the same dimensions
- having symmetric rank at most r = min{ℓ + 1, k − ℓ + 1}
- that is a good approximation of the original tensor.
We used a kind of stochastic hill-clibing algorithm.
SLIDE 45 Tensor approximations - example
The tensor for 1
SLIDE 46 Tensor approximations - example
The tensor for 1
∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75))
SLIDE 47 Tensor approximations - example
The tensor for 1
∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75)) + 1.19 exp(−13.90)(1, exp(−13.90)) ⊗ . . . ⊗ (1, exp(−13.90))
SLIDE 48 Tensor approximations - example
The tensor for 1
∼ −0.19 exp(−15.75)(1, exp(−15.75)) ⊗ . . . ⊗ (1, exp(−15.75)) + 1.19 exp(−13.90)(1, exp(−13.90)) ⊗ . . . ⊗ (1, exp(−13.90)) = 2.33 · 10−10 1.0
1.07 · 10−6
1.07 · 10−6
9.96 · 10−13
SLIDE 49
Tensor with noisy inputs
In the real world there is usually a noise that modifies functional relations between variables.
SLIDE 50
Tensor with noisy inputs
In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that
SLIDE 51 Tensor with noisy inputs
In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =
Aj1,j2,...,jk(ℓ, k) ·
k
Min,jn(p, q) ,
SLIDE 52 Tensor with noisy inputs
In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =
Aj1,j2,...,jk(ℓ, k) ·
k
Min,jn(p, q) , where Aj1,j2,...,jk(ℓ, k) represents the (exact) ℓ-out-of-k function,
SLIDE 53 Tensor with noisy inputs
In the real world there is usually a noise that modifies functional relations between variables. Tensor N(ℓ, k, p, q) represents an ℓ-out-of-k function with noisy inputs if it holds for (i1, . . . , ik) ∈ {0, 1}k that N(ℓ, k, p, q)i1,i2,...,ik =
Aj1,j2,...,jk(ℓ, k) ·
k
Min,jn(p, q) , where Aj1,j2,...,jk(ℓ, k) represents the (exact) ℓ-out-of-k function, Min,jn(p, q) are elements of matrix M(p, q) defined by Min,jn(p, q) = q if jn = 0 and in = 0 1 − q if jn = 1 and in = 0 1 − p if jn = 0 and in = 1 p if jn = 1 and in = 1 0 < p 1, 0 < q 1 are parameters of the input noise.
SLIDE 54
Experiments
We performed experiments with the game of Minesweeper for the 20 × 20 grid size.
SLIDE 55
Experiments
We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game.
SLIDE 56 Experiments
We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network
- 1. the standard method consisting of moralization and
triangulation steps and
SLIDE 57 Experiments
We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network
- 1. the standard method consisting of moralization and
triangulation steps and
- 2. the tensor rank-one decomposition applied to CPTs with
number of parents higher than three (for CPTs with less than four parents we used the moralization) followed by the triangulation step.
SLIDE 58 Experiments
We performed experiments with the game of Minesweeper for the 20 × 20 grid size. We used a random selection of fields to be played and we assumed we never hit any of fifty mines during the game. At each of 350 steps of the game we created a Bayesian network
- 1. the standard method consisting of moralization and
triangulation steps and
- 2. the tensor rank-one decomposition applied to CPTs with
number of parents higher than three (for CPTs with less than four parents we used the moralization) followed by the triangulation step. In both networks we then used the lazy propagation method of Madsen and Jensen with the computations performed with lists of tables over the junction trees.
SLIDE 59 Results of experiments
Numerical experiments reveal that we can get a gain in the order
- f two magnitudes but at the expense of a certain loss of precision.
See Figure.
50 100 150 200 250 300 350 1 2 3 4 5 6 the number of observations log10 of avg. of max. table size
rank one decomposition the standard technique
50 100 150 200 250 300 350 0.000 0.004 0.008 0.012 the number of observations