Foundations of Causal Discovery
Frederick Eberhardt
KDD Causality Workshop 2016
Foundations of Causal Discovery Frederick Eberhardt KDD Causality - - PowerPoint PPT Presentation
Foundations of Causal Discovery Frederick Eberhardt KDD Causality Workshop 2016 Causal Discovery data sample x y z w samples 2 Causal Discovery assumptions, e.g. causal Markov causal faithfulness functional form etc.
Frederick Eberhardt
KDD Causality Workshop 2016
2
data sample
x y z w
samples
2
data sample inference algorithm
assumptions, e.g.
x y z w
samples
2
data sample
x y
z
w x y
z
w
equivalence classes inference algorithm
assumptions, e.g.
x y z w
samples
2
data sample
x y
z
w x y
z
w
equivalence classes 0 0 ? a 0 0 0 0 0 0 0 0 b ? ? 0
x y z w x y
z
w
0 0 ? ? 0 0 0 ? ? 0 0 0 ? ? 0 0
x y z w x y
z
w
direct edges confounders
model specifications
inference algorithm
assumptions, e.g.
x y z w
samples
2
data sample
x y
z
w x y
z
w
equivalence classes 0 0 ? a 0 0 0 0 0 0 0 0 b ? ? 0
x y z w x y
z
w
0 0 ? ? 0 0 0 ? ? 0 0 0 ? ? 0 0
x y z w x y
z
w
direct edges confounders
model specifications
inference algorithm
assumptions, e.g.
x y z w
samples
x y
z
w
truth (unknown)
2
data sample
x y
z
w x y
z
w
equivalence classes 0 0 ? a 0 0 0 0 0 0 0 0 b ? ? 0
x y z w x y
z
w
0 0 ? ? 0 0 0 ? ? 0 0 0 ? ? 0 0
x y z w x y
z
w
direct edges confounders
model specifications
inference algorithm
assumptions, e.g.
x y z w
samples
x y
z
w
truth (unknown)
3
x y
z
w
truth (unknown)
x y
z
w x y
z
w
equivalence classes
3
x y
z
w
truth (unknown) data sample
x y z w
samples
x y
z
w x y
z
w
equivalence classes
3
x y
z
w
truth (unknown) data sample
x y z w
samples
x y
z
w x y
z
w
equivalence classes constraints statistical inference
x ⊥ ⊥ y | {z, w}
probabilistic independence
3
x y
z
w
truth (unknown) data sample
x y z w
samples
x y
z
w x y
z
w
equivalence classes constraints statistical inference
x ⊥ ⊥ y | {z, w}
probabilistic independence
x ⊥ y | {z, w}
conditions
graphical connection
3
x y
z
w
truth (unknown) data sample
x y z w
samples
d-separation
x y
z
w x y
z
w
equivalence classes constraints statistical inference
x ⊥ ⊥ y | {z, w}
probabilistic independence
x ⊥ y | {z, w}
conditions
graphical connection
3
x y
z
w
truth (unknown) data sample
x y z w
samples
d-separation
y z w x y z w
equivalence classes constraints statistical inference
x ⊥ ⊥ y | {z, w}
probabilistic independence
x ⊥ y | {z, w}
conditions
graphical connection
3
x y
z
w
truth (unknown) data sample
x y z w
samples
d-separation distribution
P(W, X, Y, Z)
y z w x y z w
equivalence classes constraints statistical inference
x ⊥ ⊥ y | {z, w}
probabilistic independence
x ⊥ y | {z, w}
conditions
graphical connection
4
x is independent of its non-descendents given its parents in the causal graph x
y z
w
v
u
4
x is independent of its non-descendents given its parents in the causal graph x
y z
w
v
u Violations of Causal Markov
5
If x is independent of y given C in the probability distribution then x is d-separated from y given C in the graph.
5
If x is independent of y given C in the probability distribution then x is d-separated from y given C in the graph. Violations of Causal Faithfulness
x
y z
a
b −ab
x y z
x-or
dependence to causal connection
6
dependence to causal connection
independence to causal separation
6
x y
z
dependence to causal connection
independence to causal separation
6
x y
z
x y
z l2 l1
dependence to causal connection
independence to causal separation
6
x y
z
x y
z l2 l1
x y
z
dependence to causal connection
independence to causal separation
7
All graphs in an equivalence class have:
[Verma & Pearl 1990, Frydenberg 1990]
dependence to causal connection
independence to causal separation
7
x
y
z
unshielded collider
All graphs in an equivalence class have:
[Verma & Pearl 1990, Frydenberg 1990]
assumptions equivalence class
x y
z
x y
z
x
y z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z
x
y
z
x
y
z x
y
z
x
y z x y z x y z
x
y
z
x y
z x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
assumptions equivalence class
x y
z
x y
z
x
y z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z
x
y
z
x
y
z x
y
z
x
y z x y z x y z
x
y
z
x y
z x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
assumptions equivalence class
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
sufficient to determine the equivalence class, in this case, a unique causal graph
x y
z
x
y z x y
z
x y
z
x y
z
x y
z
x
y
z x
y
z
x
y z x y
z
x
y z x y
z
x
y
z
x
y
z
x y
z
x y
z x y
z
x y z
x y
z
x
y z x
y z
x
y
z
x y
z
x y
z x y z
sufficient to determine the equivalence class, in this case, a unique causal graph
For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete.
(Pearl & Geiger 1988, Meek 1995)
10
10
assumptions can be adjusted and what equivalence class results)
10
assumptions can be adjusted and what equivalence class results)
10
Zhalama talk Tank talk
11
For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete.
(Pearl & Geiger 1988, Meek 1995)
12
xi = X
xj∈Pa(xi)
ijxj + ✏j
[Shimizu et al., 2006]
from the joint distribution.
12
xi = X
xj∈Pa(xi)
ijxj + ✏j
✏j ∼
[Shimizu et al., 2006]
13
x y
✏y y = x + ✏y ✏x
True model
13
x y
✏y x ⊥ ⊥ ✏y y = x + ✏y ✏x
True model
13
x y
✏y x ⊥ ⊥ ✏y y = x + ✏y ✏x
True model
x y
Backwards model
x = ✓y + ˜ ✏x ˜ ✏x ˜ ✏y
13
x y
✏y x ⊥ ⊥ ✏y y = x + ✏y ✏x
True model
x y
Backwards model
x = ✓y + ˜ ✏x ˜ ✏x ˜ ✏y y ⊥ ⊥ ˜ ✏x
13
x y
✏y x ⊥ ⊥ ✏y y = x + ✏y ✏x
True model
x y
Backwards model
x = ✓y + ˜ ✏x ˜ ✏x ˜ ✏y y ⊥ ⊥ ˜ ✏x
˜ ✏x = x − ✓y = x − ✓(x + ✏y) = (1 − ✓)x − ✓✏y
13
x y
✏y x ⊥ ⊥ ✏y y = x + ✏y ✏x
True model
x y
Backwards model
x = ✓y + ˜ ✏x ˜ ✏x ˜ ✏y y ⊥ ⊥ ˜ ✏x
˜ ✏x = x − ✓y = x − ✓(x + ✏y) = (1 − ✓)x − ✓✏y
14
y = x + ✏y
Forwards model For backwards model
✏y
x y
✏x
?
˜ ✏x = (1 − ✓)x − ✓✏y
14
y = x + ✏y
Theorem 1 (Darmois-Skitovich) Let X1, . . . , Xn be independent, non-degenerate random variables. If for two linear combinations l1 = a1X1 + . . . + anXn, ai 6= 0 l2 = b1X1 + . . . + bnXn, bi 6= 0 are independent, then each Xi is normally distributed. Forwards model For backwards model
✏y
x y
✏x
?
˜ ✏x = (1 − ✓)x − ✓✏y
15
algorithm/ assumption Markov faithfulness causal sufficiency acyclicity parametric assumption
15
algorithm/ assumption Markov faithfulness causal sufficiency acyclicity parametric assumption
PC / GES ✓ ✓ ✓ ✓ ✗ Markov equivalence FCI ✓ ✓ ✗ ✓ ✗ PAG CCD ✓ ✓ ✓ ✗ ✗ PAG
15
algorithm/ assumption Markov faithfulness causal sufficiency acyclicity parametric assumption
PC / GES ✓ ✓ ✓ ✓ ✗ Markov equivalence FCI ✓ ✓ ✗ ✓ ✗ PAG CCD ✓ ✓ ✓ ✗ ✗ PAG LiNGaM ✓ ✗ ✓ ✓ linear non- Gaussian unique DAG lvLiNGaM ✓ ✓ ✗ ✓ linear non- Gaussian set of DAGs cyclic LiNGaM ✓ ~ ✓ ✗ linear non- Gaussian set of graphs
16
For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete.
(Pearl & Geiger 1988, Meek 1995)
17
For linear Gaussian and for multinomial causal relations, an algorithm that identifies the Markov equivalence class of the true model is complete.
(Pearl & Geiger 1988, Meek 1995)
18
5
5
5
3
a b c
p(y | x) p(x | y)
x y y x
y = x + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
18
5
5
5
3
a b c
p(y | x) p(x | y)
x y y x
y = x + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
19
xj = fj(pa(xj)) + ✏j
identifiability
19
xj = fj(pa(xj)) + ✏j fj(.)
identifiability
19
xj = fj(pa(xj)) + ✏j fj(.) fj(.)
identifiability
structure represented by this class of models identifiable?
19
xj = fj(pa(xj)) + ✏j fj(.) fj(.)
20
5
5
5
3
d e f
p(y | x) p(x | y)
y x y x
y = x + x3 + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
x = g(y) + ˜ ✏x
y \ ⊥ ⊥˜ ✏x
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
20
5
5
5
3
d e f
p(y | x) p(x | y)
y x y x
y = x + x3 + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
x = g(y) + ˜ ✏x
y \ ⊥ ⊥˜ ✏x
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
20
5
5
5
3
d e f
p(y | x) p(x | y)
y x y x
y = x + x3 + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
x = g(y) + ˜ ✏x
y \ ⊥ ⊥˜ ✏x
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
20
5
5
5
3
d e f
p(y | x) p(x | y)
y x y x
y = x + x3 + ✏y x = ✏x
✏x, ✏y ∼ indep. Gaussian
x = g(y) + ˜ ✏x
y \ ⊥ ⊥˜ ✏x
True model
(graphics from Hoyer et al. 2009)
Forwards (true) model Backwards model
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
21
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
satisfies HC is linearity, otherwise the model is identifiable.
21
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
satisfies HC is linearity, otherwise the model is identifiable.
functions that satisfy HC, but in general identifiability is guaranteed.
21
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
satisfies HC is linearity, otherwise the model is identifiable.
functions that satisfy HC, but in general identifiability is guaranteed.
21
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
satisfies HC is linearity, otherwise the model is identifiable.
functions that satisfy HC, but in general identifiability is guaranteed.
21
Hoyer Condition (HC): Technical condition on the relation between the function, the noise distribution and the parent distribution that, if satisfied, permits a backward model.
satisfies HC is linearity, otherwise the model is identifiable.
functions that satisfy HC, but in general identifiability is guaranteed.
can’t fit a linear backwards model (Lingam), but there are cases where one can fit a non-linear backwards model
21
22
algorithm/ assumptions Markov faithfulness causal sufficiency acyclicity parametric assumption
PC / GES ✓ ✓ ✓ ✓ ✗ Markov equivalence FCI ✓ ✓ ✗ ✓ ✗ PAG CCD ✓ ✓ ✓ ✗ ✗ PAG LiNGaM ✓ ✗ ✓ ✓ linear non- Gaussian unique DAG lvLiNGaM ✓ ✓ ✗ ✓ linear non- Gaussian set of DAGs cyclic LiNGaM ✓ ~ ✓ ✗ linear non- Gaussian set of graphs
22
algorithm/ assumptions Markov faithfulness causal sufficiency acyclicity parametric assumption
PC / GES ✓ ✓ ✓ ✓ ✗ Markov equivalence FCI ✓ ✓ ✗ ✓ ✗ PAG CCD ✓ ✓ ✓ ✗ ✗ PAG LiNGaM ✓ ✗ ✓ ✓ linear non- Gaussian unique DAG lvLiNGaM ✓ ✓ ✗ ✓ linear non- Gaussian set of DAGs cyclic LiNGaM ✓ ~ ✓ ✗ linear non- Gaussian set of graphs non-linear additive noise ✓ minimality ✓ ✓ non-linear additive noise unique DAG
23
23
23
x y
z l2 l1
23
x y
z l2 l1
⇒
23
x y
z l2 l1
⇒
23
x y
z l2 l1
⇒
x
w
pathways
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
x
y
z w
“priors”
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
x
y
z w
“priors”
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
x
y
z w
“priors”
biological settings
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
x
y
z w
“priors”
time
subsampled time series biological settings
23
x y
z l2 l1
⇒
x
w
pathways
x
w y z
<
tier orderings
x
y
z w
“priors”
time
subsampled time series biological settings
Tank talk
24
24
x y z w
samples
x y w
samples
data sample
24
x y z w
samples
x y w
samples
data sample
(in)dependence constraints
x6? ?y|C||J
24
x y z w
samples
x y w
samples
data sample assumptions, e.g.
(in)dependence constraints
x6? ?y|C||J
24
x y z w
samples
x y w
samples
data sample assumptions, e.g.
background knowledge, e.g.
(in)dependence constraints
x6? ?y|C||J
24
x y z w
samples
x y w
samples
data sample assumptions, e.g.
background knowledge, e.g.
setting
(in)dependence constraints
x6? ?y|C||J
24
x y z w
samples
x y w
samples
data sample assumptions, e.g.
background knowledge, e.g.
setting
(in)dependence constraints
x6? ?y|C||J
Encode these as logical constraints on the underlying graph structure
24
x y z w
samples
x y w
samples
data sample assumptions, e.g.
background knowledge, e.g.
setting
(in)dependence constraints
x6? ?y|C||J
Encode these as logical constraints on the underlying graph structure
(max) SAT-solver
constraints in propositional logic
25
x ⊥ ⊥ y ⇐ ⇒ ¬A ∧ ¬B . . . A = ‘x → y is present’
constraints in propositional logic
25
x ⊥ ⊥ y ⇐ ⇒ ¬A ∧ ¬B . . . A = ‘x → y is present’ ¬A ∧ ¬B ∧ ¬(C ∧ D) ∧ ¬...
constraints in propositional logic
using a SAT-solver
25
x ⊥ ⊥ y ⇐ ⇒ ¬A ∧ ¬B . . . A = ‘x → y is present’ ¬A ∧ ¬B ∧ ¬(C ∧ D) ∧ ¬... ⇐ ⇒
x y
z
A = false B = false ...
constraints in propositional logic
using a SAT-solver
25
x ⊥ ⊥ y ⇐ ⇒ ¬A ∧ ¬B . . . A = ‘x → y is present’
complete
¬A ∧ ¬B ∧ ¬(C ∧ D) ∧ ¬... ⇐ ⇒
x y
z
A = false B = false ...
constraints in propositional logic
using a SAT-solver
25
x ⊥ ⊥ y ⇐ ⇒ ¬A ∧ ¬B . . . A = ‘x → y is present’
complete
UNsatisfiable
¬A ∧ ¬B ∧ ¬(C ∧ D) ∧ ¬... ⇐ ⇒
x y
z
A = false B = false ...
26
x y
z
constraints
x ⊥ ⊥ y x 6? ? z y 6? ? z x ⊥ ⊥ y | z
x
y
z
z
x 6? ? y | z
z
x 6? ? y
26
x y
z
constraints
x ⊥ ⊥ y x 6? ? z y 6? ? z x ⊥ ⊥ y | z
x
y
z
weight
3000 2500 500 250
26
x y
z
constraints
x ⊥ ⊥ y x 6? ? z y 6? ? z x ⊥ ⊥ y | z
x
y
z
weight
3000 2500 500 250
Sridhar talk
reliability
27
G
k : constraint k is not satisfied by G
[Hyttinen et al. 2014]
reliability
27
G
k : constraint k is not satisfied by G
[Hyttinen et al. 2014]
reliability
27
G
k : constraint k is not satisfied by G
What are suitable weights?
[Hyttinen et al. 2014]
28
[Hyttinen et al. 2014]
classical statistics
in light of these dependences
28
[Hyttinen et al. 2014]
classical statistics
in light of these dependences
log of the probability
28
x6? ?y|C P(x|C)P(y|x, C) x ⊥ ⊥y|C P(x|C)P(y|C)
vs.
[Hyttinen et al. 2014]
29
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.85 0.90 0.95 TPR FPR TPR
SAT
cPC t e s t s
l y PC
d-separation constraints
varying p-value cut-off
6 observed variables, average degree 2; 500 samples, 200 models, linear Gaussian parameterization
[Hyttinen et al. 2014]
30
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.75 0.80 0.85 0.90 FPR TPR
cFCI t e s t s
l y FCI
[Hyttinen et al. 2014]
31
0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.75 0.80 0.85 0.90 TPR FPR TPR
constant weights [sec. 4.2] log−weights [sec. 4.3] test only
SAT: log weights S A T : h a r d d e p s S A T : c
s t a n t w e i g h t s tests only
[Hyttinen et al. 2014]
32
x
w
32
x
w
32
x
w
z
32
x
w
z
weight = 0.8
32
x
w
z
weight = 0.8
x
w y z
<
32
x
w
z
weight = 0.8
graph
x
w y z
<
33
assumption/ algorithm PC / GES FCI CCD LiNGaM lvLiNGaM cyclic LiNGaM non-linear additive noise maxSAT Markov ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ faithfulness ✓ ✓ ✓ ✗ ✓ ~ minimality ✓ causal sufficiency ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ acyclicity ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✗* parametric assumption ✗ ✗ ✗ linear non- Gaussian linear non- Gaussian linear non- Gaussian non-linear additive noise ✗
33
assumption/ algorithm PC / GES FCI CCD LiNGaM lvLiNGaM cyclic LiNGaM non-linear additive noise maxSAT Markov ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ faithfulness ✓ ✓ ✓ ✗ ✓ ~ minimality ✓ causal sufficiency ✓ ✗ ✓ ✓ ✗ ✓ ✓ ✗ acyclicity ✓ ✓ ✗ ✓ ✓ ✗ ✓ ✗* parametric assumption ✗ ✗ ✗ linear non- Gaussian linear non- Gaussian linear non- Gaussian non-linear additive noise ✗
Markov equivalence PAG PAG unique DAG set of DAGs set of graphs unique DAG query based
34 20 40 60 80 100 20 40 60 80 100 instances (sorted for each line) solving time per instance (s)
40 60 80 solving time per instance (s)
log weights constant weights hard deps
[Hyttinen et al. 2014]
35
(max) SAT-solver
x y
z
w x y
z
w
etc.
35
(max) SAT-solver
x y
z
w x y
z
w
etc.
35
(max) SAT-solver
Query:
35
(max) SAT-solver
Query:
equivalence class
35
(max) SAT-solver
Query:
equivalence class
determined?
35
(max) SAT-solver
Query:
equivalence class
determined?
equivalence classes?
35
(max) SAT-solver
Query:
equivalence class
determined?
equivalence classes?
Response:
36
(max) SAT-solver
x y
z
w x y
z
w
etc.
36
(max) SAT-solver
x y
z
w x y
z
w
etc.
Grant talk
37
separation conditions
[Hyttinen et al. 2015]
37
separation conditions
P(y|do(x), z, w) = P(y|do(x), w) if Y ⊥ ⊥ Z|X, W||X
P(y|do(x), do(z), w) = P(y|do(x), z, w) if Y ⊥ ⊥ IZ|X, Z, W||X
P(y|do(x), do(z), w) = P(y|do(x), w) if Y ⊥ ⊥ IZ|X, W||X
Rule 1 (insertion/deletion of observations) Rule 2 (action/observation exchange) Rule 3 (insertion/deletion of actions)
[Hyttinen et al. 2015]
37
separation conditions
P(y|do(x), z, w) = P(y|do(x), w) if Y ⊥ ⊥ Z|X, W||X
P(y|do(x), do(z), w) = P(y|do(x), z, w) if Y ⊥ ⊥ IZ|X, Z, W||X
P(y|do(x), do(z), w) = P(y|do(x), w) if Y ⊥ ⊥ IZ|X, W||X
Rule 1 (insertion/deletion of observations) Rule 2 (action/observation exchange) Rule 3 (insertion/deletion of actions)
[Hyttinen et al. 2015]
38
x y z w
samples
x y w
samples
data sample assumptions, e.g.
background knowledge, e.g.
setting
(in)dependence constraints
x6? ?y|C||J
Encode these as logical constraints on the underlying graph structure
(max) SAT-solver
38
setting
Encode these as logical constraints on the underlying graph structure
(max) SAT-solver
38
setting
Encode these as logical constraints on the underlying graph structure
(max) SAT-solver
38
setting
Encode these as logical constraints on the underlying graph structure
(max) SAT-solver
P(y | do(x))
x
w
x w
?
39
39
[Stekhoven et al. 2012]
39
micro- to macro-variables
[Stekhoven et al. 2012] [Chalupka et al. 2016]
39
micro- to macro-variables
[Stekhoven et al. 2012] [Chalupka et al. 2016]
time
[Maier et al. 2013]
39
micro- to macro-variables
property: non-causal relations
[Stekhoven et al. 2012] [Chalupka et al. 2016]
time
[Maier et al. 2013]
39
micro- to macro-variables
property: non-causal relations
[Stekhoven et al. 2012] [Chalupka et al. 2016]
Sokolova talk Blondel talk
time
Limitations
LiNGaM
Additive noise models
SAT
Sets, JMLR 2015. Other references
40
Thank you!
41