Machine learning and causal inference: a two-way road
Uri Shalit Technion – Israel Institute of Technology DATAIA Seminar Paris, January 2020
Machine learning and causal inference: a two-way road Uri Shalit - - PowerPoint PPT Presentation
Machine learning and causal inference: a two-way road Uri Shalit Technion Israel Institute of Technology DATAIA Seminar Paris, January 2020 What is causality? A big question! Extremely short into to causality (in the context of
Uri Shalit Technion – Israel Institute of Technology DATAIA Seminar Paris, January 2020
succeeded
succeeded
succeeded
process
process
(i) Johansson, S, Sontag, (2016). Learning representations for counterfactual
(ii) Shalit, U., Johansson, F., & Sontag, D. (2017). Estimating individual treatment effect: generalization bounds and algorithms. In International Conference on Machine Learning. (iii) Johansson, Kallus, S, Sontag, (2020) Generalization bounds and representation learning for estimation of potential
Age = 54 Gender = Female Race = Asian Blood sugar = 7.7% WBC count = 6.8*109/L Temperature = 36.7°C Blood pressure = 150/95
May 15
Anna
Age = 54 Gender = Female Race = Asian Blood sugar = 7.7% WBC count = 6.8*109/L Temperature = 36.7°C Blood pressure = 150/95
May 15 Blood pressure = ?
𝒁(𝟏)
𝒖𝒔𝒇𝒃𝒖𝒏𝒇𝒐𝒖 𝑼 = 𝟐 𝒖𝒔𝒇𝒃𝒖𝒏𝒇𝒐𝒖 𝑼 = 𝟏
Anna
Blood pressure = ?
𝒁(𝟐)
Only one observed for any one patient!
► Historical records of treatments and outcomes Patient Age Prior disease activity Observed treatment Disease activity Anna 54 High A High Calvin 52 High A Low John 48 Low B Low Peter 60 Low B High 𝑌 𝑈 𝑍
► Unobserved counterfactual outcomes Patient Age Prior disease activity Disease activity (A) Disease activity (B) Anna 54 High High ? Calvin 52 High Low ? John 48 Low ? Low Peter 60 Low ? High
Outcomes under alternative treatments
𝑌 𝑍(0) 𝑍 1
𝑦
Control outcome 𝔽[𝑍 0 ∣ 𝑌]
Age Mortality 𝑦
Control outcome 𝔽[𝑍 0 ∣ 𝑌]
Age Mortality
Treated outcome 𝔽[𝑍 1 ∣ 𝑌]
Effect of treatment τ(𝑦)
𝑦 Age Mortality
𝑍(1)
𝜐
𝑍(0)
𝑞K 𝑌 ≔ 𝑞(𝑌 ∣ 𝑈 = 0)
Treated group Control group
𝑞L 𝑌 ≔ 𝑞(𝑌 ∣ 𝑈 = 1)
𝑦
Control outcome 𝔽[𝑍 0 ∣ 𝑌]
Age Mortality
Treated
“Patients with similar 𝑌 respond similarly” ∀𝑢 ∶ 𝑍 𝑢 ⊥ 𝑈 ∣ 𝑌
∀𝑢, 𝑦 ∶ 𝑞 𝑈 = 𝑢 𝑌 = 𝑦 > 0
3. SUTVA: “No patient-patient interference” 4. Consistency: “We observe 𝑍 𝑢 for patients with 𝑈 = 𝑢”
E.g., assume: 𝑍 = 𝛾S𝑌 + 𝜾𝑈 + 𝜗, Goal: find 𝜾!
Treatment effect Observed outcome
E.g., assume: 𝑍 = 𝛾S𝑌 + 𝜾𝑈 + 𝜗, Goal: find 𝜾!
̂ 𝜐, 𝜐
̂ 𝜐∗ = arg min
_ `∈𝒰
𝔽 𝑀 ̂ 𝜐, 𝜐 = arg min
_ `∈𝒰
𝔽 ̂ 𝜐 𝑌 − 𝜐 b
Treatment effect Observed outcome
► Treatment is assigned unfirmly at random: 𝑞 𝑈 = 1
𝑌 = 𝑄 𝑈 = 1
► Here: every dot is a unit, color indicates observed treatment ► Predict outcome under unobserved treatment
Easier: Randomized Controlled Trials (RCT)
𝑦L 𝑦b
Control, 𝑈 = 0 Treated, 𝑈 = 1
“Training set” distribution = “Test set” distribution
► In randomized control trials, there is no confounding – just do regression! ► New architecture for estimating counterfactuals and CATE ► One “head” per potential outcome – avoids washing away treatment ► Shared representation layers Φ 𝑦 for sample efficiency
(Treatment-Agnostic Representation Network)
𝑀(ℎL(Φ), 𝑍(1))
Φ
ℎK
𝑀 ℎK Φ , 𝑍(0)
… … …
ℎL
𝑗𝑔 𝑈 = 0 𝑗𝑔 𝑈 = 1
► Predict outcome under unobserved treatment ► Treatment is not assigned equally at random: 𝑞 𝑈 = 1
𝑌 ≠ 𝑄 𝑈 = 1
► There is a non-negligible difference between treatment group distributions
𝑒 Control, 𝑈 = 0 Treated, 𝑈 = 1
Example: A difference in means “Treated tend to be younger”
► Learn a representation Φ of the data that makes it more like an RCT ► A shared representation helps identify meaningful interactions ► Penalize the distributional distance between treatment groups
New type of bias-variance tradeoff
Φ(𝑦)L
Φ(𝑦)b
Φ 𝑦
Representation space
Original space
Control, 𝑈 = 0 Treated, 𝑈 = 1 Control, 𝑈 = 0 Treated, 𝑈 = 1
► We do not want treatment groups to be identical
Φ(𝑦)L
Φ(𝑦)b 𝑞j
klL 𝑦 ≠ 𝑞j klK 𝑦
Φ 𝑦
𝑦L 𝑦b Treatment group imbalance
Control, 𝑈 = 0 Treated, 𝑈 = 1
► Regularizer to improve counterfactual estimation ► Penalize treatment distributional distance in representation space ► Integral Probability Metrics (IPM) such as Wasserstein distance and MMD
𝑀(ℎL(Φ), 𝑍(1))
Φ
IPMp( ̂ 𝑞j
klK, ̂
𝑞j
klL)
ℎK
𝑀 ℎK Φ , 𝑍(0)
… … …
ℎL
𝑗𝑔 𝑈 = 0 𝑗𝑔 𝑈 = 1
IPMq 𝑞K, 𝑞L = sup
u∈p
v
𝒯
𝑡 𝑞K 𝑡 − 𝑞L 𝑡 𝑒𝑡 With G a function family:
► Regularizer to improve counterfactual estimation ► Penalize treatment distributional distance in representation space ► Integral Probability Metrics (IPM) such as Wasserstein distance and MMD
𝑀(ℎL(Φ), 𝑍(1))
Φ
IPMp( ̂ 𝑞j
klK, ̂
𝑞j
klL)
ℎK
𝑀 ℎK Φ , 𝑍(0)
… … …
ℎL
𝑗𝑔 𝑈 = 0 𝑗𝑔 𝑈 = 1
u∈p
𝒯
► Precision in Estimation of
Heterogeneous Effects1:
►
{ 𝐷𝐵𝑈𝐹j,| = ℎ Φ 𝑦 , 1 − ℎ(Φ 𝑦 , 0)
𝜗}~k•(𝜚, ℎ) = v
𝐷𝐵𝑈𝐹j,| − CATE 𝑦
b
𝑞 𝑦 𝑒𝑦
𝜗}~k•(𝜚, ℎ) ≤ 2 𝜗„
klK Φ, ℎ + 𝜗„ klL Φ, ℎ + 𝐶j IPMq 𝑞j klL, 𝑞j klK
𝜗„
klK = v
𝑍 0 − 𝑍 0
b
𝑞‡lK 𝑦 𝑒𝑦
► Factual per-treatment group
prediction error
Effect error Prediction error Treatment group distance
1Hill, Journal of Computational and Graphical Statistics 2011
► Theorem 1:
𝜗„
klL = v
𝑍 1 − 𝑍 1
b
𝑞‡lL 𝑦 𝑒𝑦
Too loose when we have overlap + infinite samples
group
𝜗CATE ≤ 2 𝜗„
klK Φ, ℎ + 𝜗„ klL Φ, ℎ + 𝐶j IPMq 𝑞j klL, 𝑞j klK
Effect error Prediction error Treatment group distance
► Theorem 1:
► Our full architecture learns a representation Φ(x), a re-weighting
𝑥‡(𝑦) and hypotheses ℎ‡(Φ) to trade-off between the re-weighted loss 𝑥ℓ and imbalance between re-weighted representations
𝑦 Φ ℎL ℎK 𝑥
IPM(𝑥K𝑞j
‡lK, 𝑥L𝑞j ‡lL)
𝑥ℓ
𝑢
Context Repres. Hypotheses Weighting Imbalance Weighted loss Treatment
DNN
Φ
► Theorem 2*: (Representation learning) ► Letting Φ 𝑦 = 𝑦, and 𝑥‡(𝑦) be inverse propensity weights, we
recover classic result
► Minimizing a weighted loss and IPM converge to the
representation and hypothesis that minimize CATE error
𝜗CATE ≤ 2 ›
‡∈{K,L}
𝜗‡
žŸ Φ, ℎ + 𝐶j IPMq 𝑞j L ‡(𝑦), 𝑥‡ 𝑞j ‡ (𝑦) Effect risk Re-weighted factual loss Imbalance of re-weighted representations
*Extension to finite samples available
► No ground truth, similar to off-policy evaluation in
reinforcement learning
► No ground truth, similar to off-policy evaluation in
reinforcement learning
► Requires either: ► Knowledge of the true outcome (synthetic) ► Knowledge of treatment assignment policy
(e.g. a randomized controlled trial)
► No ground truth, similar to off-policy evaluation in
reinforcement learning
► Requires either: ► Knowledge of the true outcome (synthetic) ► Knowledge of treatment assignment policy
(e.g. a randomized controlled trial)
► Our framework has proven effective in both settings
► The Infant Health and Development Program (IHDP)
► Studied the effects of home visits and other interventions
► Real covariates and treatment, synthesized outcome ► Overlap is not satisfied (by design) ► Used to evaluate MSE in CATE prediction
1Hill, JCGS, 2011
Method CATE MSE BART1 2.3 ± 0.1 Neural net 2.0 ± 0.0 Shared rep.2 𝟐. 𝟏 ± 𝟏. 𝟏 Shared rep. + invariance2 𝟏. 𝟗 ± 𝟏. 𝟏 Shared rep. + invariance + weighting3 𝟏. 𝟖 ± 𝟏. 𝟏
► BART, Bayesian Additive Regression
Trees, are state-of-the-art baselines
► Standard neural networks
competitive
► Shared representation learning with
ERM halves the MSE on IHDP2
► Minimizing upper bounds on risk,
including 𝑒ℋ further reduces the MSE
1Hill, JCGS, 2011, 2S., Johansson, Sontag. ICML, 2017, 3Johansson, Kallus, S., Sontag. arXiv, 2018
► ML is well understood when test data ≈ training data ► Learning individualized policies from observational data
requires going beyond test ≈ train
► Fewer/worse guarantees when assumptions are violated
process
process
“Off-Policy Evaluation in Partially Observable Environments”, Tennenholtz, Mannor, S AAAI 2020
state, modify treatment, and so on
Doctor Patient
Figure: Shweta Bhatt
From causal inference perspective
intervene directly
experiment optimally in a dynamic environment
From causal inference perspective
intervene directly
experiment optimally in a dynamic environment From RL perspective
From causal inference perspective
intervene directly
experiment optimally in a dynamic environment From RL perspecFve
with cases we cannot intervene directly
From causal inference perspective
intervene directly
experiment optimally in a dynamic environment From RL perspective
with cases we cannot intervene directly
From causal inference perspective
intervene directly
experiment optimally in a dynamic environment From RL perspective
with cases we cannot intervene directly
evaluation of a simple policy such as “treat everyone”
i. dynamic environment with ongoing actions ii. while we possibly do not have access to the same data as the agent
intensive care unit (ICU)
without considering hidden confounders or overlap (common support / positivity)
(see “Guidelines for Reinforcement Learning in Healthcare” Gottesman et al. 2019)
Observable Markov Decision Process (POMDP)
Causal name RL name Example 𝐯t confounder (possibly “hidden”) state (possibly “unobserved”) Information available to the doctor 𝐛t action, treatment action medications, procedures… 𝐬t
reward mortality 𝝆𝒄 treatment assignment process behavioral policy The way doctors treat patients 𝐴t Proxy variable
Electronic health record
Causal name RL name Example 𝐯t confounder (possibly “hidden”) state (possibly “unobserved”) Information available to the doctor 𝐛t action, treatment action medications, procedures… 𝐬t
reward mortality 𝝆𝒄 treatment assignment process behavioral policy The way doctors treat patients 𝐴t Proxy variable
Electronic health record
Causal name RL name Example 𝐯t confounder (possibly “hidden”) state (possibly “unobserved”) information available to the doctor 𝐛t action, treatment action medications, procedures… 𝐬t
reward mortality 𝝆𝒄 treatment assignment process behavioral policy the way doctors treat patients 𝐴t Proxy variable
Electronic health record
Causal name RL name Example 𝐯t confounder (possibly “hidden”) state (possibly “unobserved”) information available to the doctor 𝐛t action, treatment action medications, procedures… 𝐬t
reward mortality 𝝆𝒄 treatment assignment process behavioral policy the way doctors treat patients 𝐴t proxy variable
electronic health record
Causal name RL name Example 𝐯t confounder (possibly “hidden”) state (possibly “unobserved”) information available to the doctor 𝐛t action, treatment action medications, procedures… 𝐬t
reward mortality 𝝆𝒄 treatment assignment process behavioral policy the way doctors treat patients 𝐴t proxy variable
electronic health record
policy value (discounted over a finite horizon)
from 𝝆𝒄, with 𝐯t unobserved?
policy value (discounted over a finite horizon)
from observed behavioral policy
from targeted evaluation policy
evaluate a proposed policy 𝝆𝒇(𝐴𝐮) in terms of policy value (discounted over a finite horizon)
IMPOSSIBLE
more aggressively
“Identifying causal effects with proxy variables of an unmeasured confounder.” Biometrika (2018)
matrices 𝑁¹º(𝑏) = 𝑞(𝐱 = 𝑗|𝐴 = j, 𝐛 = 𝑏) are invertible for all 𝑏
as many categories as discrete 𝒗
(untestable from data)
‡ 𝑏 = 𝑞𝝆𝒄 𝐴𝐮 = 𝑗 𝐴𝐮 𝟐 = 𝑘, 𝐛𝐮 = 𝑏
If 𝑁‡(𝑏) are all invertible then we can evaluate value of a proposed policy 𝝆𝒇(𝒜𝒖) given observational data gathered under 𝝆𝒄, without observing 𝐯𝐮
assumptions are unverifiable from data
Assumptions
‡ 𝑏 = 𝑞𝝆𝒄 𝐴𝐮 = 𝑗 𝐴𝐮 𝟐 = 𝑘, 𝐛𝐮 = 𝑏
are invertible for all 𝑏 and 𝑢
k
‡ 𝑏 = 𝑞𝝆𝒄 𝐴𝐮 = 𝑗, 𝐴𝐮 𝟐 = 𝑨‡ L 𝐴𝐮 𝟑 = 𝑘, 𝐛𝐮 𝟐 = 𝑏
K(𝜐) = ∑º 𝑁K 𝑏Æ ¹º L 𝑞𝝆𝒄(𝒜𝟏 = 𝑘)
k
𝑋‡ 𝜐 ⋅ 𝑅K (𝜐)
k
𝝆𝒇(𝑏‡|𝑨K, 𝑏K, … , 𝑨‡ L, 𝑏‡ L, 𝑨‡)
Í ΛË 𝜐 𝑞𝝆𝒄(𝑠‡, 𝑨‡|𝑏‡, 𝑨‡ L) Ω 𝜐
Assumptions
‡ 𝑏 = 𝑞𝝆𝒄 𝐴𝐮 = 𝑗 𝐴𝐮 𝟐 = 𝑘, 𝐛𝐮 = 𝑏
are invertible for all 𝑏 and 𝑢
number of the estimated inverse matrices
number of the estimated inverse matrices
process
process
“Robust learning with the Hilbert- Schmidt independence criterion”, Greenfeld & S arXiv:1910.00270
Î(𝑌, 𝑍)
Target distributions 𝑄Ï 𝑌, 𝑍 ∈
Source 𝑄
Î
Î 𝑌, 𝑍
For all 𝑄Ï 𝑌, 𝑍 ∈ , 𝑄Ï 𝑍|𝑌 = 𝑄
Î(𝑍|𝑌)
Î 𝑍 𝑌 is easy
instances changes
Learn structure of causal models by learning funcFons 𝑔 such that 𝑍 − 𝑔 𝑌 is approximately independent of 𝑌
kernels 𝐿(⋅,⋅) and 𝑀(⋅,⋅) respectively
Denote (some abuse of notation) 𝐿 the 𝑜 × 𝑜 kernel matrix on 𝑌, 𝑀 is 𝑜 × 𝑜 kernel matrix on 𝑍
𝐼𝑇𝐽𝐷 (𝑌, 𝑍; Õ, Ö) =
L Ü L à 𝑢𝑠 𝐿𝐼𝑀𝐼
𝐼 is a centering matrix, 𝐼¹º = 𝜀¹º − L
Ü
|∈ℋ 𝔽 ℓ(𝑍, ℎ 𝑌 )
|∈ℋ 𝐼𝑇𝐽𝐷 𝑌, 𝑍 − ℎ 𝑌 ; Õ, Ö
|∈ℋ 𝐼𝑇𝐽𝐷 𝑌, 𝑍 − ℎ 𝑌 ; Õ, Ö
|∈ℋ 𝐼𝑇𝐽𝐷 𝑌, 𝑍 − ℎ 𝑌 ; Õ, Ö
â
ŸãäåæŸ •
â
çèéäêæ • is “nice” in the sense of low RKHS norm.
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
70 80 90 100
Accuracy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
HSIC Cross entropy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy 60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
60 65 70 75 80 85 90 95 100
Accuracy
C11 - sourcH 0LP 2x256 - sourcH 0LP 2x524 - sourcH 0LP 2x1024 - sourcH 0LP 4x256 - sourcH 0LP 4x512 - sourcH 0LP 4x1024 - sourcH C11 - targHt 0LP 2x256 - targHt 0LP 2x524 - targHt 0LP 2x1024 - targHt 0LP 4x256 - targHt 0LP 4x512 - targHt 0LP 4x1024 - targHt
7raLnLng schHPH
H6IC Cross HntroSy
HSIC Cross entropy 70 80 90 100
Accuracy
process
treatment assignments process
No unmeasured factors that affect both treatment and outcome
𝑈 = 1 and 𝑈 = 0 populations should be similar
be able to approximate 𝔽 𝑍|𝑦, 𝑈 = 𝑢
You have condition A. Treatment
T=0, T=1
Obviously, give T=0
No need for algorithmic decision support
Obviously, give T=0 Obviously, give T=1
No need for algorithmic decision support
Obviously, give T=0 Obviously, give T=1 I’m not so sure…
Obviously, give T=1 Obviously, give T=0 I’m not so sure… Recommend T=0
Obviously, give T=1 Obviously, give T=0 I’m not so sure… Recommend T=0
Recommending a suboptimal action is not as risky
Recommending a subopFmal acFon is not as risky
recommendaFon
Obviously, give T=1 Obviously, give T=0 I’m not so sure… T=0 T=1
conscious point in time decision by trained decision makers
focus on cases with explicit decision uncertainty
sign(𝐷𝐵𝑈𝐹) more important than exact number
paFent correctly
physician uncertainty
more data regarding treatment alternaFves for similar paFents
hospitalized acute heart failure patients with kidney injury in Rambam Medical Center
They have poor guidance how to prescribe diuretics and blood-pressure medications to these patients
demographics, lab tests, diagnoses, medications, administrative and more
half had decreased diuretics
𝐷𝐵𝑈𝐹(𝑦) we can derive a policy recommendaFon for treatment
𝐷𝐵𝑈𝐹 𝑦 > 0
expected outcome if paFents were treated by policy 𝜌
Increase everyone Random policy Decrease everyone { 𝐷𝐵𝑈𝐹 policy
0% 40%
Increase everyone Decrease everyone { 𝐷𝐵𝑈𝐹 policy
0% 40%
Doctors policy
Increase everyone Decrease everyone { 𝐷𝐵𝑈𝐹 policy
0% 40%
Doctors policy Random policy
Increase everyone Decrease everyone
0% 40%
Doctors policy Random policy
Increase everyone Doctors policy Random policy Decrease everyone { 𝐷𝐵𝑈𝐹 policy
0% 40%
Increase everyone Doctors policy Random policy Decrease everyone { 𝐷𝐵𝑈𝐹 policy
0% 40%