Introduction to Bayesian Statistics
Lecture 4: Multiparameter models (I)
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
March 18, 2015
Introduction to Bayesian Statistics Lecture 4: Multiparameter models - - PowerPoint PPT Presentation
Introduction to Bayesian Statistics Lecture 4: Multiparameter models (I) Rung-Ching Tsai Department of Mathematics National Taiwan Normal University March 18, 2015 Noninformative prior distributions Proper and improper prior distributions
Rung-Ching Tsai
Department of Mathematics National Taiwan Normal University
March 18, 2015
[improper]
y, σ2
n )
2 of 17
dφ| = p(θ) |h′(θ)|−1
p(θ) ∝ [J(θ)]1/2, where J(θ) is the Fisher information for θ: J(θ) = E dlogp(y|θ) dθ 2 |θ
d2logp(y|θ) dθ2 |θ
at θ = h−1(φ):
J(φ) = −E d2logp(y|φ) dφ2
dθ2
dφ
= J(θ)
dφ
;
thus, J(φ)1/2 = J(θ)1/2| dθ
dφ|
3 of 17
n
y
logp(y|θ) = constant + ylogθ + (n − y)log(1 − θ). J(θ) = −E d2logp(y|θ) dθ2 |θ
n θ(1 − θ) Jeffreys′prior ⇒ p(θ) ∝ θ−1/2(1 − θ)−1/2.
2, 1 2)
4 of 17
many) unknown parameters!
interest) while others are regarded as nuisance parameters for which we have no interest in making inferences but which are required in order to construct a realistic model.
approach reveals its principal advantage over other forms of inference.
5 of 17
distribution of all unknowns, then integrate over the nuisance parameters to leave the marginal posterior distribution for the parameters of interest.
posterior distribution (even this may be computationally difficult), look at the parameters of interest and ignore the rest.
6 of 17
y|µ, σ2 ∼ normal(µ, σ2),
7 of 17
p(θ) = p(θ1, θ2)
p(y|θ) = p(y|θ1, θ2)
p(θ1, θ2|y) ∝ p(θ1, θ2)p(y|θ1, θ2).
8 of 17
p(θ1|y) =
p(θ1|y) =
=
(1)
nuisance parameter θ2, p(θ1|θ2, y).
representing, for example, different sub-models.
9 of 17
We rarely evaluate integral (1) explicitly, but it suggests an important strategy for constructing and computing with multiparameter models, using simulations.
1
from conditional posterior distribution given the previous drawn value of θ(t)
2 , p(θ1|θ(t) 2 , y).
2
from conditional posterior distribution given the drawn value of θ(t)
1 , p(θ2|θ(t) 1 , y).
marginal posterior distribution of p(θ1, θ2|y).
10 of 17
iid
∼ normal(µ, σ2), both µ and σ2 unknown, use Bayesian approach to estimate µ.
p(µ, σ2) = p(µ)p(σ2) ∝ 1 · (σ2)−1 = σ−2
p(y|µ, σ2) =
n
1 √ 2πσ exp
2σ2 (yi − µ)2
σ−nexp
2σ2 (
n
(yi − µ)2
iid
∼ normal(µ, σ2)
p(µ, σ2|y) ∝ p(µ, σ2)p(y|µ, σ2) ∝ σ−n−2exp
2σ2 (
n
(yi − µ)2
σ−n−2exp
2σ2 (
n
(yi − ¯ y)2 + n(¯ y − µ)2
σ−n−2exp
2σ2 [(n − 1)s2 + n(¯ y − µ)2]
where s2 =
1 n−1
n
i=1(yi − ¯
y)2, the sample variance. The sufficient statistics are s2 and ¯ y.
12 of 17
informative prior p(µ) ∝ 1, we have p(µ|σ2, y) ∼ normal(¯ y, σ2 n ).
13 of 17
p(µ, σ2|y) ∝ σ−n−2exp
2σ2 [(n − 1)s2 + n(¯
y − µ)2]
is, evaluating the simple normal integral
2σ2 n(¯ y − µ)2
n , thus, p(σ2|y) ∝ (σ2)−(n+1)/2exp
2σ2
∼ Inv − χ2(n − 1, s2), which is a scaled inverse-χ2 distribution.
14 of 17
Bayesian analysis is the marginal posterior distribution of µ. This can be obtained by integrating σ2 out of the joint posterior
then draw µ from p(µ|σ2, y).
mixture of normal distributions mixed over the scaled inverse chi-squared distribution for the variance - a rare case where analytic results are available.
15 of 17
p(µ|y) = ∞ p(µ, σ2|y)dσ2
A 2σ2 , A = (n − 1)s2 + n(µ − ¯
y)2, the result is an unnormalized gamma integral: p(µ|y) ∝ A−n/2 ∞ z(n−2)/2exp(−z)dz ∝ [(n − 1)s2 + n(µ − ¯ y)2]−n/2 ∝
y)2 (n − 1)s2 −n/2
y, s2
n ).
16 of 17
versus Frequentist: (n − 1)s2 σ2 |y ∼ χ2
n−1 vs.
(n − 1)s2 σ2 |µ, σ2 ∼ χ2
n−1
p(µ, σ2) ∝ (σ2)−1) versus Frequentist: µ − ¯ y s/√n |y ∼ tn−1 vs. ¯ y − µ s/√n |µ, σ2 ∼ tn−1. where the ratio
¯ y−µ s/√n is called a pivotal quantity: Its sampling
distribution does not depend on the nuisance parameter σ2.
17 of 17