A prior near-ignorance Gaussian Process model for nonparametric - - PowerPoint PPT Presentation

▶

Feb 22, 2023 149 likes •314 views

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions A prior near-ignorance Gaussian Process model for nonparametric regression Francesca Mangili francesca@idsia.ch Istituto Dalle Molle di

SLIDE 1

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

A prior near-ignorance Gaussian Process model for nonparametric regression

Francesca Mangili

francesca@idsia.ch Istituto “Dalle Molle” di Studi sull’Intelligenza Artificiale Lugano (Switzerland) http://www.ipg.idsia.ch/ ISIPTA 2015, Pescara

SLIDE 2

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Introduction

Consider the regression model y = f (x) + v, v = [v1, . . . , vn] := white Gaussian noise; x = [x1, . . . , xn] := vector of covariates; y = [y1, . . . , yn] := vector of observations; f (x) := unknown regression function. Goals:

◮ make inferences about f (x); ◮ model prior near ignorance about f (x).

SLIDE 3

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

f (x) ∼ GP(µ(x), k(x, x′)) µ(x) := mean function. Prior belief about shape of f (x). Usually set equal to 0. k(x, x′) := covariance function. Example: squared exponential k(x, x′) = σ2

k exp

− 1

2 (x−x′)2 ℓ2

(σk, ℓ):= hyperparameters.

SLIDE 4

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

f (x1)

f (x2)

∼ N
µ(x1)

µ(x2)

,
k(x1, x1)

k(x1, x2) k(x2, x1) k(x2, x2)

Short notation:

f (x) ∼ N ( µ(x) , k(x, x) )

SLIDE 5

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

Posterior Observations: (x, y) Generic covariate: x Kn kx

f (x)

∼ N

µ(x) µ(x) k(x, x) + σ2

νI

k(x, x) k(x, x) k(x, x)

Then

f (x)|x, y ∼ GP(ˆ µ(x), ˆ k(x, x′)), with ˆ µ(x) = µ(x) + kT

x K −1 n (y − µ(x)),

ˆ k(x, x′) = kθ(x, x′) − kT

x K −1 n kx.

SLIDE 6

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Imprecise Gaussian Process (IGP)

Definition: Given a base kernel k(x, x′), a function h(x) and a constant c > 0 we define an Imprecise Gaussian Process with base mean function h(x) (h-IGP) the set Gh = {GP (Mh(x), k(x, x′) + kh(x, x′)) , M ≥ 0} with kh(x, x′) = M + 1 c h(x)h(x′). If h(x) = 0

◮ a priori E[|f (x)|] = +∞ ◮ the component kh increases with the mean and thus

|Prior mean of f (x)| Variance of f (x) = M|h(x)| kθ(x, x) + M+1

h(x)2 ≤ c h(x) (bounded).

SLIDE 7

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The H-IGP

We can generalize the h-IGP model by letting h(x) free to vary in a set of functions H. Definition: We define an Imprecise Gaussian Process with set of base mean functions H (H-IGP) as the set of GPs: GH = {Gh : h(x) ∈ H} . Near-ignorance: If there exist both strictly positive and strictly negative values of h(x) for different h ∈ H, then inf

M,h(x) E[f (x)] = −∞,

sup

M,h(x)

E[f (x)] = +∞. Learning: Any set H-IGP such that h(x) is a nonzero vector for all h ∈ H can learn from the observations x, y.

SLIDE 8

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The constant mean IGP (c-IGP)

Definition: We define the costant mean IGP as the H-IGP with H = {h(x) = ±1} . It verifies

◮ prior near-ignorance about E[f (x)]; ◮ learning.

SLIDE 9

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The c-IGP

Posterior inferences:

◮ if

≤ 1 + c

Sk , E[f (x)] E[f (x)]

= kT

x K −1 n y + (1 − kT x sk)sT k

Sk y ± c |1 − kT

x sk|

Sk . with sk = K −1

n ✶n, Sk = ✶ T n K −1 n ✶n. ◮ Parameter c determines the degree of imprecision of the model:

E[f (x)] − E[f (x)] = 2c |1 − kT

x sk|

SLIDE 10

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Example

Estimates of E[f (x)] given n = 50 observations (x, y).

SLIDE 11

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Application to hypothesis testing

Goal: Compare f1(x) and f2(x) given two independent samples (x(1), y(1)) and (x(2), y(2)). Prior: fi ∼ c − IGP

Mihi, kθ(x, x′) + Mi + 1

Hypothesis: ∆µ(x) = E[f1(x) − f2(x)] = 0 in a region of interest XT.

SLIDE 12

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Procedure

◮ Consider a vector x∗ of equispaced inputs ∈ XT; ◮ Derive the credible region (CI) of ∆µ(x∗) from the chi-squared

random variable χ2

s = [∆µ(x∗)]T( ˆ

K ∗

∆)−1[∆µ(x∗)]

Prior near-ignorance: χ2

s = 0

χ2

s → +∞. ◮ If, a posteriori, 0 ∈ CI conclude that f1 = f2.

Indecision: If different priors entail different decisions, a robust decision cannot be made in XT.

SLIDE 13

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Numerical example

XT GP c-IGP Case A: x(1)

∼ U[−2, 2], x(2)

∼ U[−2, 2] [−2, 0] [−2, 2] 1 1

SLIDE 14

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Numerical example

XT GP c-IGP Case A: x(1)

∼ U[−2, 2], x(2)

∼ U[−2, 2] [−2, 0] [−2, 2] 1 1 Case B: x(1)

∼ U[−2, 0], x(2)

∼ U[−2, 4] [−2, 0] [−2, 2] 2

SLIDE 15

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Conclusions

◮ We have presented a general framework for modeling prior near

ignorance about f (x) based on the Gaussian process (IGP).

◮ We have derived an IGP model with prior constant mean free to vary

between −∞ and +∞:

⊲ with many observations the IGP and GP inferences almost coincide; ⊲ where there are no observations the imprecision of the IGP is very high, reflecting the actual lack of knowledge. ⊲ Applied to hypothesis testing, the IGP acknowledges when the available data are not informative enough to make a robust decision.

◮ Future research should focus on

⊲ the study of other prior near ignorance models based on different sets H of base mean functions; ⊲ the development of models allowing for a weaker specification of the kernel function.