A prior near-ignorance Gaussian Process model for nonparametric - - PowerPoint PPT Presentation

a prior near ignorance gaussian process model for
SMART_READER_LITE
LIVE PREVIEW

A prior near-ignorance Gaussian Process model for nonparametric - - PowerPoint PPT Presentation

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions A prior near-ignorance Gaussian Process model for nonparametric regression Francesca Mangili francesca@idsia.ch Istituto Dalle Molle di


slide-1
SLIDE 1

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

A prior near-ignorance Gaussian Process model for nonparametric regression

Francesca Mangili

francesca@idsia.ch Istituto “Dalle Molle” di Studi sull’Intelligenza Artificiale Lugano (Switzerland) http://www.ipg.idsia.ch/ ISIPTA 2015, Pescara

slide-2
SLIDE 2

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Introduction

Consider the regression model y = f (x) + v, v = [v1, . . . , vn] := white Gaussian noise; x = [x1, . . . , xn] := vector of covariates; y = [y1, . . . , yn] := vector of observations; f (x) := unknown regression function. Goals:

◮ make inferences about f (x); ◮ model prior near ignorance about f (x).

slide-3
SLIDE 3

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

f (x) ∼ GP(µ(x), k(x, x′)) µ(x) := mean function. Prior belief about shape of f (x). Usually set equal to 0. k(x, x′) := covariance function. Example: squared exponential k(x, x′) = σ2

k exp

  • − 1

2 (x−x′)2 ℓ2

  • ,

(σk, ℓ):= hyperparameters.

slide-4
SLIDE 4

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

  • f (x1)

f (x2)

  • ∼ N
  • µ(x1)

µ(x2)

  • ,
  • k(x1, x1)

k(x1, x2) k(x2, x1) k(x2, x2)

  • Short notation:

f (x) ∼ N ( µ(x) , k(x, x) )

slide-5
SLIDE 5

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Gaussian Process (GP)

Posterior Observations: (x, y) Generic covariate: x Kn kx

  • y

f (x)

  • ∼ N

µ(x) µ(x) k(x, x) + σ2

νI

k(x, x) k(x, x) k(x, x)

  • Then

f (x)|x, y ∼ GP(ˆ µ(x), ˆ k(x, x′)), with ˆ µ(x) = µ(x) + kT

x K −1 n (y − µ(x)),

ˆ k(x, x′) = kθ(x, x′) − kT

x K −1 n kx.

slide-6
SLIDE 6

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The Imprecise Gaussian Process (IGP)

Definition: Given a base kernel k(x, x′), a function h(x) and a constant c > 0 we define an Imprecise Gaussian Process with base mean function h(x) (h-IGP) the set Gh = {GP (Mh(x), k(x, x′) + kh(x, x′)) , M ≥ 0} with kh(x, x′) = M + 1 c h(x)h(x′). If h(x) = 0

◮ a priori E[|f (x)|] = +∞ ◮ the component kh increases with the mean and thus

|Prior mean of f (x)| Variance of f (x) = M|h(x)| kθ(x, x) + M+1

c

h(x)2 ≤ c h(x) (bounded).

slide-7
SLIDE 7

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The H-IGP

We can generalize the h-IGP model by letting h(x) free to vary in a set of functions H. Definition: We define an Imprecise Gaussian Process with set of base mean functions H (H-IGP) as the set of GPs: GH = {Gh : h(x) ∈ H} . Near-ignorance: If there exist both strictly positive and strictly negative values of h(x) for different h ∈ H, then inf

M,h(x) E[f (x)] = −∞,

sup

M,h(x)

E[f (x)] = +∞. Learning: Any set H-IGP such that h(x) is a nonzero vector for all h ∈ H can learn from the observations x, y.

slide-8
SLIDE 8

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The constant mean IGP (c-IGP)

Definition: We define the costant mean IGP as the H-IGP with H = {h(x) = ±1} . It verifies

◮ prior near-ignorance about E[f (x)]; ◮ learning.

slide-9
SLIDE 9

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

The c-IGP

Posterior inferences:

◮ if

  • sky

Sk

  • ≤ 1 + c

Sk , E[f (x)] E[f (x)]

  • = kT

x K −1 n y + (1 − kT x sk)sT k

Sk y ± c |1 − kT

x sk|

Sk . with sk = K −1

n ✶n, Sk = ✶ T n K −1 n ✶n. ◮ Parameter c determines the degree of imprecision of the model:

E[f (x)] − E[f (x)] = 2c |1 − kT

x sk|

Sk

slide-10
SLIDE 10

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Example

Estimates of E[f (x)] given n = 50 observations (x, y).

slide-11
SLIDE 11

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Application to hypothesis testing

Goal: Compare f1(x) and f2(x) given two independent samples (x(1), y(1)) and (x(2), y(2)). Prior: fi ∼ c − IGP

  • Mihi, kθ(x, x′) + Mi + 1

c

  • Hypothesis: ∆µ(x) = E[f1(x) − f2(x)] = 0 in a region of interest XT.
slide-12
SLIDE 12

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Procedure

◮ Consider a vector x∗ of equispaced inputs ∈ XT; ◮ Derive the credible region (CI) of ∆µ(x∗) from the chi-squared

random variable χ2

s = [∆µ(x∗)]T( ˆ

K ∗

∆)−1[∆µ(x∗)]

Prior near-ignorance: χ2

s = 0

χ2

s → +∞. ◮ If, a posteriori, 0 ∈ CI conclude that f1 = f2.

Indecision: If different priors entail different decisions, a robust decision cannot be made in XT.

slide-13
SLIDE 13

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Numerical example

XT GP c-IGP Case A: x(1)

i

∼ U[−2, 2], x(2)

i

∼ U[−2, 2] [−2, 0] [−2, 2] 1 1

slide-14
SLIDE 14

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Numerical example

XT GP c-IGP Case A: x(1)

i

∼ U[−2, 2], x(2)

i

∼ U[−2, 2] [−2, 0] [−2, 2] 1 1 Case B: x(1)

i

∼ U[−2, 0], x(2)

i

∼ U[−2, 4] [−2, 0] [−2, 2] 2

slide-15
SLIDE 15

Introduction Gaussian Process (GP) Imprecise GP (IGP) Constant mean IGP Application Conclusions

Conclusions

◮ We have presented a general framework for modeling prior near

ignorance about f (x) based on the Gaussian process (IGP).

◮ We have derived an IGP model with prior constant mean free to vary

between −∞ and +∞:

⊲ with many observations the IGP and GP inferences almost coincide; ⊲ where there are no observations the imprecision of the IGP is very high, reflecting the actual lack of knowledge. ⊲ Applied to hypothesis testing, the IGP acknowledges when the available data are not informative enough to make a robust decision.

◮ Future research should focus on

⊲ the study of other prior near ignorance models based on different sets H of base mean functions; ⊲ the development of models allowing for a weaker specification of the kernel function.