Bayesian Regression with Input Noise for High Dimensional Data - - PowerPoint PPT Presentation

bayesian regression with input noise for high dimensional
SMART_READER_LITE
LIVE PREVIEW

Bayesian Regression with Input Noise for High Dimensional Data - - PowerPoint PPT Presentation

Bayesian Regression with Input Noise for High Dimensional Data Jo-Anne Ting 1 , Aaron DSouza 2 , Stefan Schaal 1 1 University of Southern California, 2 Google, Inc. June 26, 2006 Agenda Relevance of high dimensional regression with input


slide-1
SLIDE 1

Bayesian Regression with Input Noise for High Dimensional Data

Jo-Anne Ting1, Aaron D’Souza2, Stefan Schaal1

1University of Southern California, 2Google, Inc.

June 26, 2006

slide-2
SLIDE 2

ICML 2006 2

Agenda

Introduction to Bayesian parameter estimation

– EM-based Joint Factor Analysis – Automatic feature detection – Making predictions with noiseless query points

Evaluation on a 100-dimensional synthetic dataset Application to Rigid Body Dynamics parameter identification

– What are RBD parameters? – Formulate it as a linear regression problem – How to ensure physically consistent parameters?

Relevance of high dimensional regression with input noise

slide-3
SLIDE 3

ICML 2006 3

Traditional regression techniques ignore noise in input data, for example, for linear regression*:

We are interested in parameter estimation…

* Solutions to linear problems can be easily extended to nonlinear systems via locally weighted methods (e.g. Atkeson et al. 1997)

Unbiased regression solution Biased regression solution Noiseless inputs Noisy inputs

slide-4
SLIDE 4

ICML 2006 4

For physical systems such as humanoid robots:

– Noisy input data, large number of input dimensions -- of which not all is relevant

We want to control these robots using model-based controllers:

…and Prediction With Noiseless Query Points

Training Phase Testing Phase t is the desired (noiseless) target

slide-5
SLIDE 5

ICML 2006 5

Current Methods Are Unsuitable

Suitable for high dimensional data Unsuitable for high dimensional data Accounts for input noise Ignores input noise ???

  • LASSO & Stepwise

regression (Tibshirani 1996, Draper & Smith 1981)

  • Total LS/Orthogonal LS

(e.g. Golub & VanLoan 1998, Hollerbach & Wampler 1996)

  • Joint Factor Analysis (JFA)

(Massey 1965): computationally prohibitive in high dimensions

  • OLS with robust

matrix inversion (e.g. Belsley et al. 1980): O(d2) at best

slide-6
SLIDE 6

ICML 2006 6

Relevance of high dimensional regression with input noise

– EM-based Joint Factor Analysis – Automatic feature detection – Making predictions with noiseless query points

Evaluation on a 100-dimensional synthetic dataset Application to Rigid Body Dynamics parameter identification

– What are RBD parameters? – Formulate it as a linear regression problem – How to ensure physically consistent parameters?

Agenda

Introduction to Bayesian parameter estimation

slide-7
SLIDE 7

ICML 2006 7

Computationally Prohibitive?

yi = wzmtim + y

m=1 d

  • xi =

wxmtim + x

m=1 d

  • EM-based JFA: All EM update equations are O(d)

yi = zim + y

m=1 d

  • zim =

wzmtim + m

m=1 d

  • Not Any More!

Introduce hidden variables, zim (D’Souza et al. 2004)

slide-8
SLIDE 8

ICML 2006 8

…but Remember the Important Parameters

xi x = wxmtim (2)

m=1 d

  • yi y =

wzmtim

m=1 d

  • (1)

yi y

xi x = wzmtim

m=1 d

  • wxmtim

m=1 d

  • r...

yi = wzm wxm xi x

( )

m=1 d

  • + y

Divide (1) by (2) to get: This is the solution to the regression problem y=bTx -- which is what we need for prediction

JFA

slide-9
SLIDE 9

ICML 2006 9

Coupled regularization of regression parameters Still O(d) per EM iteration

Next, We Add Automatic Feature Detection

Priors:

p m

( ) = Gamma am,bm ( )

p wzm m

( ) = Normal 0, 1m

  • p wxm m

( ) = Normal 0, 1m

slide-10
SLIDE 10

ICML 2006 10

For a noisy test input xq and its unknown output yq, For a noiseless test input tq and its unknown output yq,

Making Predictions with Noiseless Query Points

p yq xq

( ) =

p yq,Z,T xq

( )d

  • ZdT

yq xq = ˆ bnoise

T

xq

ˆ bnoise = y1T B1 y 1T B1z

1 Wz A1 Wx T x 1

Given: We can infer:

ˆ btrue = y1T C1 y 1T C11z

1 Wz

Wx

1

lim

x0

ˆ bnoise

...where C = 11T y + z

1

  • ˆ

bnoise = ? ˆ btrue = ?

slide-11
SLIDE 11

ICML 2006 11

Relevance of high dimensional regression with input noise Introduction to Bayesian parameter estimation

– EM-based Joint Factor Analysis – Automatic feature detection – Making predictions using noiseless query points

Application to Rigid Body Dynamics parameter identification

– What are RBD parameters? – Formulate it as a linear regression problem – How to ensure physically consistent parameters?

Agenda

Evaluation on a 100-dimensional synthetic dataset

slide-12
SLIDE 12

ICML 2006 12

Construction of 100-dimensional dataset

Constructed data with

– 10 relevant dimensions – 90 redundant and/or irrelevant dimensions

Explored different combinations of redundant (r) and irrelevant (u) dimensions

– r = 90, u = 0: 90 redundant dimensions – r = 0, u = 90: 90 irrelevant dimensions – r = 30, u = 60 – r = 60, u = 30

Tested on strongly noisy (SNR=2) and less noisy (SNR=5) data Predicted outputs with noiseless test inputs

slide-13
SLIDE 13

ICML 2006 13

10-70% Improvement for Strongly Noisy Data (SNR=2)

Bayesian parameter estimation generalizes 10-70% better for strongly noisy data

slide-14
SLIDE 14

ICML 2006 14

…and 7-50% better for less noisy data

7-50% Improvement on Less Noisy Data (SNR=5)

slide-15
SLIDE 15

ICML 2006 15

Relevance of high dimensional regression with input noise Introduction to Bayesian parameter estimation

– EM-based Joint Factor Analysis – Automatic feature detection – Making predictions with noiseless query points

Evaluation on a 100-dimensional synthetic dataset

– What are RBD parameters? – Formulate it as a linear regression problem – How to ensure physically consistent parameters?

Agenda

Application to Rigid Body Dynamics parameter identification

slide-16
SLIDE 16

ICML 2006 16

Using Newton-Euler equations for a rigid body, we get the RBD equation (where q are joint angles): M, C and G are functions of mass, centre of mass and moments

  • f inertia -- all which are unknown; q’s and τ are known

We can re-express the above linearly as:

What are Rigid Body Dynamics (RBD) Parameters?

Centripetal & Coriolis terms

= M q ( )&& q+C & q,q

( )+G q

( )

Mass matrix Vector of gravity terms

=Y q, & q,&& q

( )

slide-17
SLIDE 17

ICML 2006 17

RBD parameters:

– Must satisfy physical constraints (positive mass, positive definite inertia matrix) – But.. not all parameters are identifiable due to insufficiently rich data & constraints of the physical system (i.e. data is ill-conditioned)

Formulate RBD Parameter Identification As A Linear Regression Problem

(e.g. An et al. 1988)

=Y q, & q,&& q

( )

=[m, mcx, mcy, mcz, I11, I12, I13, I22, I23, I33]T

where the RBD parameters are…

slide-18
SLIDE 18

ICML 2006 18

To enforce physical constraints on , introduce virtual parameters : Consequently, for real world systems, we have a noisy, high dimensional, ill-conditioned linear regression problem

Specifically, a High Dimensional Noisy Linear Regression Problem

1 = ˆ 1

2, 2 = ˆ

2 ˆ 1

2, 3 = ˆ

3 ˆ 1

2

4 = ˆ 4 ˆ 1

2, 5 = ˆ

5

2 + ˆ

4

2 + ˆ

3

2

( ) ˆ

1

2

6 = ˆ 5 ˆ 6 ˆ 2 ˆ 3 ˆ 1

2, 7 = ˆ

5 ˆ 7 ˆ 2 ˆ 4 ˆ 1

2

8 = ˆ 6

2 + ˆ

8

2 + ˆ

2

2 + ˆ

4

2

( ) ˆ

1

2

9 = ˆ 6 ˆ 7 + ˆ 8 ˆ 9 ˆ 3 ˆ 4 ˆ 1

2

10 = ˆ 7

2 + ˆ

9

2 + ˆ

10

2 + ˆ

2

2 + ˆ

3

2

( ) ˆ

1

2, 11 = ˆ

11

2

  • ˆ
  • 11 features per DOF
  • For a system with s DOF,

there are 11s features

slide-19
SLIDE 19

ICML 2006 19

Find physically consistent robust parameter estimates that are as close to as possible Do a constraint optimization step to find : Finally, ensure redundant/irrelevant dimensions in remain so in

How to Ensure Our Robust Parameter Estimates are Physically Consistent?

ˆ

  • ptimal = argmin

ˆ w ˆ

btrue f ˆ

  • ( )
  • ˆ
  • ptimal

where wm = 0 if dimension m is not relevant and wm = 1 otherwise

ˆ btrue ˆ btrue

  • ptimal
slide-20
SLIDE 20

ICML 2006 20

10-20% Improvement on Robotic Oculomotor Vision Head

FAILURE FAILURE FAILURE Stepwise regression 0.4274 0.2517 0.0308 LASSO regression 0.3292 0.2189 0.0243 Bayesian de-noising 0.3969 0.2465 0.0291 Ridge regression Feedback (Nm) Velocity(rad/s) Position(rad) Algorithm Root Mean Squared Errors

7 DOFs: 3 in neck, 2 in each eye 11 features per DOF; total of 77 features RBD parameter estimates from ALL algorithms satisfy physical constraints Bayesian de-noising does ~10-20% better

slide-21
SLIDE 21

ICML 2006 21

Root Mean Squared Errors

10 DOFs: 3 in shoulder, 1 in elbow, 3 in wrist, 3 in fingers 11 features per DOF; total of 110 features Bayesian de-noising does ~5-17% better

5-17% Improvement on Robotic Anthropomorphic Arm

FAILURE FAILURE FAILURE Stepwise regression FAILURE FAILURE FAILURE LASSO regression 0.5297 0.0930 0.0201 Bayesian de-noising 0.5839 0.1119 0.0210 Ridge regression Feedback (Nm) Velocity(rad/s) Position(rad) Algorithm

slide-22
SLIDE 22

ICML 2006 22

Summary

Bayesian treatment of Joint Factor Analysis that performs parameter estimation with noisy input data O(d) complexity per EM iteration Automatic feature detection through joint regularization of both regression branches Significant improvement on synthetic data and real-world systems