Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , - - PowerPoint PPT Presentation

manifold adaptive dimension estimation
SMART_READER_LITE
LIVE PREVIEW

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , - - PowerPoint PPT Presentation

Manifold-Adaptive Dimension Estimation Amir massoud Farahmand (1) , Csaba Szepesvri (1) , Jean-Yves Audibert (2) (1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France High-Dimensional


slide-1
SLIDE 1

Amir massoud Farahmand(1), Csaba Szepesvári(1), Jean-Yves Audibert(2)

(1) Department of Computing Science, University of Alberta, Canada (2) CERTIS, Ecole Nationale des Ponts, France

Manifold-Adaptive Dimension Estimation

slide-2
SLIDE 2

High-Dimensional Data Everywhere

  • Vision
  • Sensor Fusion
  • Feature Expansion
  • Kernel
  • ...
slide-3
SLIDE 3

Curse of Dimensionality

10 10

2

10

4

10

6

10

8

10

10

10

!7

10

!6

10

!5

10

!4

10

!3

10

!2

10

!1

10 Number of Samples Mean Squared Error D = 1 D = 5 D = 100

slide-4
SLIDE 4

Practical Implications

  • Thou shall reduce the dimension of the

data before working with it!

  • Thou shall not add features unnecessarily!
  • Thou shall not accept projects with high-

dimensional data!

  • ... !

W a i t

!

slide-5
SLIDE 5

!1.5 !1 !0.5 0.5 1 1.5 !1.5 !1 !0.5 0.5 1 1.5 !0.5 0.5

Regularities of Data

  • Smoothness
  • Sparsity
  • Low noise at boundary

✓ Lower dimensional submanifold

  • LLE, IsoMap, Laplacian Eigenmap, Hessian Eigenmap, ...
  • Semi-supervised Learning, Reinforcement Learning, ...
slide-6
SLIDE 6

Goal

  • Manifold-adaptive machine learning methods
  • Convergence rate independent of the

dimension of the input space

slide-7
SLIDE 7

Many open questions!

Here:

dimension estimation

(:

slide-8
SLIDE 8

Why?

  • Needed in various learning methods
  • Not known a priori
slide-9
SLIDE 9

New?

  • Many existing methods [Pettis et al. (1979), Kegl (2002),

Costa & Hero (2004), Levina & Bickel (2005), Hein & Audibert (2005)]

  • No rigorous analysis
  • Asymptotic result [Levina & Bickel (2005)]
slide-10
SLIDE 10

Our Contribution

  • New algorithm
  • K-NN
  • Manifold-adaptive convergence rate
slide-11
SLIDE 11

General Idea

P(Xi ∈ B(x, r)) = η(x, r)rd

slide-12
SLIDE 12

P(Xi ∈ B(x, r)) = η(x, r)rd ln (P(Xi ∈ B(x, r))) = ln(η(x, r)) + d ln(r)

slide-13
SLIDE 13

P(Xi ∈ B(x, r)) = η(x, r)rd ln (P(Xi ∈ B(x, r))) = ln(η(x, r)) + d ln(r)

slide-14
SLIDE 14

P(Xi ∈ B(x, r)) = η(x, r)rd ln (P(Xi ∈ B(x, r))) = ln(η(x, r)) + d ln(r)

slide-15
SLIDE 15

P(Xi ∈ B(x, r)) = η(x, r)rd ln (P(Xi ∈ B(x, r))) = ln(η(x, r)) + d ln(r)

slide-16
SLIDE 16

ˆ d(x) = ln 2 ln(ˆ rk(x)/ˆ r⌈k/2⌉(x)) P(Xi ∈ B(x, r)) = η(x, r)rd ln (P(Xi ∈ B(x, r))) = ln(η(x, r)) + d ln(r) ln(k/n) ≈ ln(η0) + d ln(ˆ rk(x)) ln(k/(2n)) ≈ ln(η0) + d ln(ˆ r⌈k/2⌉(x))

slide-17
SLIDE 17

Finite Sample Convergence Rate

ˆ d(Xi) = ln 2 ln(ˆ r(k)(Xi)/ˆ r(⌈k/2⌉)(Xi))

Theorem: Under some regularity assumptions on η, provided that n

k > Ω(2d),

with probability at least 1 − δ, | ˆ d(Xi) − d| ≤ O

  • d

k n 1

d

+

  • ln(4/δ)

k

  • .
slide-18
SLIDE 18

Issues

Inefficient use of data

r ≪ 1 = ⇒ k ≪ n

ˆ d(Xi) = ln 2 ln(ˆ r(k)(Xi)/ˆ r(⌈k/2⌉)(Xi))

High variance of ˆ d(Xi)

slide-19
SLIDE 19

Aggregation

Theorem:

P

  • ˆ

dvote = d

e

c′n (cd k)2 ,

P

  • ˆ

davg = d

e

c′′n (Dcd k)2 .

  • Averaging
  • Voting

ˆ dvote = arg max

d′ n

  • i=1

I{[ ˆ d(Xi)] = d′} ˆ davg =

  • 1

n

n

  • i=1

ˆ d(Xi)

slide-20
SLIDE 20

Experiments

slide-21
SLIDE 21

Varying the Manifold Dimension

10

1

10

2

10

3

10

4

10

!1

10 Number of Samples Mean Absolute Dimension Estimation Error S4 S8

!1 !0.$ 0.$ 1 !1 !0.$ 0.$ 1 !1 !0.$ 0.$ 1
slide-22
SLIDE 22

Varying Embedding Space Dimension

10 100 1000 10000 20000 10

!1

10 Number of Samples Mean Absolute Dimension Estimation Errors X (D = 3) X’ (D = 6) X’’ (D = 12)

!1"# !1 !0"# 0"# 1 1"# !1"# !1 !0"# 0"# 1 1"# !0"# 0"#
slide-23
SLIDE 23

Other Datasets

Data set n=50 n=100 n=500 n=1000 n=5000 S1 98 (99) 100 (100) 100 (100) 100 (100) 100 (100) S3 75 (19) 95 (20) 100 (15) 100 (19) 100 (62) S5 33 (5) 50 (10) 100 (9) 98 (2) 100 (0) S7 18 (2) 17 (3) 57 (1) 54 (1) 100 (0) Sinusoid 92 (98) 100 (100) 100 (100) 100 (100) 100 (100) 10-M¨

  • bius

69 (47) 13 (74) 100 (98) 100 (99) 100 (100) Swiss roll 62 (71) 49 (91) 88 (96) 100 (100 100 (100)

slide-24
SLIDE 24

Conclusions and Future Work

  • New algorithm
  • Competitive results
  • Manifold-adaptive convergence rate
  • Other ML methods?
  • K-NN regression can!
  • Penalized least squares in the works
  • Dimension Reduction?
slide-25
SLIDE 25

Questions?

slide-26
SLIDE 26

Curse of Dimensionality

High-Dimensional Data Increase the complexity of the function space Higher variance with the same number of samples More samples for the same precision

slide-27
SLIDE 27

Lower Bound

Assume that mn is a regression function that estimate random variable Y based on X and Dn = {(X1, Y1), ..., (Xn, Yn)}, and m(X) = E[Y |X]. What is the best possible performance of mn in L2 sense, i.e. E{mn(X) − m(X)2}?

For the class of D(p,C) of (X, Y ) distributions, when X ∈ RD, we have the the following behavior: E{mn(X) − m(X)2} > O

  • n−

2p 2p+D

slide-28
SLIDE 28

Two sources of error:

  • Approximation Error: assuming fixed η(x, r)
  • Estimation Error: estimating P(X ∈ B(x, r)) with the empirical estimate

k/n. Both of them can be controlled by changing the size of neighborhood r (which is related to k/n).

slide-29
SLIDE 29 10 1 10 2 10 3 10 1 10 2 10 3 10 4 K Number of Samples (n) S4 − Averaging 10 1 10 2 10 3 10 1 10 2 10 3 10 4 K Number of Samples (n) S4 − Voting 10 1 10 2 10 3 10 1 10 2 10 3 10 4 K Number of Samples (n) S8 − Averaging 10 1 10 2 10 3 10 1 10 2 10 3 10 4 K Number of Samples (n) S8 − Voting

Effect of k and n

slide-30
SLIDE 30

Experiments

Noise Effect

0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 0.5 1 1.5 2 2.5 3 3.5 4 Noise Level (standard deviation) Mean Absolute Estimation Error 10!Mobius embedded in R3 10!Mobius embedded in R12 S4 embedded in R5 S4 embedded in R20

slide-31
SLIDE 31

Effect of Noise

!2 !1.5 !1 !0.5 0.5 1 1.5 2 !2 !1.5 !1 !0.5 0.5 1 1.5 2

slide-32
SLIDE 32

Effect of Noise

!2 !1.5 !1 !0.5 0.5 1 1.5 2 !2 !1.5 !1 !0.5 0.5 1 1.5 2

slide-33
SLIDE 33

Effect of Noise

!2 !1.5 !1 !0.5 0.5 1 1.5 2 !2 !1.5 !1 !0.5 0.5 1 1.5 2

slide-34
SLIDE 34

Effect of Noise

!2 !1.5 !1 !0.5 0.5 1 1.5 2 !2 !1.5 !1 !0.5 0.5 1 1.5 2

slide-35
SLIDE 35

Exponential Rate

10

1

10

2

10

3

10

!2

10

!1

10 Number of Samples Probability of Error Averaging Aggregation S4