Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel - - PowerPoint PPT Presentation

efficient multiple kernel learning
SMART_READER_LITE
LIVE PREVIEW

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel - - PowerPoint PPT Presentation

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the problem with existing formulation? Two new formulations for large scale kernel selection selection SIL formulation (Cutting Planes)


slide-1
SLIDE 1

Efficient Multiple Kernel Learning

Lei Tang

slide-2
SLIDE 2

Outline

  • What is Kernel Learning?
  • What’s the problem with existing formulation?
  • Two new formulations for large scale kernel

selection selection

– SIL formulation (Cutting Planes) – More efficient MKL (Steepest Decent)

slide-3
SLIDE 3
  • Data: {(xi,yi)}i=1...n

– x Rd

= feature vector

– y {-1,+1} = label

  • HEART
  • URINE
  • DNA
  • BLOOD
  • SCAN
  • HEART
  • URINE
  • DNA
  • BLOOD
  • SCAN

Linear algorithm: binary classification

  • Question: design a classification rule

y = f(x) such that, given a new x, this predicts y with minimal probability of error

slide-4
SLIDE 4
  • Find good hyperplane

(w,b) Rd+1 that classifies this and future data points as good as possible

Linear algorithm: binary classification

  • HEART
  • URINE
  • DNA
  • BLOOD
  • SCAN

Classification Rule:

  • HEART
  • URINE
  • DNA
  • BLOOD
  • SCAN
slide-5
SLIDE 5

Linear algorithm: binary classification

  • Intuition (Vapnik, 1965) if

linearly separable: – Separate the data – Place hyerplane “far” from

Place hyerplane “far” from the data: large margin

slide-6
SLIDE 6

Linear algorithm: binary classification

  • Intuition (Vapnik, 1965) if

linearly separable:

– Separate the data – Place hyerplane “far”

  • Maximal Margin Classifier

– Place hyerplane “far” from the data: large margin

slide-7
SLIDE 7

If not linearly separable:

– Allow some errors

Linear algorithm: binary classification

– Still, try to place hyerplane “far” from each class

slide-8
SLIDE 8

SVM: Primal & Dual

1 ) ( y

  • subject t

|| || 2 1 min

1 2 2 ,

∀ − ≥ + ⋅ +

= N i i b w

i b x w C w ξ ξ

b x w f + ⋅ =

Primal:

1 ) ( y

  • subject t

i

≥ ∀ − ≥ + ⋅

i i i

i b x w ξ ξ

C ,

  • subject t

2 1 max

, i

≥ ≥ = −

  • i

i i i j i j T i j i j i i

y x x y y α α α α α

α

Dual:

slide-9
SLIDE 9
  • Training = convex optimization problem (QP):

Linear algorithm: binary classification

,

  • subject t

2 1 max

, i

≥ = −

  • i

i i j i j T i j i j i i

y x x y y α α α α α

α

implicit embedding

,

  • subject t

≥ =

  • i

i i i y

α α , 2 1 ≥ = − α α α α α

α

to subject max

T y y T T

y KD D e

Xi Xj

K

slide-10
SLIDE 10
  • Training = convex optimization problem (QP):

Kernel algorithm: Support Vector Machine (SVM)

C ,

  • subject t

2 1 max ≥ ≥ = − α α α α α

α T y y T T

y KD D e

  • Classification rule: classify new data point x:

) ( ) ( ) (

1

b x x y sign b x w sign x f

T i i n i i T

SV

+ = + =

  • =

α

C ,

  • subject t

≥ ≥ = α α y

Kernel algorithm !

slide-11
SLIDE 11

Support Vector Machines (SVM)

  • Hand-writing recognition (e.g., USPS)
  • Computational biology (e.g., micro-array

data)

  • Text classification

Face detection

  • Face detection
  • Face expression recognition
  • Time series prediction (regression)
  • Drug discovery (novelty detection)
slide-12
SLIDE 12

Different Kernels

  • Various kinds of Kernel

– Linear kernel – Gaussian kernel – Gaussian kernel – Diffusion kernel – String Kernel – ……

( )

− =

2 2

2 exp , σ Y X Y X K

slide-13
SLIDE 13

Learning with Multiple Kernels

?

K

?

slide-14
SLIDE 14

Learning the optimal Kernel

Overview of SVM with Overview of SVM with single kernel : single kernel : G(K)

slide-15
SLIDE 15

Learning the optimal Kernel

Learn a linear mix Upper bound: the smaller, the better the guaranteed better the guaranteed performance G(K) G

slide-16
SLIDE 16

To be Continued To be Continued