Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel - - PowerPoint PPT Presentation

▶

Jan 22, 2024 174 likes •347 views

Efficient Multiple Kernel Learning Lei Tang Outline What is Kernel Learning? Whats the problem with existing formulation? Two new formulations for large scale kernel selection selection SIL formulation (Cutting Planes)

SLIDE 1

Efficient Multiple Kernel Learning

Lei Tang

SLIDE 2

Outline

What is Kernel Learning?
What’s the problem with existing formulation?
Two new formulations for large scale kernel

selection selection

– SIL formulation (Cutting Planes) – More efficient MKL (Steepest Decent)

SLIDE 3

Data: {(xi,yi)}i=1...n

– x Rd

= feature vector

– y {-1,+1} = label

HEART
URINE
DNA
BLOOD
SCAN
HEART
URINE
DNA
BLOOD
SCAN

Linear algorithm: binary classification

Question: design a classification rule

y = f(x) such that, given a new x, this predicts y with minimal probability of error

SLIDE 4

Find good hyperplane

(w,b) Rd+1 that classifies this and future data points as good as possible

Linear algorithm: binary classification

HEART
URINE
DNA
BLOOD
SCAN

Classification Rule:

HEART
URINE
DNA
BLOOD
SCAN

SLIDE 5

Linear algorithm: binary classification

Intuition (Vapnik, 1965) if

linearly separable: – Separate the data – Place hyerplane “far” from

Place hyerplane “far” from the data: large margin

SLIDE 6

Linear algorithm: binary classification

Intuition (Vapnik, 1965) if

linearly separable:

– Separate the data – Place hyerplane “far”

Maximal Margin Classifier

– Place hyerplane “far” from the data: large margin

SLIDE 7

If not linearly separable:

– Allow some errors

Linear algorithm: binary classification

– Still, try to place hyerplane “far” from each class

SLIDE 8

SVM: Primal & Dual

1 ) ( y

subject t

|| || 2 1 min

1 2 2 ,

∀ − ≥ + ⋅ +

= N i i b w

i b x w C w ξ ξ

b x w f + ⋅ =

Primal:

1 ) ( y

subject t

≥ ∀ − ≥ + ⋅

i i i

i b x w ξ ξ

C ,

subject t

2 1 max

, i

≥ ≥ = −

i i i j i j T i j i j i i

y x x y y α α α α α

Dual:

SLIDE 9

Training = convex optimization problem (QP):

Linear algorithm: binary classification

subject t

2 1 max

, i

≥ = −

i i j i j T i j i j i i

y x x y y α α α α α

implicit embedding

subject t

≥ =

i i i y

α α , 2 1 ≥ = − α α α α α

to subject max

T y y T T

y KD D e

Xi Xj

K

SLIDE 10

Training = convex optimization problem (QP):

Kernel algorithm: Support Vector Machine (SVM)

C ,

subject t

2 1 max ≥ ≥ = − α α α α α

α T y y T T

y KD D e

Classification rule: classify new data point x:

) ( ) ( ) (

b x x y sign b x w sign x f

T i i n i i T

+ = + =

α

C ,

subject t

≥ ≥ = α α y

Kernel algorithm !

SLIDE 11

Support Vector Machines (SVM)

Hand-writing recognition (e.g., USPS)
Computational biology (e.g., micro-array

data)

Text classification

Face detection

Face detection
Face expression recognition
Time series prediction (regression)
Drug discovery (novelty detection)

SLIDE 12

Different Kernels

Various kinds of Kernel

– Linear kernel – Gaussian kernel – Gaussian kernel – Diffusion kernel – String Kernel – ……

( )

− =

2 2

2 exp , σ Y X Y X K

SLIDE 13

Learning with Multiple Kernels

?

K

?

SLIDE 14

Learning the optimal Kernel

Overview of SVM with Overview of SVM with single kernel : single kernel : G(K)

SLIDE 15

Learning the optimal Kernel

Learn a linear mix Upper bound: the smaller, the better the guaranteed better the guaranteed performance G(K) G

SLIDE 16

Efficient Multiple Kernel Learning

Outline

selection selection

– SIL formulation (Cutting Planes) – More efficient MKL (Steepest Decent)

Linear algorithm: binary classification

Linear algorithm: binary classification

Linear algorithm: binary classification

Linear algorithm: binary classification

– Separate the data – Place hyerplane “far”

– Place hyerplane “far” from the data: large margin

– Allow some errors

Linear algorithm: binary classification

– Still, try to place hyerplane “far” from each class

SVM: Primal & Dual

1 ) ( y

|| || 2 1 min

∀ − ≥ + ⋅ +

i b x w C w ξ ξ

b x w f + ⋅ =

1 ) ( y

≥ ∀ − ≥ + ⋅

i b x w ξ ξ

C ,

2 1 max

≥ ≥ = −

y x x y y α α α α α

Linear algorithm: binary classification

K

Kernel algorithm: Support Vector Machine (SVM)

) ( ) ( ) (

b x x y sign b x w sign x f

+ = + =

α

Support Vector Machines (SVM)

data)

Face detection

Different Kernels

– Linear kernel – Gaussian kernel – Gaussian kernel – Diffusion kernel – String Kernel – ……

( )

− =

2 exp , σ Y X Y X K

Learning with Multiple Kernels

?

K

?

Learning the optimal Kernel

Learning the optimal Kernel

To be Continued To be Continued