Parameter-Free Convex Learning through Coin Betting Francesco - - PowerPoint PPT Presentation

parameter free convex learning through coin betting
SMART_READER_LITE
LIVE PREVIEW

Parameter-Free Convex Learning through Coin Betting Francesco - - PowerPoint PPT Presentation

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo Research, NY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Are You Still


slide-1
SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Parameter-Free Convex Learning through Coin Betting

Francesco Orabona and Dávid Pál Yahoo Research, NY

slide-2
SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Regularized empirical risk minimization: arg min

w∈Rd

λ 2 ∥w∥2 +

N

i=1

f(w, xi, yi) where f is convex in w.

slide-3
SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Regularized empirical risk minimization: arg min

w∈Rd

λ 2 ∥w∥2 +

N

i=1

f(w, xi, yi) where f is convex in w.

■ How do you choose the regularizer weight λ?

slide-4
SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Stochastic approximation: wt = wt−1 − ηt∇f(wt−1, xt, yt) where f is convex in w.

slide-5
SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Stochastic approximation: wt = wt−1 − ηt∇f(wt−1, xt, yt) where f is convex in w.

■ How do you choose the learning rate ηt?

slide-6
SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wasn’t machine learning about learning automatically from data?

■ There is a history of 7 years of parameter-free algorithms that do not have

learning rates nor regularizers to tune.

■ But they were very unintuitive and complex

slide-7
SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

One Coin to Rule Them All

is equivalent to

Online Coin betting algorithms give rise to optimal and parameter-free learning algorithms

slide-8
SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simple Algorithm & Good Results

■ Parameter-free ■ Extremely simple algorithm ■ Same complexity of SGD ■ Kernelizable

10

−1

10 10

1

10

2

10

3

4 6 8 10 12 14 16 Learning rate SGD Test loss cpusmall dataset, absolute loss SGD KT-based

See how at the poster!