Parameter-Free Convex Learning through Coin Betting Francesco - - PowerPoint PPT Presentation

▶

Sep 26, 2023 226 likes •323 views

Parameter-Free Convex Learning through Coin Betting Francesco Orabona and Dvid Pl Yahoo Research, NY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Are You Still

SLIDE 1

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Parameter-Free Convex Learning through Coin Betting

Francesco Orabona and Dávid Pál Yahoo Research, NY

SLIDE 2

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Regularized empirical risk minimization: arg min

w∈Rd

λ 2 ∥w∥2 +

∑

i=1

f(w, xi, yi) where f is convex in w.

SLIDE 3

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Regularized empirical risk minimization: arg min

w∈Rd

λ 2 ∥w∥2 +

∑

i=1

f(w, xi, yi) where f is convex in w.

■ How do you choose the regularizer weight λ?

SLIDE 4

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Stochastic approximation: wt = wt−1 − ηt∇f(wt−1, xt, yt) where f is convex in w.

SLIDE 5

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Are You Still Tuning/Learning/Adapting Hyperparameters?

Standard Machine Learning procedures Stochastic approximation: wt = wt−1 − ηt∇f(wt−1, xt, yt) where f is convex in w.

■ How do you choose the learning rate ηt?

SLIDE 6

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Wasn’t machine learning about learning automatically from data?

■ There is a history of 7 years of parameter-free algorithms that do not have

learning rates nor regularizers to tune.

■ But they were very unintuitive and complex

SLIDE 7

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

One Coin to Rule Them All

is equivalent to

Online Coin betting algorithms give rise to optimal and parameter-free learning algorithms

SLIDE 8

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Simple Algorithm & Good Results

■ Parameter-free ■ Extremely simple algorithm ■ Same complexity of SGD ■ Kernelizable

−1

10 10

4 6 8 10 12 14 16 Learning rate SGD Test loss cpusmall dataset, absolute loss SGD KT-based

Parameter-Free Convex Learning through Coin Betting

Are You Still Tuning/Learning/Adapting Hyperparameters?

Are You Still Tuning/Learning/Adapting Hyperparameters?

Are You Still Tuning/Learning/Adapting Hyperparameters?

Are You Still Tuning/Learning/Adapting Hyperparameters?

Wasn’t machine learning about learning automatically from data?

One Coin to Rule Them All

is equivalent to

Simple Algorithm & Good Results

See how at the poster!