On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - - PowerPoint PPT Presentation

on the local minima of the empirical risk
SMART_READER_LITE
LIVE PREVIEW

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* - - PowerPoint PPT Presentation

On the Local Minima of the Empirical Risk Chi Jin * 1 , Lydia T. Liu* 1 , Rong Ge 2 , Michael I. Jordan 1 1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk Overview Nonconvex


slide-1
SLIDE 1

On the Local Minima of the Empirical Risk

Chi Jin*1, Lydia T. Liu*1, Rong Ge2, Michael I. Jordan1

1EECS, University of California, Berkeley. 2Duke University. 1 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-2
SLIDE 2

Overview

Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points: local max, saddle points, local min.

2 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-3
SLIDE 3

Overview

Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points: local max, saddle points, local min. ◮ Perturbed GD [Jin et al. 2017] efficiently escapes local max and saddle points.

2 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-4
SLIDE 4

Overview

Nonconvex Optimization. ◮ Gradient Descent (GD) → stationary points: local max, saddle points, local min. ◮ Perturbed GD [Jin et al. 2017] efficiently escapes local max and saddle points. ◮ How to deal with spurious local min?

2 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-5
SLIDE 5

Local Minima

In general, finding global minima is NP-hard.

3 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-6
SLIDE 6

Local Minima

In general, finding global minima is NP-hard.

f

Avoiding “shallow” local minima Goal: finds approximate local minima of smooth nonconvex function F, given only access to an errorneous version f where supx |F(x) − f (x)| ≤ ν

3 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-7
SLIDE 7

Application

Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ Rn. R(θ) = Ez∼D[L(θ; z)], ˆ Rn(θ) = 1 n

n

  • i=1

L(θ; zi).

4 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-8
SLIDE 8

Application

Statistical Learning. Minimize population risk R while only have access to emprical risk ˆ Rn. R(θ) = Ez∼D[L(θ; z)], ˆ Rn(θ) = 1 n

n

  • i=1

L(θ; zi). Unifrom convergence guarantees supθ |R(θ) − ˆ Rn(θ)| ≤ O(1/√n).

4 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-9
SLIDE 9

Results

f

Goal: find ǫ-approximate local minima of F in polynomial time. Central Questions:

  • 1. What algorithm can achieve this?
  • 2. How much error ν can be tolerated?

5 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-10
SLIDE 10

Results

f

Goal: find ǫ-approximate local minima of F in polynomial time. Central Questions:

  • 1. What algorithm can achieve this?
  • 2. How much error ν can be tolerated?

Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ2/d8.

5 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-11
SLIDE 11

Results

f

Goal: find ǫ-approximate local minima of F in polynomial time. Central Questions:

  • 1. What algorithm can achieve this?
  • 2. How much error ν can be tolerated?

Zhang et al. [2017]: Stochastic Gradient Langevin Dynamics (SGLD) if ν ≤ ǫ2/d8. This Work: Perturbed SGD on a “smoothed” version of f if ν ≤ ǫ1.5/d.

5 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-12
SLIDE 12

Almost Sharp Guarantees

Is there better polynomial time algorithms that tolerate larger error?

6 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-13
SLIDE 13

Almost Sharp Guarantees

Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d.

6 / 6 Chi Jin On the Local Minima of the Empirical Risk

slide-14
SLIDE 14

Almost Sharp Guarantees

Is there better polynomial time algorithms that tolerate larger error? No! Complete characterization of error ν vs accuracy ǫ and dimension d. Poster: Wed 5-7 PM, #43. Thanks!

6 / 6 Chi Jin On the Local Minima of the Empirical Risk