SLIDE 1
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with Stochastic Gradients
Sasha Rakhlin
- A. Rakhlin, 9.520/6.860 2018
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and - - PowerPoint PPT Presentation
MIT 9.520/6.860, Fall 2018 Statistical Learning Theory and Applications Class 06: Learning with Stochastic Gradients Sasha Rakhlin A. Rakhlin, 9.520/6.860 2018 Why Optimization? Much (but not all) of Machine Learning: write down objective
T
TX)−12X T(Y − X0) = (X TX)−1X TY
T
T
T)−1 = A−1 − A−1uv TA−1
T
T
T
T
T
T(wt − w ∗)
T(wt − w ∗) = wt − w ∗2 − wt+1 − w ∗2 + η2 ∇f (wt)2 .
T(wt − w ∗) ≤ 1
T(wt − w ∗)
T(wt − w ∗) ≤ BG
T(w − wt)} + 1
Txi) = EI∼unif[1:n]ℓ(yI, w TxI)
T
T
TX)
T
T