Hyperparameter Optimization using Hyperopt Yassine Alouini - Paul - - PowerPoint PPT Presentation

hyperparameter optimization using hyperopt
SMART_READER_LITE
LIVE PREVIEW

Hyperparameter Optimization using Hyperopt Yassine Alouini - Paul - - PowerPoint PPT Presentation

Hyperparameter Optimization using Hyperopt Yassine Alouini - Paul Coursaux 03/11/2016 @qucit @YassineAlouini About us Yassine Data Scientist @ Qucit Centrale Paris & Cambridge Quoras Top Writer 2016 Paul Data


slide-1
SLIDE 1

Hyperparameter Optimization using Hyperopt

Yassine Alouini - Paul Coursaux 03/11/2016

@YassineAlouini

@qucit

slide-2
SLIDE 2

About us

Yassine

  • Data Scientist @ Qucit
  • Centrale Paris & Cambridge
  • Quora’s Top Writer 2016

Paul

  • Data Scientist @ Qucit
  • Centrale Paris
  • Market finance in London
  • Horse riding
slide-3
SLIDE 3

Outline

1. Hyperparameters in Machine Learning 2. How to Choose Hyperparameters ? 3. Tree-structured Parzen Estimation Approach 4. Live-coding Example

slide-4
SLIDE 4
  • 1. Hyperparameters in

Machine Learning

slide-5
SLIDE 5

What are hyperparameters ?

Parameters:

Rent = a1× surface + a2× distance to city center + ...

Hyperparameters:

RMSELASSO = RMSE + α × (|a1| + …)

slide-6
SLIDE 6

The impact of hyperparameters

slide-7
SLIDE 7
  • 2. How to choose

hyperparameters ?

slide-8
SLIDE 8

Cross validation

Enable to choose the hyperparameter(s) with the best generalization capabilities making an efficient use of the data

Figure credit: http://vinhkhuc.github.io/2015/03/01/how-many-folds-for-cross-validation.html

slide-9
SLIDE 9

How to choose the points to cross-validate?

Grid search Random search

Credits: https://medium.com/rants-on-machine-learning/smarter-parameter-sweeps-or-why-grid-search-is-plain-stupid-c17d97a0e881#.db7060phq https://districtdatalabs.silvrback.com/visual-diagnostics-for-more-informed-machine-learning-part-3

slide-10
SLIDE 10
  • 3. Tree-structured Parzen

Estimation Approach

slide-11
SLIDE 11

Sequential Model-based Global Optimization

slide-12
SLIDE 12

The Expected Improvement

EIε*(α) = ∫max(ε* - ε, 0)pM(ε|α)dε

slide-13
SLIDE 13

How to Optimize the EI ? (1)

slide-14
SLIDE 14

How to Optimize the EI ? (2)

  • Lasso model on the Boston

Housing Dataset

  • Distribution of the suggested

αs

slide-15
SLIDE 15
  • 4. Live-coding Example
slide-16
SLIDE 16

Description of the dataset

  • IMDb dataset
  • Dataset publicly available

(from Kaggle)

Credits: screenshot, 24/10/2016, https://www.kaggle.com/deepmatrix/imdb-5000-movie-dataset

slide-17
SLIDE 17

Movies having the best score

Credits: http://www.impawards.com/1974/towering_inferno.html, http://www.impawards.com/1994/shawshank_redemption_ver1.html, http://ruthusher.com/wordpress/wp-includes/js/godfather-poster

slide-18
SLIDE 18

Movies having the worst score

Credits: https://en.wikipedia.org/wiki/Justin_Bieber:_Never_Say_Never, http://www.movieinsider.com/m766/foodfight, http://www.moviepostershop.com/superbabies-baby-geniuses-2-movie-poster-2004

slide-19
SLIDE 19

Task

  • Predict the IMDB movie score
  • Gradient Boosting algorithm

(XGBoost package)

  • 3 hyperparameters optimization

strategies ○ A naive grid search ○ An expert grid search (*) ○ The TPE algorithm (hyperopt package)

(*) http://blog.kaggle.com/2016/07/21/approaching-almost-any-machine-learning-problem-abhishek-thakur/

slide-20
SLIDE 20

Features description

  • 28 features:

○ 14 movie-related ○ 4 review-related ○ 10 cast-related

  • 16 kept:

○ 11 numerical ○ 5 categorical

  • 12 removed
slide-21
SLIDE 21

Live demo

Our code is available here: https://github.com/yassineAlouini/ hyperparameters-optimization-talk

slide-22
SLIDE 22

Conclusion

  • Outperforms the standard methods in most cases
  • Search space matters
  • Other Python libraries: Spearmint, BayesOpt, Scikit-Optimize
  • Distributed optimization (using MongoDB)
slide-23
SLIDE 23

Thanks for your attention. Question time Qucit is hiring!

slide-24
SLIDE 24

References

  • https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
  • https://conference.scipy.org/proceedings/scipy2013/pdfs/bergstra_hyperopt.pdf
  • https://github.com/scikit-optimize
  • http://jaberg.github.io/hyperopt/
  • https://github.com/JasperSnoek/spearmint
  • https://github.com/fmfn/BayesianOptimization
  • http://xgboost.readthedocs.io/en/latest/
  • http://www.cs.ubc.ca/~hutter/papers/13-BayesOpt_EmpiricalFoundation.pdf