LSTM: A Search Space Odyssey Klaus Greff, Rupesh K. Srivastava, Jan - - PowerPoint PPT Presentation

▶

Mar 11, 2024 646 likes •762 views

LSTM: A Search Space Odyssey Klaus Greff, Rupesh K. Srivastava, Jan Koutn k, Bas R. Steunebrink, Ju rgen Schmidhuber, 2015. Presenter: Yijun Tian, Zhenyu Liu Abstract In this paper, the authors analyze performance of LSTM and its

SLIDE 1

LSTM: A Search Space Odyssey

Presenter: Yijun Tian, Zhenyu Liu

Klaus Greff, Rupesh K. Srivastava, Jan Koutn ́ık, Bas R. Steunebrink, Ju ̈rgen Schmidhuber, 2015.

SLIDE 2

NYU Courant

Abstract

In this paper, the authors analyze performance of

LSTM and its eight variants on three representative tasks: speech recognition, handwriting recognition and polyphonic music modeling.

Hyperparameters for each variant were optimized

individually using random search and importance was gauged using fANOVA (a tool for assessing hyperparameters importance).

SLIDE 3

NYU Courant

Dataset

TIMIT: the TIMIT Speech

corpus

IAM Online: the IAM Online

Handwriting Database

JSB Chorales: a collection
f 382 fourpart harmonized

chorales by J. S. Bach

Speech Recognition Handwriting Recognition Polyphonic Music Modeling

SLIDE 4

NYU Courant

Vanilla LSTM

N: number of LSTM blocks. M: input size

SLIDE 5

NYU Courant

LSTM Variants

SLIDE 6

NYU Courant

Experiments

Performed 27 random searches (one for each combination
f the nine variants and three datasets).
Each random search encompasses 200 trials of randomly

sampling the following hyperparameters:

Number of LSTM blocks per hidden layer.
Learning rate, momentum, standard deviation of Gaussian

input noise

SLIDE 7

NYU Courant

Results

NOAF and NFG performs significantly worse

SLIDE 8

NYU Courant

Results

Learning rate and network size are important hyperparameters

SLIDE 9

NYU Courant

Conclusions and Insights

None of the variants improve upon the standard LSTM

architecture significantly.

Coupling the input and forget gates (CIFG) or removing

peephole connections (NP) are attractive models.

Forget gate and output activation function are the most

critical components of the LSTM block.

Learning rate and network size are important.
No apparent structure of hyperparameter interaction.

SLIDE 10

LSTM: A Search Space Odyssey Klaus Greff, Rupesh K. Srivastava, Jan - - PowerPoint PPT Presentation

LSTM: A Search Space Odyssey

Presenter: Yijun Tian, Zhenyu Liu

Abstract

LSTM and its eight variants on three representative tasks: speech recognition, handwriting recognition and polyphonic music modeling.

individually using random search and importance was gauged using fANOVA (a tool for assessing hyperparameters importance).

Dataset

corpus

Handwriting Database

chorales by J. S. Bach

Vanilla LSTM

LSTM Variants

Experiments

sampling the following hyperparameters:

input noise

Conclusions and Insights

architecture significantly.

peephole connections (NP) are attractive models.

critical components of the LSTM block.

Thank you! Questions?

Take home message: The most commonly used LSTM architecture (vanilla LSTM) performs reasonably well on various datasets.