Traitement automatique des langues : Fondements et applications - - PowerPoint PPT Presentation

▶

Jan 09, 2024 122 likes •504 views

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 20162017 Introduction Machine learning for NLP Standard approach: linear model trained over

SLIDE 1

Traitement automatique des langues : Fondements et applications

Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 2016—2017

SLIDE 2

Introduction

Machine learning for NLP

Standard approach: linear model trained over high-dimensional

but very sparse feature vectors

Recently: non-linear neural networks over dense input vectors

SLIDE 3

Neural Network Architectures

Feed-forward neural networks

Best known, standard neural network approach
Fully connected layers
Can be used as drop-in replacement for typical NLP classifiers

SLIDE 4

Convolutional neural network

Introduction

Type of feedforward neural network
Certain layers are not fully connected but locally connected

(convolutional layers, pooling layers)

same, local cues appear in different places in input (cfr. vision)

SLIDE 5

Convolutional neural network

Intuition

SLIDE 6

Convolutional neural network

Intuition

SLIDE 7

Convolutional neural network

Intuition

SLIDE 8

Convolutional neural network

Architecture

SLIDE 9

Convolutional neural network

Encoding sentences

How to represent variable number of features, e.g. words in a sentence, document?

Continuous Bag of Words (CBOW): sum embedding vectors of

corresponding features

no ordering info (”not good quite bad” = ”not bad quite good”)
Convolutional layer
’Sliding window’ approach that takes local structure into account
Combine individual windows to create vector of fixed size

SLIDE 10

Continuous bag of words

Variable number of features

Feed-forward network assumes fixed dimensional input
How to represent variable number of features, e.g. words in a

sentence, document?

Continuous Bag of Words (CBOW): sum embedding vectors of

corresponding features

SLIDE 11

Convolutional neural network

Convolutional layer for NLP

Goal: identify indicative local features (n-grams) in large

structure, combine them into fixed size vector

Convolution: apply filter to each window (linear transformation

+ non-linear activation)

Pooling: combine by taking maximum

SLIDE 12

Convolutional neural networks

Architecture for NLP

SLIDE 13

Neural Network Architectures

Recurrent (+ recursive) neural networks

Handle structured data of arbitrary sizes
Recurrent networks for sequences
Recursive networks for trees

SLIDE 14

Recurrent neural network

Introduction

CBOW: no ordering, no structure
CNN: improvement, but mostly local patterns
RNN: represent arbitrarily sized structured input as fixed-size

vectors, paying attention to structured properties

SLIDE 15

Recurrent neural network

Model

x1: input layer (current word)
a1: hidden layer of current timestep
a0: hidden layer of previous timestep
U, W and V: weights matrices
f(·): element-wise activation function (sigmoid)
g(·): softmax function to ensure probability distribution

a1 = f(Ux1 + Wa0) (1) y1 = g(Va1) (2)

SLIDE 16

Recurrent neural network

Graphical representation

SLIDE 17

Recurrent neural network

Training

Consider recurrent neural network as very deep neural network

with shared parameters across computation

Backpropagation through time
What kind of supervision?
Acceptor: based on final state
Transducer: an output for each input (e.g. language modeling)
Encoder-decoder: one RNN to encode sequence into vector

representation, another RNN to decode into sequence (e.g. machine translation)

SLIDE 18

Recurrent neural network

Training: graphical representation

SLIDE 19

Recurrent neural network

Multi-layer RNN

multiple layers of RNNs
input of next layer is output of RNN layer below it
Empirically shown to work better

SLIDE 20

Recurrent neural network

Bi-directional RNN

Input sequence both forward and backward to different RNNs
Representation is concatenation of forward and backward state

(A & A’)

Represent both history and future

SLIDE 21

Concrete RNN architectures

Simple RNN

SLIDE 22

Concrete RNN architectures

LSTM

Long short term memory networks
In practice, simple RNNs only able to remember narrow context

(vanishing gradient)

LSTM: complex architecture able to capture long-term

dependencies

SLIDE 23

Concrete RNN architectures

LSTM

SLIDE 24

Concrete RNN architectures

LSTM

SLIDE 25

Concrete RNN architectures

LSTM

SLIDE 26

Concrete RNN architectures

LSTM

SLIDE 27

Concrete RNN architectures

LSTM

SLIDE 28

Concrete RNN architectures

LSTM

SLIDE 29

Concrete RNN architectures

GRU

LSTM: effective, but complex, computationally expensive
GRU: cheaper alternative that works well in practice

SLIDE 30

Concrete RNN architectures

GRU

reset gate (r): how much information from previous hidden

state needs to be included (reset with current information?)

upgate gate (z): controls updates to hidden state (how much

does hidden state need to be updated with current information?)

SLIDE 31

Recursive neural networks

Introduction

Generalization of RNNs from sequences to (binary) trees
Linear transformation + non-linear activation function applied

recursively throughout a tree

Useful for parsing

SLIDE 32

Application

Image to caption generation

SLIDE 33

Application

Image to caption generation

SLIDE 34

Application

Neural machine translation

SLIDE 35

Application

Neural machine translation

SLIDE 36

Application

Neural dialogue generation (chatbot)

SLIDE 37

Software

Tensorflow
Python, C++
http://www.tensorflow.org
Theano
Python
http://deeplearning.net/software/theano/
Keras
Theano/tensorflow-based modular deep learning library
Lasagne
Theano-based deep learning library