Traitement automatique des langues : Fondements et applications - - PowerPoint PPT Presentation

traitement automatique des langues fondements et
SMART_READER_LITE
LIVE PREVIEW

Traitement automatique des langues : Fondements et applications - - PowerPoint PPT Presentation

Traitement automatique des langues : Fondements et applications Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 20162017 Introduction Machine learning for NLP Standard approach: linear model trained over


slide-1
SLIDE 1

Traitement automatique des langues : Fondements et applications

Cours 11 : Neural networks (2) Tim Van de Cruys & Philippe Muller 2016—2017

slide-2
SLIDE 2

Introduction

Machine learning for NLP

  • Standard approach: linear model trained over high-dimensional

but very sparse feature vectors

  • Recently: non-linear neural networks over dense input vectors
slide-3
SLIDE 3

Neural Network Architectures

Feed-forward neural networks

  • Best known, standard neural network approach
  • Fully connected layers
  • Can be used as drop-in replacement for typical NLP classifiers
slide-4
SLIDE 4

Convolutional neural network

Introduction

  • Type of feedforward neural network
  • Certain layers are not fully connected but locally connected

(convolutional layers, pooling layers)

  • same, local cues appear in different places in input (cfr. vision)
slide-5
SLIDE 5

Convolutional neural network

Intuition

slide-6
SLIDE 6

Convolutional neural network

Intuition

slide-7
SLIDE 7

Convolutional neural network

Intuition

slide-8
SLIDE 8

Convolutional neural network

Architecture

slide-9
SLIDE 9

Convolutional neural network

Encoding sentences

How to represent variable number of features, e.g. words in a sentence, document?

  • Continuous Bag of Words (CBOW): sum embedding vectors of

corresponding features

  • no ordering info (”not good quite bad” = ”not bad quite good”)
  • Convolutional layer
  • ’Sliding window’ approach that takes local structure into account
  • Combine individual windows to create vector of fixed size
slide-10
SLIDE 10

Continuous bag of words

Variable number of features

  • Feed-forward network assumes fixed dimensional input
  • How to represent variable number of features, e.g. words in a

sentence, document?

  • Continuous Bag of Words (CBOW): sum embedding vectors of

corresponding features

slide-11
SLIDE 11

Convolutional neural network

Convolutional layer for NLP

  • Goal: identify indicative local features (n-grams) in large

structure, combine them into fixed size vector

  • Convolution: apply filter to each window (linear transformation

+ non-linear activation)

  • Pooling: combine by taking maximum
slide-12
SLIDE 12

Convolutional neural networks

Architecture for NLP

slide-13
SLIDE 13

Neural Network Architectures

Recurrent (+ recursive) neural networks

  • Handle structured data of arbitrary sizes
  • Recurrent networks for sequences
  • Recursive networks for trees
slide-14
SLIDE 14

Recurrent neural network

Introduction

  • CBOW: no ordering, no structure
  • CNN: improvement, but mostly local patterns
  • RNN: represent arbitrarily sized structured input as fixed-size

vectors, paying attention to structured properties

slide-15
SLIDE 15

Recurrent neural network

Model

  • x1: input layer (current word)
  • a1: hidden layer of current timestep
  • a0: hidden layer of previous timestep
  • U, W and V: weights matrices
  • f(·): element-wise activation function (sigmoid)
  • g(·): softmax function to ensure probability distribution

a1 = f(Ux1 + Wa0) (1) y1 = g(Va1) (2)

slide-16
SLIDE 16

Recurrent neural network

Graphical representation

slide-17
SLIDE 17

Recurrent neural network

Training

  • Consider recurrent neural network as very deep neural network

with shared parameters across computation

  • Backpropagation through time
  • What kind of supervision?
  • Acceptor: based on final state
  • Transducer: an output for each input (e.g. language modeling)
  • Encoder-decoder: one RNN to encode sequence into vector

representation, another RNN to decode into sequence (e.g. machine translation)

slide-18
SLIDE 18

Recurrent neural network

Training: graphical representation

slide-19
SLIDE 19

Recurrent neural network

Multi-layer RNN

  • multiple layers of RNNs
  • input of next layer is output of RNN layer below it
  • Empirically shown to work better
slide-20
SLIDE 20

Recurrent neural network

Bi-directional RNN

  • Input sequence both forward and backward to different RNNs
  • Representation is concatenation of forward and backward state

(A & A’)

  • Represent both history and future
slide-21
SLIDE 21

Concrete RNN architectures

Simple RNN

slide-22
SLIDE 22

Concrete RNN architectures

LSTM

  • Long short term memory networks
  • In practice, simple RNNs only able to remember narrow context

(vanishing gradient)

  • LSTM: complex architecture able to capture long-term

dependencies

slide-23
SLIDE 23

Concrete RNN architectures

LSTM

slide-24
SLIDE 24

Concrete RNN architectures

LSTM

slide-25
SLIDE 25

Concrete RNN architectures

LSTM

slide-26
SLIDE 26

Concrete RNN architectures

LSTM

slide-27
SLIDE 27

Concrete RNN architectures

LSTM

slide-28
SLIDE 28

Concrete RNN architectures

LSTM

slide-29
SLIDE 29

Concrete RNN architectures

GRU

  • LSTM: effective, but complex, computationally expensive
  • GRU: cheaper alternative that works well in practice
slide-30
SLIDE 30

Concrete RNN architectures

GRU

  • reset gate (r): how much information from previous hidden

state needs to be included (reset with current information?)

  • upgate gate (z): controls updates to hidden state (how much

does hidden state need to be updated with current information?)

slide-31
SLIDE 31

Recursive neural networks

Introduction

  • Generalization of RNNs from sequences to (binary) trees
  • Linear transformation + non-linear activation function applied

recursively throughout a tree

  • Useful for parsing
slide-32
SLIDE 32

Application

Image to caption generation

slide-33
SLIDE 33

Application

Image to caption generation

slide-34
SLIDE 34

Application

Neural machine translation

slide-35
SLIDE 35

Application

Neural machine translation

slide-36
SLIDE 36

Application

Neural dialogue generation (chatbot)

slide-37
SLIDE 37

Software

  • Tensorflow
  • Python, C++
  • http://www.tensorflow.org
  • Theano
  • Python
  • http://deeplearning.net/software/theano/
  • Keras
  • Theano/tensorflow-based modular deep learning library
  • Lasagne
  • Theano-based deep learning library