Basis of Neural Networks School of Data Science, Fudan - - PowerPoint PPT Presentation

basis of neural networks
SMART_READER_LITE
LIVE PREVIEW

Basis of Neural Networks School of Data Science, Fudan - - PowerPoint PPT Presentation

DATA130006 Text Management and Analysis Basis of Neural Networks School of Data Science, Fudan University Dec. 20 th , 2017 General Neural Architectures for NLP 1. Represent the words/features with dense


slide-1
SLIDE 1

复旦大学大数据学院

School of Data Science, Fudan University

DATA130006 Text Management and Analysis

Basis of Neural Networks

魏忠钰

  • Dec. 20th, 2017
slide-2
SLIDE 2

General Neural Architectures for NLP

  • 1. Represent the

words/features with dense vectors (embeddings) by lookup table’

  • 2. Concatenate the vectors
  • 3. Multi-layer neural

networks

§ Classification § Matching § ranking

  • R. Collobert et al. “Natural language processing (almost) from scratch”
slide-3
SLIDE 3

Machine Learning

§ Machine learning explores the study and construction

  • f algorithms that can learn from and make

predictions on data. (from Wikipedia)

slide-4
SLIDE 4

Formal Specification of Machine Learning

§ Input Data: 𝑦", 𝑧" , 1 ≤ 𝑗 ≤ 𝑛 § Model

§ Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network

§ Criterion:

§ Loss Function:

§ L(y, f(x)) à Optimization § 𝑅 𝜄 = 4

5 ∑

𝑀(𝑧", 𝑔(𝑦", 𝜄))

5 "84

à Minimization § Regularization: 𝜄

§ Objective Function: Q 𝜄 + 𝜇 𝜄 ;

slide-5
SLIDE 5

Linear Classifier

𝑔 𝑦, 𝑋 = 𝑋𝑦 + b

slide-6
SLIDE 6

Generalized Linear Classification

§ Hypothesis is a logistic function of a linear combination

  • f inputs

𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 F x =

4 4?@AB (D)

§ We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥,𝑦 is linear

slide-7
SLIDE 7

Softmax

§ Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: § To present class c by one-hot vector § Where I() is indicator function

𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥M

,𝑦 =

exp (𝑥M

,𝑦)

∑ exp (𝑥"

,𝑦) P "84

𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)],

slide-8
SLIDE 8

Examples of word classification

  • x = D * 1

W = K * D b = K * 1

slide-9
SLIDE 9

How to learn W? 𝑅 𝜄 = 1 𝑛 W 𝑀(𝑧", 𝑔(𝑦", 𝜄))

5 "84

§ Hinge Loss (SVM) § Softmax loss: cross-entropy loss

slide-10
SLIDE 10

SVM vs Softmax (Quiz)

slide-11
SLIDE 11

Parameter Learning

§ In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?

slide-12
SLIDE 12

Gradient Descent

§ Gradient Descent: § 𝜇 is also called Learning Rate in ML.

slide-13
SLIDE 13

Gradient Descent

slide-14
SLIDE 14

Learning Rate

slide-15
SLIDE 15

Gradient Descent

slide-16
SLIDE 16

Stochastic Gradient Descent (SGD)

slide-17
SLIDE 17

Computational graphs

slide-18
SLIDE 18

Backpropagation: a simple example

slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28
slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31
slide-32
SLIDE 32
slide-33
SLIDE 33
slide-34
SLIDE 34
slide-35
SLIDE 35
slide-36
SLIDE 36
slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41
slide-42
SLIDE 42
slide-43
SLIDE 43
slide-44
SLIDE 44
slide-45
SLIDE 45
slide-46
SLIDE 46
slide-47
SLIDE 47
slide-48
SLIDE 48
slide-49
SLIDE 49
slide-50
SLIDE 50
slide-51
SLIDE 51
slide-52
SLIDE 52
slide-53
SLIDE 53

Biological Neuron

slide-54
SLIDE 54

Artificial Neuron

slide-55
SLIDE 55

Activation Functions

slide-56
SLIDE 56
slide-57
SLIDE 57

Activation Functions

slide-58
SLIDE 58

Feedforward Neural Network

slide-59
SLIDE 59

Neural Network

slide-60
SLIDE 60

Feedforward Computing