Basis of Neural Networks School of Data Science, Fudan - - PowerPoint PPT Presentation

▶

Mar 03, 2024 444 likes •1.05k views

DATA130006 Text Management and Analysis Basis of Neural Networks School of Data Science, Fudan University Dec. 20 th , 2017 General Neural Architectures for NLP 1. Represent the words/features with dense

SLIDE 1

复旦大学大数据学院

School of Data Science, Fudan University

DATA130006 Text Management and Analysis

Basis of Neural Networks

魏忠钰

Dec. 20th, 2017

SLIDE 2

General Neural Architectures for NLP

1. Represent the

words/features with dense vectors (embeddings) by lookup table’

2. Concatenate the vectors
3. Multi-layer neural

networks

§ Classification § Matching § ranking

R. Collobert et al. “Natural language processing (almost) from scratch”

SLIDE 3

Machine Learning

§ Machine learning explores the study and construction

f algorithms that can learn from and make

predictions on data. (from Wikipedia)

SLIDE 4

Formal Specification of Machine Learning

§ Input Data: 𝑦", 𝑧" , 1 ≤ 𝑗 ≤ 𝑛 § Model

§ Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network

§ Criterion:

§ Loss Function:

§ L(y, f(x)) à Optimization § 𝑅 𝜄 = 4

5 ∑

𝑀(𝑧", 𝑔(𝑦", 𝜄))

5 "84

à Minimization § Regularization: 𝜄

§ Objective Function: Q 𝜄 + 𝜇 𝜄 ;

SLIDE 5

Linear Classifier

𝑔 𝑦, 𝑋 = 𝑋𝑦 + b

SLIDE 6

Generalized Linear Classification

§ Hypothesis is a logistic function of a linear combination

f inputs

𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 F x =

4 4?@AB (D)

§ We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥,𝑦 is linear

SLIDE 7

Softmax

§ Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: § To present class c by one-hot vector § Where I() is indicator function

𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥M

,𝑦 =

exp (𝑥M

,𝑦)

∑ exp (𝑥"

,𝑦) P "84

𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)],

SLIDE 8

Examples of word classification

x = D * 1

W = K * D b = K * 1

SLIDE 9

How to learn W? 𝑅 𝜄 = 1 𝑛 W 𝑀(𝑧", 𝑔(𝑦", 𝜄))

5 "84

§ Hinge Loss (SVM) § Softmax loss: cross-entropy loss

SLIDE 10

SVM vs Softmax (Quiz)

SLIDE 11

Parameter Learning

§ In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?

SLIDE 12

Gradient Descent

§ Gradient Descent: § 𝜇 is also called Learning Rate in ML.

SLIDE 13

Gradient Descent

SLIDE 14

Learning Rate

SLIDE 15

Gradient Descent

SLIDE 16

Stochastic Gradient Descent (SGD)

SLIDE 17

Computational graphs

SLIDE 18

Backpropagation: a simple example

SLIDE 19

SLIDE 20

SLIDE 21

SLIDE 22

SLIDE 23

SLIDE 24

SLIDE 25

SLIDE 26

SLIDE 27

SLIDE 28

SLIDE 29

SLIDE 30

SLIDE 31

SLIDE 32

SLIDE 33

SLIDE 34

SLIDE 35

SLIDE 36

SLIDE 37

SLIDE 38

SLIDE 39

SLIDE 40

SLIDE 41

SLIDE 42

SLIDE 43

SLIDE 44

SLIDE 45

SLIDE 46

SLIDE 47

SLIDE 48

SLIDE 49

SLIDE 50

SLIDE 51

SLIDE 52

SLIDE 53

Biological Neuron

SLIDE 54

Artificial Neuron

SLIDE 55

Activation Functions

SLIDE 56

SLIDE 57

Activation Functions

SLIDE 58

Feedforward Neural Network

SLIDE 59

Neural Network

SLIDE 60

DATA130006 Text Management and Analysis

Basis of Neural Networks

魏忠钰

General Neural Architectures for NLP

words/features with dense vectors (embeddings) by lookup table’

networks

§ Classification § Matching § ranking

Machine Learning

§ Machine learning explores the study and construction

predictions on data. (from Wikipedia)

Formal Specification of Machine Learning

§ Input Data: 𝑦", 𝑧" , 1 ≤ 𝑗 ≤ 𝑛 § Model

§ Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 § Generalized Linear Model: 𝑧 = 𝑔 𝑦 = 𝑥,𝜚(𝑦) + 𝑐 § Non-linear Model: Neural Network

§ Criterion:

§ Loss Function:

§ L(y, f(x)) à Optimization § 𝑅 𝜄 = 4

𝑀(𝑧", 𝑔(𝑦", 𝜄))

à Minimization § Regularization: 𝜄

§ Objective Function: Q 𝜄 + 𝜇 𝜄 ;

Linear Classifier

𝑔 𝑦, 𝑋 = 𝑋𝑦 + b

Generalized Linear Classification

§ Hypothesis is a logistic function of a linear combination

𝑧 = 𝑔 𝑦 = 𝑥,𝑦 + 𝑐 F x =

4 4?@AB (D)

§ We can interpret F(x) as P(y=1|x) § Then the log-odds ratio, In P(y=1|x) P(y=0|x) = 𝑥,𝑦 is linear

Softmax

§ Softmax regression is a generalization of logistic regression to multi-class classification problems § With softmax, the posterior probability of y = c is: § To present class c by one-hot vector § Where I() is indicator function

𝑄 𝑧 = 𝑑 𝑦 = 𝑡𝑝𝑔𝑢𝑛𝑏𝑦 𝑥M

,𝑦 =

exp (𝑥M

,𝑦)

∑ exp (𝑥"

,𝑦) P "84

𝑧 = [𝐽 1 = 𝑑 , 𝐽 2 = 𝑑 , … , 𝐽(𝐷 = 𝑑)],

Examples of word classification

W = K * D b = K * 1

How to learn W? 𝑅 𝜄 = 1 𝑛 W 𝑀(𝑧", 𝑔(𝑦", 𝜄))

5 "84

§ Hinge Loss (SVM) § Softmax loss: cross-entropy loss

SVM vs Softmax (Quiz)

Parameter Learning

§ In ML, our objective is to learn the parameter 𝜄 to minimize the loss function. § How to learn 𝜄 ?

Gradient Descent

§ Gradient Descent: § 𝜇 is also called Learning Rate in ML.

Gradient Descent

Learning Rate

Gradient Descent

Stochastic Gradient Descent (SGD)

Computational graphs

Backpropagation: a simple example

Biological Neuron

Artificial Neuron

Activation Functions

Activation Functions

Feedforward Neural Network

Neural Network

Feedforward Computing