IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning: from Linear Models to Neural Networks
Andrey Kutuzov, Vinit Ravishankar, Lilja Øvrelid, Stephan Oepen, & Erik Velldal
University of Oslo
24 January 2019
1
IN5550: Neural Methods in Natural Language Processing Lecture 2 - - PowerPoint PPT Presentation
IN5550: Neural Methods in Natural Language Processing Lecture 2 Supervised Machine Learning: from Linear Models to Neural Networks Andrey Kutuzov, Vinit Ravishankar, Lilja vrelid, Stephan Oepen, & Erik Velldal University of Oslo 24
1
1
2
◮ make sure to update your UiO github profile with your photo, and star
◮ Linked from the course page, adapted for the notation of [Goldberg, 2017]. 3
3
◮ for example, e-mail messages.
◮ for example, whether the message is spam (1) or not (0).
4
5
6
6
◮ feature vector x ∈ Rdin; ◮ each training instance is represented with din features; ◮ for example, some properties of the documents.
◮ matrix W ∈ Rdin×dout ◮ dout is the dimensionality of the desired prediction (number of classes) ◮ bias vector b ∈ Rdout ◮ bias ‘shifts’ the function output to some direction. 7
8
◮ Thus, it contains data both about the instances and their features (more
9
10
◮ (although this is much harder to visualize).
◮ or a binary flag {1, 0} of whether a appeared in i at all or not.
◮ for example, if we have 1000 words in the vocabulary: ◮ i ∈ R1000 ◮ i = [20, 16, 0, 10, 0, . . . , 3] 11
◮ D extracted from the text above contains 10 words (lowercased): {‘-’,
◮ o0 = [0, 0, 0, 0, 0, 0, 1, 0, 0, 0] ◮ o1 = [0, 0, 0, 0, 0, 0, 0, 0, 1, 0] ◮ etc... ◮ i = [1, 1, 1, 1, 1, 2, 2, 1, 1, 1] (‘the’ and ‘road’ mentioned 2 times) 12
13
14
15
◮ (all scores sum to 1) 16
16
◮ for example, L = (y − ˆ
17
18
19
20
21
22
23
23
24
25
26
27
28
28
29
30
31
31
32
32
33
34