Deep Learning
Recurrent Networks Part 3
1
Recurrent Networks Part 3 1 Y(t+6) Story so far Stock vector - - PowerPoint PPT Presentation
Deep Learning Recurrent Networks Part 3 1 Y(t+6) Story so far Stock vector X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7) Iterated structures are good for analyzing time series data with short-time dependence on the past
1
Stock vector X(t) X(t+1) X(t+2) X(t+3) X(t+4) X(t+5) X(t+6) X(t+7) Y(t+6)
2
– These are “Time delay” neural nets, AKA convnets
– These are recurrent neural networks
Time X(t) Y(t) t=0 h-1
3
4
– Input is binary – Will require large number of training instances
– Network trained for N-bit numbers will not work for N+1 bit numbers
– With very little training data!
1 0 0 0 1 1 0 0 1 0 1 1 0 0 1 0 1 1 0 0 MLP 1 0 1 0 1 0 1 1 1 1 0 1 1 RNN unit Previous carry Carry
5
Time X(t) Y(t) t=0 h-1 DIVERGENCE Ydesired(t)
6
Time X(t) Y(t) t=0 h-1 DIVERGENCE Ydesired(t) Primary topic for today
7
sigmoid tanh relu
8
Output layer Input layer
9
1
10
11
Time X(t) Y(t)
12
X(0)
Y(0) t hf(-1)
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T) X(0) X(1) X(2) X(T-2) X(T-1) X(T)
hb(inf)
13
Time X(t) Y(t) t=0 h-1 DIVERGENCE Ydesired(t) Primary topic for today
14
15
Images from Karpathy
16
– E.g phoneme recognition
– E.g. speech recognition – Exact location of output is unknown a priori
17
input
– E.g. language translation
– E.g. captioning an image
Images from Karpathy
18
Images from Karpathy
19
Time X(t) Y(t) t=0
20
Time X(t) Y(t) t=0 DIVERGENCE
21
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈)
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑢
𝑥𝑢𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑥𝑢𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍 𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
– 𝑥𝑢 is typically set to 1.0 – This is further backpropagated to update weights etc Y(t) DIVERGENCE
22
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈)
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑢
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍 𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
– This is further backpropagated to update weights etc
Y(t) DIVERGENCE
Typical Divergence for classification: 𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢) = 𝑌𝑓𝑜𝑢(𝑍 𝑢𝑏𝑠𝑓𝑢, 𝑍)
23
Images from Karpathy
24
Images from Karpathy
25
and future words in the sentence
26
two
CD h-1
roads diverged a yellow wood
NNS VBD
DT JJ NN in
IN
27
X(0)
Y(0) h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T)
– Process input left to right using forward net – Process it right to left using backward net – The combined outputs are used subsequently to produce one output per input symbol
discussion generalizes
28
X(0)
Y(0) h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T) X(0) X(1) X(2) X(T-2) X(T-1) X(T)
sequences and output sequences of equal length, with one-to-one correspondence
– (𝐘𝑗, 𝐄𝑗), where – 𝐘𝑗 = 𝑌𝑗,0, … , 𝑌𝑗,𝑈 – 𝐄𝑗 = 𝐸𝑗,0, … , 𝐸𝑗,𝑈
X(0)
Y(0) t h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T)
29
generate outputs
X(0)
Y(0) t h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T)
30
– Back Propagation Through Time
X(0)
Y(0) t h-1
X(1) X(2) X(T-2) X(T-1) X(T)
Y(1) Y(2)
Y(T-2) Y(T-1) Y(T)
31
h-1
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑈 − 2) 𝑌(𝑈 − 1) 𝑌(𝑈) 𝑍(0) 𝑍(1) 𝑍(2) 𝑍(𝑈 − 2) 𝑍(𝑈 − 1) 𝑍(𝑈) 𝐸(1. . 𝑈) 𝐸𝐽𝑊
32
h-1
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑈 − 2) 𝑌(𝑈 − 1) 𝑌(𝑈) 𝑍(0) 𝑍(1) 𝑍(2) 𝑍(𝑈 − 2) 𝑍(𝑈 − 1) 𝑍(𝑈) 𝐸(1. . 𝑈) 𝐸𝐽𝑊
33
h-1
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑈 − 2) 𝑌(𝑈 − 1) 𝑌(𝑈) 𝑍(0) 𝑍(1) 𝑍(2) 𝑍(𝑈 − 2) 𝑍(𝑈 − 1) 𝑍(𝑈) 𝐸(1. . 𝑈) 𝐸𝐽𝑊
34
h-1
𝑌(0) 𝑌(1) 𝑌(2) 𝑌(𝑈 − 2) 𝑌(𝑈 − 1) 𝑌(𝑈) 𝑍(0) 𝑍(1) 𝑍(2) 𝑍(𝑈 − 2) 𝑍(𝑈 − 1) 𝑍(𝑈) 𝐸(1. . 𝑈) 𝐸𝐽𝑊
35
individual instants
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑢
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍 𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
Time X(t) Y(t) t=0 h-1 Y(t) DIVERGENCE
36
individual instants
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑢
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝛼𝑍(𝑢)𝐸𝑗𝑤 𝑍 𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢)
Time X(t) Y(t) t=0 h-1 Y(t) DIVERGENCE
37
Typical Divergence for classification: 𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢) = 𝑌𝑓𝑜𝑢(𝑍 𝑢𝑏𝑠𝑓𝑢, 𝑍)
h-1 𝑥0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7
38
– Actually “embeddings” of one-hot vectors
– Must ideally peak at the target character
Figure from Andrej Karpathy. Input: Sequence of characters (presented as one-hot vectors). Target output after observing “h e l l” is “o”
39
𝑍 𝑢, 𝑗 = 𝑄(𝑊
𝑗|𝑥0 … 𝑥𝑢−1)
𝑗 is the i-th symbol in the vocabulary
𝐸𝑗𝑤 𝑍
𝑢𝑏𝑠𝑓𝑢 1 … 𝑈 , 𝑍(1 … 𝑈) = 𝑢
𝑌𝑓𝑜𝑢 𝑍
𝑢𝑏𝑠𝑓𝑢 𝑢 , 𝑍(𝑢) = − 𝑢
log 𝑍(𝑢, 𝑥𝑢+1)
Time Y(t) t=0 h-1 Y(t) DIVERGENCE 𝑥0 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥1 𝑥2 𝑥3 𝑥4 𝑥5 𝑥6 𝑥7 The probability assigned to the correct next word
40
41
42
Four score and seven years ??? A B R A H A M L I N C O L ??
43
– Pre-specify a vocabulary of N words in fixed (e.g. lexical) order
– Represent each word by an N-dimensional vector with N-1 zeros and a single 1 (in the position of the word in the ordered list of words)
– English will require about 100 characters, to include both cases, special characters such as commas, hyphens, apostrophes, etc., and the space character
44
1…𝑋 𝑜−1, predict 𝑋 𝑜
1…𝑋 𝑜−1 are both
𝑋
𝑜 = 𝑔 𝑋 1, … , 𝑋 𝑜−1
Four score and seven years ??? Nx1 one-hot vectors 𝑔()
⋮ 1 1 ⋮ 1 ⋮
⋮
1 ⋮
𝑋
1
𝑋
2
𝑋
𝑜−1
𝑋
𝑜
45
1…𝑋 𝑜−1, predict 𝑋 𝑜
1…𝑋 𝑜−1 are both
𝑋
𝑜 = 𝑔 𝑋 1, … , 𝑋 𝑜−1
Four score and seven years ??? Nx1 one-hot vectors 𝑔()
⋮ 1 1 ⋮ 1 ⋮
⋮
1 ⋮
𝑋
1
𝑋
2
𝑋
𝑜−1
𝑋
𝑜
46
cube
– Actual volume of space used = 0
– Density of points: 𝒫
𝑂 𝑠𝑂
(1,0,0) (0,1,0) (0,0,1)
47
importance of words
– All word vectors are the same length
– The distance between every pair of words is the same
(1,0,0) (0,1,0) (0,0,1)
48
– The volume used is still 0, but density can go up by many orders of magnitude
𝑂 𝑠𝑁
– If properly learned, the distances between projected points will capture semantic relations between the words
(1,0,0) (0,1,0) (0,0,1)
49
– The volume used is still 0, but density can go up by many orders of magnitude
𝑂 𝑠𝑁
– If properly learned, the distances between projected points will capture semantic relations between the words
(1,0,0) (0,1,0) (0,0,1)
50
– Replace every one-hot vector 𝑋
𝑗 by 𝑄𝑋 𝑗
– 𝑄 is an 𝑁 × 𝑂 matrix – 𝑄𝑋
𝑗 is now an 𝑁-dimensional vector
– Learn P using an appropriate objective
𝑋
𝑜 = 𝑔 𝑄𝑋 1, 𝑄𝑋 2, … , 𝑄𝑋 𝑜−1
Four score and seven years ??? 𝑔()
⋮ 1 1 ⋮ 1 ⋮
⋮
1 ⋮
𝑋
1
𝑋
2
𝑋
𝑜−1
𝑋
𝑜
𝑄 𝑄 𝑄
(1,0,0) (0,1,0) (0,0,1)
51
tied weights
𝑋
𝑜 = 𝑔 𝑄𝑋 1, 𝑄𝑋 2, … , 𝑄𝑋 𝑜−1
(1,0,0) (0,1,0) (0,0,1)
⋮ ⋮ ⋮ ⋮ 𝑔()
1 ⋮
𝑋
𝑜
⋮ ⋮ ⋮
⋮ 1 1 ⋮ 1 ⋮
𝑋
1
𝑋
2
𝑋
𝑜−1
𝑂 𝑁
52
– “A neural probabilistic language model”, Bengio et al. 2003 – Hidden layer has Tanh() activation, output is softmax
representations 𝑄𝑋 of words 𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑄 𝑋
4
𝑄 𝑋
5
𝑄 𝑋
6
𝑄 𝑋
7
𝑄 𝑋
8
𝑄 𝑋
9
𝑋
5
𝑋
6
𝑋
7
𝑋
8
𝑋
9
𝑋
10
53
𝑄 Mean pooling 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑄 𝑋
5
𝑄 𝑋
6
𝑄 𝑋
7
𝑋
4
𝑄 𝑋
7
𝑋
5
𝑋
6
𝑋
8
𝑋
9
𝑋
10
𝑋
4
Color indicates shared parameters
54
and Phrases and their Compositionality”
55
𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑄 𝑋
4
𝑄 𝑋
5
𝑄 𝑋
6
𝑄 𝑋
7
𝑄 𝑋
8
𝑄 𝑋
9
𝑋
5
𝑋
6
𝑋
7
𝑋
8
𝑋
9
𝑋
10
𝑋
2
𝑋
3
𝑋
4
56
– One-hot vectors
– Outputs an N-valued probability distribution rather than a one-hot vector 𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
57
– One-hot vectors
– Outputs an N-valued probability distribution rather than a one-hot vector
– And set it as the next word in the series
𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑋
4
58
– And draw the next word from the output probability distribution
– In some cases, e.g. generating programs, there may be a natural termination 𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑄 𝑋
5
𝑋
4
59
– And draw the next word from the output probability distribution
– In some cases, e.g. generating programs, there may be a natural termination 𝑄 𝑋
1
𝑄 𝑋
2
𝑄 𝑋
3
𝑄 𝑄 𝑄 𝑄 𝑄 𝑄 𝑋
5
𝑋
6
𝑋
7
𝑋
8
𝑋
9
𝑋
10
𝑋
4
60
Trained on linux source code Actually uses a character-level model (predicts character sequences)
61
http://www.hexahedria.com/2015/08/03/composing-music-with-recurrent-neural-networks/
62
63
– E.g phoneme recognition
– E.g. speech recognition – Exact location of output is unknown a priori
64
65
𝐷𝑝𝑚𝑝𝑠 Blue 𝑝𝑔 𝑡𝑙𝑧
– Represented as an N-dimensional output probability vector, where N is the number of phonemes
𝑌0 𝑌1 𝑌2 /AH/
66
𝑌0 𝑌1 𝑌2 /AH/
67
– Output generated when the last vector is processed
– We only read it at the end of the sequence
𝑌0 𝑌1 𝑌2 /AH/
68
𝑢𝑏𝑠𝑓𝑢, 𝑍 = 𝑌𝑓𝑜𝑢(𝑍 𝑈 , 𝑄ℎ𝑝𝑜𝑓𝑛𝑓)
𝑌0 𝑌1 𝑌2 /AH/ Div Y(2)
69
𝑢𝑏𝑠𝑓𝑢, 𝑍 = 𝑌𝑓𝑜𝑢(𝑍 𝑈 , 𝑄ℎ𝑝𝑜𝑓𝑛𝑓)
𝑌0 𝑌1 𝑌2 /AH/ Div Y(2) Shortcoming: Pretends there’s no useful information in these
70
𝐸𝐽𝑊 𝑍
𝑢𝑏𝑠𝑓𝑢, 𝑍 = 𝑢
𝑥𝑢𝑌𝑓𝑜𝑢(𝑍 𝑢 , 𝑄ℎ𝑝𝑜𝑓𝑛𝑓)
𝑌0 𝑌1 𝑌2 /AH/ Div Y(2) Fix: Use these
These too must ideally point to the correct phoneme /AH/ Div /AH/ Div
71
𝐸𝐽𝑊 𝑍
𝑢𝑏𝑠𝑓𝑢, 𝑍 = 𝑢
𝑥𝑢𝑌𝑓𝑜𝑢(𝑍 𝑢 , 𝑄ℎ𝑝𝑜𝑓𝑛𝑓)
– Only 𝑥𝑈 is high, other weights are 0 or low 𝑌0 𝑌1 𝑌2 /AH/ Div Y(2) Fix: Use these
These too must ideally point to the correct phoneme /AH/ Div /AH/ Div
72
𝐷𝑝𝑚𝑝𝑠 Blue Div Y(2) 𝑝𝑔 𝑡𝑙𝑧 Div Div
– E.g phoneme recognition
– E.g. speech recognition – Exact location of output is unknown a priori
73
– This is just a simple concatenation of many copies of the simple “output at the end of the input sequence” model we just saw
𝑌0 𝑌1 𝑌2 /B/ 𝑌4 𝑌5 𝑌6 /AH/ 𝑌7 𝑌8 𝑌9 /T/ 𝑌3
74
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌9 𝑌3 /B/ /AH/ /T/
75
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
76
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
77
– Merge adjacent repeated symbols, and place the actual emission
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
/G/ /F/ /IY/ /D/
78
– Merge adjacent repeated symbols, and place the actual emission
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
/G/ /F/ /IY/ /D/ Cannot distinguish between an extended symbol and repetitions of the symbol /F/
79
– Merge adjacent repeated symbols, and place the actual emission
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
/G/ /F/ /IY/ /D/ Cannot distinguish between an extended symbol and repetitions of the symbol /F/ Resulting sequence may be meaningless (what word is “GFIYD”?)
80
allowed
– E.g. only allow sequences corresponding to dictionary words – E.g. Sub-symbol units (like in HW1 – what were they?)
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌3 /AH/ /B/ /D/ /EH/ /IY/ /F/ /G/ 𝑧0
1
𝑧0
2
𝑧0
3
𝑧0
4
𝑧0
5
𝑧0
6
𝑧0
7
𝑧1
1
𝑧1
2
𝑧1
3
𝑧1
4
𝑧1
5
𝑧1
6
𝑧1
7
𝑧2
1
𝑧2
2
𝑧2
3
𝑧2
4
𝑧2
5
𝑧2
6
𝑧2
7
𝑧3
1
𝑧3
2
𝑧3
3
𝑧3
4
𝑧3
5
𝑧3
6
𝑧3
7
𝑧4
1
𝑧4
2
𝑧4
3
𝑧4
4
𝑧4
5
𝑧4
6
𝑧4
7
𝑧5
1
𝑧5
2
𝑧5
3
𝑧5
4
𝑧5
5
𝑧5
6
𝑧5
7
𝑧6
1
𝑧6
2
𝑧6
3
𝑧6
4
𝑧6
5
𝑧6
6
𝑧6
7
𝑧7
1
𝑧7
2
𝑧7
3
𝑧7
4
𝑧7
5
𝑧7
6
𝑧7
7
𝑧8
1
𝑧8
2
𝑧8
3
𝑧8
4
𝑧8
5
𝑧8
6
𝑧8
7
81
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌9 𝑌3 /B/ /AH/ /T/
82
𝑌0 𝑌1 𝑌2 /B/ 𝑌4 𝑌5 𝑌6 /AH/ 𝑌7 𝑌8 𝑌9 /T/ 𝑌3
83
2, 𝐶 + 𝑌𝑓𝑜𝑢 𝑍 6, 𝐵𝐼 + 𝑌𝑓𝑜𝑢(𝑍 9, 𝑈)
𝑌0 𝑌1 𝑌2 /B/ 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌9 𝑌3 Div Div Div /AH/ /T/ 𝑍
2
𝑍
6
𝑍
9
84
2, 𝐶 + 𝑌𝑓𝑜𝑢 𝑍 6, 𝐵𝐼 + 𝑌𝑓𝑜𝑢(𝑍 9, 𝑈)
𝑢
𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚𝑢 = − 𝑢
𝑌0 𝑌1 𝑌2 /B/ 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌9 𝑌3 Div Div Div /AH/ /T/ 𝑍
2
𝑍
6
𝑍
9
Div Div Div Div Div Div Div
85
𝑌0 𝑌1 𝑌2 𝑌4 𝑌5 𝑌6 𝑌7 𝑌8 𝑌9 𝑌3
𝑢
𝑍 𝑍
1
𝑍
2
𝑍
4
𝑍
5
𝑍
6
𝑍
7
𝑍
8
𝑍
9
𝑍
3
86
87