The actual output of the network 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ /D/ Cannot distinguish between an extended symbol and 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 repetitions of the symbol 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ /F/ /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 /G/ 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 • Option 1: Simply select the most probable symbol at each time – Merge adjacent repeated symbols, and place the actual emission of the symbol in the final instant 17
The actual output of the network 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 Resulting sequence may be meaningless (what word is “GFIYD”?) 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ /D/ Cannot distinguish between an extended symbol and 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 repetitions of the symbol 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ /F/ /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 /G/ 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 • Option 1: Simply select the most probable symbol at each time – Merge adjacent repeated symbols, and place the actual emission of the symbol in the final instant 18
The actual output of the network 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 • Option 2: Impose external constraints on what sequences are allowed – E.g. only allow sequences corresponding to dictionary words – E.g. Sub-symbol units (like in HW1 – what were they?) 19
The actual output of the network 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 We will refer to the process 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ of obtaining an output from the network as decoding 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 • Option 2: Impose external constraints on what sequences are allowed – E.g. only allow sequences corresponding to dictionary words – E.g. Sub-symbol units (like in HW1 – what were they?) 20
The sequence-to-sequence problem /B/ /IY/ /F/ /IY/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • How do we know when to output symbols – In fact, the network produces outputs at every time – Which of these are the real outputs • How do we train these models? 21
Training /B/ /IY/ /F/ /IY/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Given output symbols at the right locations – The phoneme /B/ ends at X 2 , /IY/ at X 4 , /F/ at X 6 , /IY/ at X 9 22
/F/ /IY/ /B/ /IY/ Div Div Div Div 𝑍 𝑍 𝑍 𝑍 2 4 6 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Either just define Divergence as: 𝐸𝐽𝑊 = 𝑌𝑓𝑜𝑢 𝑍 2 , 𝐶 + 𝑌𝑓𝑜𝑢 𝑍 4 , 𝐽𝑍 + 𝑌𝑓𝑜𝑢 𝑍 6 , 𝐺 + 𝑌𝑓𝑜𝑢(𝑍 9 , 𝐽𝑍) • Or.. 23
/IY/ /F/ /IY/ /B/ Div Div Div Div Div Div Div Div Div Div 𝑍 𝑍 𝑍 𝑍 2 4 6 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Either just define Divergence as: 𝐸𝐽𝑊 = 𝑌𝑓𝑜𝑢 𝑍 2 , 𝐶 + 𝑌𝑓𝑜𝑢 𝑍 4 , 𝐽𝑍 + 𝑌𝑓𝑜𝑢 𝑍 6 , 𝐺 + 𝑌𝑓𝑜𝑢(𝑍 9 , 𝐽𝑍) • Or repeat the symbols over their duration 𝐸𝐽𝑊 = 𝑌𝑓𝑜𝑢 𝑍 𝑢 , 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 = − log 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 𝑢 𝑢 24
/IY/ /F/ /IY/ /B/ Div Div Div Div Div Div Div Div Div Div 𝑍 𝑍 𝑍 𝑍 2 4 6 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 𝐸𝐽𝑊 = 𝑌𝑓𝑜𝑢 𝑍 𝑢 , 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 = − log 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 𝑢 𝑢 • The gradient w.r.t the 𝑢 -th output vector 𝑍 𝑢 −1 𝛼 𝑍 𝑢 𝐸𝐽𝑊 = 0 0 … 0 … 0 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 – Zeros except at the component corresponding to the target 25
Problem: No timing information provided /B/ /IY/ /F/ /IY/ ? ? ? ? ? ? ? ? ? ? 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 0 1 2 3 4 5 6 7 8 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Only the sequence of output symbols is provided for the training data – But no indication of which one occurs where • How do we compute the divergence? – And how do we compute its gradient w.r.t. 𝑍 𝑢 26
Solution 1: Guess the alignment /F/ /B/ /B/ /IY/ /IY/ /IY/ /F/ /F/ /IY/ /F/ ? ? ? ? ? ? ? ? ? ? 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 0 1 2 3 4 5 6 7 8 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Initialize: Assign an initial alignment – Either randomly, based on some heuristic, or any other rationale • Iterate: – Train the network using the current alignment – Reestimate the alignment for each training instance • Using the decoding methods already discussed 27
Solution 1: Guess the alignment /F/ /B/ /B/ /IY/ /IY/ /IY/ /F/ /F/ /IY/ /F/ ? ? ? ? ? ? ? ? ? ? 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 0 1 2 3 4 5 6 7 8 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 • Initialize: Assign an initial alignment – Either randomly, based on some heuristic, or any other rationale • Iterate: – Train the network using the current alignment – Reestimate the alignment for each training instance • Using the decoding methods already discussed 28
Estimating an alignment • Given: – The unaligned 𝐿 -length symbol sequence 𝑇 = 𝑇 0 … 𝑇 𝐿−1 (e.g. /B/ /IY/ /F/ /IY/) – An 𝑂 -length input ( 𝑂 ≥ 𝐿 ) – And a (trained) recurrent network • Find: – An 𝑂 -length expansion 𝑡 0 … 𝑡 𝑂−1 comprising the symbols in S in strict order • e.g. 𝑇 0 𝑇 1 𝑇 1 𝑇 2 𝑇 3 𝑇 3 … 𝑇 𝐿−1 – i.e. 𝑡 0 = 𝑇 0 , 𝑡 2 = 𝑇 1 , 𝑇 3 = 𝑇 1 , 𝑡 4 = 𝑇 2 , 𝑡 5 = 𝑇 3 , … 𝑡 𝑂−1 = 𝑇 𝐿−1 • E.g. /B/ /B/ /IY/ /IY/ /IY/ /F/ /F/ /F/ /F/ /IY/ .. – 𝑡 𝑗 = 𝑇 𝑙 ⇒ 𝑗 ≥ 𝑙 – 𝑡 𝑗 = 𝑇 𝑙 , 𝑡 𝑘 = 𝑇 𝑚 , 𝑗 < 𝑘 ⇒ 𝑙 ≤ 𝑚 • Outcome: an alignment of the target symbol sequence 𝑇 0 … 𝑇 𝐿−1 to the input 𝑌 0 … 𝑌 𝑂−1 29
Recall: The actual output of the network 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 • At each time the network outputs a probability for each output symbol 30
Recall: unconstrained decoding 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ • We find the most likely sequence of symbols – (Conditioned on input 𝑌 0 … 𝑌 𝑂−1 ) • This may not correspond to an expansion of the desired symbol sequence – E.g. the unconstrained decode may be /AH//AH//AH//D//D//AH//F//IY//IY/ • Contracts to /AH/ /D/ /AH/ /F/ /IY/ – Whereas we want an expansion of /B//IY//F//IY/ 31
Constraining the alignment: Try 1 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝐵𝐼 𝑧 7 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 8 /AH/ 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝐸 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /D/ 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 𝐹𝐼 /EH/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝐻 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /G/ • Block out all rows that do not include symbols from the target sequence – E.g. Block out rows that are not /B/ /IY/ or /F/ 32
Blocking out unnecessary outputs 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 Compute the entire output (for all symbols) Copy the output values for the target symbols into the secondary reduced structure 33
Constraining the alignment: Try 1 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 • Only decode on reduced grid – We are now assured that only the appropriate symbols will be hypothesized 34
Constraining the alignment: Try 1 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 • Only decode on reduced grid – We are now assured that only the appropriate symbols will be hypothesized • Problem: This still doesn’t assure that the decode sequence correctly expands the target symbol sequence – E.g. the above decode is not an expansion of /B//IY//F//IY/ • Still needs additional constraints 35
Try 2: Explicitly arrange the constructed table 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ Arrange the constructed table so that from top to bottom it has the exact sequence of symbols required 36
Try 2: Explicitly arrange the constructed table 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ Note: If a symbol occurs multiple times, we repeat the row in the appropriate location. E.g. the row for /IY/ occurs twice, in the 2 nd and 4 th positions Arrange the constructed table so that from top to bottom it has the exact sequence of symbols required 37
Explicitly constrain alignment 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 /B/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /IY/ 𝑧 5 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 /F/ 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Constrain that the first symbol in the decode must be the top left block • The last symbol must be the bottom right • The rest of the symbols must follow a sequence that monotonically travels down from top left to bottom right – I.e. never goes up • This guarantees that the sequence is an expansion of the target sequence – /B/ /IY/ /F/ /IY/ in this case 38
Explicitly constrain alignment 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Compose a graph such that every path in the graph from source to sink represents a valid alignment – Which maps on to the target symbol sequence (/B//AH//T/) • Edge scores are 1 • Node scores are the probabilities assigned to the symbols by the neural network • The “score” of a path is the product of the probabilities of all nodes along the path • Find the most probable path from source to sink using any dynamic programming algorithm – E.g. The Viterbi algorithm 39
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • At each node, keep track of – The best incoming edge – The score of the best path from the source to the node • Dynamically compute the best path from source to sink 40
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • First, some notation: 𝑇(𝑠) is the probability of the target symbol assigned to the 𝑠 -th row • 𝑧 𝑢 in the 𝑢 -th time (given inputs 𝑌 0 … 𝑌 𝑢 ) – E.g., S(0) = /B/ • The scores in the 0 th row have the form 𝑧 𝑢 𝐶 – E.g. S(1) = S(3) = /IY/ • The scores in the 1 st and 3 rd rows have the form 𝑧 𝑢 𝐽𝑍 – E.g. S(2) = /F/ • The scores in the 2 nd row have the form 𝑧 𝑢 𝐺 41
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄(𝑢, 0) = 0, 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 0 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 42
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 43
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 44
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 45
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 46
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 47
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 48
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 49
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 50
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • Initialization: 𝐶𝑄 0, 𝑗 = 𝑜𝑣𝑚𝑚, 𝑗 = 0 … 𝐿 − 1 𝑇 0 , 𝐶𝑡𝑑𝑠 0, 𝑗 = −∞, 𝑗 = 1 … 𝐿 − 1 𝐶𝑡𝑑𝑠 0,0 = 𝑧 0 • for 𝑢 = 1 … 𝑈 − 1 𝑇 0 𝐶𝑄 𝑢, 0 = 0; 𝐶𝑡𝑑𝑠(𝑢, 0) = 𝐶𝑡𝑑𝑠(𝑢 − 1,0) × 𝑧 𝑢 for 𝑚 = 1 … 𝐿 − 1 • 𝐶𝑄 𝑢, 𝑚 = 𝑗𝑔 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 − 1 > 𝐶𝑡𝑑𝑠 𝑢 − 1, 𝑚 𝑚 − 1; 𝑓𝑚𝑡𝑓 𝑚 𝑇 𝑚 • 𝐶𝑡𝑑𝑠(𝑢, 𝑚) = 𝐶𝑡𝑑𝑠(𝐶𝑄(𝑢, 𝑚)) × 𝑧 𝑢 51
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • 𝑡(𝑈 − 1) = 𝑇(𝐿 − 1) • for 𝑢 = 𝑈 𝑒𝑝𝑥𝑜 𝑢𝑝 1 – s(t-1) = BP(s(t)) 52
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • 𝑡(𝑈 − 1) = 𝑇(𝐿 − 1) • for 𝑢 = 𝑈 − 1 𝑒𝑝𝑥𝑜𝑢𝑝 1 𝑡(𝑢 − 1) = 𝐶𝑄(𝑡(𝑢)) 53
Viterbi algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • 𝑡(𝑈 − 1) = 𝑇(𝐿 − 1) • for 𝑢 = 𝑈 − 1 𝑒𝑝𝑥𝑜𝑢𝑝 1 𝑡(𝑢 − 1) = 𝐶𝑄(𝑡(𝑢)) /B/ /B/ /IY/ /F/ /F/ /IY/ /IY/ /IY/ /IY/ 54
Gradients from the alignment 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ /B/ /B/ /IY/ /F/ /F/ /IY/ /IY/ /IY/ /IY/ 𝑐𝑓𝑡𝑢𝑞𝑏𝑢ℎ = − 𝑐𝑓𝑡𝑢𝑞𝑏𝑢ℎ 𝐸𝐽𝑊 = 𝑌𝑓𝑜𝑢 𝑍 𝑢 , 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 log 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 𝑢 𝑢 • The gradient w.r.t the 𝑢 -th output vector 𝑍 𝑢 −1 … 0 … 0 𝛼 𝑍 𝑢 𝐸𝐽𝑊 = 0 0 𝑐𝑓𝑡𝑢𝑞𝑏𝑢ℎ 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 – Zeros except at the component corresponding to the target in the estimated alignment 55
Iterative Estimate and Training /IY/ /IY/ /B/ /B/ /IY/ /F/ /F/ /IY/ /IY/ /IY/ ? ? ? ? ? ? ? ? ? ? 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 𝑍 0 1 2 3 4 5 6 7 8 9 𝑌 0 𝑌 1 𝑌 2 𝑌 3 𝑌 4 𝑌 5 𝑌 6 𝑌 7 𝑌 8 𝑌 9 Initialize Train model with Decode to obtain alignments given alignments alignments The “decode” and “train” steps may be combine into a single “decode, find alignment, 56 compute derivatives” step for SGD and mini -batch updates
Iterative update • Option 1: – Determine alignments for every training instance – Train model (using SGD or your favorite approach) on the entire training set – Iterate • Option 2: – During SGD, for each training instance, find the alignment during the forward pass – Use in backward pass 57
Iterative update: Problem • Approach heavily dependent on initial alignment • Prone to poor local optima • Alternate solution: Do not commit to an alignment during any pass.. 58
The reason for suboptimality 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • We commit to the single “best” estimated alignment – The most likely alignment 𝑐𝑓𝑡𝑢𝑞𝑏𝑢ℎ 𝐸𝐽𝑊 = − log 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 𝑢 – This can be way off, particularly in early iterations, or if the model is poorly initialized • Alternate view: there is a probability distribution over alignments – Selecting a single alignment is the same as drawing a single sample from this distribution – Selecting the most likely alignment is the same as deterministically always drawing the most probable value from the distribution 59
The reason for suboptimality 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ • We commit to the single “best” estimated alignment – The most likely alignment 𝑐𝑓𝑡𝑢𝑞𝑏𝑢ℎ 𝐸𝐽𝑊 = − log 𝑍 𝑢, 𝑡𝑧𝑛𝑐𝑝𝑚 𝑢 𝑢 – This can be way off, particularly in early iterations, or if the model is poorly initialized • Alternate view: there is a probability distribution over alignments of the target Symbol sequence (to the input) – Selecting a single alignment is the same as drawing a single sample from it – Selecting the most likely alignment is the same as deterministically always drawing the most probable value from the distribution 60
Averaging over all alignments 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Instead of only selecting the most likely alignment, use the statistical expectation over all possible alignments 𝐸𝐽𝑊 = 𝐹 − log 𝑍 𝑢, 𝑡 𝑢 𝑢 – Use the entire distribution of alignments – This will mitigate the issue of suboptimal selection of alignment 61
The expectation over all alignments 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝐸𝐽𝑊 = 𝐹 − log 𝑍 𝑢, 𝑡 𝑢 𝑢 • Using the linearity of expectation 𝐸𝐽𝑊 = − 𝐹 log 𝑍 𝑢, 𝑡 𝑢 𝑢 – This reduces to finding the expected divergence at each input 𝐸𝐽𝑊 = − 𝑄(𝑡 𝑢 = 𝑇|𝐓, 𝐘) log 𝑍 𝑢, 𝑡 𝑢 = 𝑡 𝑢 𝑇∈𝑇 1 …𝑇 𝐿 62
The expectation over all alignments 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t The probability of seeing the specific symbol s at time t, 𝐸𝐽𝑊 = 𝐹 − log 𝑍 𝑢, 𝑡 𝑢 given that the symbol sequence is an expansion of 𝐓 = 𝑇 0 … 𝑇 𝐿−1 and given the input sequence 𝐘 = 𝑌 0 … 𝑌 𝑂−1 𝑢 • Using the linearity of expectation We need to be able to compute this 𝐸𝐽𝑊 = − 𝐹 log 𝑍 𝑢, 𝑡 𝑢 𝑢 – This reduces to finding the expected divergence at each input 𝐸𝐽𝑊 = − 𝑄(𝑡 𝑢 = 𝑇|𝐓, 𝐘) log 𝑍 𝑢, 𝑡 𝑢 = 𝑇 𝑢 𝑇∈𝑇 1 …𝑇 𝐿 63
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄(𝑡 𝑢 = 𝑇 𝑠 |𝐓, 𝐘) ∝ 𝑄(𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘) • 𝑄(𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘) is the total probability of all valid paths in the graph for target sequence 𝐓 that go through the symbol 𝑇 𝑠 (the 𝑠 th symbol in the sequence 𝑇 1 … 𝑇 𝐿 ) at time 𝑢 • We will compute this using the “forward - backward” algorithm 64
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Decompose 𝑄(𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘) as follows: 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝑡 𝑢+1 … 𝑡 𝑂−1 , 𝐓 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • [𝑇 𝑠+ ] indicates that 𝑡 𝑢+1 might either be 𝑇 𝑠 or 𝑇 𝑠+1 • [𝑇 𝑠− ] indicates that 𝑡 𝑢−1 might be either 𝑇 𝑠 or 𝑇 𝑠−1 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 – Because the target symbol sequence 𝐓 is implicit in the synchronized sequences 𝑡 0 … 𝑡 𝑂−1 which are constrained to be expansions of 𝐓 65
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • For a recurrent network without feedback from the output we can make the conditional independence assumption: 𝑄 𝑡 𝑢+1 … 𝑡 0 … 𝑡 𝑢 , 𝐘 = 𝑄 𝑡 𝑢+1 … 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝑡 𝑢 = 𝑇 𝑠 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 66
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • For a recurrent network without feedback from the output we can make the conditional independence assumption: 𝑄 𝑡 𝑢+1 … 𝑡 0 … 𝑡 𝑢 , 𝐘 = 𝑄 𝑡 𝑢+1 … 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝑡 𝑢 = 𝑇 𝑠 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 Note: in reality, this assumption is not valid if the hidden states are unknown, but 67 we will make it anyway
Conditional independence 𝑧 0 𝑧 1 𝐘 = 𝑌 0 𝑌 1 … 𝑌 𝑂−1 𝐈 = 𝐼 0 𝐼 1 … 𝐼 𝑂−1 ⋮ 𝑧 𝑂−1 • Dependency graph: Input sequence 𝐘 = 𝑌 0 𝑌 1 … 𝑌 𝑂−1 governs hidden variables 𝐈 = 𝐼 0 𝐼 1 … 𝐼 𝑂−1 • Hidden variables govern output predictions 𝑧 0 , 𝑧 1 , … 𝑧 𝑂−1 individually • 𝑧 0 , 𝑧 1 , … 𝑧 𝑂−1 are conditionally independent given 𝐈 • Since 𝐈 is deterministically derived from 𝐘 , 𝑧 0 , 𝑧 1 , … 𝑧 𝑂−1 are also conditionally independent given 𝐘 – This wouldn’t be true if the relation between 𝐘 and 𝐈 were not deterministic or if 𝐘 is unknown 68
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 69
A posteriori probabilities of symbols 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 70
The expectation over all alignments 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 71
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢, 𝑠 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝑡 0 … 𝑡 𝑢−1 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠 𝐘 + 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 𝑠− ] 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 (𝑠−1)− ] 72
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢, 𝑠 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝑡 0 … 𝑡 𝑢−1 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠 𝐘 + 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 𝑠− ] 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 (𝑠−1)− ] 73
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢, 𝑠 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝑡 0 … 𝑡 𝑢−1 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠 𝐘 + 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 𝑠− ] 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 (𝑠−1)− ] 74
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢, 𝑠 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝑡 0 … 𝑡 𝑢−1 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠 𝐘 + 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 𝑠− ] 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 (𝑠−1)− ] 𝑇(𝑠) 75 𝑧 𝑢 𝛽 𝑢 − 1, 𝑠 𝛽 𝑢 − 1, 𝑠 − 1
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢, 𝑠 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝑡 0 … 𝑡 𝑢−1 , 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] = 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠 𝐘 + 𝑄 𝑡 0 … 𝑡 𝑢−2 , 𝑡 𝑢−1 = 𝑇 𝑠−1 𝐘 𝑄 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 𝑠− ] 𝑡 0 …𝑡 𝑢−2 →𝑇 1 …[𝑇 (𝑠−1)− ] 𝑇(𝑠) 𝛽 𝑢, 𝑠 = 𝛽 𝑢 − 1, 𝑠 + 𝛽 𝑢 − 1, 𝑠 − 1 𝑧 𝑢 76
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛽 𝑢 − 1, 𝑠 − 1 𝛽 𝑢 − 1, 𝑠 𝛽 𝑢, 𝑠 𝑇(𝑠) 𝛽 𝑢, 𝑠 = 𝛽 𝑢 − 1, 𝑠 + 𝛽 𝑢 − 1, 𝑠 − 1 𝑧 𝑢 77
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 78
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 79
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 80
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 81
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 82
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝑇 1 , 𝛽 0,1 = 𝑧 0 𝛽 0, 𝑠 = 0, 𝑠 > 1 • for 𝑢 = 1 … 𝑈 − 1 𝑇 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1)𝑧 𝑢 for 𝑚 = 2 … 𝐿 𝑇 𝑚 • 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 83
In practice.. • The recursion 𝑇 𝑚 𝛽(𝑢, 𝑚) = (𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 )𝑧 𝑢 will generally underflow • Instead we can do it in the log domain log 𝛽(𝑢, 𝑚) = log(𝑓 log 𝛽 𝑢−1,𝑚 + 𝑓 log 𝛽 𝑢−1,𝑚−1 ) + log 𝑧 𝑢 𝑇 𝑚 – This can be computed entirely without underflow 84
Forward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛽 0,1 = 1, ො 𝛽 0, 𝑠 = 0, 𝑠 > 1 ො 𝑇 𝑠 , 𝛽 0, 𝑠 = ො 𝛽 0, 𝑠 𝑧 0 1 ≤ 𝑠 ≤ 𝐿 • for 𝑢 = 1 … 𝑈 − 1 𝛽(𝑢, 1) = 𝛽(𝑢 − 1,1) ො for 𝑚 = 2 … 𝐿 • 𝛽(𝑢, 𝑚) = 𝛽 𝑢 − 1, 𝑚 + 𝛽 𝑢 − 1, 𝑚 − 1 ො 𝑇 𝑠 , 𝛽 𝑢, 𝑠 = ො 𝛽 𝑢, 𝑠 𝑧 𝑢 1 ≤ 𝑠 ≤ 𝐿 85
The forward probability 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝑄 𝑡 0 … 𝑡 𝑢−1 , 𝑡 𝑢 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 0 …𝑡 𝑢−1 →𝑇 1 …[𝑇 𝑠− ] 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 We have seen how to compute this 𝛽 𝑢, 𝑠 86
The forward probability 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝛽 𝑢, 𝑠 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 We have seen how to compute this 87
The forward probability 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝛽 𝑢, 𝑠 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 Lets look at this 𝛾 𝑢, 𝑠 88
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛾 𝑢, 𝑠 = 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 89
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛾 𝑢, 𝑠 = 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 90
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛾 𝑢, 𝑠 = 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 91
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝛾 𝑢, 𝑠 = 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝑡 𝑢+1 = 𝑇 𝑠+1 , 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 = 𝑄 𝑡 𝑢+1 = 𝑇 𝑠 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 + 𝑄 𝑡 𝑢+1 = 𝑇 𝑠+1 𝐘 𝑄 𝑡 𝑢+2 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 𝑡 𝑢+2 …𝑡 𝑂−1 →[𝑇 (𝑠+1)+ ]…𝑇 𝐿 92 𝑇(𝑠) 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 𝛾 𝑢 + 1, 𝑠 + 1 𝑧 𝑢+1 𝑧 𝑢+1
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑇(𝑠) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 93
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛾 𝑈 − 1, 𝐿 = 1, 𝛾 𝑈 − 1, 𝑠 = 0, 𝑠 < 𝐿 • for 𝑢 = 𝑈 − 2 𝑒𝑝𝑥𝑜𝑢𝑝 0 𝑇 𝐿 𝛾(𝑢, 𝐿) = 𝛾(𝑢 + 1, 𝐿)𝑧 𝑢+1 for 𝑚 = 𝐿 − 1 … 1 𝑇(𝑚) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 • 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 94
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛾 𝑈 − 1, 𝐿 = 1, 𝛾 𝑈 − 1, 𝑠 = 0, 𝑠 < 𝐿 • for 𝑢 = 𝑈 − 2 𝑒𝑝𝑥𝑜𝑢𝑝 0 𝑇 𝐿 𝛾(𝑢, 𝐿) = 𝛾(𝑢 + 1, 𝐿)𝑧 𝑢+1 for 𝑚 = 𝐿 − 1 … 1 𝑇(𝑚) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 • 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 95
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛾 𝑈 − 1, 𝐿 = 1, 𝛾 𝑈 − 1, 𝑠 = 0, 𝑠 < 𝐿 • for 𝑢 = 𝑈 − 2 𝑒𝑝𝑥𝑜𝑢𝑝 0 𝑇 𝐿 𝛾(𝑢, 𝐿) = 𝛾(𝑢 + 1, 𝐿)𝑧 𝑢+1 for 𝑚 = 𝐿 − 1 … 1 𝑇(𝑚) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 • 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 96
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛾 𝑈 − 1, 𝐿 = 1, 𝛾 𝑈 − 1, 𝑠 = 0, 𝑠 < 𝐿 • for 𝑢 = 𝑈 − 2 𝑒𝑝𝑥𝑜𝑢𝑝 0 𝑇 𝐿 𝛾(𝑢, 𝐿) = 𝛾(𝑢 + 1, 𝐿)𝑧 𝑢+1 for 𝑚 = 𝐿 − 1 … 1 𝑇(𝑚) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 • 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 97
Backward algorithm 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t • Initialization: 𝛾 𝑈 − 1, 𝐿 = 1, 𝛾 𝑈 − 1, 𝑠 = 0, 𝑠 < 𝐿 • for 𝑢 = 𝑈 − 2 𝑒𝑝𝑥𝑜𝑢𝑝 0 𝑇 𝐿 𝛾(𝑢, 𝐿) = 𝛾(𝑢 + 1, 𝐿)𝑧 𝑢+1 for 𝑚 = 𝐿 − 1 … 1 𝑇(𝑚) 𝛾 𝑢 + 1, 𝑠 𝑇(𝑠+1) 𝛾 𝑢 + 1, 𝑠 + 1 • 𝛾 𝑢, 𝑠 = 𝑧 𝑢+1 + 𝑧 𝑢+1 98
The joint probability 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝛽 𝑢, 𝑠 𝑄 𝑡 𝑢+1 … 𝑡 𝑂−1 𝐘 𝑡 𝑢+1 …𝑡 𝑂−1 →[𝑇 𝑠+ ]…𝑇 𝐿 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 We now can compute this 𝛾 𝑢, 𝑠 99
The joint probability 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝐶 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 /B/ 𝑧 5 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 6 𝑧 7 𝑧 8 𝑧 5 /IY/ 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝐺 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /F/ 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝐽𝑍 𝑧 0 𝑧 1 𝑧 2 𝑧 3 𝑧 4 𝑧 5 𝑧 6 𝑧 7 𝑧 8 /IY/ 0 1 2 3 4 5 6 7 8 t 𝑄 𝑡 𝑢 = 𝑇 𝑠 , 𝐓|𝐘 = 𝛽 𝑢, 𝑠 𝛾 𝑢, 𝑠 • We will call the first term the forward probability 𝛽 𝑢, 𝑠 • We will call the second term the backward probability 𝛾 𝑢, 𝑠 Forward algo Backward algo 100
Recommend
More recommend