Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking
ECE 417, Multimedia Signal Processing October 10, 2019
Lecture 14: LPC speech synthesis and autocorrelation- based pitch - - PowerPoint PPT Presentation
Lecture 14: LPC speech synthesis and autocorrelation- based pitch tracking ECE 417, Multimedia Signal Processing October 10, 2019 Outline The LPC-10 speech synthesis model Autocorrelation-based pitch tracking Inter-frame
ECE 417, Multimedia Signal Processing October 10, 2019
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = ,
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced (P>0) vs. Unvoiced (P=0)
G
Gain= ๐;<=>?@
BB ๐ =
C./0
๐
BB ๐ =
,
C./0
๐ฆ ๐ ๐ฆ[๐ โ ๐] = ๐ฆ ๐ โ ๐ฆ โ๐ = โฑ/H ๐ ๐
K
= 1 2๐ N
/O O
๐ ๐
K ๐$%P๐๐
Notice that, for n=0, this becomes just Parsevalโs theorem: ๐
BB 0 =
,
C./0
๐ฆK ๐ = 1 2๐ N
/O O
๐ ๐
K ๐๐
But since ๐ ๐
K is positive and real, any value of ๐$%P that is NOT positive and
real will reduce the value of the integral! ๐
BB ๐ = 1
2๐ N
/O O
๐ ๐
K ๐$%P๐๐ โค 1
2๐ N
/O O
๐ ๐
K ๐๐ = ๐ BB 0
BB ๐ =
C./0
C./0
BB 0
Pitch period = 9ms = 99 samples Pitch period = 9ms = 99 samples
๐ = argmax
XYZ[\C\XY]^
๐
BB[๐]
?_` and ๐ ?ab, are
important for good performance:
?_` corresponds to a high womanโs pitch,
about ๐บ
@/๐ ?_` โ 250 Hz
?ab corresponds to a low manโs pitch,
about ๐บ
@/๐ ?ab โ 80 Hz
๐?_` ๐?ab
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = ,
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced (P>0) vs. Unvoiced (P=0)
G
Gain= ๐;<=>?@
BB ๐ โ ๐ BB 0
BB ๐ โ ๐[๐],
BB ๐ โช ๐ BB 0
ijj k โฅ ๐ขโ๐ ๐๐กโ๐๐๐: say the frame is voiced.
ijj k < ๐ขโ๐ ๐๐กโ๐๐๐: say the frame is
voiced: ๐ฆ ๐ + ๐ โ ๐ฆ ๐ unvoiced: E[๐ฆ ๐ ๐ฆ ๐ โ ๐ ] โ ๐[๐]
Pitch Period Sample Number, n Frame Boundary Frame Boundary Frame Boundary Frame Boundary
Linear interpolation sounds much
interpolation using a formula like ๐ ๐ = (1 โ ๐)๐
u + ๐๐ uvH
Where
u is the pitch period in frame t
P/u@ @
is how far sample n is from the beginning of frame t
Pitch Period Sample Number, n Frame Boundary Frame Boundary Frame Boundary Frame Boundary
P.u@ u@v}/H
Vocal Tract: Modeled by an LPC synthesis Filter.
๐ ๐ = ,
๐ ๐ โ ๐๐ ๐ ๐ ~๐ช 0,1 Voiced Speech, pitch period P Unvoiced Speech Binary Control Switch: Voiced vs. Unvoiced
G
Gain= ๐;<=>?@
By Morn - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index. php?curid=24084756
๐ ๐ = ,
๐ ๐ โ ๐๐
RMS to equal 1.0, we need to scale each pulse by ๐: ๐ ๐ = ๐ ,
๐ ๐ โ ๐๐
Pitch period = 80 samples โ first pulse in frame 31 canโt occur until the 70th sample of the frame
30
๐ 0 = 0
๐ ๐ = ๐ ๐ โ 1 + 2๐ ๐[๐]
๐ ๐ = โฌ ๐ ๐ ๐ 2๐ โ ๐ ๐ โ 1 2๐ > 0 ๐๐๐ก๐
30
Sample Number, n Phase ๐ ๐ ๐ ๐ 2๐ 4๐ 6๐ 8๐ ๐ ๐
pulse train. In fact, each sample is highly predictable from previous samples. ๐ฆ[๐] โ ,
C.H Hk
๐ฝC๐ฆ[๐ โ ๐]
predictability!
The LPC idea:
๐ ๐ = ๐ฆ ๐ โ ,
C.H Hk
๐ฝC๐ฆ[๐ โ ๐]
explain as much as they can, so that ๐ ๐ is as close to zero as possible.
๐ ๐ ๐ฆ ๐
โก.H Hk
K
โก.H Hk
โนล โนโขลฝ = 0 gives
โก.H Hk
๐BB ๐ ๐BB |๐ โ ๐|
โก.H Hk
C.H Hk
C.H Hk
C.H Hk
C.H Hk
C.H Hk
Hk
even in the worst possible case, ๐ ๐ โค 11๐ฝ?ab๐?ab.
just a delta function (๐ ๐ = ๐ ๐ ), and suppose all of the ๐ฝC = 0 except the first one, ๐ฝH = 1. 1. Then ๐ก ๐ = ๐ ๐ + 1. 1๐ก[๐ โ 1] = 1. 1 P Which overflows your 16-bit sample buffer after only 110 samples. Your
Fortunately, the LPC synthesis filter is causal, so itโs easy to guarantee stability. We just need to make sure that all of the poles have magnitude less than 1: |๐
โบ| < 1
We find the poles like this: ๐ผ ๐จ = 1 ๐ต(๐จ) = 1 โC.k
Hk
๐C๐จ/C = 1 โโบ.H
Hk
1 โ ๐
โบ๐จ/H
in other words, ๐
โบ = ๐ ๐๐๐ข๐ก(๐ต ๐จ )
โฆwhich you can do using np.roots, if you define the polynomial correctly. Then you just truncate the magnitude, ๐
โบ โ min ๐ โบ , 0.999 ๐$โกiยข
โฆand then use np.poly to convert back from roots to polynomial.