Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1
Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - - PowerPoint PPT Presentation
Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - - PowerPoint PPT Presentation
IEEE Speech Coding Workshop Sept 1720, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2
Outline
- Problem statement
- Proposed solution
- System performance
- Discussion
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3
Problem Statement
- Telephone Band: 300 - 3400 Hz
- AM Band: 50 - 7000 Hz
- How to make sound like with 500 bits/sec?
- We need to recover information from both low and
high-frequency bands
(G.729)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4
Proposed Solution
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 1 1 1 F r e q u e n c y ( H z ) A m p l i t u d e ( d B )
- 1) Do our best to recover the wideband information
from narrowband speech
- 2) Use coding for the information that cannot be
recovered
– Recovered information :
- Low-frequency band
- High-frequency excitation
– Coded information :
- High-frequency spectral
envelope
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5
System Overview
- Inverse IRM filter is optional
– produces a flat response from 200-3500 Hz
Inverse IRM Filter Low-frequency regeneration 2 High-frequency regeneration
8 kHz narrowband 16 kHz wideband
50-300 Hz band 300-3400 Hz band 3400-8000 Hz band
Side information
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6
Low-Frequency Regeneration (1/2)
- Assumptions :
– Only pitch harmonics need to be recovered
- In general, no more than two pitch harmonics below 200 Hz
– Absolute phase is not perceptually relevant
- Frequency of harmonics determined from pitch
analysis
- Amplitudes determined from feed-forward multi-
layer perceptron (output in log domain)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7
Low-Frequency Regeneration (2/2)
Low-frequency harmonic synthesis Scale and sum 2
1st harmonic 2nd harmonic
Pitch analysis MFCC calculation Multi-layer Perceptron
Pitch delay Cepstral coefficients Scale factors Narrowband speech Low frequencies Pitch gain
LP filter
(1) (1) (16)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8
High-Frequency Extension
- Excitation-filter model (16 ms frames)
- Problem is separated in two parts
– Excitation extension (no side information) – Spectral envelope coding (side information)
A(z)
Narrowband speech
LPC analysis Excitation extension B(z)
1
Spectral envelope Extension
Side information (High-frequency spectral envelope) High- frequency band
High pass
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9
Excitation Extension
Narrowband excitation
Absolute value Whitening filter
wideband excitation
2 High pass
5 1 1 5 2
- .
5 . 5 1 2 4 6 8 2 4 6 8 1 5 1 1 5 2 . 2 . 4 . 6 . 8 1 2 4 6 8 5 1 1 5 5 1 1 5 2
- .
2 . 2 . 4 . 6 . 8 2 4 6 8 1 2 3 4 5
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10
Spectral Envelope Coding
- Spectral envelope calculated from the wideband
LPC coefficients
- Quantization of the 3000-8000 Hz range (40 points)
– Log domain – 8-bit Vector Quantization (500 bits/s side information, using 16 ms frames)
- Concatenation with envelope obtained from LPC
analysis on narrowband speech
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11
Objective results
- Low-frequency band
– 3 dB RMS error on harmonic amplitude
- High-frequency band
– 3.6 dB RMS error on envelope – No objective measure for excitation extension (perceptually very close to original)
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12
Subjective Results
female male Original wideband Recovered from
- riginal IRM-filtered speech
Recovered from G.729 coded speech
Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13
Discussion
- Highlights
– Expand IRM-filtered telephone-band speech to AM band – Very low side information rate (500 bits/s)
- Areas of improvement