Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - - PowerPoint PPT Presentation

bandwidth extension of narrowband speech for low bit rate
SMART_READER_LITE
LIVE PREVIEW

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband - - PowerPoint PPT Presentation

IEEE Speech Coding Workshop Sept 1720, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding Speech Coding Workshop 2000


slide-1
SLIDE 1

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 1

IEEE Speech Coding Workshop

Sept 17–20, 2000 Lake Lawn Resort Delavan, WI Jean-Marc Valin, Roch Lefebvre University of Sherbrooke

Bandwidth Extension of Narrowband Speech for Low Bit- Rate Wideband Coding

slide-2
SLIDE 2

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 2

Outline

  • Problem statement
  • Proposed solution
  • System performance
  • Discussion
slide-3
SLIDE 3

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 3

Problem Statement

  • Telephone Band: 300 - 3400 Hz
  • AM Band: 50 - 7000 Hz
  • How to make sound like with 500 bits/sec?
  • We need to recover information from both low and

high-frequency bands

(G.729)

slide-4
SLIDE 4

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 4

Proposed Solution

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 1 1 1 F r e q u e n c y ( H z ) A m p l i t u d e ( d B )

  • 1) Do our best to recover the wideband information

from narrowband speech

  • 2) Use coding for the information that cannot be

recovered

– Recovered information :

  • Low-frequency band
  • High-frequency excitation

– Coded information :

  • High-frequency spectral

envelope

slide-5
SLIDE 5

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 5

System Overview

  • Inverse IRM filter is optional

– produces a flat response from 200-3500 Hz

Inverse IRM Filter Low-frequency regeneration 2 High-frequency regeneration

8 kHz narrowband 16 kHz wideband

50-300 Hz band 300-3400 Hz band 3400-8000 Hz band

Side information

slide-6
SLIDE 6

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 6

Low-Frequency Regeneration (1/2)

  • Assumptions :

– Only pitch harmonics need to be recovered

  • In general, no more than two pitch harmonics below 200 Hz

– Absolute phase is not perceptually relevant

  • Frequency of harmonics determined from pitch

analysis

  • Amplitudes determined from feed-forward multi-

layer perceptron (output in log domain)

slide-7
SLIDE 7

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 7

Low-Frequency Regeneration (2/2)

Low-frequency harmonic synthesis Scale and sum 2

1st harmonic 2nd harmonic

Pitch analysis MFCC calculation Multi-layer Perceptron

Pitch delay Cepstral coefficients Scale factors Narrowband speech Low frequencies Pitch gain

LP filter

(1) (1) (16)

slide-8
SLIDE 8

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 8

High-Frequency Extension

  • Excitation-filter model (16 ms frames)
  • Problem is separated in two parts

– Excitation extension (no side information) – Spectral envelope coding (side information)

A(z)

Narrowband speech

LPC analysis Excitation extension B(z)

1

Spectral envelope Extension

Side information (High-frequency spectral envelope) High- frequency band

High pass

slide-9
SLIDE 9

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 9

Excitation Extension

Narrowband excitation

Absolute value Whitening filter

wideband excitation

2 High pass

5 1 1 5 2

  • .

5 . 5 1 2 4 6 8 2 4 6 8 1 5 1 1 5 2 . 2 . 4 . 6 . 8 1 2 4 6 8 5 1 1 5 5 1 1 5 2

  • .

2 . 2 . 4 . 6 . 8 2 4 6 8 1 2 3 4 5

slide-10
SLIDE 10

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 10

Spectral Envelope Coding

  • Spectral envelope calculated from the wideband

LPC coefficients

  • Quantization of the 3000-8000 Hz range (40 points)

– Log domain – 8-bit Vector Quantization (500 bits/s side information, using 16 ms frames)

  • Concatenation with envelope obtained from LPC

analysis on narrowband speech

slide-11
SLIDE 11

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 11

Objective results

  • Low-frequency band

– 3 dB RMS error on harmonic amplitude

  • High-frequency band

– 3.6 dB RMS error on envelope – No objective measure for excitation extension (perceptually very close to original)

slide-12
SLIDE 12

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 12

Subjective Results

female male Original wideband Recovered from

  • riginal IRM-filtered speech

Recovered from G.729 coded speech

slide-13
SLIDE 13

Speech Coding Workshop 2000 Jean-Marc Valin, Roch Lefebvre 13

Discussion

  • Highlights

– Expand IRM-filtered telephone-band speech to AM band – Very low side information rate (500 bits/s)

  • Areas of improvement

– Use high-band spectral estimation before coding – Use residual low-frequency information (below 300 Hz) – Noise robustness – Post-filtering