Effect of Telephone-Line Transmission and Digital Audio Format on - - PowerPoint PPT Presentation

effect of telephone line transmission and digital audio
SMART_READER_LITE
LIVE PREVIEW

Effect of Telephone-Line Transmission and Digital Audio Format on - - PowerPoint PPT Presentation

1 Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011 Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements Christoph Meinerz Herbert Masthoff Landeskriminalamt Brandenburg,


slide-1
SLIDE 1

Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011

Christoph Meinerz

Landeskriminalamt Brandenburg, Germany christoph.meinerz@gmx.de

Herbert Masthoff

Department of Phonetics, University of Trier, Germany masthoff@uni-trier.de

Effect of Telephone-Line Transmission and Digital Audio Format on Formant Tracking Measurements

1

slide-2
SLIDE 2

Effect of Telephone-Line Transmission and Digital Audio Format 27.07.2011

Christoph Meinerz

Landeskriminalamt Brandenburg, Germany christoph.meinerz@gmx.de

Herbert Masthoff

Department of Phonetics, University of Trier, Germany masthoff@uni-trier.de

Introduction

  • Formants, Speaker ID and Audio Compression

Method

  • Experimental Setup, Hardware, Software

Results

  • Formant Shift

Conclusion

  • What to do

2

slide-3
SLIDE 3

27.07.2011 3

  • (revival of) reports of formant measurements for speaker identification

(i.e. Nolan/Grigoras, 2005; Becker et al., 2007; Jessen et al., 2010; Simpson/French, 2010)

  • reports of effects of telephone and lossy compression on acoustic

parameters (Künzel, 2001; Köster/Grasmück, 2004; Gonzalez et. al., 2003)

  • the problem is real: telephone-intercepts in low-Bit .mp3!

➡ results of preliminary study: effects of telephone-line and lossy low-

Bit audio compression on LPC-based formant-measurement and no intra-speaker variation

Introduction

slide-4
SLIDE 4

27.07.2011 4

1 2 1 2

Method I

Experimental set-up - „The Plan“

slide-5
SLIDE 5

27.07.2011 5

mike .wav PCM 44.1 kHz 705 kbps mike .wma CBR

  • 22. kHz

20 kbps mike .mp3 CBR 8 kHz 8 kbps

Method II

Audio Formats and Hardware

tel .wav PCM 44.1 kHz 705 kbps tel .wma CBR

  • 22. kHz

20 kbps tel .mp3 CBR 8 kHz 8 kbps

Tech-Specs: „Re-Tel“ - Tel. Rec. Adapter 157 Soundcard: MBox 2 Pro Tech-Specs: Sound Studio UoT Mike: Neumann M147 Tube Soundcard: RME Hammerfall

slide-6
SLIDE 6

27.07.2011

Shift of average formant frequency according to format (males)

6

600 1.200 1.800 2.400 mike .wav mike .wma mike .mp3

F3 F3 F2 F2 F1 F1

600 1.200 1.800 2.400 tel .wav

  • tel. wma

tel .mp3

2 1

Results I

slide-7
SLIDE 7

27.07.2011

F3 F3 F2 F2 F1 F1

1 2 7

600 1.200 1.800 2.400 mike .wav mike .wma mike .mp3 600 1.200 1.800 2.400 tel .wav

  • tel. wma

tel .mp3

Shift of average formant frequency according to format (females)

Results II

slide-8
SLIDE 8

27.07.2011

F3 F2 F1

8

600 1.200 1.800 2.400 mike .wav mike .wma mike .mp3 600 1.200 1.800 2.400 tel .wav

  • tel. wma

tel .mp3

100 % 100 % 98 % 82 % 83 % 80 % 77 % 98 % 83 %

90 %

87 % 82 %

100 %

98 % 83 %

104 % 102 %

98 %

Mean shift of average formant frequency according to format % (all)

Results III

slide-9
SLIDE 9

27.07.2011 9

Results IV

Sonagraphic symptoms (top mike .wav, bottom mike .mp3)

2

slide-10
SLIDE 10

27.07.2011 10

Results V

Sonagraphic symptoms (top tel .wav, bottom tel .mp3)

2

slide-11
SLIDE 11

27.07.2011 11

Results VI

Sonagraphic symptoms (top mike .wav, bottom mike .mp3)

1

slide-12
SLIDE 12

27.07.2011 12

Results VII

Sonagraphic symptoms (top tel .wav, bottom tel .mp3)

1

slide-13
SLIDE 13

27.07.2011 13

  • shift of formant frequencies (all)

Summary

  • F3: downward ≈ 2 - 23 %
  • F2: downward ≈ 1 - 17 %
  • F1: mike downward ≈ 1 - 16 %

tel upward ≈ 2 - 4 %, .wav + .wma

tel downward ≈ 1 %, .mp3

  • highest amount of shift in tel .mp3, 8 kbps
  • telephone-line alone produces shift of F2, F3 ≈ mike .mp3
  • sonagraphic and auditory symptoms
  • spectral cancellations - „the moth“
  • „musical noise“ effect
slide-14
SLIDE 14

27.07.2011 14

  • results confirm those already reported (i.e. Becker et al., 2011!)
  • consider shifting effects when doing formants and

formant-related ASR (LPC)

  • include larger population for statistical significance
  • possibly detect “critical” Bit-rate

Conclusion

  • possibly cross-check with FFT
  • based measurements
slide-15
SLIDE 15

27.07.2011 15

Thank you for your attention!

Moth-Zilla (Becker et. al., Vienna 2011)

slide-16
SLIDE 16

27.07.2011 Becker, T. et al: Forensic speaker verification using formant features and Gaussian Mixture Models. Interspeech 2008 Special Session: Forensic Speaker Recognition – Traditional and Automatic Approaches, Brisbane. Boersma, P./D. Weenink: Praat: doing phonetics by computer [Computer program]. Version 5.2.17, retrieved 26 March 2011 from http://www.praat.org/ Gonzalez, J. et al.: Acoustic analysis of pathological voices compressed with MPEG System. Journal of Voice, 17, 2003, 126-139. Grasmück, C./J.-P. Köster: Die Auswirkung von mp3 und ATRAC-Kompression auf sprechertypische Parameter des Sprachsignals. In: Nolte, B.: Proceedings „Schall und Schwingungen in sensibler Umgebung“, 2004, Bonn, 126-132. Harrison, P.: Formant measurement errors for multiple synthetic speakers. IAFPA Annual Conference 2010, Trier. Jessen, M. et al.: Correlation between long-term formant measurements and automatic speaker recognition in forensic case material. IAFPA Annual Conference 2010, Trier. Künzel, H.J.: Beware of the telephone effect: the influence of telephone transmission on the measurement of formant frequencies. Forensic Linguistics, 8, 2001, 80-99. Nolan, F./C. Grigoras: A case for formant analysis in forensic speaker identification. International Journal of Speech, Language and the Law, 12, 2005, 143-173. Simpson, S./P. French : Testing the speaker discrimination ability of formant measurements. IAFPA Annual Conference 2010, Trier. 16

References