Speaker Change Detection using Siamese Networks Siamese layers share - - PowerPoint PPT Presentation

speaker change detection using siamese networks
SMART_READER_LITE
LIVE PREVIEW

Speaker Change Detection using Siamese Networks Siamese layers share - - PowerPoint PPT Presentation

Speaker Change Detection using Siamese Networks Siamese layers share their Acoustic Data Acoustic Data weights Left Segment Right Segment Classifier is trained using binary cross-entropy BLSTM BLSTM Siamese Input features are


slide-1
SLIDE 1

Speaker Change Detection using Siamese Networks

  • Siamese layers share their

weights

  • Classifier is trained using

binary cross-entropy

  • Input features are PLPs

Left Segment BLSTM Right Segment BLSTM

Same/Different

Acoustic Data Acoustic Data Classifier Siamese Left embedding Right embedding

slide-2
SLIDE 2

Pre-training of the Siamese Layers

  • Gender classification
  • Triplet Loss

BLSTM Male/Female left BLSTM right BLSTM anchor BLSTM positive BLSTM negative BLSTM xl xr xa xp xn min ∑"#$

%

max 0, Δ + -(/0 " , /1 " ) − -(/0 " , /4 " )

  • Contrastive Divergence

min ∑"#$

%

7 8 = : -(/;("), /<(")) + 7 8 ≠ : max(0, Δ − -(/; " , /< " ))

slide-3
SLIDE 3

Validation Data Classification Accuracy (%)

Pretraining Distance Freeze Siamese layers Accuracy Gender classification

  • Yes

76.9 Gender classification

  • No

78.1 Contrastive divergence Cosine Yes 76.7 Contrastive divergence Cosine No 87.3 Contrastive divergence Euclidean Yes 77.4 Contrastive divergence Euclidean No 87.5 Triplet loss Cosine Yes 84.6 Triplet loss Cosine No 87.9 Triplet loss Euclidean Yes 82.7 Triplet loss Euclidean No 89.0