Summary of the 2015 NIST Language Recognition i-Vector Machine - - PowerPoint PPT Presentation

summary of the 2015 nist language recognition i vector
SMART_READER_LITE
LIVE PREVIEW

Summary of the 2015 NIST Language Recognition i-Vector Machine - - PowerPoint PPT Presentation

Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge Craig Greenberg Audrey Tong, Alvin Martin, George Doddington National Institute of Standards and Technology Douglas Reynolds, Elliot Singer MIT Lincoln


slide-1
SLIDE 1

Summary of the 2015 NIST Language Recognition i-Vector Machine Learning Challenge

Craig Greenberg

Audrey Tong, Alvin Martin, George Doddington

National Institute of Standards and Technology

Douglas Reynolds, Elliot Singer

MIT Lincoln Laboratory

Desire Banse, John Howard, Hui Zhao

NIST contractors

Daniel Garcia-Romero, Alan McCree

HLT Center of Excellence, JHU

Jaime Hernandez-Cordero, Lisa Mason

U.S. Department of Defense

Odyssey 2016 Special Session June 24, 2016

slide-2
SLIDE 2

Motivation

 Attract researchers outside of the speech processing

community to work on language recognition

 Explore new ideas in machine learning for use in

language recognition

 Improve the performance of language recognition

technology

2

slide-3
SLIDE 3

The Task: Open Set Language Identification

3

Audio Segment Language 1 Unknown Languages Language N . . . Language 2

?

slide-4
SLIDE 4

4

 Telephone conversations and narrowband broadcasts

 Previous NIST LREs 1996 – 2011  Selected data from the IARPA Babel Program

 Data driven selection

 Only languages with multiple sources were select to reduce

source to language effect

 Language pairs with higher confusability were preferred

The Data & Language Selection

slide-5
SLIDE 5

Set Number of Languages Number of Segments per Language Total Segments Train 50 300 1500 Dev. 65

  • 50 target
  • 15 out-of-set

≈ 100 6431 Test 65

  • 50 target
  • 15 out-of-set

100 6500

  • 5000 target

– 1500 progress – 3500 evaluation

  • 1500 out-of-set

– 450 progress – 1050 evaluation

Data Size

Training set did not include out-of-set data

Development set included an unlabeled out-of-set

Test set was divided into a progress and evaluation subsets

5

slide-6
SLIDE 6

6

Data Sources

Distribution of sources across languages similar

slide-7
SLIDE 7

7

Segment Speech Durations

Distribution of segment speech durations relatively similar

slide-8
SLIDE 8

8

Performance Metric

  • k = target language
  • oos = out-of-set language
  • n = total number of target languages = 50
  • Poos = prior probably of oos language = 0.23
slide-9
SLIDE 9

9

Participation

 Worldwide participation

 6 continents  31 countries

 78 participants downloaded data  56 participants submitted results

 44 unique organizations

 3773 valid submissions during the challenge period

(May 1 - September 1, 2015)

 4021 submissions as of January 2016

slide-10
SLIDE 10

10

Contrast with Traditional LREs

i-Vector Traditional Input i-vector representations of audio segments audio segments Task Identification (n-class problem) Detection (2-class problem) Metric Cost based on error rates Cost based on miss and false alarm rates Target Language 50 10 – 25 Segment Speech Duration Log normal distribution with a mean of 35 secs Uniform distribution of 3, 10, 30 secs Challenge Duration 4 months 1 week Scoring Results feedback on portion of the test set No results feedback Evaluation Platform Web-based Manual

slide-11
SLIDE 11

Web-based Evaluation Platform

 Goal was to facilitate the

evaluation process with limited human involvement

 All evaluation activities

were conducted via a web interface

 Download training and

evaluation data

 Upload submissions for

validation and scoring

 Track submission status  View results and site ranking

11

slide-12
SLIDE 12

12

Daily Best Cost During the Challenge

slide-13
SLIDE 13

13

Best Cost Per Participant at End of Challenge

slide-14
SLIDE 14

14

Number of Submissions Per Participant

slide-15
SLIDE 15

15

Results by Target Language

slide-16
SLIDE 16

16

Results by Speech Duration

slide-17
SLIDE 17

17

 Record participation, more than all previous LREs

 56 participants from 44 unique organizations from 31 countries

in 6 continents

 46 of 56 systems were better than the baseline system  6 were better than the oracle system  Half of improvement made in the first few weeks of the

four-month challenge

 Top systems did well on Burmese, not so on English or Hindi  Performance on OOS language was in the middle  Did not receive many system descriptions so not clear if new

methods were investigated

 Top system did develop novel technique to improve out-of-

set detection (talk upcoming)

Lessons Learned

slide-18
SLIDE 18

18

 NIST plans to keep the web-based evaluation platform up

for the foreseeable future as a system development tool

 Visit http://ivectorchallenge.nist.gov  Download the training and development data  Test your system on the test data used in the Challenge

Benchmark Your System

slide-19
SLIDE 19

19

 SRE16 & Workshop – 15th edition

 Speaker detection of telephone speech recorded over variety

  • f handsets

 Introduction of fixed training condition  Test segments with more speech duration variability  Data collected outside North America  Inclusion of trials using same and different phone numbers  http://www.nist.gov/itl/iad/mig/sre16.cfm

 2016 LRE Analysis Workshop

 In-depth analysis of results from LRE15

 Co-located with SLT16 in San Juan, Puerto Rico

Upcoming Activities