Towards a Fair Evaluation of Zero-Shot Action Recognition using - - PowerPoint PPT Presentation

β–Ά
towards a fair evaluation of zero shot action recognition
SMART_READER_LITE
LIVE PREVIEW

Towards a Fair Evaluation of Zero-Shot Action Recognition using - - PowerPoint PPT Presentation

European Conference on Computer Vision 2018 Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen Computer Vision for Human-Computer Interaction


slide-1
SLIDE 1

KIT – The Research University in the Helmholtz Association

Computer Vision for Human-Computer Interaction Karlsruhe Institute of Technology, Germany

www.kit.edu

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen

European Conference on Computer Vision 2018

Workshop on Shortcomings in Vision and Language, ECCV 2018

cvhci.anthropomatik.kit.edu

slide-2
SLIDE 2

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 2

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

slide-3
SLIDE 3

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 3

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

slide-4
SLIDE 4

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 4

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Introduction: Zero-Shot Action Recognition

Task: classifying actions without any training data How? Linking visual and semantic features

08.09.2018

Zero-Shot Learning premise: source and target classes are disjoint!

slide-5
SLIDE 5

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 5

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

08.09.2018

Zero-Shot Learning premise: source and target classes are disjoint!

Origin?

slide-6
SLIDE 6

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 6

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

  • Classifying the

already known actions

  • π‘ˆ βŠ‚ 𝑇

(T: target classes, S: source classes)

  • Source from the same

domain: 𝑻 = π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • 𝑼 ∩ π‘»π’π’ƒπ’–π’‹π’˜π’‡ = βˆ… οƒ  ZSL

premise satisfied 

slide-7
SLIDE 7

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 7

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

  • Classifying the

already known actions

  • π‘ˆ βŠ‚ 𝑇

(T: target classes, S: source classes)

  • Source from the same

domain: 𝑻 = π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • 𝑼 ∩ π‘»π’π’ƒπ’–π’‹π’˜π’‡ = βˆ… οƒ  ZSL

premise satisfied 

  • Source from a different domain: 𝑻 = π‘»π’‡π’šπ’–
  • Boost in accuracy
  • 𝑼 ∩ π‘»π’‡π’šπ’– β‰  βˆ… . ZSL premise not given

Zero-Shot AR | Cross-dataset (Zhu et. al, 2018)

slide-8
SLIDE 8

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 8

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

  • Classifying the

already known actions

  • π‘ˆ βŠ‚ 𝑇

(T: target classes, S: source classes)

  • Source from the same

domain: 𝑻 = π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • 𝑼 ∩ π‘»π’π’ƒπ’–π’‹π’˜π’‡ = βˆ… οƒ  ZSL

premise satisfied 

  • Source from a different domain: 𝑻 = π‘»π’‡π’šπ’–
  • Boost in accuracy
  • 𝑼 ∩ π‘»π’‡π’šπ’– β‰  βˆ… . ZSL premise not given
  • Source from native and external

domains: 𝑻 = π‘»π’‡π’šπ’– βˆͺ π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • Accuracy, lower-bounded by the

intra- and cross-dataset regimes Zero-Shot AR | Cross-dataset (Zhu et. al, 2018) Zero-Shot AR Hybrid (ours)

slide-9
SLIDE 9

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 9

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

What is the origin of the source dataset?

Standard: intra-dataset, same origin of training- and test data Cross-dataset: utilize large-scale external data sources

08.09.2018

Origin? Supervised Action Recognition (AR) Zero-Shot AR Intra-dataset

  • Classifying the

already known actions

  • π‘ˆ βŠ‚ 𝑇

(T: target classes, S: source classes)

  • Source from the same

domain: 𝑻 = π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • 𝑼 ∩ π‘»π’π’ƒπ’–π’‹π’˜π’‡ = βˆ… οƒ  ZSL

premise satisfied 

  • Source from a different domain: 𝑻 = π‘»π’‡π’šπ’–
  • Boost in accuracy
  • 𝑼 ∩ π‘»π’‡π’šπ’– β‰  βˆ… . ZSL premise not given
  • Source from native and external

domains: 𝑻 = π‘»π’‡π’šπ’– βˆͺ π‘»π’π’ƒπ’–π’‹π’˜π’‡

  • Accuracy, lower-bounded by the

intra- and cross-dataset regimes Zero-Shot AR | Cross-dataset (Zhu et. al, 2018) Zero-Shot AR Hybrid (ours) Our corrective protocol eliminates source-target synonyms οƒ  ZSL premise βœ”

slide-10
SLIDE 10

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 10

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Exploring the semantic source-target similarity

A dataset does not contain the same action twice οƒ  no label synonyms External datasets intersect with datasets for zero-shot AR! Example: brushing hair in ActivityNet, Kinetics and HMDB-51 Specializations: drinking beer vs. drinking οƒ  Getting rid of the direct matches is not enough!

Semantic similarity between the source and target classes is much higher for the external datasets

08.09.2018

slide-11
SLIDE 11

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 11

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Exploring the semantic source-target similarity

Semantic similarity between the source and target classes is much higher for the external datasets The accuracy is greatly influenced by the presence of analogue classes οƒ  Need for a method to constrain the external datasets

08.09.2018

slide-12
SLIDE 12

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 12

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Corrective protocol for fair cross-dataset transfer

1) Calculate the maximum intra- dataset similarity as our rejection threshold π’•π’–π’Š: π‘‘π‘’β„Ž = 𝑛𝑏𝑦

π‘π‘™βˆˆπ‘‡π‘—π‘œπ‘’π‘ π‘, π‘’π‘›βˆˆπ‘ˆπ‘‘ (πœ• 𝑏𝑙 , πœ•(𝑒𝑛))

2) Filter out the source category, if the label is too similar: βˆ€π‘’π‘› ∈ π‘ˆ, 𝑑 πœ• 𝑏𝑙 , πœ• 𝑒𝑛 ≀ π‘‘π‘’β„Ž

Propotion of allowed source labels for different maximum similarity thresholds

Notation S = aK k=1

K

βˆ’ source actions classes T = tm m=1

M

βˆ’ target classes πœ• β‹… βˆ’ π‘šπ‘π‘π‘“π‘š π‘›π‘π‘žπ‘žπ‘—π‘œπ‘• 𝑒𝑝 π‘’β„Žπ‘“ π‘‘π‘“π‘›π‘π‘œπ‘’π‘—π‘‘ π‘‘π‘žπ‘π‘‘π‘“ 𝑓. 𝑕. π‘₯𝑝𝑠𝑒2𝑀𝑓𝑑 s β‹… βˆ’ π‘‘π‘—π‘›π‘—π‘šπ‘π‘ π‘—π‘’π‘§ 𝑛𝑓𝑏𝑑𝑣𝑠𝑓 π‘‘π‘’β„Žβˆ’π‘’β„Žπ‘ π‘“π‘‘β„Žπ‘π‘šπ‘’ π‘‘π‘—π‘›π‘—π‘šπ‘π‘ π‘—π‘’π‘§

08.09.2018

slide-13
SLIDE 13

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 13

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Summary

We empirically show that external sources tend to have actions excessively similar to the target classes, strongly influencing the performance and violating the ZSL premise We propose an evaluation procedure that enables fair use of external data for zero-shot action recognition We propose the hybrid ZSL regime, which uses the available training data of the source domain and the large-scale external datasets, improving ZSL performance

08.09.2018

slide-14
SLIDE 14

Workshop on Shortcoming in Vision and Language European Conference on Computer Vision 2018 14

  • A. Roitberg, M. Martinez, M. Haurilet and R. Stiefelhagen

Towards a Fair Evaluation of Zero-Shot Action Recognition using External Data

Come and see our poster  Thank you for your attention!

β€œTowards a Fair Evaluation of Zero-Shot Action Recognition using External Data”

Alina Roitberg, Manuel Martinez, Monica Haurilet, Rainer Stiefelhagen

08.09.2018