Music Classification Using Constant-Q Based Features a library for - - PowerPoint PPT Presentation

music classification using constant q based features
SMART_READER_LITE
LIVE PREVIEW

Music Classification Using Constant-Q Based Features a library for - - PowerPoint PPT Presentation

Music Classification Using Constant-Q Based Features a library for mobile devices Lena Brder January 5, 2013 Outline 1 Introduction 2 Music Signal Processing The Constant Q transform Feature Extraction Gaussian Mixture Models 3


slide-1
SLIDE 1

Music Classification Using Constant-Q Based Features

a library for mobile devices

Lena Brüder January 5, 2013

slide-2
SLIDE 2

Outline

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

slide-3
SLIDE 3

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

2 / 22

slide-4
SLIDE 4

Objectives

Create a program that helps exploring music collections Derive all classification features from the Constant Q transform Design program as a library that runs on both a PC and on embedded

devices (→ Blackberry Playbook)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

3 / 22

slide-5
SLIDE 5

General approach to music classification

signal

MP3, WAV, FLAC, . . .

frequency domain

Constant Q transform

signal features

Length, dynamic range, tempo, timbre, chroma

classification

Gaussian model

result

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

4 / 22

slide-6
SLIDE 6

General approach to music classification

signal

MP3, WAV, FLAC, . . .

frequency domain

Constant Q transform

signal features

Length, dynamic range, tempo, timbre, chroma

classification

Gaussian model

result

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

4 / 22

slide-7
SLIDE 7

General approach to music classification

signal

MP3, WAV, FLAC, . . .

frequency domain

Constant Q transform

signal features

Length, dynamic range, tempo, timbre, chroma

classification

Gaussian model

result

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

4 / 22

slide-8
SLIDE 8

General approach to music classification

signal

MP3, WAV, FLAC, . . .

frequency domain

Constant Q transform

signal features

Length, dynamic range, tempo, timbre, chroma

classification

Gaussian model

result

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

4 / 22

slide-9
SLIDE 9

General approach to music classification

signal

MP3, WAV, FLAC, . . .

frequency domain

Constant Q transform

signal features

Length, dynamic range, tempo, timbre, chroma

classification

Gaussian model

result

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

4 / 22

slide-10
SLIDE 10

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

5 / 22

slide-11
SLIDE 11

Constant-Q-Transform: Intuition

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

6 / 22

slide-12
SLIDE 12

Constant Q transform: Definition

Let x(n) be a discrete time-domain signal, fk the center frequency of bin k and B the frequency bin count per octave. Nk is inversely proportional to fk. fs is the sampling rate and w : R → R, t → w(t) a continous window function with t = 0 for t ∈ [0, 1]. ak(n) = 1 Nk w n Nk

  • exp
  • −j2πnfk

fs

  • (1)

are complex basis functions, time-frequency-atoms or temporal kernels. Then XCQ(k, n) =

n+⌊

Nk 2 ⌋

  • l=n−⌊

Nk 2 ⌋

x(l)a∗

k

  • l − n + Nk

2

  • (2)

is the constant Q transform of x(n). The center frequencies fk are defined as fk = f12

k−1 B .

(3)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

7 / 22

slide-13
SLIDE 13

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-14
SLIDE 14

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-15
SLIDE 15

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-16
SLIDE 16

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-17
SLIDE 17

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-18
SLIDE 18

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-19
SLIDE 19

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-20
SLIDE 20

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-21
SLIDE 21

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-22
SLIDE 22

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-23
SLIDE 23

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-24
SLIDE 24

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-25
SLIDE 25

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-26
SLIDE 26

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-27
SLIDE 27

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-28
SLIDE 28

Feature Extraction

Different features are extracted:

Length of the piece Dynamic range (how relate loud parts to quieter ones) Tempo in BPM (not used for classification) Timbre (via Constant-Q Cepstrum) Key-invariant chroma (map all octaves to one, remove key)

Note:

Timbre and chroma are multi-dimensional features, the others are

scalar values.

Timbre and chroma are calculated every 10 − 20ms, the others are

calculated once per recording.

But: Classifiers expect features to be uniform, or at least comparable. Solution: Transform many multi-dimensional feature vectors to one

scalar value (→ dimensionality and data count reduction).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

8 / 22

slide-29
SLIDE 29

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-30
SLIDE 30

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-31
SLIDE 31

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-32
SLIDE 32

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

slide-33
SLIDE 33

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-34
SLIDE 34

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-35
SLIDE 35

Gaussian Mixture Models

Data count reduction works as follows:

Take all feature vectors of one feature Model their probability distribution Forget about the original feature

vectors

This step brings the data count

reduction: One model (fixed model size) instead of many (arbitrary count) feature vectors x1 x2

Do this with feature vectors of one recording: Get a model for the

recording

Do this with feature vectors of all recordings from a category: Get a

model for the category!

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

9 / 22

slide-36
SLIDE 36

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-37
SLIDE 37

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-38
SLIDE 38

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-39
SLIDE 39

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-40
SLIDE 40

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-41
SLIDE 41

GMM: Dimensionality reduction

Dimensionality reduction works through comparision of the models: x1 x2 quite similar d(a, b) ≈ 0.9 x1 x2 not that similar d(a, b) ≈ 30 For comparision, the Kullback-Leibler divergence is used (Monte-Carlo integration!).

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

10 / 22

slide-42
SLIDE 42

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-43
SLIDE 43

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-44
SLIDE 44

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-45
SLIDE 45

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-46
SLIDE 46

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-47
SLIDE 47

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

11 / 22

slide-48
SLIDE 48

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

12 / 22

slide-49
SLIDE 49

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-50
SLIDE 50

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-51
SLIDE 51

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-52
SLIDE 52

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-53
SLIDE 53

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-54
SLIDE 54

Classification: Classical approaches

Classical approaches use categories Decision: Does a recording belong to a category, or not? Score is binary: e.g. −1 or 1 Positive and negative examples needed for training (ideally many) Approaches exist that only need positive examples Examples for binary classifiers: LDA, SVM, (ANN)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

13 / 22

slide-55
SLIDE 55

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-56
SLIDE 56

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-57
SLIDE 57

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-58
SLIDE 58

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) + Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-59
SLIDE 59

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) Better matches are shown first + No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-60
SLIDE 60

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) Better matches are shown first No need for both positive and negative examples + Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-61
SLIDE 61

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs − There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-62
SLIDE 62

Classification: Approach used

Different approach here: Recordings get a score from [−1, 1] for a

category

Gives a ranking rather than a classification Positive and negative examples can be used, but there is no need for

both

Only a few examples are needed (works from a single feature vector,

5-10 is ideal) Better matches are shown first No need for both positive and negative examples Flexible approach, fits to users needs There is no decision which recordings definitely do not match

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

14 / 22

slide-63
SLIDE 63

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

15 / 22

slide-64
SLIDE 64

GMM: Applied to recordings and categories

How does it work?

Build a model for every recording Build a model for every category Compare recording-model to category-model Combine resulting scalar values for timbre and chroma with other

scalar values to new “all-feature vector”: feature vector =     timbre similarity to category model chroma similarity to category model dynamic range length of the recording    

There is one such feature vector per recording

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

16 / 22

slide-65
SLIDE 65

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-66
SLIDE 66

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-67
SLIDE 67

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-68
SLIDE 68

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-69
SLIDE 69

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-70
SLIDE 70

Classification: How does it work? (1/2)

Four-dimensional

recording feature vectors used

Calculate distribution of

vectors (→ covariance matrix)

Gaussian Model (no

mixture!) of positive example feature vectors

Calculate Mahalanobis

distance of any other feature vector: dΣ(x, y) =

  • (x − µ)T Σ−1(x − µ).

timbre similarity chroma similarity timbre similarity dynamic range length dynamic range

same vector

Sectional drawing of feature vectors

This gives a distance value in [0, ∞[.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

17 / 22

slide-71
SLIDE 71

Classification: How does it work? (2/2)

Transform from [0, ∞[ to [0, 1] through Tp(x) = 1 1+x

Tp(x) x 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12

Up to now: Positive model Negative model: Second model, mapped to [−1, 0] via Tn(x) = −1 1+x Sum both intervals: Values from [−1, 1]

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

18 / 22

slide-72
SLIDE 72

Classification: How does it work? (2/2)

Transform from [0, ∞[ to [0, 1] through Tp(x) = 1 1+x

Tp(x) x 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12

Up to now: Positive model Negative model: Second model, mapped to [−1, 0] via Tn(x) = −1 1+x Sum both intervals: Values from [−1, 1]

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

18 / 22

slide-73
SLIDE 73

Classification: How does it work? (2/2)

Transform from [0, ∞[ to [0, 1] through Tp(x) = 1 1+x

Tp(x) x 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12

Up to now: Positive model Negative model: Second model, mapped to [−1, 0] via Tn(x) = −1 1+x Sum both intervals: Values from [−1, 1]

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

18 / 22

slide-74
SLIDE 74

Classification: How does it work? (2/2)

Transform from [0, ∞[ to [0, 1] through Tp(x) = 1 1+x

Tp(x) x 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12

Up to now: Positive model Negative model: Second model, mapped to [−1, 0] via Tn(x) = −1 1+x Sum both intervals: Values from [−1, 1]

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

18 / 22

slide-75
SLIDE 75

Classification: How does it work? (2/2)

Transform from [0, ∞[ to [0, 1] through Tp(x) = 1 1+x

Tp(x) x 0.5 1.0 1 2 3 4 5 6 7 8 9 10 11 12

Up to now: Positive model Negative model: Second model, mapped to [−1, 0] via Tn(x) = −1 1+x Sum both intervals: Values from [−1, 1]

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

18 / 22

slide-76
SLIDE 76

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

19 / 22

slide-77
SLIDE 77

Results

Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches.

Classical: Three positives, three negatives → 94% matches, first

“false-positive” at rank 57

Jazz/RnB: Two positives, two negatives → 89% matches, first

“false-positive” at rank 41

Pop/Rock: Two positives, one negative → 87% matches, first

“false-positive” at rank 13

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

20 / 22

slide-78
SLIDE 78

Results

Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches.

Classical: Three positives, three negatives → 94% matches, first

“false-positive” at rank 57

Jazz/RnB: Two positives, two negatives → 89% matches, first

“false-positive” at rank 41

Pop/Rock: Two positives, one negative → 87% matches, first

“false-positive” at rank 13

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

20 / 22

slide-79
SLIDE 79

Results

Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches.

Classical: Three positives, three negatives → 94% matches, first

“false-positive” at rank 57

Jazz/RnB: Two positives, two negatives → 89% matches, first

“false-positive” at rank 41

Pop/Rock: Two positives, one negative → 87% matches, first

“false-positive” at rank 13

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

20 / 22

slide-80
SLIDE 80

Results

Testing procedure: Train classifier with positive and negative examples, take 100 best matches, count same-category matches.

Classical: Three positives, three negatives → 94% matches, first

“false-positive” at rank 57

Jazz/RnB: Two positives, two negatives → 89% matches, first

“false-positive” at rank 41

Pop/Rock: Two positives, one negative → 87% matches, first

“false-positive” at rank 13

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

20 / 22

slide-81
SLIDE 81

Demonstration

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

21 / 22

slide-82
SLIDE 82

Questions?

Any questions left?

Dynamic range Tempo Timbre Chroma Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

22 / 22

slide-83
SLIDE 83

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

23 / 22

slide-84
SLIDE 84

Dynamic range

Intuition: We want to define a measure of how loud parts of a musical

piece relate to the quieter ones.

The measure should be small if most of the signal is at one volume. It

should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal xc(t), which is dyncRMS =

  • 1

Tc Tc x2

c(t) d t

(4) with Tc being the last point in time of the signal.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

24 / 22

slide-85
SLIDE 85

Dynamic range

Intuition: We want to define a measure of how loud parts of a musical

piece relate to the quieter ones.

The measure should be small if most of the signal is at one volume. It

should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal xc(t), which is dyncRMS =

  • 1

Tc Tc x2

c(t) d t

(4) with Tc being the last point in time of the signal.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

24 / 22

slide-86
SLIDE 86

Dynamic range

Intuition: We want to define a measure of how loud parts of a musical

piece relate to the quieter ones.

The measure should be small if most of the signal is at one volume. It

should increase with the amount of volume changes during the recording. Within the context of music comparision, we define the dynamic range of an audio signal as the root of the mean energy of the continous input signal xc(t), which is dyncRMS =

  • 1

Tc Tc x2

c(t) d t

(4) with Tc being the last point in time of the signal.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

24 / 22

slide-87
SLIDE 87

Dynamic range

This definition will be changed slightly for the implementation: dyndRMS = 1 −

  • 1

N

N

  • n=0

nsumCQ2(XCQ, tn). (5) with nsumCQ(XCQ, tn) = 1 R

B

  • b=0

|XCQ(b, tn)| (6) and R = max

tn ( B

  • b=0

|XCQ(b, tn)|). (7)

Remark

Here, we are talking of discrete points in time. Every tn refers to the continous time interval [tn, tn+1].

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

25 / 22

slide-88
SLIDE 88

Dynamic range

This definition will be changed slightly for the implementation: dyndRMS = 1 −

  • 1

N

N

  • n=0

nsumCQ2(XCQ, tn). (5) with nsumCQ(XCQ, tn) = 1 R

B

  • b=0

|XCQ(b, tn)| (6) and R = max

tn ( B

  • b=0

|XCQ(b, tn)|). (7)

Remark

Here, we are talking of discrete points in time. Every tn refers to the continous time interval [tn, tn+1].

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

25 / 22

slide-89
SLIDE 89

Dynamic range

This definition will be changed slightly for the implementation: dyndRMS = 1 −

  • 1

N

N

  • n=0

nsumCQ2(XCQ, tn). (5) with nsumCQ(XCQ, tn) = 1 R

B

  • b=0

|XCQ(b, tn)| (6) and R = max

tn ( B

  • b=0

|XCQ(b, tn)|). (7)

Remark

Here, we are talking of discrete points in time. Every tn refers to the continous time interval [tn, tn+1].

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

25 / 22

slide-90
SLIDE 90

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-91
SLIDE 91

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-92
SLIDE 92

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-93
SLIDE 93

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-94
SLIDE 94

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-95
SLIDE 95

Tempo in bpm (beats per minute)

Intuitive: The speed at which humans tap when listening to a song Problem: That speed is not well-defined. Some persons tap at

quarters, some at halves, . . .

  • 1. Take the sum of the constant-Q bins sumCQ(XCQ, tn)
  • 2. Calculate the difference vector dCQ(XCQ, tn)
  • 3. Calculate the autocorrelation of the difference vector
  • 4. Find recurring peaks in the autocorrelation function

sumCQ(XCQ, tn) =

B

  • b=0

|XCQ(b, tn)| (8) dCQ(XCQ, tn) = sumCQ(tn) − sumCQ(tn+1) (9) aCQ(dCQ(tn), τ) =

τmax

  • τ=0

dCQ(tn) ∗ dCQ(tn − τ) (10)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

26 / 22

slide-96
SLIDE 96

Tempo: Find recurring peaks

100 200 300 400 500 600 −1500 −1000 −500 500 1000 1500 2000 2500 Metronom, 80 bpm

The unit of the absissica is 10µs, the ordinate has no unit.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

27 / 22

slide-97
SLIDE 97

Tempo: Find recurring peaks

100 200 300 400 500 600 −1500 −1000 −500 500 1000 1500 2000 2500 Metronom, 80 bpm 100 200 300 400 500 600 −1000 −500 500 1000 1500 2000 2500 3000 3500 4000 Drums, Hi−Hat on 8th, 80 bpm

The unit of the absissica is 10µs, the ordinate has no unit.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

27 / 22

slide-98
SLIDE 98

Tempo: Find recurring peaks

100 200 300 400 500 600 −1500 −1000 −500 500 1000 1500 2000 2500 Metronom, 80 bpm 100 200 300 400 500 600 −1000 −500 500 1000 1500 2000 2500 3000 3500 4000 Drums, Hi−Hat on 8th, 80 bpm 100 200 300 400 500 600 −1000 −500 500 1000 1500 2000 2500 3000 3500 Drums, Hi−Hat on 16th, 80 bpm

The unit of the absissica is 10µs, the ordinate has no unit.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

27 / 22

slide-99
SLIDE 99

Tempo: Find recurring peaks

100 200 300 400 500 600 −1500 −1000 −500 500 1000 1500 2000 2500 Metronom, 80 bpm 100 200 300 400 500 600 −1000 −500 500 1000 1500 2000 2500 3000 3500 4000 Drums, Hi−Hat on 8th, 80 bpm 100 200 300 400 500 600 −1000 −500 500 1000 1500 2000 2500 3000 3500 Drums, Hi−Hat on 16th, 80 bpm 100 200 300 400 500 600 −4 −2 2 4 6 8 x 10

4

Test file: "dead_rocks.mp3", 103bpm

The unit of the absissica is 10µs, the ordinate has no unit.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

27 / 22

slide-100
SLIDE 100

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-101
SLIDE 101

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-102
SLIDE 102

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-103
SLIDE 103

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-104
SLIDE 104

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-105
SLIDE 105

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-106
SLIDE 106

Timbre

The timbre of a signal is “the way it sounds” It is a multi-dimensional feature In many publications, the Mel Frequency Cepstrum (MFC) is used The lower (e.g. 8-16) coefficients describe the timbre Short-time feature: typically one vector every 10-50ms The MFC is not based on the Constant-Q transform, but: Similar features can be derived from the Constant-Q transform (see

[11])

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

28 / 22

slide-107
SLIDE 107

Timbre: Calculation of Constant-Q Cepstrum

Windowed Input signal Constant-Q Transform Logarithm of absolute values Discrete Cosine Transform Constant-Q Cepstrum w(t)xC(t) XCQ(k, tn) log(|⊙|) N−1

n=0 ⊙ cos

π

N

  • n + 1

2

  • k
  • take only lower coefficients

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

29 / 22

slide-108
SLIDE 108

Timbre: Calculation of Constant-Q Cepstrum

Windowed Input signal Constant-Q Transform Logarithm of absolute values Discrete Cosine Transform Constant-Q Cepstrum w(t)xC(t) XCQ(k, tn) log(|⊙|) N−1

n=0 ⊙ cos

π

N

  • n + 1

2

  • k
  • take only lower coefficients

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

29 / 22

slide-109
SLIDE 109

Timbre: Calculation of Constant-Q Cepstrum

Windowed Input signal Constant-Q Transform Logarithm of absolute values Discrete Cosine Transform Constant-Q Cepstrum w(t)xC(t) XCQ(k, tn) log(|⊙|) N−1

n=0 ⊙ cos

π

N

  • n + 1

2

  • k
  • take only lower coefficients

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

29 / 22

slide-110
SLIDE 110

Timbre: Calculation of Constant-Q Cepstrum

Windowed Input signal Constant-Q Transform Logarithm of absolute values Discrete Cosine Transform Constant-Q Cepstrum w(t)xC(t) XCQ(k, tn) log(|⊙|) N−1

n=0 ⊙ cos

π

N

  • n + 1

2

  • k
  • take only lower coefficients

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

29 / 22

slide-111
SLIDE 111

Timbre: Calculation of Constant-Q Cepstrum

Windowed Input signal Constant-Q Transform Logarithm of absolute values Discrete Cosine Transform Constant-Q Cepstrum w(t)xC(t) XCQ(k, tn) log(|⊙|) N−1

n=0 ⊙ cos

π

N

  • n + 1

2

  • k
  • take only lower coefficients

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

29 / 22

slide-112
SLIDE 112

Key-invariant chroma: Intuition

notes time in 50ms 2 4 6 8 10 12 500 1000 1500 2000 2500 3000

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

30 / 22

slide-113
SLIDE 113

Key-invariant chroma: Definition of chroma

The chroma bin is defined as c(b, t) =

P −1

  • p=0

|XCQ(b + 12p, t)| (11) where P is the number of octaves in the constant Q transform, b is one bin in an octave, and t is a point in time. The chroma vector is the vector

  • f chroma bins

c(t) =      c(1, t) c(2, t) . . . c(12, t)      . (12)

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

31 / 22

slide-114
SLIDE 114

Key-invariant chroma: Key-invariance

fehlt noch

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

32 / 22

slide-115
SLIDE 115

Plan

1 Introduction 2 Music Signal Processing

The Constant Q transform Feature Extraction Gaussian Mixture Models

3 Classification 4 Results

Demonstration

5 Appendix

Dynamic range Tempo Timbre Key-invariant chroma

6 Bibliography

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

33 / 22

slide-116
SLIDE 116

Bibliography I

Fabrizio Argenti, Paolo Nesi, and Gianni Pantaleo. Automatic Transcription of Polyphonic Music based on the Constant-Q Bispectral Analysis. IEEE Transactions on Audio, Speech and Language Processing, 19(6):1610–1630, August 2011. American Standards Association. Acoustical Terminology. 1960. J.J. Aucouturier and F. Pachet. Music similarity measures: What’s the use. In Proceedings of the 3rd International Symposium on Music Information Retrieval, page 157–163, 2002.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

34 / 22

slide-117
SLIDE 117

Bibliography II

Ehrhard Behrends. Analysis, volume 2. Vieweg, April 2004. Juan P. Bello and Jeremy Pickens. A Robust Mid-level Representation for Harmonic Content in Music Signals. Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR-05), pages 304–311, September 2005.

  • G. Beylkin, R. Coifman, and V. Rohlkin.

Fast Wavelet Transforms and Numerical Algorithms I. Communications on Pure and Applied Mathematics, (44):141–183, 1991. Christopher M. Bishop. Clarendon Press, Oxford, 1995.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

35 / 22

slide-118
SLIDE 118

Bibliography III

Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. Richard Brent and Paul Zimmermann. Modern Computer Arithmetic. Cambridge University Press, 2010. Judith C. Brown. Calculation of a constant Q spectral transformation.

  • J. Acoust Soc. Am., 89(1):425–434, January 1991.

Judith C. Brown. Computer identification of musical instruments using pattern recognition with cepstral coefficients as features. Acoustical Society of America, 105(3):1933–1941, March 1999.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

36 / 22

slide-119
SLIDE 119

Bibliography IV

Judith C. Brown and Miller S. Puckette. An efficient algorithm for the calculation of a constant Q transform.

  • J. Acoust Soc. Am., 92(5):2698–2701, 1992.

Thomas H. Cormen, Charles E. Leierson, Ronald Rivest, and Clifford Stein. Algorithmen - Eine Einführung. Oldenbourg Verlag, München, 3. edition, 2010.

  • G. Cybenko.

Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314, 1989. Zhouyu Fu, Kai Ming Ting, and Dengsheng Zhang. A Survey of Audio-Based Music Classification and Annotation. IEEE Transactions on Multimedia, 13(2):303–319, April 2011.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

37 / 22

slide-120
SLIDE 120

Bibliography V

James E. Gentle. Random Number Generation and Monte Carlo Methods. Springer, 2. edition, 2003. Masataka Goto. An Audio-based Real-time Beat Tracking System for Music With or Without Drum-sounds. Journal of New Music Research, 30(2):159–171, 2001. Frederic J. Harris. On the Use of Windows for Harmonic Analysis with the Discrete Fourier Transform. Proceedings of the IEEE, 66:51–83, January 1978.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

38 / 22

slide-121
SLIDE 121

Bibliography VI

Jesper H. Jensen, Daniel P.W. Ellis, Mads G. Christensen, and Søren H. Jensen. Evaluation of Distance Measures between Gaussian Mixture Models

  • f MFCCs.

In ISMIR 2007: Proceedings of the 8th International Conference on Music Information Retrieval, pages 107–108, Vienna, September 2007. Kristoffer Jensen. Timbre Models of Musical Sounds. PhD thesis, University of Copenhagen, 1999.

  • M. Karjalainen, V. Välimäki, and Z. Jánosy.

Towards high-quality sound synthesis of the guitar and string instruments. In Proc. ICMC, pages 56–63, 1993.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

39 / 22

slide-122
SLIDE 122

Bibliography VII

  • T. Kinnunen, T. Kilpeläinen, and P. Fränti.

Comparison of clustering algorithms in speaker identification. Proceedings IASTED Int. Conf. Signal Processing and Communications, 1:222–227, 2000. Ulrich Krengel. Einführung in die Wahrscheinlichkeitstheorie und Statistik. vieweg studium, Wiesbaden, 8. edition, 2005. Edward A. Lee. The Problem with Threads. Technical Report UCB/EECS-2006-1, EECS Department, University

  • f California, Berkeley, Jan 2006.

The published version of this paper is in IEEE Computer 39(5):33-42, May 2006.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

40 / 22

slide-123
SLIDE 123

Bibliography VIII

  • N. Lesh and M. Mitzenmacher.

BubbleSearch: A simple heuristic for improving priority-based greedy algorithms. Information Processing Letters, 97(4):161–169, 2006. Beth Logan. Mel Frequency Cepstral Coefficients for Music Modeling. In International Symposium on Music Information Retrieval, volume 28, page 5, 2000. K.R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf. An introduction to kernel-based learning algorithms. IEEE transactions on neural networks, 12(2):181–201, 2001.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

41 / 22

slide-124
SLIDE 124

Bibliography IX

Klaus-Robert Müller, Sebastian Mika, Gunnar Rätsch, Koji Tsuda, and Bernhard Schölkopf. An Introduction to Kernel-Based Learning Algorithms. IEEE Transactions on Neural Networks, 12(2):181–201, March 2001.

  • K. Noland and M. Sandler.

Key estimation using a hidden Markov model. In Proceedings of the 7th International Conference on Music Information Retrieval (ISMIR), page 121–126, 2006. Alan V. Oppenheim and Ronald W. Schafer. Zeitdiskrete Signalverarbeitung.

  • R. Oldenbourg Verlag, München, 2. edition, 1995.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

42 / 22

slide-125
SLIDE 125

Bibliography X

  • F. Pachet and J.J. Aucouturier.

"The way it Sounds": timbre models for analysis and retrieval of music signals. IEEE Transactions on Multimedia, 7(6):1028–1035, 2005.

  • G. Peeters.

Chroma-based estimation of musical key from audio-signal analysis. In Proc. of the 7th International Conference on Music Information Retrieval (ISMIR), page 115–120, 2006. William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. Numerical Recipes. 2007.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

43 / 22

slide-126
SLIDE 126

Bibliography XI

  • Y. Rubner, C. Tomasi, and L.J. Guibas.

A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on, page 59–66. IEEE, 1998. Ingo Schnitt. Ähnlichkeitssuche in Multimedia-Datenbanken: Retrieval, Suchalgorithmen und Anfragebehandlung. Oldenbourg Wissenschaftsverlag, 2005. Dominik Schnitzer. Indexing Content-Based Music Similarity Models for Fast Retrieval in Massive Databases. PhD thesis, Johannes Kepler Universität Linz, October 2011.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

44 / 22

slide-127
SLIDE 127

Bibliography XII

Christian Schörkhuber and Anssi Klapuri. Constant-Q transform toolbox for music processing. In 7th Sound and Music Computing Conference, Barcelona, Spain, 2010. Josef Stoer. Numerische Mathematik 1. Springer, Berlin, Heidelberg, New York, 9 edition, 2005. Josef Stoer and Roland Burlisch. Numerische Mathematik 2. Springer, Berlin, Heidelberg, New York, 5 edition, 2005. Wolfgang Theimer, Igor Vatolkin, and Antti Eronen. Definitions of Audio Features for Music Content Description. Technical report, February 2008.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

45 / 22

slide-128
SLIDE 128

Bibliography XIII

Wolfgang Theimer, Igor Vatolkin, Rainer Martin, Christian Igel, Holger Blume, Bernd Bischl, Martin Botteck, Günther Roetter, Günther Rudolph, and Claus Weihs. Huge Music Archives on Mobile Devices. IEEE Signal Processing Magazine, page 24–39, July 2011. George Tzanetakis and Perry Cook. Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10(5):293–302, July 2002.

  • R. W. Young.

Terminology for Logarithmic Frequency Units. Acoustical Society of America Journal, 11:134, 1939.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

46 / 22

slide-129
SLIDE 129

Bibliography XIV

  • S. Ystad, P. Guillemain, and R. Kronland-Martinet.

Estimation of parameters corresponding to a propagative synthesis model through the analysis of real sounds. In Proc. ICMC, 1996. Wieland Ziegenrücker. ABC Musik. Breitkopf & Härtel, Wiesbaden, 3. edition, 2000.

Introduction Music Signal Processing Classification Results Appendix Bibliography

  • L. Brüder

47 / 22