[PPT] - Tensor Canonical Correlation Analysis and It Its applications PowerPoint Presentation

SLIDE 1

Tensor Canonical Correlation Analysis and It Its applications

Presenter: Yong LUO The work is done when Yong LUO was a Research Fellow at Nanyang Technological University, Singapore

SLIDE 2

Outline

Y. Luo, D. C. Tao, R. Kotagiri, C. Xu, and Y. G. Wen,

“Tensor Canonical Correlation Analysis for Multi- view Dimension Reduction,” IEEE Transactions on Knowledge and Data Engineering (T-KDE), vol. 27,

no. 11, pp. 3111-3124, 2015.
Y. Luo, Y. G. Wen and D. C. Tao, “On Combining Side

Information and Unlabeled Data for Heterogeneous Multi-task Metric Learning,” International Joint Conference on Artificial Intelligence (IJCAI), pp. 1809-1815, 2016.

SLIDE 3

Mult lti-view dimension reduction (MVDR)

Dimension reduction (DR)
Find a low-dimensional representation for high

dimensional data

Benefits: reduce the chance of over-fitting, reduce

computational cost, etc.

Approaches: feature selection (IG, MI, sparse learning,

etc.), feature transformation (PCA, LDA, LE, etc.)

SLIDE 4

MVDR

Real world objects usually contain information from

multiple sources, and can be extracted different kinds of features.

Traditional DR methods cannot effectively handle

multiple types of features

Feature concatenation

SLIDE 5

MVDR

Multi-view learning
Learn to fuse multiple distinct feature representations
Families: weighted view combination, multi-view

dimension reduction, view agreement exploration

Multi-view dimension reduction
Multi-view feature selection
Multi-view subspace learning: seek a low-dimensional

common subspace to compactly represent the heterogeneous data; One of the most representative model: CCA

SLIDE 6

Canonical correla lation a analysis (CCA)

Objective of CCA
Correlation maximization on the common subspace

argmax

𝐴1,𝐴2

𝜍 = corr 𝐴1, 𝐴2 = 𝐢1

𝑈𝐷12𝐢2

𝐢1

𝑈𝐷11𝐢1𝐢2 𝑈𝐷22𝐢2

𝑦1 𝑦2 𝑨 𝑨1𝑜 = 𝐲1𝑜

𝑈 𝐢1

𝑨2𝑜 = 𝐲2𝑜

𝑈 𝐢2

H. Hotelling, “Relations between two sets of variants,” Biometrika, 1936.
D. P. Foster, et al., “Multi-view dimensionality reduction via canonical correlation analysis,” Tech. Rep., 2008.

SLIDE 7

Generalizatio ions of CCA to several l vie iews

CCA-MAXVAR
Generalizes CCA to 𝑁 ≥ 2 views
𝐴𝑛 = 𝑌𝑛

𝑈 𝐢𝑛 is the vector of canonical variables for the

𝑛’th view, and 𝐴 is a centroid representation

Solutions can be obtained using the SVD of 𝑌𝑛

argmin

𝐴,𝐛, 𝐢𝑞 𝑛=1

𝑁

1 𝑁 ෍

𝑛=1 𝑁

𝐴 − 𝛽𝑛𝐴𝑛

2 2 ,

s. t.

𝐴𝑛

2 = 1

J. R. Kettenring, “Canonical analysis of several sets of variables,” Biometrika, 1971.

SLIDE 8

Generalizatio ions of CCA to several l vie iews

CCA-LS
Equivalent to CCA-MAXVAR, but can be solved efficiently

and adaptively based on LS regression argmin

𝐢𝑛 𝑛=1

𝑁

1 2𝑁 𝑁 − 1 ෍

𝑞,𝑟=1 𝑁

𝑌𝑞

𝑈𝐢𝑞 − 𝑌𝑟 𝑈𝐢𝑟 2 2

s. t. 1

𝑛 ෍

𝑛=1 𝑁

𝐢𝑛

𝑈 𝐷𝑛𝑛𝐢𝑛 = 1

J. Via et al., “A learning algorithm for adaptive canonical correlation analysis of several data sets,” Neural

Networks, 2007.

SLIDE 9

The proposed TCCA framework

Main drawback of CCA-MAXVAR and CCA-LS
Only the statistics (correlation information) between

pairs of features is explored, while high-order statistics is ignored

Tensor CCA
Directly maximize the high-order correlation between all

views

High order tensor correlation

𝑦3 𝑦1 𝑦2 𝑒1 𝑒2 𝑒3

Pairwise correlation

𝑦2 𝑦1 𝑦3 𝑒1 𝑒2 𝑒3

SLIDE 10

The proposed TCCA framework for MVDR

𝐯2

1

𝐯3

1

𝜇 1 𝐯1

1

𝐯2

𝑠

𝐯3

𝑠

𝜇 𝑠 𝐯1

𝑠

+ ⋯ ⋯ + ⋮ ⋮ 𝑌2 𝑌1 ⋮ 𝑌3 𝑂 𝑒1 𝑒2 𝑒3 ⋯ 𝑉1 ⋮ 𝑉2 ⋯ 𝑉3 Tensor CCA 𝑎1 𝑎2 𝑎3 𝑠 𝑠 𝑠 𝑎 3𝑠 Mapping LAB WT SIFT 𝑦3 𝑦1 𝑦2 𝑒1 𝑒2 𝑒3 𝒟123 Covariance tensor Sum of rank-1 approximation

SLIDE 11

Tensor basic ics

Generalization of an n-dimensional array

Vector: order-1 tensor Matrix: order-2 tensor Scalar: order-0 tensor Order-3 tensor

SLIDE 12

Tensor basic ics

Tensor-matrix multiplication
The 𝑛-mode product of an 𝐽1 × 𝐽2 × ⋯ × 𝐽𝑁 tensor 𝒝

and an 𝐾𝑛 × 𝐽𝑛 matrix 𝑉 is a tensor ℬ = 𝒝 ×𝑛 𝑉 of size 𝐽1 × ⋯ × 𝐽𝑛−1 × 𝐾𝑛 × 𝐽𝑛+1 ⋯ × 𝐽𝑁 with the element

The product of 𝒝 and a sequence of matrices ሼ𝑉𝑛 ∈

ℬ 𝑗1, ⋯ , 𝑗𝑛−1, 𝑘𝑛, 𝑗𝑛+1, ⋯ , 𝑗𝑁 = ෍

𝑗𝑛=1 𝐽𝑛

𝒝 𝑗1, 𝑗2, ⋯ , 𝑗𝑁 𝑉 𝑘𝑛, 𝑗𝑛 ℬ = 𝒝 ×1 𝑉1 ×2 𝑉2 ⋯ ×𝑁 𝑉𝑁

SLIDE 13

Tensor basic ics

Tensor-vector multiplication
The contracted 𝑛-mode product of 𝒝 and an 𝐽𝑛-vector

𝐯 is an 𝐽1 × ⋯ × 𝐽𝑛−1 × 𝐽𝑛+1 ⋯ × 𝐽𝑁 tensor ℬ = 𝒝 ഥ ×𝑛 𝐯 of order 𝑁 − 1 with the entry

Tensor-tensor multiplication
Outer product, contracted product, inner product
Frobenius norm of the tensor

ℬ 𝑗1, ⋯ , 𝑗𝑛−1, 𝑗𝑛+1, ⋯ , 𝑗𝑁 = ෍

𝑗𝑛=1 𝐽𝑛

𝒝 𝑗1, 𝑗2, ⋯ , 𝑗𝑁 𝐯 𝑗𝑛 𝒝 𝐺

2 = 𝒝, 𝒝 = ෍ 𝑗1=1 𝐽1

෍

𝑗2=1 𝐽2

⋯ ෍

𝑗𝑁=1 𝐽𝑁

𝒝 𝑗1, 𝑗2, ⋯ , 𝑗𝑁 2

SLIDE 14

Tensor basic ics

Matricization
The mode-𝑛 matricization of 𝒝 is denoted as an 𝐽𝑛 ×

𝐽1 ⋯ 𝐽𝑛−1𝐽𝑛+1 ⋯ 𝐽𝑁 matrix 𝐵 𝑛

mode-1 mode-2 frontal matricizing horizontal matricizing

𝒝 𝐵 1 𝐵 2 𝐵 3

row-wise vectorizing column-wise vectorizing

SLIDE 15

Tensor basic ics

Matricization property
The 𝑛-mode multiplication ℬ = 𝒝 ×𝑛 𝑉 can be

manipulated as matrix multiplication by storing the tensors in metricized form, i.e., 𝐶 𝑛 = 𝑉𝐵 𝑛

The series of 𝑛-mode product can be expressed as a

series of Kronecker products 𝑑1, 𝑑2, ⋯ , 𝑑𝐿 = 𝑛 + 1, 𝑛 + 2, ⋯ , 𝑁, 1, 2, ⋯ , 𝑛 − 1 is a forward cyclic ordering for indices of the tensor dims ℬ = 𝒝 ×1 𝑉1 ×2 𝑉2 ⋯ ×𝑁 𝑉𝑁 𝐶 𝑛 = 𝑉𝑛𝐵 𝑛 𝑉𝑑𝑛−1⨂𝑉𝑑𝑛−1⨂ ⋯ ⨂𝑉𝑑𝑛−1

𝑈

SLIDE 16

TCCA formulation

Optimization problem
Maximize the correlation between the canonical variables

𝐴𝑛 = 𝑌𝑛

𝑈 𝐢𝑛, 𝑛 = 1, ⋯ , 𝑁:

Equivalent formulation
Covariance tensor: 𝒟12⋯𝑁 =

1 𝑂 σ𝑜=1 𝑂

𝐲1𝑜 ∘ 𝐲2𝑜 ∘ ⋯ ∘ 𝐲𝑁𝑜 argmax

𝐢𝑛

𝜍 = corr 𝐴1, 𝐴2, ⋯ , 𝐴𝑁 = 𝐴1⨀𝐴2⨀ ⋯ ⨀𝐴𝑁 𝑈𝐟 ,

s. t. 𝐴𝑛

𝑈 𝐴𝑛 = 1, 𝑛 = 1, ⋯ , 𝑁

argmax

𝐢𝑛

𝜍 = 𝒟12⋯𝑛 ഥ ×1 𝐢1

𝑈 ഥ

×2 𝐢2

𝑈 ⋯ ഥ

×𝑁 𝐢𝑁

𝑈 ,

s. t. 𝐢𝑛

𝑈 𝐷𝑛𝑛 + 𝜗𝐽 𝐢𝑛 = 1, 𝑛 = 1, ⋯ , 𝑁

SLIDE 17

TCCA formulation

Reformulation
Let ℳ = 𝒟12⋯𝑁 ×1 ሚ

𝐷11

Τ 1 2 ×2 ሚ

𝐷22

Τ 1 2 ⋯ ×𝑁 ሚ

𝐷𝑁𝑁

Τ 1 2, and

𝐯𝑛 = ሚ 𝐷𝑛𝑛

Τ 1 2𝐢𝑛, where ሚ

𝐷𝑛𝑛 = 𝐷𝑛𝑛 + 𝜗𝐽

Main solution
If define ෡

ℳ = 𝜍𝐯1 ∘ 𝐯2 ∘ ⋯ ∘ 𝐯𝑁, problem becomes

Solved by alternating least square (ALS), high-order

power method (HOPM), etc. argmax

𝐯𝑛

𝜍 = ℳ ഥ ×1 𝐯1

𝑈 ഥ

×2 𝐯2

𝑈 ⋯ ഥ

×𝑁 𝐯𝑁

𝑈 ,

s. t. 𝐯𝑛

𝑈 𝐯𝑛 = 1, 𝑛 = 1, ⋯ , 𝑁

argmin

𝐯𝑛

ℳ − ෡ ℳ 𝐺

2 , [Lathauwer et al., 2000a]

L. De Lathauwer et al., “On the best Rank-1 and rank-(r1, r2, ..., rn) approximation of higher-order tensors,”

SIAM J. Matrix Anal. Appl., 2000.

SLIDE 18

TCCA solu lution

Solutions
Remaining solutions: recursively maximizing the same

correlation as presented in the main TCCA problem

All solutions: the best sum of rank-1 approximation, i.e.,

rank-𝑠 CP decomposition of ℳ

Projected data

෡ ℳ ≈ ෍

𝑙=1 𝑠

𝜍𝑙𝐯1

𝑙 ∘ 𝐯2 𝑙 ∘ ⋯ ∘ 𝐯𝑁 𝑙

𝑎𝑛 = 𝑌𝑛

𝑈 ሚ

𝐷𝑛𝑛

Τ −1 2𝑉𝑛

𝑉𝑛 = 𝐯𝑛

1 , ⋯ , 𝐯𝑛 𝑠

SLIDE 19

KTCCA formulatio ion

Non-linear extension
Non-linear feature mapping ∅:
Canonical variables: 𝐴𝑛 = ∅𝑈 𝑌𝑛 𝐢𝑛
Representer theorem: 𝐢𝑛 = ∅ 𝑌𝑛 𝐛𝑛
Optimization problem

∅ 𝑌𝑛 = ∅ 𝐲𝑛1 , ∅ 𝐲𝑛2 , ⋯ , ∅ 𝐲𝑛𝑂 argmax

𝐛𝑛

𝜍 = 𝒧12⋯𝑁 ഥ ×1 𝐛1

𝑈 ഥ

×2 𝐛2

𝑈 ⋯ ഥ

×𝑁 𝐛𝑁

𝑈 ,

s. t. 𝐛𝑛

𝑈 𝐿𝑛𝑛 2

+ 𝜗𝐿𝑛𝑛 𝐛𝑛 = 1, 𝑛 = 1, ⋯ , 𝑁 𝑀𝑛

𝑈 𝑀𝑛

SLIDE 20

KTCCA solution

Reformulation
Let 𝒯 = 𝒧12⋯𝑛 ×1 ሚ

𝐷11

Τ 1 2 ×2 ሚ

𝐷22

Τ 1 2 ⋯ ×𝑁 ሚ

𝐷𝑁𝑁

Τ 1 2, and 𝐜𝑛 =

ሚ 𝐷𝑛𝑛

Τ 1 2𝐛𝑛:

Solved by ALS
Projected data:

argmax

𝐯𝑛

𝜍 = 𝒯 ഥ ×1 𝐜1

𝑈 ഥ

×2 𝐜2

𝑈 ⋯ ഥ

×𝑁 𝐜𝑁

𝑈 ,

s. t. 𝐜𝑛

𝑈 𝐜𝑛 = 1, 𝑛 = 1, ⋯ , 𝑁

𝑎𝑛 = 𝐿𝑛𝑛𝑀𝑛

−1𝐶𝑛, 𝑛 = 1, ⋯ , 𝑁

SLIDE 21

Experimental l setup

Datasets
SecStr: biometric structure prediction
84K instances, 100 as labeled, additional 1200K unlabeled
3 views: attributes based on left, middle, right context generated

from the sequence window of amino acid. Each view is 105-D

Advertisement classification
3279 instances, 100 as labeled
3 views: features based on the terms in the images (588-D), terms in

the current URL (495-D), and terms in the anchor URL (472-D)

Web image annotation
11189 images, {4, 6, 8} labeled instances for each of 10 concepts
3 views: 500-D SIFT visual words, 144-D color, 128-D wavelet
Classifier: RLS and KNN
Evaluation criterion: Prediction/classification/annotation

accuracy

SLIDE 22

Experimental l setup

Compared methods
BSF: best single view feature
CAT: concatenation of the normalized features
FRAC: a recent multi-view feature selection algorithm
CCA:

Τ 𝑛 𝑛 − 1 2 subsets of two views

CCA (BST): the best subset
CCA (AVG): the average performance of all subsets
CCA-LS: traditional generalizations of CCA to several

views

DSE: a popular unsupervised multi-view DR method
SSMVD: a recent unsupervised multi-view DR method
TCCA: the proposed method

SLIDE 23

Experimental l results and analy lysis

Biometric structure prediction
Learn common subspace > CAT > BSF
SSMVD, CCA-LS are comparable, so are DSE, CCA (BST)
TCCA is the best at most dims, and does not decrease

significantly when dim is high

Unlabeled = 84K Unlabeled = 1.3M

SLIDE 24

Experimental l results and analy lysis

Web image annotation
DSE comparable to CCA (BST), CCA (AVG)
TCCA > SSMVD, and is better than the other CCA based

methods

Non-linear > linear

Linear Non-linear

SLIDE 25

Conclusions and discussion

Conclusions
Finding a common subspace for all views using the CCA-

based strategy is often better than simply concatenating all the features, especially when the feature dimension is high

Examining more statistics, which may require more

unlabeled data to be utilized, often leads to better performance; By exploring the high-order statistics, the proposed TCCA outperforms the other methods

Discussion
Can the common subspace be used for knowledge

transfer between different views?

SLIDE 26

Dis istance metric ic le learning (DML)

Goal: learn appropriate distance function over

input space to reflect relationships between data

Useful in many ML algorithms, e.g., clustering,

classification and information retrieval

Most common DML scheme: Mahalanobis metric

learning >>> learning linear transformation

Non-linear and local DML are able to capture

complex structure in the data

𝑒𝐵 𝐲𝑗, 𝐲𝑘 = 𝐲𝑗 − 𝐲𝑘

𝑈𝐵 𝐲𝑗 − 𝐲𝑘 ,

𝑒𝐵 𝐲𝑗, 𝐲𝑘 = 𝑉𝐲𝑗 − 𝑉𝐲𝑘

2 2

𝐵 = 𝑉𝑉𝑈

SLIDE 27

Transfer DML (TDML)

Motivation
Needs large amount of side information to learn robust

distance metric

The training samples are insufficient in the task/domain
f interest (target task/domain)
We have abundant labeled data in certain related, but

different tasks/domains (source tasks/domains)

Goal
Utilize the metrics obtained from source tasks/domains

to help metric learning in the target tasks/domains

SLIDE 28

Homogeneous TDML (HoTDML)

Data of source domain and target domain drawn

from different distributions (same feature space)

Examples [Pan and Yang, 2009]
Web document classification: university website -> new

website

Indoor WiFi localization: WiFi signal-strength values

change in different time periods or on different devices

Sentiment classification: distribution of reviews among

different types of products can be very different

Challenge
How to utilize the source information appropriately

given the different distributions or find a subspace in which the distribution difference is reduced

S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE TKDE, 2010.

SLIDE 29

Heterogeneous TDML (HeTDML)

Data of source domain and target domain lie in different

feature spaces, and may have different semantics

Examples
Multi-lingual document classification
multi-view classification or retrieval, etc.
Challenge
How to find correspondences or common representations for

the different domains

Labeled reviews in English (abundant) Labeled reviews in Spanish (scarce) Classify reviews in Spanish

SLIDE 30

HeTDML exis istin ing solu lutions

Heterogeneous transfer learning (HTL) approaches

usually transform heterogeneous features into a common subspace, and the transformation can be used to derive a metric

Groups
Heterogeneous domain adaptation (HDA)
Improve the performance in target domain
Most only handle two domains
Heterogeneous multi-task learning (HMTL)
Improve the performance of all domains simultaneously

SLIDE 31

Heterogeneous multi-task metric ic le learning (HMTML)

Limitations of existing HMTL approaches
Do not optimize w.r.t. the metric
Mainly focus on utilizing the side information
Can only explore the pairwise relationships between

different domains, the high-order statistics that can only be obtained by simultaneously examining all domains is ignored

Our method
Handle arbitrary number of domains, and directly
ptimize w.r.t. the metrics
Make use of large amounts of unlabeled data to build

domain connections

Explore high-order statistics between all domains

SLIDE 32

HMTML framework

… … … … …

𝑉1

…

𝑉𝑁 𝐵1

∗ = 𝑉1 ∗𝑉1 ∗𝑈

𝐵𝑁

∗ = 𝑉𝑁 ∗ 𝑉𝑁 ∗ 𝑈

… …

𝑦11

𝑉

𝑌1

𝑉

𝑦12

𝑉

𝑦1𝑂

𝑉

Tensor based correlation maximization

… …

𝑨11

𝑉

𝑨12

𝑉 𝑨1𝑂 𝑉

𝑎1

𝑉

𝑨𝑁1

𝑉

𝑨𝑁2

𝑉 𝑨𝑁𝑂 𝑉

𝑎𝑁

𝑉

𝑌𝑁

𝑉

𝑦𝑁2

𝑉

𝑦𝑁1

𝑉

𝑦𝑁𝑂

𝑉

Unlabeled data

𝒠𝑉

Germany

𝐵𝑁 = 𝑉𝑁𝑉𝑁

𝑈

𝒠𝑁

𝑀

English

𝐵1 = 𝑉1𝑉1

𝑈

𝒠1

𝑀

SLIDE 33

HMTML formulation

Optimization problem
General formulation
Ψ 𝐵𝑛 =

2 𝑂𝑛 𝑂𝑛−1 σ𝑗<𝑘 𝑀 𝐵𝑛; 𝐲𝑛𝑗, 𝐲𝑛𝑘, 𝑧𝑛𝑗𝑘 is the

empirical loss w.r.t. 𝐵𝑛

𝑆 𝐵1, 𝐵2, ⋯ , 𝐵𝑁 enforces information transfer across

different domains argmin

𝐵𝑛 𝑛=1

𝑁

𝐺 𝐵𝑛 = ෍

𝑛=1 𝑁

Ψ 𝐵𝑛 + 𝛿𝑆 𝐵1, 𝐵2, ⋯ , 𝐵𝑁 , s.t. 𝐵𝑛 ≽ 0, 𝑛 = 1, 2, ⋯ , 𝑁,

SLIDE 34

Knowledge transfer by high-order correlation maxim imization

Main idea
Decompose 𝐵𝑛 as 𝐵𝑛 = 𝑉𝑛𝑉𝑛

𝑈

Use 𝑉𝑛 to project the unlabeled data points of different

domains into a common subspace, where the correlation of all domains are maximized

Formulation
corr 𝐴1𝑜

𝑉 , 𝐴2𝑜 𝑉 , ⋯ , 𝐴𝑁𝑜 𝑉

= 𝐴1𝑜

𝑉 ⨀𝐴2𝑜 𝑉 ⨀ ⋯ ⨀𝐴𝑁𝑜 𝑉 𝑈𝐟 is

the correlation of the projected representations 𝐴𝑛𝑜

𝑉

= 𝑉𝑛

𝑈 𝐲𝑛𝑜 𝑉 𝑛=1 𝑁

argmax

𝑉𝑛 𝑛=1

𝑁

1 𝑂𝑉 ෍

𝑜=1 𝑂𝑉

corr 𝐴1𝑜

𝑉 , 𝐴2𝑜 𝑉 , ⋯ , 𝐴𝑁𝑜 𝑉

,

SLIDE 35

Knowledge transfer by high-order correlation maxim imization

Reformulation

argmax

𝑉𝑛 𝑛=1

𝑁

1 𝑂𝑉 ෍

𝑜=1 𝑂𝑉

corr 𝐴1𝑜

𝑉 , 𝐴2𝑜 𝑉 , ⋯ , 𝐴𝑁𝑜 𝑉

argmax

𝑉𝑛 𝑛=1

𝑁

1 𝑂𝑉 ෍

𝑜=1 𝑂𝑉

𝒣 ഥ ×1 𝐲1𝑜

𝑉 𝑈 ⋯ ഥ

×𝑁 𝐲𝑁𝑜

𝑉 𝑈

argmin

𝑉𝑛 𝑛=1

𝑁

1 𝑂𝑉 ෍

𝑜=1 𝑂𝑉

𝒟𝑜

𝑉 − 𝒣 𝐺 2

[Luo et al., 2015] 𝒣 = ℰ𝑠 ×1 𝑉1 ×2 𝑉2 ⋯ ×𝑁 𝑉𝑁 is the covariance tensor of the mappings; 𝒟𝑜

𝑉 is the covariance tensor of

the representations for the 𝑜’th unlabeled sample. [Lathauwer et al., 2000b]

Y. Luo et al., “Tensor Canonical Correlation Analysis for Multi-view Dimension Reduction,” IEEE TKDE, 2015.
L. De Lathauwer et al., “A multilinear singular value decomposition,” JMAA, 2000.

SLIDE 36

HMTML formulation

Specific optimization problem
Corresponds to find a subspace where the

representations of all domains are close to each other

Knowledge is transferred in this subspace, and so

different domains can help each other in learning the mapping 𝑉𝑛, or equivalently the metric 𝐵𝑛

argmin

𝑉𝑛 𝑛=1

𝑁

𝐺 𝑉𝑛 = ෍

𝑛=1 𝑁

1 𝑂𝑛

′ ෍ 𝑙=1 𝑂𝑛

′

𝑕 𝑧𝑛𝑙 1 − 𝛆𝑛𝑙

𝑈 𝑉𝑛𝑉𝑛 𝑈 𝛆𝑛𝑙

+ 𝛿 𝑂𝑉 ෍

𝑜=1 𝑂𝑉

𝒟𝑜

𝑉 − 𝒣 𝐺 2 + ෍ 𝑛=1 𝑁

𝛿𝑛 𝑉𝑛 1

SLIDE 37

HMTML solution

Rewrite 𝒟𝑜

𝑉 − 𝒣 𝐺 2 to an expression w.r.t. 𝑉𝑛

Alternating for each 𝑉𝑛 and solve each subproblem

w.r.t. 𝑉𝑛 by projected gradient descent

𝒣 = ℰ𝑠 ×1 𝑉1 ×2 𝑉2 ⋯ ×𝑁 𝑉𝑁 = ℬ ×𝑛 𝑉𝑛,

Metricizing property

𝐻 𝑛 = 𝑉𝑛𝐶 𝑛

𝒟𝑜

𝑉 − 𝒣 𝐺 2 =

𝐷𝑜 𝑛

𝑉

− 𝐻 𝑛

𝐺 2

𝒟𝑜

𝑉 − 𝒣 𝐺 2 =

𝐷𝑜 𝑛

𝑉

− 𝑉𝑛𝐶 𝑛

𝐺 2

[Lathauwer et al., 2000a]

ℬ = ℰ𝑠 ×1 𝑉1 ⋯ ×𝑛−1 𝑉𝑛−1 ×𝑛+1 𝑉𝑛+1 ⋯ ×𝑁 𝑉𝑁

L. De Lathauwer et al., “On the best Rank-1 and rank-(r1, r2, ..., rn) approximation of higher-order tensors,”

SIAM J. Matrix Anal. Appl., 2000.

SLIDE 38

Experiments

Datasets and features
Reuters multilingual collection (RMLC)
6 categories, 3 domains: English (EN), Italian (IT), Spanish (SP)
Number of Documents: EN=18758, IT=24039, SP=12342
TF-IDF features, PCA preprocess to find comparable and high-

level patterns for transfer

NUS WIDE
Subset of 12 animal concepts, 16519 images + tags
{SIFT, wavelet, tag} + PCA preprocess, each representation is a

domain

Evaluation criteria
Accuracy, MacroF1

SLIDE 39

Experiments

Compared methods
EU: Euclidean distance between samples based on their
riginal feature representations
RDML: an efficient and competitive DML algorithm,

does not make use of any additional information from

ther domains
DAMA: constructing mappings 𝑉𝑛 to link multiple

heterogeneous domains using manifold alignment

MTDA: the multi-task extension of linear discriminative

analysis

HMTML: the proposed method

SLIDE 40

Experiments

Average perf. of all domains w.r.t. common factors
Although the labeled samples in each domain is scarce,

learning the distance metric separately using RDML can still improve the performance significantly

SLIDE 41

Experiments

Average perf. of all domains w.r.t. common factors
All the three heterogeneous transfer learning

approaches achieve much better performance than

RDML. This indicates that it is useful to leverage

information from other domains in DML

SLIDE 42

Experiments

Average perf. of all domains w.r.t. common factors
HMTML outperforms both DAMA and MTDA at most

numbers (of common factors). This indicates that the learned factors by our method are more expressive than the other approaches

SLIDE 43

Experiments

Performance for individual domains
RDML improves the performance in each domain, and

the improvements are similar for different domains, since there is no communication between them

SLIDE 44

Experiments

Performance for individual domains
Transfer learning methods has much larger

improvements than RDML in the domains that the discriminative ability of the original representations is not very good. This demonstrates that the knowledge is successfully transferred between different domains

SLIDE 45

Experiments

Performance for individual domains
The discriminative domain obtains little benefit from the
ther relatively non-discriminative domains in DAMA

and MTDA, while in the proposed HMTML, we still achieve significant improvements

SLIDE 46

Conclusions

The labeled data deficiency problem can be

alleviated by learning metrics for multiple heterogeneous domains simultaneously

The shared knowledge of different domains

exploited by the transfer learning methods can benefit each domain if appropriate common factors are discovered, and the high-order statistics (correlation information) is critical in discovering such factors

SLIDE 47

Tensor Canonical Correlation Analysis and It Its applications

Outline

Mult lti-view dimension reduction (MVDR)

MVDR

MVDR

Canonical correla lation a analysis (CCA)

Generalizatio ions of CCA to several l vie iews

Generalizatio ions of CCA to several l vie iews

The proposed TCCA framework

The proposed TCCA framework for MVDR

Tensor basic ics

Tensor basic ics

Tensor basic ics

Tensor basic ics

Tensor basic ics

TCCA formulation

TCCA formulation

TCCA solu lution

KTCCA formulatio ion

KTCCA solution

Experimental l setup

Experimental l setup

Experimental l results and analy lysis

Experimental l results and analy lysis

Conclusions and discussion

Dis istance metric ic le learning (DML)

Transfer DML (TDML)

Homogeneous TDML (HoTDML)

Heterogeneous TDML (HeTDML)

HeTDML exis istin ing solu lutions

Heterogeneous multi-task metric ic le learning (HMTML)

HMTML framework

HMTML formulation

Knowledge transfer by high-order correlation maxim imization

Knowledge transfer by high-order correlation maxim imization

HMTML formulation

HMTML solution

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Experiments

Conclusions

Thank You! Q & & A