[PPT] - Online Learning with Pairwise Loss Functions Online Learning with PowerPoint Presentation

SLIDE 1

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions

MLSIG Seminar Series, Dept. of CSA, IISc Joint work with B. Sriperumbudur, P. Jain, H. Karnick Purushottam Kar MLO Group, Microsoft Research India

SLIDE 2

Outline

A quick introduction to

nline learning

A quick introduction to

nline learning

Examples of pairwise loss functions Examples of pairwise loss functions An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 2

SLIDE 3

Outline

A quick introduction to

nline learning

A quick introduction to

nline learning

Notion of regret Generalization error

Examples of pairwise loss functions An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 3

SLIDE 4

Credit Card Fraud Detection

MLSIG seminar series, Dept. of CSA, IISc 4

Transaction 1

Guess 
Truth 
Loss 0

Transaction 1

Guess 
Truth 
Loss 0

Transaction 2

Guess 
Truth 
Loss 1

Transaction 2

Guess 
Truth 
Loss 1

Transaction 3

Guess 
Truth 
Loss 0

Transaction 3

Guess 
Truth 
Loss 0

Transaction 4

Guess 
Truth 
Loss 0

Transaction 4

Guess 
Truth 
Loss 0

SLIDE 5

The Online Learning Process

MLSIG seminar series, Dept. of CSA, IISc 5

+ +

Receive instance Take action

Incur loss

ℓ , Update →

Initialize

Truth revealed =

SLIDE 6

Benefits of Online Learning

Don’t have to wait for all data to arrive
Streaming data, Transactional data
Applications to large scale learning
Data too large to fit in memory (or even disk)
Solution: stream data into memory from disk or network
Fast learning
Several online learning algorithms have cheap updates

→

Online gradient descent, Mirror descent

MLSIG seminar series, Dept. of CSA, IISc 6

SLIDE 7

Example: Online Classification

Instances are vector-label pairs = ,
∈ ℝ, y ∈ −1, +1
Actions are classifiers e.g. = , , ∈
Loss is the hinge loss function

ℓ , = 1 − ⋅ ,

Total loss incurred by adaptive classfn ∑

ℓ ,

Loss of single best classifier min

∈ ∑

ℓ ,

This is what a “batch” learning algorithm would have given
The online process suffers
Unable to see all data in one go

MLSIG seminar series, Dept. of CSA, IISc 7

SLIDE 8

Regret and Generalization

Regret: how much the online process suffers

ℜ = ℓ(, )

− min

∈ ℓ ,

Online learning can compete with batch learning
Excess training error
ℜ ↓ 0 if ℜ =
Performance on unseen points: ℒ =

∼ ℓ ,

Online-to-batch conversion: For random , convex ℓ

ℒ ≤ inf

∈ ℒ + 1

ℜ + 1

where

=

∑

MLSIG seminar series, Dept. of CSA, IISc 8

SLIDE 9

Outline

A quick introduction to

nline learning

Notion of regret Generalization error

Examples of pairwise loss functions Examples of pairwise loss functions

Algorithmic challenges Learning theoretic challenges

An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 9

SLIDE 10

Pointwise Loss Functions

Loss functions for classification, regression …
… look at the performance of function at one point

Examples

Hinge loss: ℓ , = 1 − ⋅ ,
Logistic loss: ℓ , = ln 1 + exp ⋅ ,
Squared loss: ℓ , = − ,
MLSIG seminar series, Dept. of CSA, IISc

10

ℓ: × → ℝ

SLIDE 11

Metric Learning for Classification

MLSIG seminar series, Dept. of CSA, IISc 11

Penalize metric for bringing blue and red points close
Loss function needs to consider two points at a time!
… in other words a pairwise loss function
Example: ℓ , , =

1, ≠ and , < 1, = and , > 0, otherwise

SLIDE 12

Bipartite Ranking

Want relevant results to be ranked above others
Penalize scoring function : → ℝ for each “switch”
ℓ , , = 1 iff > and <

Images taken from cinemahood.com, sify.com, santabanta.com and thehindu.com

12

Chennai Express Search

SLIDE 13

Pairwise Loss Functions

Examples:

Mahalanobis metric learning
Bipartite ranking
Preference learning
Two-stage multiple kernel learning
Indefinite kernel learning

MLSIG seminar series, Dept. of CSA, IISc 13

ℓ: × × → ℝ

SLIDE 14

Learning with Pairwise Loss Functions

Algorithmic challenges:

Training data available as a set = , , … ,
Question: how to create pairs?
Solution 1: min

∈

() ∑

ℓ , ,

Expensive for ≫ 1
Solution 2: Use online techniques for a batch solver
Challenge: Online creation of pairs from a data stream
Desirable: Memory efficiency

MLSIG seminar series, Dept. of CSA, IISc 14

ℓ: × × → ℝ

SLIDE 15

Learning with Pairwise Loss Functions

Learning theoretic challenges:

Batch learning methods: learn from pairs ,
Intersection between pairs: training data not i.i.d.
Direct application of concentration inequalities not possible
Online learning methods: let arrive in a stream
Need an appropriate notion of regret
Classical OTB proofs require i.i.d. data crucially

This talk: mostly algorithmic solutions + hint of theory

MLSIG seminar series, Dept. of CSA, IISc 15

ℓ: × × → ℝ

SLIDE 16

Outline

A quick introduction to

nline learning

Notion of regret Generalization error

Examples of pairwise loss functions

Algorithmic challenges Learning theoretic challenges

An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions

A memory efficient online learning algo Regret and generalization bounds

MLSIG seminar series, Dept. of CSA, IISc 16

SLIDE 17

An Online Learning Model for Pairwise Losses

At each time step
We propose an action (e.g. a scoring function or a metric)
We receive a single point = ,
We incur loss ℓ on action
Buffer , , , …
Pair up with points in buffer , , … ,
Incur loss

ℓ

=

1 − 1 ℓ , , + ⋯ + ℓ , ,

MLSIG seminar series, Dept. of CSA, IISc 17

ℓ: × × → ℝ

SLIDE 18

An Online Learning Model for Pairwise Losses

At each time step
We propose an action (e.g. a scoring function or a metric)
We receive a single point = ,
We incur loss ℓ on action
Finite Buffer □, □, … , □
Pair up with points in buffer , , … ,
Incur loss

ℓ

= 1

ℓ , , + ⋯ + ℓ , ,

MLSIG seminar series, Dept. of CSA, IISc 18

ℓ: × × → ℝ

SLIDE 19

An Online Learning Model for Pairwise Losses

Notions of Regret in this Model

How well are we able to do on pairs that we have seen
Finite buffer regret

ℜ

= ℓ

− min

∈ ℓ

How well are we able to do on all possible pairs
All pairs regret

ℜ

= ℓ

− min

∈ ℓ

MLSIG seminar series, Dept. of CSA, IISc

19

ℓ: × × → ℝ

SLIDE 20

An Online Learning Algorithm for Pairwise Losses

OLP: Online learning with pairwise losses Simple variant of Zinkevich’s GIGA

Start with = 0
At each = 1 …
Receive a new point
Construct appropriate loss function ℓ = ℓ
r ℓ = ℓ
← w −
ℓ
If required, update buffer with

MLSIG seminar series, Dept. of CSA, IISc 20

ℓ: × × → ℝ

SLIDE 21

An Online Learning Algorithm for Pairwise Losses

RS-x: Reservoir sampling with replacement

MLSIG seminar series, Dept. of CSA, IISc 21

ℓ: × × → ℝ

∼

⁄

SLIDE 22

An Online Learning Algorithm for Pairwise Losses

Guarantees for OLP and RS-x

Sampling guarantee

At any time > , the contents of buffer are i.i.d. samples from the set , , … ,

Regret guarantee

OLP guarantees** a finite buffer regret

ℜ

≤

Finite-to-all-pairs regret conversion

1 ℜ

≤ 1

ℜ

+

log

MLSIG seminar series, Dept. of CSA, IISc

22

ℓ: × × → ℝ

SLIDE 23

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

,∼ℓ , ,
For random , convex ℓ and unbounded buffer

ℒ ≤ min

∈ ℒ + 1

ℜ

+

log ⁄ where =

∑

MLSIG seminar series, Dept. of CSA, IISc 23

ℓ: × × → ℝ

SLIDE 24

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

,∼ℓ , ,
For random , convex ℓ and finite buffer of size

ℒ ≤ min

∈ ℒ + 1

ℜ

+

log ⁄ where =

∑
Corollary: ℒ

≤ min

∈ ℒ +

log ⁄

MLSIG seminar series, Dept. of CSA, IISc 24

ℓ: × × → ℝ

SLIDE 25

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

,∼ℓ , ,
For random , strongly convex ℓ and unbounded buffer

ℒ ≤ min

∈ ℒ + 1

ℜ

+ log

⁄ where =

∑

MLSIG seminar series, Dept. of CSA, IISc 25

ℓ: × × → ℝ

SLIDE 26

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

,∼ℓ , ,
For random , strongly convex ℓ and finite buffer

ℒ ≤ min

∈ ℒ + 1

ℜ

+ log

⁄ where =

∑
Corollary: ℒ

≤ min

∈ ℒ + log

⁄

MLSIG seminar series, Dept. of CSA, IISc 26

ℓ: × × → ℝ

SLIDE 27

An Online Learning Algorithm for Pairwise Losses

Some other details

Our bounds give dimension independent bounds
For Hilbertian norm regularizations: no dependence on
For sparsity inducing regularizations: log dependence
Previous work [Wang et al, COLT12]: linear dependence
Proofs use (modified notions of) Rademacher averages
Trickier symmetrization step
Previous work: covering number based analysis

MLSIG seminar series, Dept. of CSA, IISc 27

ℓ: × × → ℝ

SLIDE 28

Some Open Problems

Current all-pairs regret bound for finite buffers

ℜ

≤

log

Can we get bounds that scale as 1 ()

⁄ ?

Similar question for OTB conversion bounds
OTB bounds require stream-oblivious buffer updates
Update algorithm cannot look at just
Examples: FIFO, RS, RS-x
Guarantees for (suitable) stream-aware policies?

MLSIG seminar series, Dept. of CSA, IISc 28