Online Learning with Pairwise Loss Functions Online Learning with - - PowerPoint PPT Presentation

online learning with pairwise loss functions online
SMART_READER_LITE
LIVE PREVIEW

Online Learning with Pairwise Loss Functions Online Learning with - - PowerPoint PPT Presentation

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG Seminar Series, Dept. of CSA, IISc Joint work with B. Sriperumbudur, P. Jain, H. Karnick Purushottam Kar MLO Group, Microsoft Research India Outline


slide-1
SLIDE 1

Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions

MLSIG Seminar Series, Dept. of CSA, IISc Joint work with B. Sriperumbudur, P. Jain, H. Karnick Purushottam Kar MLO Group, Microsoft Research India

slide-2
SLIDE 2

Outline

A quick introduction to

  • nline learning

A quick introduction to

  • nline learning

Examples of pairwise loss functions Examples of pairwise loss functions An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 2

slide-3
SLIDE 3

Outline

A quick introduction to

  • nline learning

A quick introduction to

  • nline learning

Notion of regret Generalization error

Examples of pairwise loss functions An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 3

slide-4
SLIDE 4

Credit Card Fraud Detection

MLSIG seminar series, Dept. of CSA, IISc 4

Transaction 1

  • Guess 
  • Truth 
  • Loss 0

Transaction 1

  • Guess 
  • Truth 
  • Loss 0

Transaction 2

  • Guess 
  • Truth 
  • Loss 1

Transaction 2

  • Guess 
  • Truth 
  • Loss 1

Transaction 3

  • Guess 
  • Truth 
  • Loss 0

Transaction 3

  • Guess 
  • Truth 
  • Loss 0

Transaction 4

  • Guess 
  • Truth 
  • Loss 0

Transaction 4

  • Guess 
  • Truth 
  • Loss 0
slide-5
SLIDE 5

The Online Learning Process

MLSIG seminar series, Dept. of CSA, IISc 5

+ +

Receive instance Take action

  • Incur loss

ℓ , Update →

Initialize

Truth revealed =

slide-6
SLIDE 6

Benefits of Online Learning

  • Don’t have to wait for all data to arrive
  • Streaming data, Transactional data
  • Applications to large scale learning
  • Data too large to fit in memory (or even disk)
  • Solution: stream data into memory from disk or network
  • Fast learning
  • Several online learning algorithms have cheap updates

  • Online gradient descent, Mirror descent

MLSIG seminar series, Dept. of CSA, IISc 6

slide-7
SLIDE 7

Example: Online Classification

  • Instances are vector-label pairs = ,
  • ∈ ℝ, y ∈ −1, +1
  • Actions are classifiers e.g. = , , ∈
  • Loss is the hinge loss function

ℓ , = 1 − ⋅ ,

  • Total loss incurred by adaptive classfn ∑

ℓ ,

  • Loss of single best classifier min

∈ ∑

ℓ ,

  • This is what a “batch” learning algorithm would have given
  • The online process suffers
  • Unable to see all data in one go

MLSIG seminar series, Dept. of CSA, IISc 7

slide-8
SLIDE 8

Regret and Generalization

  • Regret: how much the online process suffers

ℜ = ℓ(, )

  • − min

∈ ℓ ,

  • Online learning can compete with batch learning
  • Excess training error
  • ℜ ↓ 0 if ℜ =
  • Performance on unseen points: ℒ =

∼ ℓ ,

  • Online-to-batch conversion: For random , convex ℓ

ℒ ≤ inf

∈ ℒ + 1

ℜ + 1

  • where

=

MLSIG seminar series, Dept. of CSA, IISc 8

slide-9
SLIDE 9

Outline

A quick introduction to

  • nline learning

Notion of regret Generalization error

Examples of pairwise loss functions Examples of pairwise loss functions

Algorithmic challenges Learning theoretic challenges

An online learning model+algo for pairwise functions

MLSIG seminar series, Dept. of CSA, IISc 9

slide-10
SLIDE 10

Pointwise Loss Functions

  • Loss functions for classification, regression …
  • … look at the performance of function at one point

Examples

  • Hinge loss: ℓ , = 1 − ⋅ ,
  • Logistic loss: ℓ , = ln 1 + exp ⋅ ,
  • Squared loss: ℓ , = − ,
  • MLSIG seminar series, Dept. of CSA, IISc

10

ℓ: × → ℝ

slide-11
SLIDE 11

Metric Learning for Classification

MLSIG seminar series, Dept. of CSA, IISc 11

  • Penalize metric for bringing blue and red points close
  • Loss function needs to consider two points at a time!
  • … in other words a pairwise loss function
  • Example: ℓ , , =

1, ≠ and , < 1, = and , > 0, otherwise

slide-12
SLIDE 12

Bipartite Ranking

  • Want relevant results to be ranked above others
  • Penalize scoring function : → ℝ for each “switch”
  • ℓ , , = 1 iff > and <

Images taken from cinemahood.com, sify.com, santabanta.com and thehindu.com

12

Chennai Express Search

slide-13
SLIDE 13

Pairwise Loss Functions

Examples:

  • Mahalanobis metric learning
  • Bipartite ranking
  • Preference learning
  • Two-stage multiple kernel learning
  • Indefinite kernel learning

MLSIG seminar series, Dept. of CSA, IISc 13

ℓ: × × → ℝ

slide-14
SLIDE 14

Learning with Pairwise Loss Functions

Algorithmic challenges:

  • Training data available as a set = , , … ,
  • Question: how to create pairs?
  • Solution 1: min

  • () ∑

ℓ , ,

  • Expensive for ≫ 1
  • Solution 2: Use online techniques for a batch solver
  • Challenge: Online creation of pairs from a data stream
  • Desirable: Memory efficiency

MLSIG seminar series, Dept. of CSA, IISc 14

ℓ: × × → ℝ

slide-15
SLIDE 15

Learning with Pairwise Loss Functions

Learning theoretic challenges:

  • Batch learning methods: learn from pairs ,
  • Intersection between pairs: training data not i.i.d.
  • Direct application of concentration inequalities not possible
  • Online learning methods: let arrive in a stream
  • Need an appropriate notion of regret
  • Classical OTB proofs require i.i.d. data crucially

This talk: mostly algorithmic solutions + hint of theory

MLSIG seminar series, Dept. of CSA, IISc 15

ℓ: × × → ℝ

slide-16
SLIDE 16

Outline

A quick introduction to

  • nline learning

Notion of regret Generalization error

Examples of pairwise loss functions

Algorithmic challenges Learning theoretic challenges

An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions

A memory efficient online learning algo Regret and generalization bounds

MLSIG seminar series, Dept. of CSA, IISc 16

slide-17
SLIDE 17

An Online Learning Model for Pairwise Losses

  • At each time step
  • We propose an action (e.g. a scoring function or a metric)
  • We receive a single point = ,
  • We incur loss ℓ on action
  • Buffer , , , …
  • Pair up with points in buffer , , … ,
  • Incur loss

=

1 − 1 ℓ , , + ⋯ + ℓ , ,

MLSIG seminar series, Dept. of CSA, IISc 17

ℓ: × × → ℝ

slide-18
SLIDE 18

An Online Learning Model for Pairwise Losses

  • At each time step
  • We propose an action (e.g. a scoring function or a metric)
  • We receive a single point = ,
  • We incur loss ℓ on action
  • Finite Buffer □, □, … , □
  • Pair up with points in buffer , , … ,
  • Incur loss

= 1

ℓ , , + ⋯ + ℓ , ,

MLSIG seminar series, Dept. of CSA, IISc 18

ℓ: × × → ℝ

slide-19
SLIDE 19

An Online Learning Model for Pairwise Losses

Notions of Regret in this Model

  • How well are we able to do on pairs that we have seen
  • Finite buffer regret

= ℓ

  • − min

∈ ℓ

  • How well are we able to do on all possible pairs
  • All pairs regret

= ℓ

  • − min

∈ ℓ

  • MLSIG seminar series, Dept. of CSA, IISc

19

ℓ: × × → ℝ

slide-20
SLIDE 20

An Online Learning Algorithm for Pairwise Losses

OLP: Online learning with pairwise losses Simple variant of Zinkevich’s GIGA

  • Start with = 0
  • At each = 1 …
  • Receive a new point
  • Construct appropriate loss function ℓ = ℓ
  • r ℓ = ℓ
  • ← w −
  • If required, update buffer with

MLSIG seminar series, Dept. of CSA, IISc 20

ℓ: × × → ℝ

slide-21
SLIDE 21

An Online Learning Algorithm for Pairwise Losses

RS-x: Reservoir sampling with replacement

MLSIG seminar series, Dept. of CSA, IISc 21

ℓ: × × → ℝ

slide-22
SLIDE 22

An Online Learning Algorithm for Pairwise Losses

Guarantees for OLP and RS-x

  • Sampling guarantee

At any time > , the contents of buffer are i.i.d. samples from the set , , … ,

  • Regret guarantee

OLP guarantees** a finite buffer regret

  • Finite-to-all-pairs regret conversion

1 ℜ

≤ 1

+

log

  • MLSIG seminar series, Dept. of CSA, IISc

22

ℓ: × × → ℝ

slide-23
SLIDE 23

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

  • ,∼ℓ , ,
  • For random , convex ℓ and unbounded buffer

ℒ ≤ min

∈ ℒ + 1

+

log ⁄ where =

MLSIG seminar series, Dept. of CSA, IISc 23

ℓ: × × → ℝ

slide-24
SLIDE 24

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

  • ,∼ℓ , ,
  • For random , convex ℓ and finite buffer of size

ℒ ≤ min

∈ ℒ + 1

+

log ⁄ where =

  • Corollary: ℒ

≤ min

∈ ℒ +

log ⁄

MLSIG seminar series, Dept. of CSA, IISc 24

ℓ: × × → ℝ

slide-25
SLIDE 25

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

  • ,∼ℓ , ,
  • For random , strongly convex ℓ and unbounded buffer

ℒ ≤ min

∈ ℒ + 1

+ log

⁄ where =

MLSIG seminar series, Dept. of CSA, IISc 25

ℓ: × × → ℝ

slide-26
SLIDE 26

An Online Learning Algorithm for Pairwise Losses

OTB Guarantees for Pairwise loss functions Define ℒ ≔

  • ,∼ℓ , ,
  • For random , strongly convex ℓ and finite buffer

ℒ ≤ min

∈ ℒ + 1

+ log

⁄ where =

  • Corollary: ℒ

≤ min

∈ ℒ + log

MLSIG seminar series, Dept. of CSA, IISc 26

ℓ: × × → ℝ

slide-27
SLIDE 27

An Online Learning Algorithm for Pairwise Losses

Some other details

  • Our bounds give dimension independent bounds
  • For Hilbertian norm regularizations: no dependence on
  • For sparsity inducing regularizations: log dependence
  • Previous work [Wang et al, COLT12]: linear dependence
  • Proofs use (modified notions of) Rademacher averages
  • Trickier symmetrization step
  • Previous work: covering number based analysis

MLSIG seminar series, Dept. of CSA, IISc 27

ℓ: × × → ℝ

slide-28
SLIDE 28

Some Open Problems

  • Current all-pairs regret bound for finite buffers

log

  • Can we get bounds that scale as 1 ()

⁄ ?

  • Similar question for OTB conversion bounds
  • OTB bounds require stream-oblivious buffer updates
  • Update algorithm cannot look at just
  • Examples: FIFO, RS, RS-x
  • Guarantees for (suitable) stream-aware policies?

MLSIG seminar series, Dept. of CSA, IISc 28