Online Learning with Pairwise Loss Functions Online Learning with - - PowerPoint PPT Presentation
Online Learning with Pairwise Loss Functions Online Learning with - - PowerPoint PPT Presentation
Online Learning with Pairwise Loss Functions Online Learning with Pairwise Loss Functions MLSIG Seminar Series, Dept. of CSA, IISc Joint work with B. Sriperumbudur, P. Jain, H. Karnick Purushottam Kar MLO Group, Microsoft Research India Outline
Outline
A quick introduction to
- nline learning
A quick introduction to
- nline learning
Examples of pairwise loss functions Examples of pairwise loss functions An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions
MLSIG seminar series, Dept. of CSA, IISc 2
Outline
A quick introduction to
- nline learning
A quick introduction to
- nline learning
Notion of regret Generalization error
Examples of pairwise loss functions An online learning model+algo for pairwise functions
MLSIG seminar series, Dept. of CSA, IISc 3
Credit Card Fraud Detection
MLSIG seminar series, Dept. of CSA, IISc 4
Transaction 1
- Guess
- Truth
- Loss 0
Transaction 1
- Guess
- Truth
- Loss 0
Transaction 2
- Guess
- Truth
- Loss 1
Transaction 2
- Guess
- Truth
- Loss 1
Transaction 3
- Guess
- Truth
- Loss 0
Transaction 3
- Guess
- Truth
- Loss 0
Transaction 4
- Guess
- Truth
- Loss 0
Transaction 4
- Guess
- Truth
- Loss 0
The Online Learning Process
MLSIG seminar series, Dept. of CSA, IISc 5
+ +
Receive instance Take action
- Incur loss
ℓ , Update →
Initialize
Truth revealed =
Benefits of Online Learning
- Don’t have to wait for all data to arrive
- Streaming data, Transactional data
- Applications to large scale learning
- Data too large to fit in memory (or even disk)
- Solution: stream data into memory from disk or network
- Fast learning
- Several online learning algorithms have cheap updates
→
- Online gradient descent, Mirror descent
MLSIG seminar series, Dept. of CSA, IISc 6
Example: Online Classification
- Instances are vector-label pairs = ,
- ∈ ℝ, y ∈ −1, +1
- Actions are classifiers e.g. = , , ∈
- Loss is the hinge loss function
ℓ , = 1 − ⋅ ,
- Total loss incurred by adaptive classfn ∑
ℓ ,
- Loss of single best classifier min
∈ ∑
ℓ ,
- This is what a “batch” learning algorithm would have given
- The online process suffers
- Unable to see all data in one go
MLSIG seminar series, Dept. of CSA, IISc 7
Regret and Generalization
- Regret: how much the online process suffers
ℜ = ℓ(, )
- − min
∈ ℓ ,
- Online learning can compete with batch learning
- Excess training error
- ℜ ↓ 0 if ℜ =
- Performance on unseen points: ℒ =
∼ ℓ ,
- Online-to-batch conversion: For random , convex ℓ
ℒ ≤ inf
∈ ℒ + 1
ℜ + 1
- where
=
- ∑
MLSIG seminar series, Dept. of CSA, IISc 8
Outline
A quick introduction to
- nline learning
Notion of regret Generalization error
Examples of pairwise loss functions Examples of pairwise loss functions
Algorithmic challenges Learning theoretic challenges
An online learning model+algo for pairwise functions
MLSIG seminar series, Dept. of CSA, IISc 9
Pointwise Loss Functions
- Loss functions for classification, regression …
- … look at the performance of function at one point
Examples
- Hinge loss: ℓ , = 1 − ⋅ ,
- Logistic loss: ℓ , = ln 1 + exp ⋅ ,
- Squared loss: ℓ , = − ,
- MLSIG seminar series, Dept. of CSA, IISc
10
ℓ: × → ℝ
Metric Learning for Classification
MLSIG seminar series, Dept. of CSA, IISc 11
- Penalize metric for bringing blue and red points close
- Loss function needs to consider two points at a time!
- … in other words a pairwise loss function
- Example: ℓ , , =
1, ≠ and , < 1, = and , > 0, otherwise
Bipartite Ranking
- Want relevant results to be ranked above others
- Penalize scoring function : → ℝ for each “switch”
- ℓ , , = 1 iff > and <
Images taken from cinemahood.com, sify.com, santabanta.com and thehindu.com
12
Chennai Express Search
Pairwise Loss Functions
Examples:
- Mahalanobis metric learning
- Bipartite ranking
- Preference learning
- Two-stage multiple kernel learning
- Indefinite kernel learning
MLSIG seminar series, Dept. of CSA, IISc 13
ℓ: × × → ℝ
Learning with Pairwise Loss Functions
Algorithmic challenges:
- Training data available as a set = , , … ,
- Question: how to create pairs?
- Solution 1: min
∈
- () ∑
ℓ , ,
- Expensive for ≫ 1
- Solution 2: Use online techniques for a batch solver
- Challenge: Online creation of pairs from a data stream
- Desirable: Memory efficiency
MLSIG seminar series, Dept. of CSA, IISc 14
ℓ: × × → ℝ
Learning with Pairwise Loss Functions
Learning theoretic challenges:
- Batch learning methods: learn from pairs ,
- Intersection between pairs: training data not i.i.d.
- Direct application of concentration inequalities not possible
- Online learning methods: let arrive in a stream
- Need an appropriate notion of regret
- Classical OTB proofs require i.i.d. data crucially
This talk: mostly algorithmic solutions + hint of theory
MLSIG seminar series, Dept. of CSA, IISc 15
ℓ: × × → ℝ
Outline
A quick introduction to
- nline learning
Notion of regret Generalization error
Examples of pairwise loss functions
Algorithmic challenges Learning theoretic challenges
An online learning model+algo for pairwise functions An online learning model+algo for pairwise functions
A memory efficient online learning algo Regret and generalization bounds
MLSIG seminar series, Dept. of CSA, IISc 16
An Online Learning Model for Pairwise Losses
- At each time step
- We propose an action (e.g. a scoring function or a metric)
- We receive a single point = ,
- We incur loss ℓ on action
- Buffer , , , …
- Pair up with points in buffer , , … ,
- Incur loss
ℓ
=
1 − 1 ℓ , , + ⋯ + ℓ , ,
MLSIG seminar series, Dept. of CSA, IISc 17
ℓ: × × → ℝ
An Online Learning Model for Pairwise Losses
- At each time step
- We propose an action (e.g. a scoring function or a metric)
- We receive a single point = ,
- We incur loss ℓ on action
- Finite Buffer □, □, … , □
- Pair up with points in buffer , , … ,
- Incur loss
ℓ
= 1
ℓ , , + ⋯ + ℓ , ,
MLSIG seminar series, Dept. of CSA, IISc 18
ℓ: × × → ℝ
An Online Learning Model for Pairwise Losses
Notions of Regret in this Model
- How well are we able to do on pairs that we have seen
- Finite buffer regret
ℜ
= ℓ
- − min
∈ ℓ
- How well are we able to do on all possible pairs
- All pairs regret
ℜ
= ℓ
- − min
∈ ℓ
- MLSIG seminar series, Dept. of CSA, IISc
19
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
OLP: Online learning with pairwise losses Simple variant of Zinkevich’s GIGA
- Start with = 0
- At each = 1 …
- Receive a new point
- Construct appropriate loss function ℓ = ℓ
- r ℓ = ℓ
- ← w −
- ℓ
- If required, update buffer with
MLSIG seminar series, Dept. of CSA, IISc 20
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
RS-x: Reservoir sampling with replacement
MLSIG seminar series, Dept. of CSA, IISc 21
ℓ: × × → ℝ
- ∼
⁄
An Online Learning Algorithm for Pairwise Losses
Guarantees for OLP and RS-x
- Sampling guarantee
At any time > , the contents of buffer are i.i.d. samples from the set , , … ,
- Regret guarantee
OLP guarantees** a finite buffer regret
- ℜ
≤
- Finite-to-all-pairs regret conversion
1 ℜ
≤ 1
ℜ
+
log
- MLSIG seminar series, Dept. of CSA, IISc
22
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
OTB Guarantees for Pairwise loss functions Define ℒ ≔
- ,∼ℓ , ,
- For random , convex ℓ and unbounded buffer
ℒ ≤ min
∈ ℒ + 1
ℜ
+
log ⁄ where =
- ∑
MLSIG seminar series, Dept. of CSA, IISc 23
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
OTB Guarantees for Pairwise loss functions Define ℒ ≔
- ,∼ℓ , ,
- For random , convex ℓ and finite buffer of size
ℒ ≤ min
∈ ℒ + 1
ℜ
+
log ⁄ where =
- ∑
- Corollary: ℒ
≤ min
∈ ℒ +
log ⁄
MLSIG seminar series, Dept. of CSA, IISc 24
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
OTB Guarantees for Pairwise loss functions Define ℒ ≔
- ,∼ℓ , ,
- For random , strongly convex ℓ and unbounded buffer
ℒ ≤ min
∈ ℒ + 1
ℜ
+ log
⁄ where =
- ∑
MLSIG seminar series, Dept. of CSA, IISc 25
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
OTB Guarantees for Pairwise loss functions Define ℒ ≔
- ,∼ℓ , ,
- For random , strongly convex ℓ and finite buffer
ℒ ≤ min
∈ ℒ + 1
ℜ
+ log
⁄ where =
- ∑
- Corollary: ℒ
≤ min
∈ ℒ + log
⁄
MLSIG seminar series, Dept. of CSA, IISc 26
ℓ: × × → ℝ
An Online Learning Algorithm for Pairwise Losses
Some other details
- Our bounds give dimension independent bounds
- For Hilbertian norm regularizations: no dependence on
- For sparsity inducing regularizations: log dependence
- Previous work [Wang et al, COLT12]: linear dependence
- Proofs use (modified notions of) Rademacher averages
- Trickier symmetrization step
- Previous work: covering number based analysis
MLSIG seminar series, Dept. of CSA, IISc 27
ℓ: × × → ℝ
Some Open Problems
- Current all-pairs regret bound for finite buffers
ℜ
≤
log
- Can we get bounds that scale as 1 ()
⁄ ?
- Similar question for OTB conversion bounds
- OTB bounds require stream-oblivious buffer updates
- Update algorithm cannot look at just
- Examples: FIFO, RS, RS-x
- Guarantees for (suitable) stream-aware policies?
MLSIG seminar series, Dept. of CSA, IISc 28