Controlling Fairness and Bias in Dynamic Learning-to-Rank ACM SIGIR - - PowerPoint PPT Presentation

controlling fairness and bias in
SMART_READER_LITE
LIVE PREVIEW

Controlling Fairness and Bias in Dynamic Learning-to-Rank ACM SIGIR - - PowerPoint PPT Presentation

Controlling Fairness and Bias in Dynamic Learning-to-Rank ACM SIGIR 2020 Marco Morik * , Ashudeep Singh * , Jessica Hong , Thorsten Joachims TU Berlin, Cornell University Dynamic Learning-to-Rank +1 +1 +1 +1 +1 1 2


slide-1
SLIDE 1

Controlling Fairness and Bias in Dynamic Learning-to-Rank

ACM SIGIR 2020 Marco Morik*†, Ashudeep Singh*‡, Jessica Hong‡, Thorsten Joachims‡

† TU Berlin, ‡ Cornell University

slide-2
SLIDE 2

Dynamic Learning-to-Rank

….

Update Update Update

User 1 User 2 User 3 +1 +1 +1 +1 +1 +1 +1 +1

1 2 3 4 5 6

Candidate Set for query x

slide-3
SLIDE 3

Problem 1: Selection Bias due to position

  • Number of clicks is a biased estimator of

relevance.

  • Lower positions get lower attention.
  • Less attention means fewer clicks.
  • Rich-get-richer dynamic: What starts at the

bottom has little opportunity to rise in the ranking.

Position Bias

slide-4
SLIDE 4

49% 51%

User Distribution

Left Leaning Right Leaning

Problem 2: Unfair Exposure

1 2 3 4 5 6

Gleft Gright Ranking by true average relevance leads to unfair rankings.

Prefer Gright news articles. Prefer Gleft news articles.

Probability Ranking Principle [Robertson, 1977]: Rank documents by probability of relevance → 𝑧∗. Maximizes utility for virtually any measure U of ranking quality 𝑧∗ ≔ argmax𝑧 U 𝑧|𝑦

slide-5
SLIDE 5

Position-Based Exposure Model

Definition: Exposure 𝑓

𝑘 is the probability a users

  • bserves the item at position 𝑘.

Exposure of Group: 𝐹𝑦𝑞 𝐻 𝑦 = ෍

𝑘∈G

𝑓

𝑘

How to estimate?

  • Eye tracking [Joachims et al. 2007]
  • Intervention studies [Joachims et al. 2017]
  • Intervention harvesting [Agarwal et al. 2019,

Fang et al. 2019]

Disparate exposure allocation: A small difference in average relevance, leads to a large difference in average exposure!

Relevance = 51% Relevance = 49%

Gleft Gright

0.39 0.71 0.49 0.51

0.02 difference in expected relevance. 0.32 difference in expected exposure.

0.5 1 Exposure(j) = 1/log(1+j)

slide-6
SLIDE 6

Outline

  • Exposure Model
  • Fairness Notions
  • FairCo Algorithm
  • Unbiased Average Relevance estimation
  • Unbiased Relevance estimation for Personalization
slide-7
SLIDE 7

[Singh & Joachims. Fairness of Exposure in Rankings. KDD 2018]

Exposure Fairness Impact Fairness

Goal Exposure is allocated based on relevance

  • f the group.

𝐹𝑦𝑞 𝐻 𝑦 = 𝑔 𝑆𝑓𝑚 𝐻|𝑦 The expected impact (e.g. clickthrough rate) is allocated based on merit. 𝐽𝑛𝑞 𝐻 𝑦 = 𝑔(𝑆𝑓𝑚 𝐻 𝑦 ) For the position bias model, 𝐽𝑛𝑞 𝑒 𝑦 = 𝐹𝑦𝑞 𝑒|𝑦 𝑆𝑓𝑚 𝑒 𝑦 Constraint Make exposure proportional to relevance (per group) 𝐹𝑦𝑞(𝐻0|𝑦) 𝐹𝑦𝑞 𝐻1|𝑦 = 𝑆𝑓𝑚(𝐻0|𝑦) 𝑆𝑓𝑚 𝐻1 𝑦 . Make the expected impact proportional to relevance (per group) 𝐽𝑛𝑞(𝐻0|𝑦) 𝐽𝑛𝑞 𝐻1|𝑦 = 𝑆𝑓𝑚(𝐻0|𝑦) 𝑆𝑓𝑚(𝐻1|𝑦) . Disparity Measure 𝐸𝐹 𝐻0, 𝐻1 = 𝐹𝑦𝑞(𝐻0|𝑦) 𝑆𝑓𝑚(𝐻0|𝑦) − 𝐹𝑦𝑞 𝐻1|𝑦 𝑆𝑓𝑚(𝐻1|𝑦) . 𝐸𝐽 𝐻0, 𝐻1 = 𝐽𝑛𝑞 𝐻0|𝑦 𝑆𝑓𝑚 𝐻0|𝑦 − 𝐽𝑛𝑞 𝐻1|𝑦 𝑆𝑓𝑚 𝐻1|𝑦 .

slide-8
SLIDE 8

Does not satisfy Fairness of Exposure or Fairness of Impact. Relevance = 51% Relevance = 49%

Gleft Gright

0.39 0.71 0.49 0.51

0.5 1

Exposure(j) = 1/log(1+j)

slide-9
SLIDE 9

Dynamic Learning-to-Rank

Sequentially present rankings to users that ❑ Maximize Expected User Utility 𝔽 𝑉 𝑦 ❑ Ensure Unfairness 𝐸𝜐 goes to 0 with 𝜐.

slide-10
SLIDE 10

Fairness Controller (FairCo) LTR Algorithm

FairCo: Ranking at time 𝜐 𝜏𝜐 = argsort𝑒∈𝒠 ෠ 𝑆 𝑒 𝑦 + 𝜇 errτ 𝑒

  • Theorem: When the problem is well posed, FairCo ensures that 𝐸𝜐 → 0 as 𝜐 → ∞ at the

rate of 𝒫

1 𝜐 .

  • Requirements:
  • Estimating Average Relevances ෠

𝑆(𝑒).

  • Estimating Unbiased Conditional Relevances ෠

𝑆 𝑒 𝑦 for personalization.

Proportional Controller: Linear feedback control system where correction is proportional to the error. ෠ 𝑆 𝑒 𝑦 : Estimated Conditional Relevance 𝑓𝑠𝑠

𝜐 𝑒 = 𝜐 − 1 max 𝐻𝑗 (෡

𝐸𝜐

𝐹(𝐻𝑗, 𝐻(𝑒)))

𝜇 > 0

slide-11
SLIDE 11

Estimating Average Relevances

  • Average number of clicks is not a consistent estimator.
  • IPS weighted clicks:

𝑆IPS 𝑒 is an unbiased estimator of a document’s relevance.

𝑑𝑢 𝑒 : Click on 𝑒 at time 𝑢. 𝑞𝑢 𝑒 : Position bias at the position of 𝑒. [Joachims et al., 2017]

slide-12
SLIDE 12

Experimental Evaluation

Simulation on Ad Fontes Media Bias Dataset

A user’s relevance is a function of their polarity and the news article’s polarity, and their openness.

Prefer Gright news articles. Prefer Gleft news articles.

Gleft 𝜍𝑒 < 0 Gright 𝜍𝑒 ≥ 0

Each news source in the dataset has a polarity assigned 𝜍𝑒 ∈ [−1, 1]. Sample user 𝑣𝑢 is drawn with a polarity parameter 𝜍𝑣𝑢 ∈ [−1, 1] and an openness parameter 𝑝𝑢 ∈ (0.05, 0.55).

Goal: Present rankings to a sequence of users to maximize their utility while providing fair exposure to the news articles relative to their average relevance over the user population.

slide-13
SLIDE 13

Can FairCo break the Rich-get-richer dynamic?

Effect of the initial ranking after 3000 users.

FairCo keeps the Unfairness low for any amount of head start. Click count based ranking converges to unfair rankings due to the initial bias.

slide-14
SLIDE 14

Can FairCo ensure fairness for Minority user groups?

FairCo converges to fair ranking for all user distributions. Trades off utility for fairness when there is an imbalance in user distribution.

slide-15
SLIDE 15

Outline

  • Exposure Model
  • Fairness Notions
  • FairCo Algorithm
  • Unbiased Average Relevance estimation
  • Unbiased Relevance estimation for Personalization

Selection Bias Fairness

slide-16
SLIDE 16

D-ULTR: Relevance Estimation for Personalized Ranking

  • To estimate: ෠

𝑆𝑥 𝑒 𝒚𝑢 − Relevance of document 𝑒 for query 𝒚𝒖.

  • Train the neural network by minimizing ℒ𝑑 w .
  • ℒ𝑑 w is unbiased i.e. in expectation it is equal to a full information

squared loss (with no position bias).

෠ 𝑆𝑥: Output of a Neural Network with weights 𝑥. 𝑑𝑢 𝑒 : Click on 𝑒 at time 𝑢. 𝑞𝑢 𝑒 : Position bias at position of 𝑒.

slide-17
SLIDE 17

Evaluation on Movielens dataset

  • Completed a subset of Movielens dataset (10k × 100 ratings

matrix) using matrix factorization.

  • Selected 100 movies from top-5 production companies in ML-20M dataset.

Groups: MGM, Warner Bros, Paramount, 20th Century Fox, Columbia.

  • Selected 10k most active users.
  • User features 𝑦𝑢 come from this matrix factorization.

Goal: Present ranking to each user 𝑣𝜐 to maximize NDCG while making sure the production companies receive fair share of exposure relative to the average relevance of their movies.

slide-18
SLIDE 18

Does FairCo ensure fairness with effective personalization?

Exposure Unfairness Impact Unfairness Personalized Rankings achieve high utility (NDCG), while reducing Unfairness to 0 with 𝜐.

slide-19
SLIDE 19

Conclusions

  • Identified how biased feedback leads to unfairness and suboptimal

ranking in Dynamic-LTR.

  • Proposed FairCo to adaptively enforce amortized fairness constraints

while relevances are being learned.

  • Easy to implement and computationally efficient at serving time.
  • The algorithm breaks the rich-get-richer effect in Dynamic-LTR.
slide-20
SLIDE 20

Thank you!

Controlling Fairness and Bias in Dynamic Learning-to- Rank

Marco Morik*†, Ashudeep Singh*‡, Jessica Hong‡, Thorsten Joachims‡

† TU Berlin, ‡ Cornell University

ACM SIGIR 2020