Promoting Fairness in Learned Models by Learning to Active Learn - - PowerPoint PPT Presentation

promoting fairness in learned models by learning to
SMART_READER_LITE
LIVE PREVIEW

Promoting Fairness in Learned Models by Learning to Active Learn - - PowerPoint PPT Presentation

Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints Amr Sharaf Hal Daum III University of Maryland University of Maryland Microsoft Research amr@cs.umd.edu me@hal3.name 1 Can we learn to active learn


slide-1
SLIDE 1

Promoting Fairness in Learned Models by Learning to Active Learn under Parity Constraints

Amr Sharaf University of Maryland amr@cs.umd.edu Hal Daumé III University of Maryland Microsoft Research me@hal3.name

1

slide-2
SLIDE 2

Can we learn to active learn under fairness parity constraints?

2

slide-3
SLIDE 3

Pre-existing data D = (U,)

PANDA Test Time Behavior

U

3

slide-4
SLIDE 4

Pre-existing data D = (U,)

Transformer Selection Policy π

PANDA Test Time Behavior

Feed Forward Decoder

U

4

slide-5
SLIDE 5

Pre-existing data D = (U,)

Transformer Selection Policy π

PANDA Test Time Behavior

Distribution Q Over U, Y

Feed Forward Decoder

U

Gumbel(0) + Q=π(h0,D) x B sampled items

5

slide-6
SLIDE 6

Pre-existing data D = (U,)

Transformer Selection Policy π

PANDA Test Time Behavior

Train Classifier hB = A(DB)

  • n B Samples

Distribution Q Over U, Y

Feed Forward Decoder

U

Gumbel(0) + Q=π(h0,D) x B sampled items

6

slide-7
SLIDE 7

Pre-existing data D = (U,)

Transformer Selection Policy π

PANDA Test Time Behavior

Train Classifier hB = A(DB)

  • n B Samples

Evaluate Meta-Loss

  • n held-out data V
  • n accuracy / parity:

𝔽Vℓ(hB) / Δv(hB) Distribution Q Over U, Y

Feed Forward Decoder

Gumbel(0) + Q=π(h0,D) x B sampled items

U

7

slide-8
SLIDE 8

Goal: can we manage an efficacy vs annotation cost trade-off under a target parity constraint?

8

slide-9
SLIDE 9

Transformer Selection Policy π

PANDA Train Time Behavior

Train Classifier hB = A(DB)

  • n B Samples

Evaluate Meta-Loss

  • n held-out data V
  • n accuracy / parity:

𝔽Vℓ(hB) / Δv(hB) Distribution Q Over U, Y

Feed Forward Decoder

U

Gumbel(0) + Q=π(h0,D) x B sampled items

9

slide-10
SLIDE 10

Pre-existing data D = (U, Y)

Transformer Selection Policy π

PANDA Train Time Behavior

Train Classifier hB = A(DB)

  • n B Samples

Evaluate Meta-Loss

  • n held-out data V
  • n accuracy / parity:

𝔽Vℓ(hB) / Δv(hB) Distribution Q Over U, Y

Feed Forward Decoder

U Y

Gumbel(0) + Q=π(h0,D) x B sampled items

10

slide-11
SLIDE 11

Pre-existing data D = (U, Y)

Transformer Selection Policy π

PANDA Train Time Behavior

Train Classifier hB = A(DB)

  • n B Samples

Evaluate Meta-Loss

  • n held-out data V
  • n accuracy / parity:

𝔽Vℓ(hB) / Δv(hB)

Compute Gradients w.r.t parameters of π update π to minimize performance/parity loss

Distribution Q Over U, Y

Feed Forward Decoder

U Y

Gumbel(0) + Q=π(h0,D) x B sampled items

11

slide-12
SLIDE 12

Experimental Results

Random Sampling Fairlearn PANDA Fair Active Learning Entropy Sampling Group Aware Random Sampling

12

slide-13
SLIDE 13

F-Score vs Budget for different Active Learning Algorithms

F-score 0.3 0.4 0.5 0.6 0.7 Budget 100 200 300 400

Demographic Disparity vs Budget for different Active Learning Algorithms

Demographic Disparity 0.035 0.07 0.105 0.14 Budget 100 200 300 400

Experimental Results

Random Sampling Fairlearn PANDA Fair Active Learning Entropy Sampling Group Aware Random Sampling

13

slide-14
SLIDE 14

Experimental Results

Random Sampling Fairlearn PANDA Fair Active Learning Entropy Sampling Group Aware

Demographic Disparity vs F-Score

F-Score

0.45 0.488 0.525 0.563 0.6

Demographic Disparity

0.025 0.044 0.062 0.081 0.1

Error Rate Balance vs F-Score

F-Score

0.45 0.488 0.525 0.563 0.6

Error Rate Balance

0.025 0.119 0.213 0.306 0.4

Random Sampling Fairlearn PANDA Fair Active Learning Entropy Sampling Group Aware

14

slide-15
SLIDE 15

Conclusion

  • Q: Can we learn to active learn under fairness parity constraints?
  • A: Yes, using meta-learning + Forward Backward Splitting;
  • We compare to alternative active learning strategies;
  • PANDA outperforms alternative strategies in most setting.

15

slide-16
SLIDE 16

Questions? amr@cs.umd.edu

16