Modeling filter configuration Tyler Moore Computer Science & - - PDF document

modeling filter configuration
SMART_READER_LITE
LIVE PREVIEW

Modeling filter configuration Tyler Moore Computer Science & - - PDF document

Notes Modeling filter configuration Tyler Moore Computer Science & Engineering Department, SMU, Dallas, TX CSE 5/7338 Lecture 9 ROC curves Optimal filter configuration An economic model of optimal filter configuration Notes


slide-1
SLIDE 1

Modeling filter configuration

Tyler Moore

Computer Science & Engineering Department, SMU, Dallas, TX

CSE 5/7338 Lecture 9

Optimal filter configuration ROC curves An economic model of optimal filter configuration

Domain-specific models

Up to now we have modeled security investment at a very high level Map costs to benefits, assume diminishing marginal returns to investment, etc. Useful for when justifying security budgets compared to non-security expenditures Not useful for deciding how best to allocate a given security budget Today, we discuss a model for a tactical security investment decision: configuring a filter to balance false positives and negatives

3 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Binary classification is a recurring problem in CS

Common task: distill many observations to a binary signal

{0, 1}: communications theory S = {undervalued, overvalued}: stock trading S = {reject, accept}: research hypothesis S = {benign, malicious}: security filter

Such simplification inevitably leads to errors compared to reality (aka ground truth)

4 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Filter defense mechanism

Reality Signal no attack attack benign 1 − α β malicious α 1 − β α: false positive rate, β: false negative rate

5 / 15

Notes Notes Notes Notes

slide-2
SLIDE 2

Optimal filter configuration ROC curves An economic model of optimal filter configuration

Receiver operating characteristic

Detection rate 1 − β 1 False positive rate α 1

45◦ 6 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Receiver operating characteristic

Detection rate 1 − β 1 False positive rate α 1

45◦

α = β

EERsolid EERdashed

6 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Model for optimal filter configuration

Binary classifiers are imperfect Finding the optimal trade-off, say for an IDS or spam filter, is hard Can be framed as an economic trade-off between opportunity cost of false positives and losses incurred by false negatives

7 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Model for optimal filter configuration

We can see from ROCs that β can be expressed as a function

  • f α.

β : [0, 1] → [0, 1] defines the false negative rate as a function

  • f the false positive rate α

β(0) = 1, β(1) = 0 We assume β′(x) < 0 and β′′(x) ≥ 0

8 / 15

Notes Notes Notes Notes

slide-3
SLIDE 3

Optimal filter configuration ROC curves An economic model of optimal filter configuration

Model for optimal filter configuration

Suppose we rely on a filter to scan incoming email attachments for malware a: cost of false positive (blocking a benign email) b: cost of false negative (delivering malicious email) p: probability of email containing malware Cost C(α) = p · β(α) · b + (1 − p) · α · a

Suppose p = 0.1, a = $250, b = $500, α = 0.1, β = .2 C(α) = 0.1 · 0.2 · 500 + 0.9 · 0.1 · 250 = $32.50

9 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration: exercise 1

Suppose we rely on a filter to scan incoming email attachments for malware. Suppose the cost of dealing with a false negative event is $400, and the cost of dealing with a false positive is $200. 20% of incoming email has malware. You can choose between two configurations

  • Config. A: 10% false positive rate and 30% false negative rate
  • Config. B: 25% false positive rate and 15% false negative rate

Your task: compute the expected costs for both configurations, and state which configuration you prefer.

10 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Model for optimal filter configuration

α∗ = arg min

α p · β(α) · b + (1 − p) · α · a

which has first-order condition (FOC) 0 = δα

  • p · β(α∗) · b + (1 − p) · α∗ · a
  • after rearranging, we obtain:

β′(α∗) = −1 − p p · a b

11 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration (continuous ROC curves)

Detection rate 1 − β 1 False positive rate α 1

Indifference curves

(1−p)a p·b

α∗

B

α∗

A

12 / 15

Notes Notes Notes Notes

slide-4
SLIDE 4

Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration (continuous ROC curves)

Detection rate 1 − β 1 False positive rate α 1

45◦

B A α = β EERA = EERB AUCA = AUCB

12 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration (continuous ROC curves)

Detection rate 1 − β 1 False positive rate α 1

45◦

B A

(1−p)a p·b

α∗

B

α∗

A

12 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration (discrete ROC curves)

Detection rate 1 − β 1 False positive rate α 1

45◦

(1−p)a p·b

C F E α∗D

13 / 15 Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration example (discrete ROC curves)

Detection rate 1 − β 1 False positive rate α 1

0.2 0.7 0.4 0.9 0.2 0.4 s l

  • p

e 2 0.5 0.5 s l

  • p

e 1 0.3 0.1 slope 1/3

(1−p)a p·b

C F E α∗D α∗ = 0.2 if 1 ≤ (1−p)a

p·b

≤ 2

14 / 15

Notes Notes Notes Notes

slide-5
SLIDE 5

Optimal filter configuration ROC curves An economic model of optimal filter configuration

Optimal filter configuration: exercise 2

Suppose we rely on a filter to scan incoming email attachments for malware. Suppose the cost of dealing with a false negative event is $400, and the cost of dealing with a false positive is $200. 20% of incoming email has malware. You can choose between two configurations

  • Config. A: 10% false positive rate and 30% false negative rate
  • Config. B: 25% false positive rate and 15% false negative rate

Your task

1

Draw the ROC curve for configurations A and B (plus (0% FP, 100% FN) and (100% FP, 0% FN))

2

Calculate the slope of the indifference curve for the optimal configuration

3

Select the optimal point for the ROC curve

15 / 15

Notes Notes Notes Notes