Exploiting Transport-Level Characteristics of Spam Robert Beverly 1 - - PowerPoint PPT Presentation

exploiting transport level characteristics of spam
SMART_READER_LITE
LIVE PREVIEW

Exploiting Transport-Level Characteristics of Spam Robert Beverly 1 - - PowerPoint PPT Presentation

Exploiting Transport-Level Characteristics of Spam Robert Beverly 1 Karen Sollins MIT Computer Science and Artificial Intelligence Laboratory 1 now at BBN Technologies {rbeverly,sollins}@csail.mit.edu August 21, 2008 Conference on Email and


slide-1
SLIDE 1

Exploiting Transport-Level Characteristics of Spam

Robert Beverly1 Karen Sollins

MIT Computer Science and Artificial Intelligence Laboratory

1now at BBN Technologies

{rbeverly,sollins}@csail.mit.edu August 21, 2008 Conference on Email and Anti-Spam 2008

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 1 / 46

slide-2
SLIDE 2

Background The Character of Spam

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 2 / 46

slide-3
SLIDE 3

Background The Character of Spam

The Spam Arms Race

Attackers, scammers and thieves quickly adapt to defenses. Most effective solutions exploit fundamental weaknesses of attackers Current Best Practices: Content Filtering ... response: modify word tokens Reputation Analysis ... response: dynamic, fresh addresses Collaborative Filtering ... response: mail uniqueness And the cycle continues: Authentication Schemes, computational puzzles, etc.

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 3 / 46

slide-4
SLIDE 4

Background The Character of Spam

The Spam Arms Race

Attackers, scammers and thieves quickly adapt to defenses. Most effective solutions exploit fundamental weaknesses of attackers Current Best Practices: Content Filtering ... response: modify word tokens Reputation Analysis ... response: dynamic, fresh addresses Collaborative Filtering ... response: mail uniqueness And the cycle continues: Authentication Schemes, computational puzzles, etc.

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 3 / 46

slide-5
SLIDE 5

Background The Character of Spam

The Spam Arms Race

We propose a different approach: No panacea; existing solutions all have weaknesses Our solution, “SpamFlow,” is distinct from current practice Question: Are traffic characteristics a fundamental weakness of spam?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 4 / 46

slide-6
SLIDE 6

Background The Character of Spam

Hypothetical Question

Specifically: What is the transport (TCP/IP packet stream) character of spam? Are there differences between spam and ham flows? How to exploit differences in a way which spammers cannot easily evade? Why ask this question?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 5 / 46

slide-7
SLIDE 7

Background The Character of Spam

Hypothetical Question

Specifically: What is the transport (TCP/IP packet stream) character of spam? Are there differences between spam and ham flows? How to exploit differences in a way which spammers cannot easily evade? Why ask this question?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 5 / 46

slide-8
SLIDE 8

Background The Character of Spam

Transport-Level Characteristics of Spam

Two Observations

1

Low Penetration:

due to existing filters, user ambivalence → huge volumes of spam

2

Sending Methods:

Open mail relays, email trojans, botnets, dialup → Low asymmetric bandwidth, widely distributed

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 6 / 46

slide-9
SLIDE 9

Background The Character of Spam

Transport-Level Characteristics of Spam

Combining Observations: Low Penetration + Sending Methods Volume + Methods + Economics → link/host resource contention

MX

BOT

MX MX MX MX MX MX

aDSL

Congestion/Loss/Reordering

Contention: Contention manifests as TCP/IP loss, retransmission, reordering, etc.

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 7 / 46

slide-10
SLIDE 10

Background The Character of Spam

Understanding SpamFlow

IP TCP SMTP data

} } }

SpamFlow Analysis Filtering Content Reputation

Not looking at IP header Not looking at data SpamFlow: TCP stream, incl timing (look at combining methods later)

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 8 / 46

slide-11
SLIDE 11

Background TCP and SMTP Transport

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 9 / 46

slide-12
SLIDE 12

Background TCP and SMTP Transport

A Brief Diversion on TCP/IP

Transmission Control Protocol (TCP): Reliable, bi-directional, in-order byte transmission abstraction

Acknowledgments State Machine

Flow and congestion control

Reacts to loss, persistent congestion

Multi-flow fairness and efficient resource utilization (AIMD)

Round trip time (RTT) estimation Bandwidth probing

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 10 / 46

slide-13
SLIDE 13

Background TCP and SMTP Transport

SMTP and TCP

Transmission Control Protocol:

mx.bob.com mx.alice.com

EHLO mx.alice.com MAIL FROM: alice@alice.com DATA: 200 Hellow Alice 200 OK

Simple Mail Transport Protocol (SMTP) uses TCP for transport Sequence of SMTP handshaking between Mail Transport Agents (MTAs) Mail contents are packetized How do Spam Connections Behave?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 11 / 46

slide-14
SLIDE 14

Background Building intuition

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 12 / 46

slide-15
SLIDE 15

Background Building intuition

How do Spam Connections Behave?

...or, a quick look at netstat

RcvQ SndQ Local Foreign Addr State srv:25 92.47.129.89:49014 SYN_RECV srv:25 ppp83-237-106-114.:29081 SYN_RECV srv:25 88.200.227.123:25068 SYN_RECV srv:25 92.47.129.89:49014 SYN_RECV srv:25 ppp83-237-106-114.:29084 SYN_RECV srv:25 88.200.227.123:25068 SYN_RECV srv:25 88.200.227.123:25069 SYN_RECV srv:25 88.200.227.123:25070 SYN_RECV srv:25 88.200.227.123:25074 SYN_RECV srv:25 84.255.150.15:4232 SYN_RECV 25 srv:25 222.123.147.41:50282 LAST_ACK 28 srv:25 adsl-pool-222.123.:1720 LAST_ACK 31 srv:25 222.123.147.41:50152 LAST_ACK 15 srv:25 222.123.147.41:50889 LAST_ACK 9 srv:25 88.245.3.19:venus LAST_ACK 25 srv:25 78.184.155.70:1854 FIN_WAIT1 23 srv:25 190-48-30-225.spe:50920 FIN_WAIT1 23 srv:25 dsl.dynamic812132:48154 FIN_WAIT1 23 srv:25 ip-85-160-91-16.e:48093 FIN_WAIT1 23 srv:25 88.234.141.158:48389 FIN_WAIT1 23 srv:25 p5B0FBB5D.dip.t-d:11965 FIN_WAIT1 ...

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 13 / 46

slide-16
SLIDE 16

Background Building intuition

How do Spam Connections Behave?

...or, a quick look at netstat

RcvQ SndQ Local Foreign Addr State srv:25 92.47.129.89:49014 SYN_RECV srv:25 ppp83-237-106-114.:29081 SYN_RECV srv:25 88.200.227.123:25068 SYN_RECV srv:25 92.47.129.89:49014 SYN_RECV srv:25 ppp83-237-106-114.:29084 SYN_RECV srv:25 88.200.227.123:25068 SYN_RECV srv:25 88.200.227.123:25069 SYN_RECV srv:25 88.200.227.123:25070 SYN_RECV srv:25 88.200.227.123:25074 SYN_RECV srv:25 84.255.150.15:4232 SYN_RECV 25 srv:25 222.123.147.41:50282 LAST_ACK 28 srv:25 adsl-pool-222.123.:1720 LAST_ACK 31 srv:25 222.123.147.41:50152 LAST_ACK 15 srv:25 222.123.147.41:50889 LAST_ACK 9 srv:25 88.245.3.19:venus LAST_ACK 25 srv:25 78.184.155.70:1854 FIN_WAIT1 23 srv:25 190-48-30-225.spe:50920 FIN_WAIT1 23 srv:25 dsl.dynamic812132:48154 FIN_WAIT1 23 srv:25 ip-85-160-91-16.e:48093 FIN_WAIT1 23 srv:25 88.234.141.158:48389 FIN_WAIT1 23 srv:25 p5B0FBB5D.dip.t-d:11965 FIN_WAIT1 ...

TCP Stuck in States Stays in these states for minutes Half-open connections Remote MTAs that “disappear” mid-connection Remote MTAs that send FIN and disappear

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 13 / 46

slide-17
SLIDE 17

Background Building intuition

What about RTT?

...building more intuition

Received: from vms044pub.verizon.net From: "Dr. Beverly, MD" <b@ex.com> Subject: thoughts Dear Robert, I hope you have had a great week! Received: from unknown (59.9.86.75) From: Erich Shoemaker <ried@ex.com> Subject: Repl1ca for you A T4g Heuer w4tch is a luxury statement

  • n its own.

In Prest1ge Repl1cas, any T4g Heuer...

54 52 50 49.3000 49.2000 49.1000 09:51:49 48.9000

rtt (ms) time Ham Flow (rtt samples)

. . . . . .

rtt

.

1400 1200 1000 800 600 23:16:40 23:16:20 23:16:00 23:15:40 23:15:20 23:15:00

rtt (ms) time Spam Flow (rtt samples)

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

rtt

.

Ham Spam

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 14 / 46

slide-18
SLIDE 18

Experimental Methodology

Data Collection

Instrument a Mail Transport Agent (MTA) server Collect SMTP packet trace Match labeled emails to packet flows

TCP/IP MTA

Match Spam/Ham?

Server

Mail Mail Mail Labels Flows SMTP Packet Capture Dataset (X,Y)

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 15 / 46

slide-19
SLIDE 19

Experimental Methodology Using a flow property

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 16 / 46

slide-20
SLIDE 20

Experimental Methodology Using a flow property

Round Trip Time

0.2 0.4 0.6 0.8 1 0.0001 0.001 0.01 0.1 1 10 Cumulative Probability RTT (sec) Spam Ham

P(ham rtt<100ms) ∼ 1; P(spam rtt<100ms) ∼ 0.2!

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 17 / 46

slide-21
SLIDE 21

Experimental Methodology Using a flow property

Round Trip Time

cont’d

Bayes’ Rule Use causal information to form diagnosis P(spam|rtt > x) = P(rtt > x|spam)P(spam) P(rtt > x) (1)

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 18 / 46

slide-22
SLIDE 22

Experimental Methodology Using a flow property

Round Trip Time

cont’d

0.2 0.4 0.6 0.8 1 0.001 0.01 0.1 1 10 P() RTT P(spam|rtt<x) P(ham|rtt<x)

Neutral between [20 − 100ms]; Highly biased

  • therwise
  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 19 / 46

slide-23
SLIDE 23

Experimental Methodology Using a flow property

Selecting Features

Wait! You’re disenfranchising distant servers! Yes; may be a good thing ≃ 5% > 1s More importantly... Other Transport “Features:” Packets, Retransmits, OutOfOrder, RSTs, FINs Zero Window, Minimum Cong. Window, Max Idle, Jitter, etc. Adaptable per-user, per-network Key Insight Statistical flow properties can provide differentiation

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 20 / 46

slide-24
SLIDE 24

Experimental Methodology Using a flow property

Selecting Features

Wait! You’re disenfranchising distant servers! Yes; may be a good thing ≃ 5% > 1s More importantly... Other Transport “Features:” Packets, Retransmits, OutOfOrder, RSTs, FINs Zero Window, Minimum Cong. Window, Max Idle, Jitter, etc. Adaptable per-user, per-network Key Insight Statistical flow properties can provide differentiation

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 20 / 46

slide-25
SLIDE 25

Experimental Methodology Non-Features

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 21 / 46

slide-26
SLIDE 26

Experimental Methodology Non-Features

Non-Features

Non-Features Many intuitively “good” features turn out not to be Strength of statistical approach One Example in Detail: RSTs as abortive close on socket A good indication of misbehaving flows?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 22 / 46

slide-27
SLIDE 27

Experimental Methodology Non-Features

Non-Features

Example: Received RSTs

0.2 0.4 0.6 0.8 1 1 2 3 Cumulative Probability Received RSTs (pkt count) Spam Ham

Only ∼ 50% of ham flows sent no RSTs! ∼ 30% of ham flows send two RSTs! (see tech report for why)

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 23 / 46

slide-28
SLIDE 28

Experimental Methodology Feature Selection

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 24 / 46

slide-29
SLIDE 29

Experimental Methodology Feature Selection

Picking Features

So, which features provide discrimination? Feature selection Simple method is forward fitting Greedily choose one available feature to minimize training error

feature select θ

1θ2

PREDICT X

1 −1 1 1 1 1 0 0 1 1

... ...

−1

... ... y

?

TEST TRAIN x

3 1 2

x x y xf

1 ? ? ?

θ

i

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 25 / 46

slide-30
SLIDE 30

Experimental Methodology Feature Selection

Picking Features

cont’d

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 PDF Selection Order Feature CwndMin RecvRxmit RTT

80% chance that RTT or CwndMin is best single feature

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 26 / 46

slide-31
SLIDE 31

Experimental Methodology Feature Selection

Features

cont’d

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 PDF Selection Order Feature CwndMin RecvRxmit RTT RecvFIN SentFIN SentRxmit SentRST Cwnd0

Less discriminatory secondary features

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 27 / 46

slide-32
SLIDE 32

Experimental Methodology Feature Selection

Features

cont’d

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 1 2 3 4 5 6 7 8 9 10 11 12 13 PDF Selection Order Feature CwndMin RecvRxmit RTT RecvFIN SentFIN SentRxmit SentRST Cwnd0 RecvPkt SentPkt RecvRST MaxIdle

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 28 / 46

slide-33
SLIDE 33

Learning and Prediction SpamFlow

Outline

1

Background

2

Experimental Methodology

3

Learning and Prediction

4

Open Questions

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 29 / 46

slide-34
SLIDE 34

Learning and Prediction SpamFlow

SpamFlow

Based on observations, build a model Supervised learning, binary classification E.g. Bayes Nets, Support Vector Machines, etc. SpamFlow A working implementation of the ideas using SVMs Evaluation FP = ham marked as spam FN = spam marked as ham accuracy = TP+TN

P+N

precision =

TP TP+FP

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 30 / 46

slide-35
SLIDE 35

Learning and Prediction SpamFlow

Prediction Performance

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10 20 30 40 50 60 70 Test Classification Percentage Training Size (Emails) Accuracy Precision Recall

Multiple independent experiments Over ∼ 90% accuracy, precision and recall Tight bounds

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 31 / 46

slide-36
SLIDE 36

Learning and Prediction SpamFlow

SpamAssassin False Negatives

False Negatives Against our data set, SpamAssassin gives 127 false negatives SpamFlow detects 78% of those → useful to combine methods! For example...

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 32 / 46

slide-37
SLIDE 37

Learning and Prediction SpamFlow

SpamAssassin False Negatives

Received: (qmail 12851 invoked from network); 24 Jan 2008 05:14:58 -0000 Received: from 201-213-46-215.net.prima.net.ar (201.213.46.215:8963) by ralph.rbeverly.net with SMTP; 24 Jan 2008 05:14:58 -0000 Received: from unknown (HELO deviant) (192.168.0.5) by mail6.colossal.com with SMTP; Thu, 24 Jan 2008 00:14:58 -0500 Date: Thu, 24 Jan 2008 00:14:58 -0500 To: rbeverly@grdata.com, rcmsjm@grdata.com, reb3@grdata.com, roots.nojunk@grdata.com, russell_sh From: "Jordan Abrams" <inclusionVito@familyhistree.com> Subject: Canadian Pharmcy Online! - 70-80% OFF! Content-Length: 76 Lines: 6 Re" Your Pharmacy order # 85493899 Pls Go ’ www.protectfair ’ dot com

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 34 / 46

slide-38
SLIDE 38

Learning and Prediction SpamFlow

SpamAssassin False Negatives

Received: (qmail 12851 invoked from network); 24 Jan 2008 05:14:58 -0000 Received: from 201-213-46-215.net.prima.net.ar (201.213.46.215:8963) by ralph.rbeverly.net with SMTP; 24 Jan 2008 05:14:58 -0000 Received: from unknown (HELO deviant) (192.168.0.5) by mail6.colossal.com with SMTP; Thu, 24 Jan 2008 00:14:58 -0500 Date: Thu, 24 Jan 2008 00:14:58 -0500 To: rbeverly@grdata.com, rcmsjm@grdata.com, reb3@grdata.com, roots.nojunk@grdata.com, russell_sh From: "Jordan Abrams" <inclusionVito@familyhistree.com> Subject: Canadian Pharmcy Online! - 70-80% OFF! Content-Length: 76 Lines: 6 Re" Your Pharmacy order # 85493899 Pls Go ’ www.protectfair ’ dot com

SpamAssassin:

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_50, FS_OBFU_PRMCY, SORTED_RECIPS, UNPARSEABLE_RELAY autolearn=no version=3.2.3

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 34 / 46

slide-39
SLIDE 39

Learning and Prediction SpamFlow

SpamAssassin False Negatives

Received: (qmail 12851 invoked from network); 24 Jan 2008 05:14:58 -0000 Received: from 201-213-46-215.net.prima.net.ar (201.213.46.215:8963) by ralph.rbeverly.net with SMTP; 24 Jan 2008 05:14:58 -0000 Received: from unknown (HELO deviant) (192.168.0.5) by mail6.colossal.com with SMTP; Thu, 24 Jan 2008 00:14:58 -0500 Date: Thu, 24 Jan 2008 00:14:58 -0500 To: rbeverly@grdata.com, rcmsjm@grdata.com, reb3@grdata.com, roots.nojunk@grdata.com, russell_sh From: "Jordan Abrams" <inclusionVito@familyhistree.com> Subject: Canadian Pharmcy Online! - 70-80% OFF! Content-Length: 76 Lines: 6 Re" Your Pharmacy order # 85493899 Pls Go ’ www.protectfair ’ dot com

SpamFlow:

SntPkt: 45 RcvPkt: 29 SntRxmit: 0 RcvRxmit: 1 SntRST: 0 RcvRST: 0 SntFIN: 1 RcvFIN: 1 Cwnd0: 0 MinCwnd: 65280 MaxIdle: 1.366636 RTT: 0.162413

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 36 / 46

slide-40
SLIDE 40

Open Questions

Open Questions

Spam is an Arms Race: How would spammers react? Adapt by slowing down, sending less mail Could spammers tweak TCP stacks and circumvent? Future Work: Gather additional data sets Package, distribute Explore method’s potential in other domains

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 37 / 46

slide-41
SLIDE 41

Summary

Summary

Attacking spam at a different layer Correct predictions with over 90% accuracy, precision and recall without content or reputation analysis SpamFlow finds 78% of SpamAssassin false-negatives No implementation hurdle, easily combined with existing techniques Thanks! Questions?

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 38 / 46

slide-42
SLIDE 42

Summary

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 39 / 46

slide-43
SLIDE 43

Backup Slides

SpamFlow FAQ

1

Can SpamFlow be more conservative in using RTT: Yes, even a highly conservative filter can still leverage RTT to eliminate extremely large RTT spam flows.

2

Doesn’t SpamFlow privilege well-connected senders? Personal, home or small business servers do not have the same volume requirement as spammers and thus are unlikely to induce the same TCP congestion effects we observe. SpamFlow only discriminates against sources that are both poorly connected and injecting large volumes of mail.

3

What about email lists? In contrast to spam, which must be sent continually, email list traffic can be scheduled in order to not cause local congestion.

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 40 / 46

slide-44
SLIDE 44

Backup Slides

Support Vector Machines

Dual-Form, Constrained Optimization:

n

  • t=1

αt − 1 2

n

  • i=1

n

  • j=1

αiαjK(φ(xi), φ(xj)) s.t. C ≥ αt ≥ 0,

n

  • t=1

αtyt = 0 (2) Separate training set into two classes in most general way Main insight: find hyper-plane separator that maximizes the minimum margin between convex hulls of classes Second insight: if data is not linearly separable, take to higher dimension Result: generalizes well, fast, accommodate unknown data structure

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 41 / 46

slide-45
SLIDE 45

Backup Slides

What’s going on here?

Example: Received RSTs

Google sends SMTP QUIT, then active close, then RSTs passive close

11:55:57.807504 googl > srv: P 187089:187095(6) ack 143 win 5720 11:55:57.807510 googl > srv: F 187095:187095(0) ack 143 win 5720 11:55:57.807628 srv > googl: . ack 187096 win 32614 11:55:57.807863 srv > googl: P 143:167(24) ack 187096 win 32614 11:55:57.808181 srv > googl: F 167:167(0) ack 187096 win 32614 11:55:57.834759 googl > srv: R 46149836:46149836(0) win 0

Yahoo! sends SMTP QUIT, srv performs active close. Yahoo! then sends three RSTs when srv goes to TIME_WAIT

11:20:35.023406 srv > yahoo: P 113:137(24) ack 1426 win 32120 11:20:35.023782 srv > yahoo: F 137:137(0) ack 1426 win 32120 11:20:35.023983 yahoo > srv: F 1426:1426(0) ack 113 win 33304 11:20:35.024073 srv > yahoo: . ack 1427 win 32120 11:20:35.076591 yahoo > srv: R 776208340:776208340(0) win 0 11:20:35.076969 yahoo > srv: R 776208340:776208340(0) win 0 11:20:35.077381 yahoo > srv: R 776208341:776208341(0) win 0

Abortive close in Postfix source; normal behavior

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 42 / 46

slide-46
SLIDE 46

Backup Slides

What’s going on here?

Example: Received RSTs

Is abortive close a common “normal” SMTP technique? Postfix Source

static void start_connect(SESSION *session) { int fd; struct linger linger; linger.l_onoff = 1; linger.l_linger = 0; if (setsockopt(fd, SOL_SOCKET, SO_LINGER, (char *) &linger, sizeof(linger)) < 0) ...

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 43 / 46

slide-47
SLIDE 47

Backup Slides

ROC Curve

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 True Positive Rate False Positive Rate Training Size 5 10 20 40 60

Larger training sizes perform better In practice: SpamFlow as a weighted voter

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 44 / 46

slide-48
SLIDE 48

Backup Slides

Features

cont’d

0.2 0.4 0.6 0.8 1 1 2 3 4 5 6 7 8 9 10 11 12 13 CDF Selection Order Feature RecvPkt SentPkt RecvRxmit SentRxmit RecvRST SentRST RecvFIN SentFIN Cwnd0 CwndMin MaxIdle RTT JitterVar

80% chance that RTT or CwndMin is best first feature Others (e.g. RecvRST) much less discriminatory

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 45 / 46

slide-49
SLIDE 49

Backup Slides

Data Collection

Dataset: One week, January 2008 ∼ 18k emails, only ∼ 200 legitimate ham Normalize spam and ham count for each experiment, randomly select spams Dataset is small; future work corrects this This talk: method, intuition, validation

  • R. Beverly, K. Sollins (MIT)

Transport Character of Spam CEAS 2008 46 / 46