They did not know what hit them Network Security Monitoring at - - PowerPoint PPT Presentation

they did not know what hit them
SMART_READER_LITE
LIVE PREVIEW

They did not know what hit them Network Security Monitoring at - - PowerPoint PPT Presentation

They did not know what hit them Network Security Monitoring at Mozilla I bought you those servers to run NSM on them <- Boss, 2012 New servers <- always cool Mozilla Confidential 2 The big idea NSM = Network Security Monitoring


slide-1
SLIDE 1

They did not know what hit them

Network Security Monitoring at Mozilla

slide-2
SLIDE 2 Mozilla Confidential 2

“I bought you those servers to run NSM

  • n them” <- Boss, 2012

New servers <- always cool

slide-3
SLIDE 3 Mozilla Confidential 3

NSM = Network Security Monitoring Write arbitrary detection logic Store metadata about connections The big idea

slide-4
SLIDE 4 Mozilla Confidential 4

“You want to do IDS in 2012?” “What is this bro/zeek that takes CPU from snort?” Not at our scale

Cannot be done

Everything is encrypted

slide-5
SLIDE 5 Mozilla Confidential 5

<- Back of my laptop G r e a t l e a d e r s i n s p i r e a c t i

  • n
slide-6
SLIDE 6 Mozilla Confidential

Logs

Here is why we like it

Record threat actor’s activity DFIR Past Zeek

Not a silver bullet

6

IOC

Match IOCs in a creative way DFIR and detection Past and present Zeek

TTP

Do the TTP detection Detection Present Zeek + Suricata

slide-7
SLIDE 7 Mozilla Confidential 7

To answer

slide-8
SLIDE 8 Mozilla Confidential 8

The most important question

slide-9
SLIDE 9 Mozilla Confidential 9

Are we owned?

slide-10
SLIDE 10 Mozilla Confidential

Mozilla’s Threat Management response

10

To a new APT report Zeek, Suri, Auditd, Syslog, application

slide-11
SLIDE 11 Mozilla Confidential 11

Learn how to build a nice Zeek sensor Your monitoring is wrong ;) Learn how to improve what you have

slide-12
SLIDE 12 Mozilla Confidential

“...but you promised AF_Packet!!”

12
slide-13
SLIDE 13 Mozilla Confidential 13

AF_Packet

slide-14
SLIDE 14 Mozilla Confidential

10 000 events / second syslog-ng -> MozDef ClearLinux 3 datacenters, 9 offices AWS, GCE (??) Europe, North America, Asia

Mozilla NSM architecture

14
slide-15
SLIDE 15 Mozilla Confidential

Mozilla NSM Sensor (Mark VI ;)

15

CPU - 2x Intel Xeon 2 x 6 x 16GB DIMM <- all memory channels populated 1DPC NUMA0 <- Intel X710-DA2 (i40e) / Mellanox ConnectX-4 Lx (mlx5) NUMA1 <- Intel X710-DA2 (i40e) / Mellanox ConnectX-4 Lx (mlx5)

slide-16
SLIDE 16 Mozilla Confidential

Maybe for bitcoin

Hardware acceleration??

16

Dual Xeons + Intel X710 + 128GB RAM Suricata - 40Gbit/sec No packet loss 40 000 rules inspecting Vlan2Vlan traffic Linux + AF_Packet

https:/ /github.com/pevma/SEPTun https:/ /github.com/pevma/SEPTun-Mark-II Mozilla + Suricata developers research

slide-17
SLIDE 17 Mozilla Confidential 17

Developer looking at production logs after a regression with downtime. Oil canvas, circa 1580 Overheard: looks like Michal

slide-18
SLIDE 18 Mozilla Confidential

Modern OS - Linux 2.4+, Windows NT+, etc

18
slide-19
SLIDE 19 Mozilla Confidential 19
slide-20
SLIDE 20 Mozilla Confidential 20
slide-21
SLIDE 21 Mozilla Confidential 21
slide-22
SLIDE 22 Mozilla Confidential 22
slide-23
SLIDE 23 Mozilla Confidential 23
slide-24
SLIDE 24 Mozilla Confidential 24
slide-25
SLIDE 25 Mozilla Confidential 25
slide-26
SLIDE 26 Mozilla Confidential 26
slide-27
SLIDE 27 Mozilla Confidential 27
slide-28
SLIDE 28 Mozilla Confidential 28
slide-29
SLIDE 29 Mozilla Confidential

Modern cards datacenter in a box

29

X710 integrated managed switch and 384 vNICs And you can access all of this power :)

slide-30
SLIDE 30 Mozilla Confidential 30

It is all about per-packet latency It is NOT about zero copy!!

Netmap papers

Thanks Luigi Rizzo

slide-31
SLIDE 31 Mozilla Confidential

What does eat time per packet?

31

TLB thrashing Cache thrashing Userspace -> kernel transitions

67ns to process a packet 200 cycles

slide-32
SLIDE 32 Mozilla Confidential

Findings

32

Cache access timings, approximate Local L3 - 20ns Local RAM - 100ns Remote L3 - 80ns Remote RAM - 140ns

slide-33
SLIDE 33 Mozilla Confidential

Findings

33

IPC - instructions per clock cycle Before tuning - 0.7 After tuning - 2.7 Theoretical limit - 4.0

slide-34
SLIDE 34 Mozilla Confidential

Card sends packets to the cache <- pre-warms the CPU cache

Intel DDIO

34

Hang-on to it!! Packet arrives to card’s FIFO

slide-35
SLIDE 35 Mozilla Confidential

The Grand Plan - in English

35

Send all packets 10.1.2.3 <-> 8.8.8.8 to core 2 Zeek packets 10.1.2.3 <-> 8.8.8.8 on core 9 Dedicate cores for IRQ/SoftIRQ processing Establish Zeek Worker cores Achieve eternal happiness

slide-36
SLIDE 36 Mozilla Confidential

The Grand Plan - in drawings (sorry ;)

36
slide-37
SLIDE 37 Mozilla Confidential

Symmetric hashing

37

In software - AF_Packet - cluster_flow <- cannot configure In software - AF_Packet - cluster_ebpf <- new hotness In hardware - AF_Packet - cluster_qm Software has fragmentation problems :( Hardware is flexible :)

slide-38
SLIDE 38 Mozilla Confidential

Who’s deciding? ATR? PF? RSS?

38

ATR - if enabled AND no Perfect Filters Perfect Filters - if any RSS - your fallback

slide-39
SLIDE 39 Mozilla Confidential
  • ATR. Disable. It’s out of order ;)
39
slide-40
SLIDE 40 Mozilla Confidential

NTuple AKA Too Perfect Filters

40
slide-41
SLIDE 41 Mozilla Confidential

RSS - what is hashed?

41
slide-42
SLIDE 42 Mozilla Confidential

RSS - how is it hashed?

42

ethtool -K enp7s0f1 ntuple on ; ethtool -K enp7s0f1 rxhash on for i in tcp4 udp4 tcp6 udp6; do ethtool -U enp7s0f1 rx-flow-hash $i sd; done; ethtool -X enp7s0f1 hkey \ 6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6 D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A:6D:5A equal 4

slide-43
SLIDE 43 Mozilla Confidential

RSS - how is the hash used?

43
slide-44
SLIDE 44 Mozilla Confidential

Hashing consistency

44

cluster_flow may have problems with fragments cluster_qm -> RSS RSS cannot handle fragments (nothing can :) Hash 3-tuple Also true for your packet broker!!

slide-45
SLIDE 45 Mozilla Confidential

Findings

45

Smaller amount of faster cores <- good vs High core count <- sometimes bad ;)

slide-46
SLIDE 46 Mozilla Confidential 46
slide-47
SLIDE 47 Mozilla Confidential 47
slide-48
SLIDE 48 Mozilla Confidential

Findings

48

Cache coherence protocol Use Early Snooping

slide-49
SLIDE 49 Mozilla Confidential

Findings

49

Cluster On-Die Sub NUMA Clustering Disable

slide-50
SLIDE 50 Mozilla Confidential

Findings

50

Limit C-states to C1 Leave C-states enabled for Turbo Boost Disable P-states

slide-51
SLIDE 51 Mozilla Confidential

Findings

51

Use HyperThreading for Zeek Workers logical cores

slide-52
SLIDE 52 Mozilla Confidential

Findings

52

Use all memory channels. But there’s more. 2DPC (2Rx8) - 2x 8GB / channel (3DPC reduces frequency pre-Skyline) Keep DIMMs at the same size Use dual ranks (but don’t sweat it + watch for frequency)

slide-53
SLIDE 53 Mozilla Confidential

Findings

53

Lower the number of buffers ethtool <ethX> rx 512

slide-54
SLIDE 54 Mozilla Confidential

Discover the architecture

54

find /sys/devices/system/cpu/cpu0/cpuidle -name latency

  • o -name name | xargs cat

numactl --hardware lscpu ls -ld /sys/devices/system/node/node* cat /sys/devices/system/node/node0/cpulist cat /sys/class/net/eth3/device/numa_node egrep “CPU0|eth3” /proc/interrupts

slide-55
SLIDE 55 Mozilla Confidential

lstopo --of svg -p --no-factorize > /tmp/o1.svg

55
slide-56
SLIDE 56 Mozilla Confidential

Your checklist

56

ethtool -i <int> <- update firmware Keep kernel updated Use upstream driver. Forget sourceforge.

mlxup for Mellanox nvmupdate64e for Intel

slide-57
SLIDE 57 Mozilla Confidential

Configure the kernel

57

intel_iommu=off (or pt) intel_idle.max_cstates=1 (or cpudmalatency.c) pcie_aspm=off isolcpus=4-21,32-48 <- reserve core 0-3 on each NUMA node nohz_full=4-21,32-48 (<- does nothing for Zeek ;) rcu_nocbs=4-21,32-48

slide-58
SLIDE 58 Mozilla Confidential

Set IRQ and SoftIRQ affinity

58
slide-59
SLIDE 59 Mozilla Confidential 59

Configure Zeek

slide-60
SLIDE 60 Mozilla Confidential

When 4 is the new 8 and 8 is the new 16

60

Is your PCIe v3.0 slot x8? Some x8 slots are x4 electrically and x8 mechanically Some x16 slots are x8 electrically and x16 mechanically Is your PCIe slot v3.0?

slide-61
SLIDE 61 Mozilla Confidential

Disable monkey data prefetchers

61
slide-62
SLIDE 62 Mozilla Confidential 62

ethtool -C ethX adaptive-rx off adaptive-tx off rx-usecs 84 tx-usecs 84 start with 84us ~ 12 000 int/sec if rx_dropped - cpu too slow or not enough buffers (ethtool -G) to hold packets for 84us or too low interrupt rate if cpu utilization not maxed - 62usec to service buffers faster and have less descriptors (so less cache trashing)

Interrupt moderation

slide-63
SLIDE 63 Mozilla Confidential

Are my sensors dropping packets?

63

“Something is dropping somewhere”

slide-64
SLIDE 64 Mozilla Confidential

What is my packet drop rate?

64
slide-65
SLIDE 65 Mozilla Confidential

What is my packet drop rate?

65

Pro-tip: ignore dropped, watch if squeezed is growing

slide-66
SLIDE 66 Mozilla Confidential

Wait what?

66

softnet stats “dropped” -> out of per-CPU backlog Ain’t no backlog without RPS RPS?!?! Talk to me later ;)

slide-67
SLIDE 67 Mozilla Confidential

What is my packet drop rate?

67

@load misc/stats stats.log <- only AF_Packet!! pkts_proc bytes_recv pkts_dropped pkts_link

slide-68
SLIDE 68 Mozilla Confidential

When 2x 40 is 50

68

Your X710 / X722 - 2x 40Gbit = 1x 50Gbit And X510 / 520 / 540 can do only 8M - 10M pps

slide-69
SLIDE 69 Mozilla Confidential

Myths

69

Linux network stack is not zero copy and is slow Need to bypass!! Not true from many years

Answer

slide-70
SLIDE 70 Mozilla Confidential

Myths

70

Linux network stack is not multithreaded everywhere (pf_ring) Not true from many years

Answer

slide-71
SLIDE 71 Mozilla Confidential

Myths

71

I need to process 40 / 100Gbit and 60M pps 40Gbit interfaces vs 40Gbit/sec of traffic Not all traffic is equal <- drop early Average packet size (IMIX) - >900 bytes -> much less PPS

Answer

slide-72
SLIDE 72 Mozilla Confidential

Myths

72

Cross-NUMA talk is bad because of bandwidth

  • Nope. Bandwidth is plenty (over 100 Gbit/sec). Latency kills ya.

Hmmm… guess how I know?!?!

slide-73
SLIDE 73 Mozilla Confidential

Mistakes

73

“I will make every buffer BIG” ...and cause tons of cache misses

slide-74
SLIDE 74 Mozilla Confidential

Mistakes

74

“So I have this 4 CPU 384GB RAM with 128 cores” And a cache miss almost 100% of time

slide-75
SLIDE 75 Mozilla Confidential

Fully programmable - L2-L7 40 / 100Gbit (from 500USD) ARM11 48 flow processing cores 60 packet processing cores 480 threads 8GB DDR3 packet buffer

75

eBPF - AF_XDP Netronome + XDP = hardware bypass

slide-76
SLIDE 76 Mozilla Confidential
  • Fast. Reliable. Cheap
76
slide-77
SLIDE 77 Mozilla Confidential
  • Fast. Reliable. Cheap
77

Choose two?

slide-78
SLIDE 78 Mozilla Confidential

Have $$$?

78

Buy appliance

slide-79
SLIDE 79 Mozilla Confidential

Have time? Need flexibility?

79

Build one

slide-80
SLIDE 80 Mozilla Confidential

You can build a flexible & high-performance sensor

80

With commodity hardware

slide-81
SLIDE 81 Mozilla Confidential

You can build a flexible & high-performance sensor

81

With commodity hardware *with some learning @michalpurzynski https:/ /github.com/mozilla/zept

slide-82
SLIDE 82

Thank You

Mozilla Confidential