Hardware accelerating Linux network functions Roopa Prabhu, Wilson - - PowerPoint PPT Presentation

hardware accelerating linux network functions
SMART_READER_LITE
LIVE PREVIEW

Hardware accelerating Linux network functions Roopa Prabhu, Wilson - - PowerPoint PPT Presentation

Hardware accelerating Linux network functions Roopa Prabhu, Wilson Kok Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada Agenda Recap: offload models, offload drivers Introduction to switch asic hardware L2 offload


slide-1
SLIDE 1

Hardware accelerating Linux network functions

Roopa Prabhu, Wilson Kok

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-2
SLIDE 2

Agenda

  • Recap: offload models, offload drivers
  • Introduction to switch asic hardware
  • L2 offload to switch ASIC

○ Mac Learning, ageing ○ stp handling ○ igmp snooping ○ vxlan

  • L3 offload to switch ASIC

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-3
SLIDE 3

Offload models ...

NIC1

kernel

port1 bridge port2 rtnetlink api: bridge vlan add bridge fdb add NIC1 port3 port2 port1

NIC2

port2 port1

bridge

switch asic

CPU MEM FDB

port4

kernel

bridge port2 portn port1 port1 port2 portn port1

  • Single consistent netlink based

UAPI

  • Single kernel offload API to
  • ffload to variety of hardware

(nics, switch asics, ..)

FDB (in sync with hw) FDB FDB Rtnetlink API PATH Offload API path

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-4
SLIDE 4

user kernel kernel

iproute2 quagga mstpd bridge brctl tc nftables Routing Tables ARP Tables Bridge FDB/MDB Netfilter Tables Bonds Bridges VXLAN

HW swp1 swpN

The bigger picture...

hw driver

CPU bird MEM OVSdb snmpd lldpd tc Routing Tables ARP Tables Bridge FDB/MDB acls

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-5
SLIDE 5

user kernel kernel

Bridge br0 FDB/MDB

HW swpN

HW offload driver (kernel)

netdev_ops { .ndo_fdb_add/del

.ndo_fib_add/del

}

hw driver

CPU ASIC MEM

br0

swp1

switch ports swp2

FIB

routing daemon mstp

RTnetlink API HW

CPU MEM Routing Tables ARP Tables Bridge FDB/MDB acls

switchdev

  • ffload API

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-6
SLIDE 6

user kernel

kernel

Bridge br0 FDB/MDB

HW

swpN

HW offload driver (user space)

hw driver

CPU ASIC MEM

br0

swp1

rtnetlink listener

swp2

FIB

routing daemon mstp

HW

CPU MEM Routing Tables ARP Tables Bridge FDB/MDB acls

switch ports

RtNetlink notifications

rtnetlink API HW

CPU ASIC MEM

HW

CPU MEM Routing Tables ARP Tables Bridge FDB/MDB acls

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-7
SLIDE 7

kernel

switch hardware

switch hardware

netdevs for each front panel ports cpu port front panel ports switch driver

swp1 swp2 swp3 swpn 1 2 3 n

switch driver:

  • Creates netdevs for front

panel ports

  • Port netdevs only see traffic

forwarded to the CPU port

  • Sets hardware offload flag

NETIF_F_HW_SWITCH_OFFLOAD

  • n netdevs

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-8
SLIDE 8

ip link show switch ports

# ip link show 1: lo: <LOOPBACK> mtu 16436 qdisc noqueue state DOWN mode DEFAULT link/loopback 00:00:00:00:00:00 brd 00:00:00:00: 00:00 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT qlen 1000 link/ether 00:e0:ec:27:4e:b6 brd ff:ff:ff:ff:ff:ff 3: swp1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT qlen 500 link/ether 44:38:39:00:27:ac brd ff:ff:ff:ff:ff:ff 4: swp2: <BROADCAST,MULTICAST> mtu 9000 qdisc pfifo_fast state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:b8 brd ff:ff:ff:ff:ff:ff [snip] 55: swp53: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:f7 brd ff:ff:ff:ff:ff:ff 56: swp54s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:fb brd ff:ff:ff:ff:ff:ff 57: swp54s1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:fc brd ff:ff:ff:ff:ff:ff 58: swp54s2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:fd brd ff:ff:ff:ff:ff:ff 59: swp54s3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT qlen 500 link/ether 00:e0:ec:27:4e:fe brd ff:ff:ff:ff:ff:ff management port switch ports

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-9
SLIDE 9

ethtool on switch port

$ethtool swp1 Settings for swp1: Supported ports: [ FIBRE ] Supported link modes: 1000baseT/Full 10000baseT/Full Supported pause frame use: Symmetric Receive-only Supports auto-negotiation: Yes Advertised link modes: 1000baseT/Full Advertised pause frame use: No Advertised auto-negotiation: No Speed: 10000Mb/s Duplex: Full Port: FIBRE PHYAD: 0 Transceiver: external Auto-negotiation: off Current message level: 0x00000000 (0) Link detected: yes

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-10
SLIDE 10

Creating a hardware accelerated Linux bridge device

# ip link add br0 type bridge # ip link set dev swp1 master br0 # ip link set dev swp2 master br0 # bridge vlan add vid 10-20 dev swp1 # bridge vlan add vid 20-30 dev swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-11
SLIDE 11

Bonds as bridge ports

NIC1

bridge

switch asic

CPU MEM FDB

kernel

bridge port2 portn port1

portn-1

bond0 port1 port2

portn-1

portn

FDB (in sync with hw)

rtnetlink api: bridge vlan add bridge fdb add

LAG bond0 (portn-1, portn

switchdev

  • ffload API

rtnetlink API bonding driver

  • switch ASICS support

Link aggregation

  • bonding driver LAG

config is offloaded to the switch ASIC

  • fdb and vlan offloads go

through the bonding driver

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-12
SLIDE 12

switch asic VLAN

Bridging hardware offload: packet path

kernel swp1 bridge swp2 swp1 swp2

known unicast (transit) BUM* system generated/ destined to system

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-13
SLIDE 13

Bridging hardware offload: packet path

  • Known unicast traffic not destined to system is

forwarded only in hardware

  • BUM traffic is forwarded in hardware plus a copy MAY

be sent to kernel

  • BUM traffic in kernel should not be forwarded again

(duplicate copies from hardware and software)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-14
SLIDE 14

Bridging hardware offload: fdb learn

user kernel kernel

Bridge br0 FDB/MDB

HW swp1 swpN

switch driver

CPU ASIC MEM

hw events: learn/move br0

fdb add/update

swp2

rtnetlink

notification

00:11:22:33:44:55 vlan 10 intf_id 9876 00:11:22:33:44:55 br0 swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-15
SLIDE 15

Bridging hardware offload: learning in HW

  • Turn off learning in bridge driver
  • switch driver listens to learn notifications from hardware
  • converts hardware interface id and vlan to kernel ifindex of bridge

port (and vlan) and bridge

  • sends netlink fdb update to kernel (userspace driver) or calls bridge

driver learn sync switchdev API (kernel driver)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-16
SLIDE 16

Bridging hardware offload: kernel ageing

user kernel kernel

Bridge br0 FDB/MDB

HW swp1 swpN

switch driver

CPU ASIC MEM

br0

fdb update

swp2

rtnetlink get fdb hit status

fdb delete

fdb delete

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-17
SLIDE 17

Bridging hardware offload: hardware ageing

user kernel kernel

Bridge br0 FDB/MDB

HW swp1 swpN

switch driver

CPU ASIC MEM

br0

fdb delete

swp2

rtnetlink fdb delete

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-18
SLIDE 18

Bridging hardware offload: ageing

Bridge driver very seldom sees packets with hardware offload. FDB age is not up to date. Hardware ageing

  • bridge driver should not do ageing if hardware is doing it
  • fdb show will need to get age from hardware during ‘show’, or need

periodic age update from switch driver Kernel ageing

  • definitely need periodic age update from switch driver

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-19
SLIDE 19

STP offload

STP

  • bridge driver maintains STP states (either kernel STP or

userspace STP)

  • bridge driver communicates STP states to switch driver

using switchdev offload API

  • OR a switch driver in userspace can listen to STP state

notifications to update HW state

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-20
SLIDE 20

switch asic

IGMP snooping offload

kernel swp1 bridge swp2 swp1 swp2

report query data Query Join 224.1.2.3 224.1.2.3 dev bridge port swp1 grp 224.1.2.3 temp router ports on bridge: swp2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-21
SLIDE 21

IGMP snooping offload

  • switch driver configures hardware to send IGMP reports

and queries to software

  • bridge driver maintains IGMP group membership
  • in some cases the reports or queries need to be re-

forwarded in the kernel

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-22
SLIDE 22

VXLAN offload - hardware vtep

swp1 bridge swp2 swp3

MAC Interface macA swp1 macB swp2 macC vxlan100 MAC Destination macC 172.16.21.150 unknown 172.16.22.125

macA macB

macC

lo: 172.16.20.103 vxlan100

172.16.21.150

20.0.0.3 20.0.0.5

20.0.0.2

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-23
SLIDE 23

VXLAN offload - hardware vtep

Model

  • VXLAN link as bridge port

○ bridging between local ports ○ VXLAN tunneling for remote MACs

  • BUM traffic handling

○ multicast ○ using off-system replicator ■ could have a list of redundant replicators, need to choose ONE out of the list of remote dests (per flow or per vni etc.) ○ self replication ■ vtep sends to a list of remote vteps, need to choose ALL of the list of remote dests

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-24
SLIDE 24

VXLAN offload - ovsdb integration

Agent to translate ovsdb schema objects to kernel constructs.

OVSDB Linux kernel logical switch vxlan link + bridge physical switch tunnel_ip vxlan link local ip logical port binding bridge member port, vlan unicast remote mac + physical locator bridge fdb (mac, vlan, dst <remote ip>) mcast remote mac “unknown” + physical locator list vxlan link default dest unicast local mac + physical locator bridge fdb (mac, vlan, local dev)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-25
SLIDE 25

user kernel kernel

FIB

HW swp1 swpN

l3 offloads

switch driver

CPU ASIC MEM

swp2

ip route add 1.1.1.1/32 nexthop via 192.168.200.3 nexthop via 192.168.200.4

Routing Tables Neigh tables

Quagga/Bird rtnetlink API path iproute Network manager

  • ffload API path

neigh table arping for unresolved nexthop

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada

slide-26
SLIDE 26

l3 hardware offload

  • Routes via routing daemons go to the kernel
  • Unresolved next hops, point to CPU in HW
  • switch driver tries to resolve them by probes

(arping)

  • Refresh neigh entries for pkts routed through

hardware (hit bit provided by hardware)

Proceedings of netdev 0.1, Feb 14-17, 2015, Ottawa, On, Canada