... ISP Network prefix of arbitrary length Internet mesh ...... - - PDF document

isp network prefix of arbitrary length internet mesh 209
SMART_READER_LITE
LIVE PREVIEW

... ISP Network prefix of arbitrary length Internet mesh ...... - - PDF document

28 May, The Lecture Outline Inter Domain Routing Motivation BGP Overview/Big Picture Border Gateway Protocol 4 The BGP4 Protocol The BGP4


slide-1
SLIDE 1

28 May, Inter Domain Routing Border Gateway Protocol 4

Presented by Tomas Winkler and Innocenty Sukhov

The Lecture Outline

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

The Internet Scaling Problem

  • Exponential growth of the Internet
  • Exhaustion of the class B network address space
  • Growth of routing tables beyond the ability of

current hardware/software to manage

  • Exhaustion of the 32-bit IP address space

Exhaustion of Class B Space

  • The three bears and Goldylocks problem

Most organizations are larger then class C (254) but significantly smaller than class B (64K) (typically about 1000...5000 hosts)

Class B fits, with waste of 60K IP addresses

Would be nice to give them few consecutive C class networks

But this exacerbates the routing table explosion problem

Network (14 bits) Host (16 bits) 10

Routing Table Explosion

ISP

Despite all the client networks have common prefix 209.185 they are Class C addresses and the provider has to announce each network individually

209.185.8.0 209.185.9.0 ...... 209.185.15.0 Client Networks The Global Internet mesh 209.185.8.0 209.185.9.0 209.185.15.0

...

Classless InterDomain Routing

  • A tool to deal with exhaustion of B class address

space

  • Network prefix of arbitrary length
  • Provides a basis for slowing down the growth of

routing tables

aggregation

  • Eases temporarily the problem of IP address

space exhaustion while progress is made on a long term solution

slide-2
SLIDE 2

28 May, Aggregation with CIDR

ISP 209.185.8/21 209.185.8.0 209.185.15.0

...

The Global Internet mesh 209.185.9.0 Client Networks

BGP4: CIDR Oriented EGP

  • CIDR is only a tool to enforce scalability
  • Global EGP must take advantage of CIDR to

achieve scalability

  • BGP4 is an extension of BGP3 that provides

support for routing info aggregation based on CIDR

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

The Internet as collection of AS’s

  • Autonomous System (AS)
  • The Internet as a Collection of AS’

s

IGPs within AS

EGP between AS’s

  • Need to Deploy Policy Routing

money

security

laws

BGP4: The Big Picture

AS 1 IGP (OSPF) EGP AS 2 Area 0

History of EGP Protocols

  • EGP2

unjustified topology restriction

  • loop-free tree

lack of loop preventing mechanism

unreliable transport

periodical updates

  • high bandwidth / CPU requirements

classful routing

  • non scalable
slide-3
SLIDE 3

28 May, History of EGP Protocols II

  • BGP1 - BGP2 - BGP3 - BGP4

reliable transport (TCP)

  • no periodical updates

arbitrary topologies

explicit loop detection mechanism

Most important BGP4 features:

  • CIDR support
  • built-in aggregation

Why a new Protocol ?

  • Policy making support
  • Special concern in scalability

BGP Key Features

  • Inter-AS routing protocol for IPv4 internetworks
  • No assumptions about underlying IGPs
  • No constraints on underlying topology
  • Info exchanged by BGP peers suffices

to detect routing loops

to enforce routing policy decisions

  • Key BGP4 Features:

path attributes

aggregation

Path Attributes

  • Provide flexibility and expandability
  • Well-known vs Optional
  • Path of ASs towards a destination

loop suppression

policy routing support

aggregation support

  • Next hops
  • Various metrics specifying degrees of preference for the

route

Basic Routing Algorithms

  • Distance Vector (DV)

A router advertises whole routing table to its neighbors

Bad scaling (slow convergence)

Counting to infinity problem (loops)

Same metric has to be used in each router

how to apply a policy?

  • Link State (LS)

Link State Packets are flooded in the network

Packets are smaller then a routing table

Each router holds a map of entire network (LSDB)

No loops, but LSDB can be huge

BGP Algorithm

  • Path Vector Algorithm
  • Carrying a complete AS path makes it similar to LS
  • Exchanging only currently used routes between the

peers makes it similar to DV

  • Initial exchange of complete RT
  • Incremental updates

Route announcements

Route withdrawals

  • Route aggregation
slide-4
SLIDE 4

28 May,

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

The BGP Protocol

  • Bringing up a BGP session
  • Message Types
  • Standard Path Attributes
  • Internal BGP vs. External BGP
  • Path Selection

BGP Session

  • BGP peers
  • Establishing a TCP connection
  • Bringing up a BGP session
  • Initial exchange of complete RT
  • Incremental updates
  • Keep alive
  • BGP session break down

BGP4 Message Types

  • Common Header
  • Message Types

OPEN

UPDATE

NOTIFICATION

KEEPALIVE

Common Header

  • Precedes every BGP message

Marker (16 octets)

synchronization and security

depends on the message being sent and the type of security used (if any) Marker Length Type

✁ may be predicted/verified by the other side ✄ Message length in bytes (2 octets) ✄ Message type (1 octet) Additional octets following the common header are interpreted

according to the Message Type field

OPEN

  • My BGP version
  • My ASN: unique AS identifier
  • Hold Time: maximum length of time that one endpoint will

wait to hear something from the other endpoint.

  • BGP speaker Identifier (4 octets) - unique router ID

(usually IP address of the router’s virtual interface)

  • Optional Parameters

Authentication mechanism

Multiprotocol Capabilities

slide-5
SLIDE 5

28 May, UPDATE

  • Primary message to exchange info between two BGP

speakers

Prefix advertising and withdrawal

  • Withdrawn Routes: list of IP prefixes for which the

sender no longer wishes to forward packets

  • Path Attributes: list of BGP attributes associated with

the prefixes in the NLRI field

  • Network Layer Routing Information (NLRI) list of

prefixes for which the sender wishes to forward packets

Each attribute in the Path Attributes applies to every prefix.

Path Attributes

  • Most Important Feature

flexibility and expandability

  • Well-Known vs Optional

well-known attributes must be recognized by all BGP implementations

well-known: mandatory vs discretionary

  • Transitive: whether the attribute is passed to the other

peers

  • Partial: whether all BGP speakers on the way

understood an optional transitive attribute

  • Type + Length + Value (Types are assigned by IANA)

NOTIFICATION & KEEPALIVE

  • NOTIFICATION

BGP speaker sends NOTIFICATION message to signal an error before the TCP connection is closed

Error Code field identifies the type of error

  • KEEPALIVE

BGP neighbors send a KEEPALIVE message to each other to confirm that the connection is still

  • active. Some data (UPDATE or KEEPALIVE

message) has to be sent before the hold timer expires.

Conceptual Model of Operation

  • Adj-RIBs-In (one per a peer)

Prefixes learned from a particular peer

  • Loc-RIB (one per a system)

Prefixes selected for use (forwarding)

  • Adj-RIBs-Out (one per a peer)

Prefixes advertised to a particular peer

Route Accepting/Advertising

A B C Decision Process A B C Loc-RIB Route Accepting Choosing best Routes Route Advertising Policy DB

Policy DB Policy DB Policy DB Policy DB Policy DB Policy DB

Adj-RIB-In A Adj-RIB-In B Adj-RIB-In C Adj-RIB-Out A Adj-RIB-Out B Adj-RIB-Out C

Base Standard Path Attributes

  • ORIGIN
  • AS-PATH
  • NEXT-HOP
  • MULTI-EXIT-DISCRIMINATOR
  • LOCAL-PREF
  • ATOMIC-AGGREGATE
  • AGGREGATOR
slide-6
SLIDE 6

28 May, The ORIGIN Attribute

  • How a prefix came to be routed by BGP at

the origin AS

IGP: the prefix is interior to the originating AS.

EGP: the prefix is learned via the Exterior Gateway Protocol (EGP2).

INCOMPLETE: the prefix was learned in some other way (usually, through static configuration).

The AS-PATH Attribute

  • ASs through which the announcement for the prefix has

passed

  • Each AS adds its ASN to the AS-PATH

Routing loop Detecting and Prevention

Support for Policy Making

  • AS-PATH is a sequence of segments

AS-SEQUENCE: ordered list of ASNs

AS-SET: unordered set of ASNs

support for aggregation of prefixes with different AS-PATH

  • 138.39.0/17 AS path = (100 200 15 )
  • 138.39.128/17 AS path = ( 47 200 15 )
  • {100,47} (200,15)

The NEXT-HOP Attribute

  • NEXT-HOP: IP address of the node to send

packets to in order to get the packets closer to the destination.

  • Usually it is address of the BGP speaker that

sends the UPDATE message

  • Third party next hop: points on another

router

For example if the router for the next hop does not support BGP

NEXT-HOP Example

Router A (BGP Speaker) Router C (Not a BGP Speaker) 138.39.0.0/16 LAN (e.g. FDDI) Router B (BGP Speaker)

UPDATE Message through BGP session

Traffic to 138.39.0.0/16

MULTI-EXIT-DISCRIMINATOR

  • If two ASs connect to each other in more than
  • ne place, it is useful to choose the optimal link

to reach a particular prefix in or behind that AS.

AS1 AS2 AS3 AS4 R11 R12 R21 R22 Link A Link B

MED

  • AS2 tells AS1 how close the prefix is to the

endpoint inside AS2.

  • AS1 may then decide which link to AS2 is
  • ptional for which destinations
  • AS2 advertises a prefix over both BGP

sessions

if one of the links goes down, AS1 still has full connectivity to all prefixes in and behind AS2

NEXT-HOP alone would not work

  • Used in Provider/Subscriber model

might cause ’discrimination’

slide-7
SLIDE 7

28 May, MED Example

ISP1 ISP2

ISP1 ignores MED from ISP2 In NY, ISP1 advises ISP2 to use Link B for traffic to Tokyo In Tokyo, ISP1 advises ISP2 to use Link A for traffic to NY As a result, AS2 unfairly has a larger burden for carrying traffic

NY Tokyo Link A Link B

LOCAL-PREF

  • AS may have many points from which

packets can leave the AS

  • Thus, AS may know about many paths for

reaching the same destination

  • LOCAL-PREF: a metric used to select

among multiple paths to the same prefix

The default value is not specified by the protocol. Problem in different vendor interoperability.

  • MED wouldn’t work
  • LOCAL-PREF is ‘stronger’ metric than MED

LOCAL-PREF Example

AS1 AS2 AS3 AS4 138.39.0.0/16

ATOMIC-AGGREGATE

  • Informs about decisions made with respect to
  • verlapping routes
  • Some subset of prefix being advertised is not

accessible or is routed differently

Router A hears From B 138.39.0.0/16 and 138.39.12.0/24 and the attributes are different

Decides to use 138.39.0.0/16 ; it has to attach ATOMIC- AGGREAGATE

A Router C that receives the update must not de-aggregate the prefix

Path to some subset of the prefix might traverse ASs not listed in AS_PATH

AGGREGATOR

  • If a BGP speaker performs aggregation on

some address space it hears from peers and then announces to others, it may attach an AGGREGATOR attribute to the aggregated prefix

  • AGGREGATOR: AS number (ASN) and IP

address of the router that performed the aggregation

Internal vs External BGP

  • To this point BGP was presented only as an EGP

Routing protocol that runs between ASs (E-BGP)

  • How are prefixes learned by a single router in an

AS via a BGP session distributed to the other routers in the AS ?

R5 R1 R2 R4 R3 BGP

slide-8
SLIDE 8

28 May, I-BGP vs E-BGP

  • One possible way: inject the routes into IGP

Works only for small networks, that do not carry a full routing table (IGP does not scale well)

  • Preferred approach: use I-BGP
  • E-BGP and I-BGP are the same BGP4

the same message types

the same path attributes

the same state machine

  • I-BGP is used between two routers in the same AS for

distributing routes learned through E-BGP to the rest

  • f routers in the AS

I-BGP vs E-BGP

  • Different prefix readvertising rules

prefix learned from an I-BGP neighbour cannot be readvertised to another I-BGP neighbour

  • To prevent looping of routing info within an AS
  • The major result of this rule: need for a full

mesh of I-BGP connections in AS

causes scaling problems

I-BGP vs E-BGP

  • I-BGP connection does not require direct

physical link

corresponds to a TCP connection

  • In contrast, E-BGP session typically

corresponds to a physical link

  • LOCAL-PREF: when a router injects a route for a

prefix into the I-BGP mesh within an AS, it is that router’s responsibility to establish the degree of preference for the route

BGP Route Selection Process

  • Route with highest LOCAL-PREF (computed locally)

In practice: route with shortest AS-PATH

  • Route with lowest MULTI-EXIT-DISCRIMINATOR
  • Route with lowest IGP cost to the NEXT-HOP
  • Route learned from E-BGP neighbour with lowest

BGP ID

  • Routes learned from I-BGP neighbour with lowest

BGP ID.

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

BGP Operations

  • BGP/IGP Interaction
  • Interaction with other EGP protocols
  • Routing Policy and Transit vs Nontransit
  • Subscriber’s use of BGP

Multihoming

  • Provider’s use of BGP
slide-9
SLIDE 9

28 May, BGP Example

R1 R2 R5 R3 R4 R6 I-BGP E-BGP AS1 AS2 AS3

IGP/BGP Interaction

  • BGP runs over a TCP connection
  • A TCP connection is identified by

{ (src IP, src port), (dest IP, dest port) }

  • A router has usually many interfaces (IPs)

Which one to use for an E-BGP/I-BGP session ?

E-BGP Addressing Approach

  • E-BGP session runs over specific physical link

connecting two neighbouring ASs

If the link goes down, the E-BGP session terminates

Peering routers can’t reach each other, neither prefixes heard from each other

AS1 AS2 R1 R2 E-BGP 138.39.1.1/30 138.39.1.2/30

I-BGP Addressing Approach

  • If specific link fails, another route must be found
  • Use virtual interface, not specific physical one

Link failure is not visible to I-BGP speaker as long as there is a connectivity

IGP must route the corresponding virtual addresses

R1 R2 R3 138.39.1.1/30 138.39.1.2/30

Virtual Interfaces

  • R1 and R2 must to know how to reach each other
  • Since R1 is not directly connected to 138.39.128.5, routing is

needed

it is here that the IGP comes into play R1 R2 138.39.1.1/30 138.39.1.2/30 138.39.128.1/30 138.39.128.5/30 I-BGP Physical link

BGP4: Intermediate Summary

  • Motivation

Policy making support

The Internet scalability problems

  • BGP4 Key Features

Inter-AS routing protocol for IPv4 internetworks

No assumptions about underlying IGPs

No constraints on underlying topology

Address space aggregation (CIDR)

Info exchanged by BGP peers suffices

to detect routing loops

to enforce routing policy decisions

slide-10
SLIDE 10

28 May, BGP4: Intermediate Summary

  • BGP Protocol elements

Path Vector Algorithm

Initial exchange of complete RT

Reliable Transport: Incremental updates

Route announcements/withdrawals

BGP4: Intermediate Summary

  • Bringing up a BGP session
  • Message Types

OPEN / UPDATE / KEEPALIVE / NOTIFICATION

  • Standard Path Attributes

ORIGIN AS-PATH NEXT-HOP MED LOCAL-PREF ATOMIC-AGGREGATE AGGREGATOR

  • Internal BGP vs External BGP
  • Path Selection (local decision)

The Lecture Outline

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • Policy making
  • Multihoming
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

Making Routing Policy

  • Decide locally which routes to accept from a

neighbor

  • Decide locally about preference the BGP speaker

will give to different routes to the same prefix

  • Decide locally which set of routes should be

advertised to each BGP neighbor

  • The decision about which routes to accept from and

advertise to various BGP neighbors has a profound impact on what traffic crosses a network

Policy Example

  • Each customer pays its ISP for connectivity to the Internet
  • Each ISP must provide connectivity to its customers
  • Each ISP must prevent its resources from being used

inappropriately

Customer1 reaching Customer3 by going through ISP2

ISP1 ISP2 ISP3 Customer1 Customer2 Customer3

The Policy Implementation

  • ISP1 must make sure that other ISPs (and their

customers) can reach Customer1

ISP1 announces routes for any prefixes used by Customer1 to both ISP2 and ISP3

  • ISP1 must make sure that Customer1 can reach

any destination in the Internet

ISP1 announces all the routes it knows about (e.g. from ISP2, ISP3) to Customer1

  • ISP1 does NOT advertise routes heard from ISP2

to ISP3, and vice versa

slide-11
SLIDE 11

28 May, Transit vs Notransit Services

  • Transit Service

The kind of service that an ISP provides to its customer, because the ISP is willing to allow traffic to transit its backbone on the way to other parts of the Internet

  • Nontransit Service

The kind of service that ISP1 provides to ISP2 (who doesn’t pay ISP1), because ISP1 is willing to accept traffic from ISP2 only if the traffic is bound to one of ISP1’s customers

Provider - Subscriber

  • Most customers of an ISP do not use BGP

Static Routing

  • ISP’s router to which the customer is connected is

configured with the prefix used by that customer

  • The router also configured to announce a route to that

prefix into BGP

Rarely an ISP includes its customers’ address space in its IGP

  • Single misbehaving router may cause havoc

Small IGP (RIP) on a point-to-point link between ISP and its customer to allow the customer to dynamically advertise its prefixes to the ISP.

A Singly Homed Subscriber

R1 R2 Subscriber Provider E-BGP AS1 AS2 138.39.2.0/23

A Singly Homed Subscriber

  • R1 is configured to establish a BGP session with R2 in a

particular AS and at a particular IP address

  • Similarly for R2
  • R1 announces the 138.39.2.0/23 prefix over the BGP

session to R2

R1 may need to aggregate a number of more-specifics within the prefix

  • R2 filters the routes it hears from R1 so that R2 accepts

advertisements only for 138.39.2.0/23

  • Filtering routing info flowing in the opposite direction

A Multihomed (MH) Subscriber

  • Single network has more then one connection to

the Internet

Motivation: performance and reliability

Load Sharing: parallel use of multiple links

Physical layer: IMUX

Data link layer: Multi-Link PPP

Network layer: BGP

Intrinsic complexity and problems

TCP flow packet reordering

  • performance degradation (fast retransmit)

Non-routing problems: addressing, DNS, aggregation

MH with Static Routing

Customer ISP2 ISP3

Interconnect Point

R1 R2 R3 ISP1 ISP4 ISP5

Customer is statically routed on both R2 and R3

Because R2 connects to interconnect point, much more traffic might be received trough R2 - Customer line

If ISP1 were very large compared with the other ISPs, this behaviour might not be a problem, or even reverse unbalance would result

Not flexible

slide-12
SLIDE 12

28 May, MH to A Single Provider I

  • Two parallel links between the same

two routers

IMUX (physical layer)

takes one serial stream of bits from the router and equally divides the bits between multiple circuits (and vice versa)

Multi-Link PPP (data link layer)

allows multiple links to be bundled together so that IP layer sees only one virtual link R1 R2 ISP Customer

MH to A Single Provider I ISP->Customer

  • BGP: using virtual interfaces
  • R1, R2 must support load sharing in

the face of equal cost routes

  • Static routing to virtual interfaces

R1 configured with 2 static routes: next hops for the virtual interface 192.1.128.5 are 192.1.1.1 and 192.1.1.6

R2 advertises routes with next hop 192.1.128.5

  • Recursive lookup

Probabilistic load balancing

Depends on the number of the advertised prefixes and the traffic

R1 R2

192.1.1.1/30 192.1.1.2/30 192.1.1.5/30 192.1.1.6/30 192.1.128.1/30 192.1.128.5/30

ISP Customer

MH to A Single Provider I, Customer->ISP

  • Before considering complex configurations,

reflect on whether the complexity is worth it

  • IMUX/Multilink PPP
  • E-BGP virtual interface peering

load sharing from Customer to ISP depends on the number of routes that Customer learns from the ISP

MH to A Single Provider II

  • ISP -> Customer
  • If the amount of traffic going to these

two networks is roughly equal

Customer uses MED

ISP uses LOCAL-PREF

  • Break a prefix into subprefixes
  • Customer -> ISP
  • If the two networks produce about the

same amount of traffic

Closest exit policy works

  • Get number of routes from ISP

R1 R2 R3 138.39/16 204.70/16 ISP Customer

MH to A Single Provider III

  • ISP -> Customer
  • If the amount of traffic going to

these two networks is roughly equal

Customer uses MED

ISP uses LOCAL-PREF

  • Break a prefix into subprefixes
  • Customer -> ISP

Alternate the link when sending packets to ISP

packet reordering

Get number of routes from ISP

138.39/16 204.70/16 R1 R2 R3 ISP Customer

MH to A Single Provider IV

ISP -> Customer

If the amount of traffic going to these two networks is roughly equal

Customer uses MED

ISP uses LOCAL-PREF

Break a prefix into subprefixes

Customer -> ISP

If the two networks produce about the same amount of traffic

Closest exit policy works

Get number of routes from ISP

138.39/16 204.70/16 R1 R2 R3 ISP Customer R4

Most reliable configuration: no

equipment is shared between the two links

slide-13
SLIDE 13

28 May, MH to More Than One Provider

  • The same issues arise that complicate MH to

a single provider

  • MH to multiple providers may be known

across the whole Internet

  • Major issues are addressing and aggregation

ISP3 ISP1 ISP2 Customer 138.39.1/24 138.39/16

MH to More Than One Provider

  • Address space used by Customer is critical

for load sharing from ISPs to Customer

Delegated to it by ISP1

Delegated to it by ISP2

Delegated to it by both ISP1 and ISP2

Obtained independently from an address registry

Address Space from ISP1

  • ISP1’s aggregation is not broken

Customer uses more specific out of ISP1 aggregate

  • ISP2 must announce Customer’s prefix explicitly

ISP2 advertises a longer prefix than ISP1 does: traffic magnet

ISP1 may need do advertise Customer’s more- specific

Most of traffic to Customer passes ISP2 - Customer link

If ISP2 has a longer prefix than ISP1’s I-BGP, then link ISP1-Customer might not be used at all

Address Space from ISP2

  • The behaviour is analogous to the behaviour in the

previous example

  • Now ISP1 will announce the more specific route

and attract traffic to it

  • The approach can work well depending on the

relative size of the providers involved

  • The load sharing may be good if ISP1’s address

space is used, but not good if ISP2’s address space is used

A Side Note About The Practice

  • We have assumed that Customer’s announcement
  • f 138.39.1/24 to ISP2 would be accepted

as would ISP2’s subsequent announcement of that prefix to ISP1 and ISP3

  • That is not necessarily true

Some providers implement new strict routing policy that would reject routes for prefixes

with lengths longer than classful route would have been

with lengths greater than 19 for newly assigned address space

To encourage better aggregation

Address Space from Both ISP1 and ISP2

  • Optimal option from the perspective of

aggregation

  • Degree of load sharing to Customer depends on

the amount of traffic destined for the two prefixes

Bad if amount of traffic is not equal: ISP2 doesn't know prefixes from ISP1

  • Lack of reliability

Inject the ISP1’s prefix to ISP2 and vice versa

Again traffic magnet problem

The strict routing policy barriers

slide-14
SLIDE 14

28 May, Independent Address Space

  • No problem of different parts of the Internet having

longer matches than others

the whole Internet sees the same prefix

  • The strict routing policies issue
  • Aggregation is sacrificed
  • Load sharing

which provider uses which path to reach Customer

  • AS-PATH manipulation: make it artificially longer when

announcing to ISP1 to make ISP2 and ISP3 use the other link

MH: Customer-ISP Load Sharing

  • The issue is basically the same as for a subscriber

multihomed to a single provider.

If the areas of topology closest to the points where each ISP connects produce about the same amount of traffic

  • Closest exit policy works

Get number of routes from ISPs, favour one link

  • ver the others for reaching various parts of the

Internet

Providers’ use of BGP

  • Routes aggregation is critical to survivability of the Internet

routing system

  • External aggregation: global scalability
  • Internal aggregation
  • Filtering Transit Customers
  • Don’t accept just anything from customers
  • Manual and automated filters
  • Public Interconnect Points (PIP)
  • Layer 2 infrastructure (Ethernet/FDDI)
  • Providers buy point-to-point circuit to PIP
  • Rate-controlling individual ISPs
  • Prevention of illegal use of provider’s network
  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

Foreword

  • The Internet has established a pattern of requiring more

and more from its protocols

  • New requirements can be met with:

changes to a protocol’s implementation (TCP congestion control algorithms)

modest changes to the protocol (TCP SACK option)

a new version of the protocol (BGP4 to support CIDR)

a whole new protocol (HELLO/GGP/EGP/BGP)

  • Design of a protocol should not be hard-wired to a set
  • f demands at any particular point of time

Must provide for extensibility

BGP Extensions

  • IBGP Scaling Problem

Route Reflection

AS Confederations

  • Route Flap Dampening
  • BGP Communities (’

Route Coloring’)

  • TCP MD5 Authentication
  • Multiprotocol Extensions
  • Capabilities Negotiation
slide-15
SLIDE 15

28 May, IBGP Scaling Problem

  • BGP speaker R3 cannot advertise a route to an IBGP

neighbor R4 if R3 heard the route from another IBGP speaker R2

prevention of routing info looping within an AS

  • Result: full mesh of IBGP sessions required
  • Does not scale:

CPU

memory

bandwidth R2 R3 R4 R1

Route Reflection

  • Add hierarchy to IBGP to avoid full mesh
  • All internal peers are subdivided into:

route reflectors (RR)

route reflector clients (RRC)

  • A RR is a router that MAY readvertise routes

between IBGP neighbors

  • A RRC is a router that depends on RR

to readvertise its routes to entire AS

to learn about routes from the rest of the AS

Route Reflection Example

RR3 RR1 RR2 RRC1 RRC2 RRC3 RRC4 AS1 AS2 EBGP Neighbor EBGP IBGP 138.39.0.0/16 128.4.0.0/16

Route Reflection contd

  • If a RR hears a route from one of its clients, it readvertises

it to all its remaining clients and to all other RRs

  • If a RR hears a route from another RR, it readvertises the

route between its clients only

  • HEXT-HOP and LOCAL-PREF attributes must be

preserved by RRs throughout the IBGP mesh

  • Preventing Routing Info Looping
  • ORIGINATOR-ID
  • CLUSTER-LIST
  • cluster: RR and its clients

AS Confederations

  • Another approach to solve the full IBGP meshes

problem

  • Divide and Conquer

split an AS into many sub-AS’ s

full meshes are used in sub-AS’s only

  • Transparent from outside
  • A router within a confederation knows

ASN of the confederation (AS) itself

all the members (sub-AS’ s) of the confederation

AS Confederation Example

R1 R2 AS2 AS1 AS Confederation Sub-AS’ s 10 11 12 14 13

slide-16
SLIDE 16

28 May, AS Confederations contd

  • EIBGP sessions

much like EGP sessions

LOCAL-PREF attribute may be readvertised

NEXT-HOP attribute must be readvertised

  • a single IGP is assumed to run across the entire confederation
  • New AS-PATH segment types to prevent looping
  • f routing info

AS-CONFED-SEQUENCE

AS-CONFED-SET

Route Flap Dampening

  • Route Flapping Problem
  • Route Flap Dampening (RFD)

suppress advertisement of a flapping route until it becomes stable

  • Aggregation often helps to mask flapping
  • Aggregation does not obviate route flap

dampening

using addresses that cannot be aggregated

even a highly aggregated route can flap

Dampening Example

138.39.1/24 138.39/16 EBGP EBGP Customer ISP1 ISP2

Route Flap Dampening contd

  • RFD is analogous to policy making
  • RFD takes into account a past stability of a route in

deciding whether to use and readvertise it

  • Store a penalty value with each route

increased if the route flaps

decremented over time

  • BGP speaker does not "forget" about withdrawn route
  • The extension is for EBGP only
  • Some ISPs dampen longer prefixes more aggressively

than shorter prefixes

BGP Communities (’Route Coloring’)

  • Motivation: simplify configuration of complex

routing policies

  • The attribute provides ability to associate an

identifier with a route

  • A set of routes that supposed to be treated the

same way with respect to policy can be assigned the same identifier

BGP Communities Example

Customer1 AS10 Customer2 AS20 Customer3 AS30

138.39/16 204.70/16 132.151/16 206.1.5/24 128.4/16 18/8 192.1.10/24 ISP1 (AS1) R1 ISP3 (AS3) ISP2 (AS2)

slide-17
SLIDE 17

28 May, Communities Example contd

To enforce policy, R1 must be configured to advertise

routes only for customers’ prefixes

✁ Configure R1 with a list of all the prefixes used by

ISP1's customers

does not scale well ✂

hundreds or thousands of prefixes

highly dynamic environment

✁ Configure R1 with a list of all the AS's used by ISP1's

customers

scales better still static ✂

configuration must be replicated on each router speaking EBGP

Community Attribute

  • A list of individual community values (’colors’

), 4 bytes each

  • A route may be associated with many BGP

communities

  • To avoid conflict in community values, the high
  • rder two octets contain an ASN
  • Consider the above example, now using the

Community Attribute

NO-EXPORT Community Value

138.39.0/17 138.39.128/17 138.39/16 AS1 AS2 AS3

NO-ADVERTISE Community Value

R3 R2 R1 AS1 AS2 138.39.1/24 138.39/16 EBGP FDDI

Example: Automatic Backup Routes

138.39/16 204.70/16 138.39/16 204.70/16 (backup) 138.39/16 (backup) 204.70/16 EBGP AS1 AS2 AS3 (customer)

TCP MD5 Authentication

  • OPEN message can contain an authentication

info optional parameter

Indicates how to predict the value of the Marker field

  • Neither the base specification nor a later

extension described its specific use

  • If underlying TCP connection is not secure,

there is no value in securing application-layer BGP session

TCP RST messages may make legitimate TCP endpoints to get

  • ut of synchronization
✂ BGP session terminates
slide-18
SLIDE 18

28 May, MD5 Authentication

  • TCP MD5 signature option (RFC 2385)

Connection-specific shared secret key

MD5 applies to:

  • TCP pseudo header

Src IP

Dst IP

Protocol number

TCP segment length

  • TCP header (including options)
  • TCP segment data
  • The secret key

Multiprotocol Extensions

  • There are parts of the Internet that can forward

more than just IPv4

IPv6

IPX

  • There is an interest to use BGP to do routing for

protocols other than IPv4

  • Two new attributes

MP-REACH-NLRI

MP-UNREACH-NLRI

Multiprotocol

  • The two attributes may be used only after BGP session

comes up

A BGP speaker using MP extensions must have at least one IPv4 address

  • MP-REACH-NLRI

Used to advertise MP routes

Address family id (as specified in RFC 1700)

Network address of next hop

List of layer 2 addresses of the next hop

useful for third-party next hop

NLRI

Capabilities Negotiation

  • Enables a BGP speaker to learn its neighbor’

s capabilities with respect to protocol extensions

Multiprotocol attributes

  • New optional parameter of an OPEN message

Capabilities

  • Only one use of the Capabilities: support of

multiprotocol extensions

Address Family Identifier

  • Motivation
  • BGP Overview/Big Picture
  • The BGP4 Protocol
  • The BGP4 Operations
  • BGP4 Extensions
  • Experience with BGP4
  • Future Work

BGP4 Implementations

  • Relatively simple to implement

2 man/month (the basic specification)

  • Multiple Independent interoperable

implementations

Cisco

Gated

3COM

Bay networks

slide-19
SLIDE 19

28 May, Experience with BGP4

  • BGP4 has been used since 1993
  • Heterogeneous inter-AS environment

Link bw: 28 Kbits/sec ... 150 Mbits/sec

Routers: from slow PC to a very high performance RISC based CPU

Special purpose routers and general purpose UNIX WS

Topologies

From very sparse (ICM spanning tree) ...

... To a quite dense (NSFNET backbone)

By 1995 BGP4 is used as inter-AS routing protocol between ALL significant AS’s

From 1 router to ~100 BGP speakers

Advantages of BGP4

  • Clear superiority of BGP4 as compared to EGP2

and BGP3

Bandwidth consumption

CPU requirements

Memory requirements

Considerable slowdown of routing tables growth

Extensibility

Very good in a stable environment

BGP4 Problems

  • Slow convergence after a fault

Average of 3 minutes (up to 15 minutes)

Packet loss grows by a factor of 30

Latency grows by a factor of 4

Loss of connectivity for ~30 seconds

  • Reasons

Software bugs (1995-1997)

Controversial policies

Ambiguity of BGP4 specifications

Artifacts of router vendor implementation decisions

Recommendations

  • AS-PATH loop detection both at a sender and a receiver
  • Implementation of MinRouteAdver per

(destination prefix, peer) instead of on per peer base

  • Path changes will still trigger temporary oscillations and

require many seconds of restoration time

  • BGP convergence may be improved by

Synchronization

Diffusing updates

Additional state info

... At expense of more complex protocol and increased

router overhead

Future Work

More AS-specific info may improve a route selection process

  • Optional vector corresponding to the AS-PATH where

each transit AS indicates its ’ quality’

  • Diameter
  • Link speed
  • Capacity
  • Tendency to become congested
  • Stability

IDR: The Long Term Perspective

  • The IDR Working Group standardizes and promotes the

BGP4 and ISO Inter-Domain Routing Protocol (IDRP) as scalable inter-AS routing protocols capable of supporting policy based routing for TCP/IP internets.

  • The objective is to promote the use of BGP4 to support

IPv4.

  • IDRP is seen as a protocol that will support IPv4 as well

as the next generation of IP (IPv6).

  • The working group will plan a smooth transition

between BGP4 and IDRP.

slide-20
SLIDE 20

28 May, Relevant Documents and Literature

  • Internet draft ’

A Border Gateway Protocol 4 (BGP4)’

  • RFC 1771 ’A Border Gateway Protocol 4 (BGP4)’
  • RFC 1772 ’Application of the BGP in the Internet’
  • RFC 1773 ’Experience with the BGP4 protocol’
  • RFC 1774 ’BGP4 Protocol analysis’
  • RFC 2439 ’BGP Route Flap Damping’
  • RFC 1519 ’Classless Inter-Domain Routing (CIDR)’
  • RFC 1518 ’An Architecture for IP Address Allocation with CIDR’
  • RFC 3065 ’Autonomous System Confederations for BGP’

Relevant Documents and Literature

  • RFC 2796 ’BGP Route Reflection: An alternative to full mesh IBGP’
  • RFC 1997 ’BGP Communities Attribute’
  • RFC 1998 ’An Application of the BGP Community Attribute in Multi-home

Routing’

  • RFC 2385 ’Protection of BGP Sessions via the TCP MD5 Signature Option’
  • RFC 2858 ’Multiprotocol Extensions for BGP-4’
  • RFC 2842 ’Capabilities Advertisement with BGP-4’
  • RFCs 0827/0888/0904 ’

Exterior Gateway Protocol’

  • John W. Stewart III ’BGP4’
  • Labovitz, Ahuja, Bose ’

Delayed Internet Routing Convergence’