Networking-Forums.com

Professional Discussions => Everything Else in the Data Center => Topic started by: NetworkGroover on July 01, 2015, 11:43:03 AM

Title: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 01, 2015, 11:43:03 AM
Meh... looks like my work is going to be swallowed up into a larger design doc... so if you guys are interested, I uploaded my draft proposal for a practical BGP in the DC design doc to my LinkedIn profile under my current position until it gets published. Once published I'll broadcast the URL.  Let me know what you think (PM me if you don't know who I am).



Title: Re: BGP in the DC - Draft white paper
Post by: that1guy15 on July 01, 2015, 12:18:21 PM
Cool! Ill have a look when I get some time.

Thanks for sharing.
Title: Re: BGP in the DC - Draft white paper
Post by: Nerm on July 01, 2015, 01:44:27 PM
I would love to read it but I don't think I have anyone from here on LinkedIn (something I should probably change at some point).
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 01, 2015, 02:00:57 PM
Yeah - you should rectify that.  It's small world and you never know when someone from the forum shows up in your neck of the woods.

Just a little while ago... what.. four of us were able to get together and hang out?
Title: Re: BGP in the DC - Draft white paper
Post by: Reggle on July 01, 2015, 03:21:03 PM
Nice document. Only when you're talking about failover and reconvergence timers and BGP, I'd expect BFD to be mentioned somewhere. Did you forget it or is it not that interesting in this kind of design? If it's the latter, I'd like to know why. And I'm sure others do as well.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 01, 2015, 05:49:16 PM
Quote from: Reggle on July 01, 2015, 03:21:03 PM
Nice document. Only when you're talking about failover and reconvergence timers and BGP, I'd expect BFD to be mentioned somewhere. Did you forget it or is it not that interesting in this kind of design? If it's the latter, I'd like to know why. And I'm sure others do as well.

Thanks!  You probably missed it - it's in the section called "The Need for Fast Failure Detection" - last paragraph.  I just didn't write a huge section on it.

Title: Re: BGP in the DC - Draft white paper
Post by: wintermute000 on July 01, 2015, 09:54:22 PM
Quote from: Nerm on July 01, 2015, 01:44:27 PM
I would love to read it but I don't think I have anyone from here on LinkedIn (something I should probably change at some point).

Don't tell me you missed the linkedin-circle-jerk thread aka the belkin thread of the year
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 01, 2015, 10:08:08 PM
Quote from: wintermute000 on July 01, 2015, 09:54:22 PM
Quote from: Nerm on July 01, 2015, 01:44:27 PM
I would love to read it but I don't think I have anyone from here on LinkedIn (something I should probably change at some point).

Don't tell me you missed the linkedin-circle-jerk thread aka the belkin thread of the year

lolwut
Title: Re: BGP in the DC - Draft white paper
Post by: icecream-guy on July 02, 2015, 06:42:06 AM
Quote from: AspiringNetworker on July 01, 2015, 10:08:08 PM
Quote from: wintermute000 on July 01, 2015, 09:54:22 PM
Quote from: Nerm on July 01, 2015, 01:44:27 PM
I would love to read it but I don't think I have anyone from here on LinkedIn (something I should probably change at some point).

Don't tell me you missed the linkedin-circle-jerk thread aka the belkin thread of the year

lolwut

another place, another time.
Title: Re: BGP in the DC - Draft white paper
Post by: deanwebb on July 02, 2015, 08:28:20 AM
OK, time for another LinkedIn roundabout...
Title: Re: BGP in the DC - Draft white paper
Post by: AnthonyC on July 04, 2015, 09:54:46 PM
Petr Lapukhov presented a BGP Design for Data Center back in 2012 for NANOG when he was /w Microsoft.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 05, 2015, 11:58:10 AM
Quote from: AnthonyC on July 04, 2015, 09:54:46 PM
Petr Lapukhov presented a BGP Design for Data Center back in 2012 for NANOG when he was /w Microsoft.

I love comments like these.  What was the point of that?

Yes, Petr Lapukhov did present a BGP Design for the DC, and you apparently missed the fact one of my listed sources was the IETF Draft he collaborated on - if you even looked at my paper at all.  Thanks though. 

The point of this wasn't to say I created the design.  It was to discuss the design in a practical manner, the whys, hows, etc.  - and to teach myself and to share a little knowledge with others so they didn't go through the same pains I did.
Title: Re: BGP in the DC - Draft white paper
Post by: AnthonyC on July 06, 2015, 11:32:06 AM
Quote from: AspiringNetworker on July 05, 2015, 11:58:10 AM
Quote from: AnthonyC on July 04, 2015, 09:54:46 PM
Petr Lapukhov presented a BGP Design for Data Center back in 2012 for NANOG when he was /w Microsoft.

I love comments like these.  What was the point of that?

Yes, Petr Lapukhov did present a BGP Design for the DC, and you apparently missed the fact one of my listed sources was the IETF Draft he collaborated on - if you even looked at my paper at all.  Thanks though. 

The point of this wasn't to say I created the design.  It was to discuss the design in a practical manner, the whys, hows, etc.  - and to teach myself and to share a little knowledge with others so they didn't go through the same pains I did.

Why so defensive?  Others may find interesting to read another paper on the topic.  Also I don't see a link to the paper and not sure what's your LinkedIn profile.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 06, 2015, 12:37:51 PM
Quote from: AnthonyC on July 06, 2015, 11:32:06 AM
Why so defensive?  Others may find interesting to read another paper on the topic.  Also I don't see a link to the paper and not sure what's your LinkedIn profile.

Wow. My behavior was totally from being used to a certain type of behavior on other forums without even thinking about all of the factors involved.  My apologies - you're absolutely correct.  Heck, I just realized I have the pdf that you're talking about saved as part of my research. ;)

I'll PM you my LinkedIn profile if you're interested.  Sorry again.
Title: Re: BGP in the DC - Draft white paper
Post by: AnthonyC on July 06, 2015, 03:24:37 PM
No problem; it is the Internet and I have pretty thick skin. :)
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 07, 2015, 02:08:40 PM
Heh - looks like the draft white paper is going to get a major overhaul.  Found out some pretty interesting things I'm in the middle of testing right now.  I was leaning toward eBGP between MLAG peers for a few convenience reasons, but now it looks like there's a large entity that uses the same AS at every single leaf.  That's huge because then we can use BGP dynamic neighbors at the spine and never need to add another line of BGP config for each leaf switch that gets added... sha-weet!
Title: Re: BGP in the DC - Draft white paper
Post by: that1guy15 on July 07, 2015, 02:17:24 PM
Then your spine switches run as route reflectors?

Dynamic neighbors is pretty cool and I can see how that would fit in well in this design. As long as everything is cookie-cutter then life is good.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 07, 2015, 02:55:38 PM
Quote from: that1guy15 on July 07, 2015, 02:17:24 PM
Then your spine switches run as route reflectors?

Dynamic neighbors is pretty cool and I can see how that would fit in well in this design. As long as everything is cookie-cutter then life is good.

No no - the spines run in their own separate AS with no connection between them - that gives you easy loop prevention (Think SPINE1 advertise to LEAF1, which in turn advertises to SPINE2 - SPINE2 sees it's own AS and doesn't add the route).  Just the leaf switches run with the same AS.  Sounds weird, but get the "iBGP mesh" thought out of your head - this is outside-the-box kind of stuff.  A new concept to me which I'm labbing up now to see how it works.. but it's gotta work - the company that does this is huge.

EDIT - Re-reading this I see I did a crappy job of explaining.  So let me put it this way.

Say you have a two-tier spine/leaf DC design.  The spines both run AS 64600.  The way things are playing out, I'm seeing three options at the leaf:
1. Run a different AS at every leaf, such as 65000, 65001, 65002, etc.  This was my preferred way to do it, but that may be changing.
2. Run a different AS at each rack (one or two switches), such as 65000 in one rack, 65001 in another, etc.
3. Run the same AS at every leaf, such as 65000 - at every leaf.  If you're running two switches (MLAG,etc.) you iBGP peer them of course, and use allowas-in to accept routes from other leaf switches.  If this can be done, you can leverage dynamic neighbors at the spine and REALLY cut down on config.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 07, 2015, 03:32:08 PM
For example, at a spine with just 3 leaf switches, I went from this:

BGPDC-SPINE2(config)#sh run sec router bgp
router bgp 64600
   router-id 192.168.254.2
   bgp log-neighbor-changes
   distance bgp 20 200 200
   maximum-paths 32 ecmp 32
   neighbor eBGP_GROUP peer-group
   neighbor eBGP_GROUP fall-over bfd
   neighbor eBGP_GROUP password 7 gxo9zOfCTHZihMXNwE0BXQ==
   neighbor eBGP_GROUP maximum-routes 12000
   neighbor 192.168.255.17 peer-group eBGP_GROUP
   neighbor 192.168.255.17 remote-as 65000
   neighbor 192.168.255.19 peer-group eBGP_GROUP
   neighbor 192.168.255.19 remote-as 65001
   neighbor 192.168.255.21 peer-group eBGP_GROUP
   neighbor 192.168.255.21 remote-as 65002
   network 192.168.254.2/32
   aggregate-address 192.168.255.16/28 summary-only


To this:
BGPDC-SPINE2(config-router-bgp)#sh active
router bgp 64600
   router-id 192.168.254.2
   bgp log-neighbor-changes
   distance bgp 20 200 200
   maximum-paths 32 ecmp 32
   bgp listen range 192.168.255.0/24 peer-group ARISTA remote-as 65000
   neighbor ARISTA peer-group
   neighbor ARISTA fall-over bfd
   neighbor ARISTA password 7 6x5GIQqJNWigZDc2QCgeMg==
   neighbor ARISTA maximum-routes 12000
   network 192.168.254.2/32
   aggregate-address 192.168.255.16/28 summary-only


And it works:
BGPDC-SPINE1(config)#sh ip bgp summ
BGP summary information for VRF default
Router identifier 192.168.254.1, local AS number 64600
Neighbor         V  AS           MsgRcvd   MsgSent  InQ OutQ  Up/Down State  PfxRcd PfxAcc
192.168.255.1    4  65000             27        20    0    0 00:14:24 Estab  3      3
192.168.255.3    4  65000             33        21    0    0 00:14:22 Estab  3      3
192.168.255.5    4  65000              9         9    0    0 00:03:04 Estab  2      2

BGPDC-SPINE1(config)#sh ip route bgp

VRF name: default
Codes: C - connected, S - static, K - kernel,
       O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
       E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
       N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
       R - RIP, I - ISIS, A B - BGP Aggregate, A O - OSPF Summary,
       NG - Nexthop Group Static Route

B E    192.168.10.0/24 [20/0] via 192.168.255.1, Ethernet1
                               via 192.168.255.3, Ethernet2
B E    192.168.20.0/24 [20/0] via 192.168.255.5, Ethernet3
B E    192.168.254.3/32 [20/0] via 192.168.255.1, Ethernet1
                                via 192.168.255.3, Ethernet2
B E    192.168.254.4/32 [20/0] via 192.168.255.1, Ethernet1
                                via 192.168.255.3, Ethernet2
B E    192.168.254.5/32 [20/0] via 192.168.255.5, Ethernet3
Title: Re: BGP in the DC - Draft white paper
Post by: Otanx on July 07, 2015, 05:19:11 PM
If that works it is pretty cool. I read your paper, and was thinking that using the bgp listen range command on the spines would have been more helpful than on the leafs, but didn't see how it would work. Using the same AS on all the leafs would make adding leafs easy.

-Otanx
Title: Re: BGP in the DC - Draft white paper
Post by: that1guy15 on July 07, 2015, 06:15:03 PM
Interesting.

The single ASN for all leafs is a smart idea. i need to chew threw all of this more!!
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 08, 2015, 12:42:37 AM
Yeah I completely agree that using bgp listen at the spines is way better, but unfortunately the command in its current form requires an AS be specified, and you can't specify multiple. You could specify multiple peer groups but that's not really practical as you'd have a ton of repeated config (think bfd, authentication, etc. for each peer group). So having a different AS at each leaf obviously caused a problem there.  Using the same AS at every leaf makes it so that you can use bgp listen at the spine instead and as shown earlier drastically reduce config.

Some folks are making an argument though that they want to be able to trace routes back to a leaf using AS_PATH which obviously won't be too easy if every leaf has the same AS (though I would think the server subnets below it would help with that, but whatever). So in that situation as a happy medium it looks like another option would be to use a different AS for each rack (singe leaf or dual-leaf via MLAG).  That destroys bgp listen at the spine though, BUT there may be a fix coming down the road to allow the use of the command with multiple ASNs - which would help alleviate that.
Title: Re: BGP in the DC - Draft white paper
Post by: Otanx on July 08, 2015, 09:07:44 AM
Could you have each leaf prepend a different AS. So each leaf would be the same AS, but prepend a different AS? So something like

Leaf 1

router bgp 65001
neighbor 10.1.0.2 remote-as 65200
neighbor 10.1.0.2 route-map prepend out
!
route-map prepend permit 10
set as-path prepend 65042


Leaf 2

router bgp 65001
neighbor 10.1.0.2 remote-as 65200
neighbor 10.1.0.2 route-map prepend out
!
route-map prepend permit 10
set as-path prepend 65043


That is just one line of difference for each leaf which isn't bad.

-Otanx
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 08, 2015, 12:09:57 PM
Quote from: Otanx on July 08, 2015, 09:07:44 AM
Could you have each leaf prepend a different AS. So each leaf would be the same AS, but prepend a different AS? So something like

Leaf 1

router bgp 65001
neighbor 10.1.0.2 remote-as 65200
neighbor 10.1.0.2 route-map prepend out
!
route-map prepend permit 10
set as-path prepend 65042


Leaf 2

router bgp 65001
neighbor 10.1.0.2 remote-as 65200
neighbor 10.1.0.2 route-map prepend out
!
route-map prepend permit 10
set as-path prepend 65043


That is just one line of difference for each leaf which isn't bad.

-Otanx

Hmmm. Maybe? I don't see why not... allowas-in should still work I think. Eats up more ASNs which some people don't like and it's a little clunky but may get the job done.  I'll play with it.
Title: Re: BGP in the DC - Draft white paper
Post by: Otanx on July 08, 2015, 01:04:55 PM
If you want to trace to a leaf with AS_PATH then this gives you the simplified config, and only uses one more ASN than you would have anyway. If you don't care about tracing using AS_PATH then don't use the prepend.

-Otanx


Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 08, 2015, 04:26:22 PM
Quote from: Otanx on July 08, 2015, 01:04:55 PM
If you want to trace to a leaf with AS_PATH then this gives you the simplified config, and only uses one more ASN than you would have anyway. If you don't care about tracing using AS_PATH then don't use the prepend.

-Otanx

Oh, I was in no way knocking it.  Completely agree.
Title: Re: BGP in the DC - Draft white paper
Post by: Otanx on July 08, 2015, 06:02:48 PM
Right, I am just suggesting you can make it an optional deployment option. If you want tracing add these two lines. If you don't leave them out.

-Otanx
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 09, 2015, 03:02:27 PM
Eeesh.  So my original paper is based off using a different AS at every leaf - even those that are MLAG'd together.

Just realized something... let's say I have a spine switch (SPINE1) connected to two leaf switches (LEAF1, LEAF2).  Both of the leaf switches are MLAG'd together and connected to the same server, advertising its subnet to the spine.

If the two leaf switches are in different ASs, does that create a routing loop?   I'm thinking LEAF1 advertises server subnet to SPINE1, and SPINE1 advertises it down to LEAF2 - will LEAF2 accept and add the route?  Need to test this... I feel like this a simple networking 101 concept I'm forgetting about... but if it's true, I don't even see that as a viable design option - to address it would require filtering, and why do that when you can just put the two leaf switches in the same AS and address the problem.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on July 10, 2015, 07:48:26 PM
Yeah... paper's getting a major overhaul.  Will let you guys know once the updates are finished if you're interested.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on August 07, 2015, 06:41:33 PM
Well, I think I'm about done with the paper.  It's had a major overhaul and I've finally been talked out of running eBGP between MLAG'd switches - mostly for ease of automation.  You can find the updated paper on my LinkedIn profile - it will not be officially published as pieces of it will be used in a more holistic design doc to be released at a future date.
Title: Re: BGP in the DC - Draft white paper
Post by: burnyd on August 08, 2015, 10:32:22 AM
Gimme the link for eos central I couldnt find it.
Title: Re: BGP in the DC - Draft white paper
Post by: routerdork on August 10, 2015, 09:30:46 AM
Go to arista.com and at the bottom left of the web page there is a link for Software Downloads. Click that and create a guest account (or link to your account if you have one) and then login. Then you can download it. Once Software Downloads is selected after logging in it's at the bottom under all the notification boxes, looks like there are iso's and vmdk's. The 4.15.0F vmdk is about 450MB.
Title: Re: BGP in the DC - Draft white paper
Post by: routerdork on August 10, 2015, 09:32:40 AM
Quote from: burnyd on August 08, 2015, 10:32:22 AM
Gimme the link for eos central I couldnt find it.
I'm assuming that's what you wanted by the way :) Either way it helped me.
Title: Re: BGP in the DC - Draft white paper
Post by: NetworkGroover on August 10, 2015, 10:51:43 AM
It's not going to be officially published.  It will be cannibalized into a larger design doc coming out at a future date being written by a separate team.  I'm sharing it with you fine folks since I love you so much. ;)  (And I trust your feedback)

It's on my LinkedIn profile.