BGP Community Sent and Received on both sides of an MPLS circuit

Started by Seittit, January 06, 2015, 02:35:52 PM

Previous topic - Next topic

Seittit

Buenos Dias!

We are working on a solution that will allow us to transform an EIGRP tag value into a BGP community on a CE router and have it magically appear on the other CE router of the MPLS circuit. Our MPLS provider however uses communities we send as an indication of routing traffic within their network.

Is there anything we can slap onto a BGP route (taggish) and have it appear on both sides of an MPLS connection?

wintermute000

There's no reason why your provider can't pass communities on. You should talk to them to see if they can do that as long as you don't use the existing steering choices

Seittit

 :zomgwtfbbq:

maybe we got the wrong sales engineer on the line. any experience with Level3's MPLS circuit configuration?

*edit: i really wanted to use that gif*

wintermute000

If their bgp config within your vrf accepts communities IIRC it's transparent to the mpls. I'd wager its something like they have a standard config somewhere to strip communities so it doesn't get mixed up with their internal usage or protocols 

dely

In my experience with Verizon, L3 and CLink, none of them like to pass along communities.  They all strip them out in transit.  Thus far, we've not been successful at convincing any of them to cease so we are looking at alternatives. 

I'm more curious of what the end goal is, what are you trying to accomplish.  We were attempting the basically the same thing when we discovered the what the providers were doing... 

wintermute000

Communities map to local preference / use for tag and filter is bgp design 101 :)


I must be lucky then as I haven't had many SPs take issue with sending communities. In fact I'm involved in a project in south america right now where across 2 different provider MPLS-VPNs we're merrily sending / receiving communities and we didn't even have to tell one of them, it just worked lol (though in fairness we are still waiting for the other to enable it)

Seittit

Quote from: dely on January 06, 2015, 07:11:44 PM
I'm more curious of what the end goal is, what are you trying to accomplish.

we have VZ currently providing MPLS circuits to branch sites, but we need provider redundancy in some field areas. No use in paying for two circuits while only using one, we're going to split the next topology in half and forward traffic out one VZ and the other out L3.

not too big of an issue, but we have many network appliances that NEED symmetrical routing (firewalls, wan optimizers, etc). so here's the big idea: on the network divide, tag value ABC on routes going towards VZ and tag value XYZ on routes going towards L3; tagging is done on the actual Layer3 interface. that EIGRP tag will be mapped to a route-map on the WAN routers that'll morph it into a BGP community; if L3 plays nicely we can have that community show up on our MPLS head end router and then it'll be redistributed into EIGRP with another route-map to morph as tag value. We can then have the freedom to work some routing juice and make everything honkie-dorie.

wintermute000

Basically you are the target customer for Cisco IWAN. I'd attach the PDF but unfortunately it was under a pre-sales NDA.

in a nutshell: DMVPN (even over private networks) + PfR (but rebranded lol) = IWAN, choose tunnel 1 or tunnel 2 dynamically. Add WaaS WAN acceleration (obviously). Look ma, akamai caching!

Pie in the sky aside, you could do it without community if you were willing to manually tag inbound on your data center. e.g. On ISP A aggregation, preference A (or whatever other BGP metric you choose) on prefix-list A, preference B on prefix-list B. Reverse for ISP B aggregation.
The site itself would basically operate as two logical sites with a back-door circuit (i.e. your switching backplane lol). But without tags you'd have to manually update your preferences.

You have to be careful when mixing routes from 2 BGP domains and the possibility of advertising one into another (do you want to do that?). Lab it thoroughly, in particular, play around with different ASN lengths in the 'provider' part of your simulation, for example I've seen some interesting race conditions develop upon WAN flaps due to routes from one BGP cloud going into another. Normally you'd tag to AVOID this.

I'd go back to basics though and question your assertion of no use for paying for two circuits while only using one - so what happens in DR? If you are reliant on the combined bandwidth for normal operation, wouldn't having one circuit down mean you grind to a halt? And if you're not planning on maxing out your twin circuits and resigned to the fact that losing one means the site hobbles along, then why not just go one big one little (cheap) circuit?


You could also try a production PfR implementation  :partay:

Seittit

Quote from: wintermute000 on January 07, 2015, 01:05:42 AM
Basically you are the target customer for Cisco IWAN. I'd attach the PDF but unfortunately it was under a pre-sales NDA.

We've given Cisco six months and four technical sales meetings with iWAN's dev team to sell us on iWAN, but even they state our setup won't work for two major reasons:

1: we use multilayer switches as gateway routers for most subnets in large field sites and let the ISRs handle the WAN goodies. Since the initial route lookup takes place on the subnet's gateway, that's where you need PfR working. And since PfR is an ISR/ASR only feature, that hurts us.

2: we have two main hubs where all the traffic comes back, on in Texas and another in the US; we will also have a third opening in Mozambique. PfR v3 doesn't support more than one hub. 

3: other issues include: new data license on all ISR routers, and the sheer fact that Cisco doesn't have a clear, precise vision for what iWAN is supposed to accomplish. Not a fan of the DMVPN packet overhead, MTU issues, waiting for WAAS 6.x to be released, Akamai license, etc etc etc. The 3945e with IWAN turned is reduced to merely 40-45 Mbps throughput, which makes you evaluate ISR G3s, but they license throughput on those and there's honestly no sense in upgrading from G2 to G3 less than a year of moving from G1 to G2. We've been entertaining Ipanema's appliance, but we need a solution to take off before we can get live testing completed.

Quote from: wintermute000 on January 07, 2015, 01:05:42 AM
Pie in the sky aside, you could do it without community if you were willing to manually tag inbound on your data center. e.g.

this is a solution we initially discussed, but was dismissed as unsupportable. We have just six engineers on the team: only three of us could handle this task and all six of us are stretched incredibly thin. We need a solution that's as dynamic as we can get it, unfortunately.

Quote from: wintermute000 on January 07, 2015, 01:05:42 AM
I'd go back to basics though and question your assertion of no use for paying for two circuits while only using one - so what happens in DR? If you are reliant on the combined bandwidth for normal operation, wouldn't having one circuit down mean you grind to a halt? And if you're not planning on maxing out your twin circuits and resigned to the fact that losing one means the site hobbles along, then why not just go one big one little (cheap) circuit?

We have to sell the branch site managers on this, since they're the ones that foot the bill for two MPLS circuits. It's going to be a very difficult sell to them (and upper management) if they realize they're not getting very much bang for their buck. As for cheap circuits, it is a workable solution if there's a DSL local provider nearby. But many sites are unmanned and WAY THE HECK out there. The build cost for an MPLS circuit is comparable to the DSL (if not better) and that's what our business needs.

A tough situation all around, it's not going to be pretty

wintermute000

I deal with DMVPN regularly and yeah expect a random outage once a month. Things get even worse when you overlay NHRP issues with regular routing, IVRF vs FVRF, and how many people run the same hubs for DMVPN only sites vs sites where DMVPN is the backup, worse still they chuck DMVPN and WAN on the same routers, and then blindly follow the Cisco white paper so they have a EIGRP DMVPN vs a BGP WAN then its redistribution funtimes coz its the same router... oh yeah VRFs as well... lol not a big fan of iWAN either for the reasons you've already gone into.

Cisco are going to lose big time with the ISR4k throughput licensing. my old company used to happily shove 2921s on 100M links and never had an issue as we knew what features we used and what we didn't. guess where they're going to look for their next WAN refresh....

Well if you want BGP and dynamic then you need community I'm afraid. Everything else is duct tape - GRE tunnels, NAT, route-maps, bgp export maps, combinations of all those fun and ugly tools

One remaining thing springs to mind - ASN. If you  have unique ASN per site, you could write regex to the same effect (i.e. anything from "^xyz_"). this means though you'd have to split all branches into 2 ASN - the one you get from provider A and the one you get from provider B. What a kludge.

EDIT you could talk to WAN bonding vendors. I've deployed peplink balance appliances to great effect to aggregate 2x ADSL circuits for example. Though the problem here would be scale + cutover/integration - you basically need to shove the appliances in front of your routers - and you can't exactly do that on your prod hubs without affecting the prod network. We got away with it as we were a Layer 2 provider so we could just shave off VLANs and break out the bonding VLANs manually without touching other traffic. 

Seriously like I said the solution is to remember that its usually not worth increasing complexity by 80% to get 20% more functionality, if you get my drift

Seittit

Quote from: wintermute000 on January 07, 2015, 05:44:25 AM
Cisco are going to lose big time with the ISR4k throughput licensing. my old company used to happily shove 2921s on 100M links and never had an issue as we knew what features we used and what we didn't.

Seriously like I said the solution is to remember that its usually not worth increasing complexity by 80% to get 20% more functionality, if you get my drift

So much truth in those statements.

Quote from: wintermute000 on January 07, 2015, 05:44:25 AM
One remaining thing springs to mind - ASN. If you  have unique ASN per site, you could write regex to the same effect (i.e. anything from "^xyz_"). this means though you'd have to split all branches into 2 ASN - the one you get from provider A and the one you get from provider B. What a kludge.

That's an outside the box approach, it may not work for the administrators but it should be considered. thanks for the input

wintermute000

#11
On a final note, you could choose to tunnel everything with mGRE i.e. build a DMVPN without the annoying VPN bit. GRE tunnels are pretty reliable without the IPSEC on top, and you only lose 20 bytes of MTU. Given that there are plenty of large DMVPN deployments happy with the Cisco (tm) recommended MTU of 1400....  Even if you don't make it multipoint you would still be able to run whatever you want over it... hey Cisco claims you can run MPLS over mGRE  :-\ 

Though really I would be hesitant IRL, I find that DMVPN works great in pure hub and spoke, problem is that it always gets more complicated esp. if you start dealing with organically developing regional POPs and then the powers that be go 'oh we already have a VPN hub just use that'..... then you suddenly are dealing with multi-tier phase 3 designs that interlock back into your conventional routing at multiple points in the hierarchy....

BTW regex based filtering/route manipulation is a common CCIE lab scenario, and the good old ^$ trick is a fantastic one to have in your locker (for any stub sites esp. EIGRP stub, filter outbound BGP via ^$ i.e. only allow local originated routes, its clean and no maintenance).

Its fun tossing up these ideas if you don't have to worry about living with it LOL



Seittit

Quote from: wintermute000 on January 08, 2015, 02:40:31 AM
BTW regex based filtering/route manipulation is a common CCIE lab scenario, and the good old ^$ trick is a fantastic one to have in your locker (for any stub sites esp. EIGRP stub, filter outbound BGP via ^$ i.e. only allow local originated routes, its clean and no maintenance).

for the sake of redundancy, we don't want to filter out BGP routes learned on our MPLS head end routers. We'd like to prefer one path over another, and in the event of a failure, switch over to the alternate path as quickly as possible.

BGP != quick

Here's a thought:

  • field site sends community ABC for subnets ABC to both MPLS providers
  • field site sends community XYZ for subnets XYZ to both MPLS providers
  • two MPLS routers back at datacenter, R1 peers with Provider 1 and R2 peers with Provider 2
  • router 1 redistributes all Provider 1 BGP routes into EIGRP, but applies a nasty EIGRP metric to anything with community XYZ
  • router 2 redistributes all BGP routes into EIGRP, but applies a nasty EIGRP metric to anything with community ABC

With some EIGRP manipulation at the site, you can ensure that subnets A, B, and C will traverse Provider 1 and all subnets X,Y, and Z will traverse Provider 2 when sending traffic to the datacenter.
Back in the datacenter, an aggregate router between routers 1 and 2 will make the final decision as to which provider to traverse to when sending traffic to the site.

wintermute000

I'm not advocating you filter on ASN regex, I was referring to common practice (at least what I see around the traps!) for stub sites to use regex to filter outbound advertisement (to locally originated) in a manner that does not require manual updating of prefix lists.

What you're advocating would work as long as you remember to do the reverse (i.e. send communities from your DC routers and your branch routers act accordingly). But then again we're back @ square one i.e. needing communities to be passed transparently across both BGP WANs which was the whole question in the first place


Note you can use ASN regex to modify attributes and/or tag communities as well, it isn't just used for filtering, so again that's an option if communities are not possible

Otanx

Not sure how well it would work, but GRE tunnels, BGP peer over the GRE. Can now pass communities as the provider does not see them. However, you don't want the overhead so an inbound route-map that sets next-hop out to the same provider the GRE tunnel rides over. Doing the communities natively across the link would be better, but this may work as long as I don't have to support it.

-Otanx