Arista VEOS: Caveat

Started by wintermute000, May 12, 2016, 12:43:45 AM

Previous topic - Next topic

wintermute000

Hi all


Just some knowledge I stumbled into, with a lot of help from my friendly neighbourhood SE :)


When using Arista vEOS (the virtual image, not real hardware) for labbing, the following features work fine
- MLAG
- VXLAN bridging and VXLAN routing and anycast GW

However, you cannot combine those features, MAC flapping galore. Its not stated on any public documentation, so if you go mad trying to replicate designs in vEOS that perfectly align with vendor guides, now you know why - if you want to lab VXLAN bridging/routing, don't lab MLAG as well, and vice versa.


Woohoo

NetworkGroover

Hmmmm.  I have MLAG + VXLAN bridging set up and working on my laptop, and I wasn't aware you could do VXLAN routing in vEOS-lab.  I'll have to double check that.

Can you elaborate on the MAC flapping issue?
Engineer by day, DJ by night, family first always

wintermute000

#2
1.) VLXAN bridging DID work together with MLAG, but it was wildly inconsistent. Some flows would work, some not, then mysteriously it would reverse. Bounce the MLAG, interfaces etc., things change around.  Our local SE can corroborate, he was nice enough to jump into the CLI with me and he confirmed nothing wrong with the syntax. The MAC flapping in particular was how he found out in the internal engineering DB - it was complaining that a host MAC was flapping between the MLAG port and the VXLAN tunnel - obviously that MAC was physically not anywhere else in the overlay.


May 11 12:42:00 leaf1 PortSec: %ETH-4-HOST_FLAPPING: Host 00:50:56:02:9f:02 in VLAN 20 is flapping between interface Port-Channel7 and interface Vxlan1 (message repeated 1 times in 32.8424 secs)

YMMV, I actually built the lab twice - first in unetlab (QEMU) second in ESXI and I had DIFFERENT faults using the same configs (even same 'patching' and IP addresses etc.) i.e. different sets of hosts able to ping or not ping. This was when I brought in the big guns to sanity check

2.) VXLAN routing does work, in fact it kind of works with MLAG but again wildly inconsistent. I just labbed up a standard underlay routing topology (i.e. south-north via default route from local VARP GW, north-south directly into the edge VTEP) and seemed to work fine - though one of my end hosts refused to ping into the L3 domain (3 others did, including 1 in the same VXLAN segment/VLAN), again I put it down to virtual funny buggers.

NetworkGroover

#3
Quote from: wintermute000 on May 13, 2016, 01:05:38 AM
1.) VLXAN bridging DID work together with MLAG, but it was wildly inconsistent. Some flows would work, some not, then mysteriously it would reverse. Bounce the MLAG, interfaces etc., things change around.  Our local SE can corroborate, he was nice enough to jump into the CLI with me and he confirmed nothing wrong with the syntax. The MAC flapping in particular was how he found out in the internal engineering DB - it was complaining that a host MAC was flapping between the MLAG port and the VXLAN tunnel - obviously that MAC was physically not anywhere else in the overlay.


May 11 12:42:00 leaf1 PortSec: %ETH-4-HOST_FLAPPING: Host 00:50:56:02:9f:02 in VLAN 20 is flapping between interface Port-Channel7 and interface Vxlan1 (message repeated 1 times in 32.8424 secs)

YMMV, I actually built the lab twice - first in unetlab (QEMU) second in ESXI and I had DIFFERENT faults using the same configs (even same 'patching' and IP addresses etc.) i.e. different sets of hosts able to ping or not ping. This was when I brought in the big guns to sanity check

2.) VXLAN routing does work, in fact it kind of works with MLAG but again wildly inconsistent. I just labbed up a standard underlay routing topology (i.e. south-north via default route from local VARP GW, north-south directly into the edge VTEP) and seemed to work fine - though one of my end hosts refused to ping into the L3 domain (3 others did, including 1 in the same VXLAN segment/VLAN), again I put it down to virtual funny buggers.

Mmmmm yeah - something's fishy here.  I think you and I should get together at some point in time (Though lately I have no time) on a GoToMeeting to take a look at it.  I've seen the behavior you're mentioning regarding a host MAC moving between MLAG and VTI, or something similar in my laptop environment, but 100% of the time it was a configuration change that fixed it - though I can't remember what it was because I didn't write it down and it's happened maybe twice the entire time I've been playing with it.

Can you diagram up or screenshot your environment so I can replicate it - if it's truly a caveat I should see the same results.

Regarding the VXLAN routing... I'm not understanding from your description, which is another reason I'd like to replicate your environment to see.
Engineer by day, DJ by night, family first always

wintermute000

I typed a huge reply but the stupid forum software rejected my visio attachment and I lost everything.


Suffice to say: by VXLAN bridging, I was trying to ping within the same VLAN from host1 to host2 on VLAN/VRF10 or VLAN/VRF20.
by VLXAN routing, I was trying to ping across VLANs/VRFs via configuring anycast GW and logical VTEP with HER on the secondary loopbacks. e.g. host1 VLAN/VRF10 to host2 VLAN/VRF20/.


Ignore the L3 core, I experienced all the errors just playing around in the overlay.


As soon as I removed the MLAGs (i.e. only kept leaf1, leaf3, edge1) everything seems fine. Before that random stuff would not work, and it was inconsistent (i.e. bounce some stuff, stuff starts pinging again, but maybe something else stops). Like I said your counterpart in my neck of the woods had an hour on the CLI with me going through all the configs and show commands and he confirmed that the lab was setup correctly from a configuration POV.

NetworkGroover

Where did you build this?  I don't think I have enough RAM (16G) to run this many instances on my laptop - it tends to lock up around 7 instances.

I'll try to rebuild this if I can find some time. A couple things:

- We support /31 routed links if you want to save IP space - particularly your routed links in an ECMP environment
- What was the purpose of replicating the VARP config at the Edge?
- What are your Hosts?  How are you leveraging VRFs?

One thing you'll learn about me over time is I trust no one - I don't believe it till I see it myself.  That's not only because folks have been wrong in the past (including myself) - either way if they're right or wrong I benefit from digging into it from a learning perspective.
Engineer by day, DJ by night, family first always

wintermute000

#6
1.) thanks, noted
2.) this is how you guys say to configure VXLAN routing in an underlay routing topology - SE confirmed this is how it looks like with that option. Assuming the edge leaf(s) talk L3 to an external router(s), north south traffic enters the correct VLXAN segment/VLAN here. The SVIs are required so the edge GW knows what VLAN / VXLAN segment to route the north south traffic into. Strictly speaking I don't think VARP is required but for consistency's sake I just did it the same as everything else. I also wanted to test VARP compatibility with traditional SVI routing.
If I were doing direct or indirect or central routing, then no, I would not configure VARP there.
3.) Hosts are Aristas. Each Arista has two VLANs and SVIs to replicate two hosts - VLAN10 and VLAN20. In order to separate them, they have to live in different VRFs, otherwise they would just ping each other directly on the same box!

wintermute000

Did you get anywhere validating this?