I don't work in a DC and my company uses a third-party for ours. So I haven't really had the chance to keep up with current trends, technologies, etc. I've interviewed with a few places that have or plan to have DC's so I've been reading up on anything I can find. I read an article yesterday by Juniper that covered using BGP with BFD things pretty well.
One thing that got me thinking was their subnet choice. In their example they assumed each Leaf was 48 ports for endpoints. Because of this they said a /26 subnet should be assigned to each Leaf so that each endpoint could have an address and it leaves a few left over but was the most efficient use of IP space. So my question is...was this article only talking about physical servers and I should plan for a larger subnet if using VM's? Or is there something else going on with DC endpoints that I'm missing?
IMO unless your tight on IP space always leave yourself lots of room to grow, cause nothing sucks more than re-doing your DC in a hurry, and as you said eventually your going to want to stick lots of VMs in there. Anyway I would never setup a system of any kind with no room at all to grow.
I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.
-Otanx
If it helps at all, I wrote about this subject based on Petr Lapukhov's work:
http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
Quote from: Otanx on October 08, 2015, 10:36:19 AM
I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.
-Otanx
Agree. Some very large data center customers who I can't name stick to /24s per rack.
get with the times /64's all around..... IPv6 :rock:
Quote from: AspiringNetworker on October 08, 2015, 11:10:32 AM
If it helps at all, I wrote about this subject based on Petr Lapukhov's work:
http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
So question on this. Very informative by the way. Using the same AS on each Leaf was for the benefit of the configuration template? Doesn't seem that efficient to me to make it one thing and then prepend it with another. If you are prepending it, you are using it. Or maybe I missed something else in the concept?
Quote from: AspiringNetworker on October 08, 2015, 11:14:44 AM
Quote from: Otanx on October 08, 2015, 10:36:19 AM
I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.
-Otanx
Agree. Some very large data center customers who I can't name stick to /24s per rack.
I like the /24 idea so that I could coordinate the 2nd/3rd octets with the hostnames/AS's/racks/locations/etc. but I'm just anal like that.
Speaking of spine/leaf; can someone point out the reason why we don't have/want spine-to-spine connection (as least based on various designs that I have seen)? Is there some math reason based on CLOS?
You don't need it. Traffic between two systems should transfer a leaf, a spine, and a leaf. Always the same fixed number of hops (layer 2 or layer 3), always the shortest path.
Exactly, and keeps latency/jitter always the same, enables ECMP, etc.
Yeah it does feel intuitively 'wrong' LOL to see core devices not connected to one another! Just got to train oneself out of that mindset when dealing with leaf/spine
I don't think the subnet size matters too much, as everyone has said, whether its NSX or ACI or whatever, all the actual hosts are going to be on VXLAN segments tunnelled from your hosts and/or VTEPs, you only need enough IPs for your hosts/VTEPs (and by host I mean in the Vmware sense)
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Im not going to say its unlimited scale but its close to it. After you fill up the ports on the spine you have to do a 3d model like facebook did but thats a ton of servers!
I posted this yesterday ironically.
https://danielhertzberg.wordpress.com/2015/10/11/333/
Quote from: burnyd on October 12, 2015, 11:38:15 AM
I posted this yesterday ironically.
https://danielhertzberg.wordpress.com/2015/10/11/333/
Great post dude!!
Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Yep... just imagine a 64-way ECMP design.
"One of our spine switches went down. Wanna go handle it?"
"Naahhh.. I'll deal with it next week. We only lost 1/64th of our bandwidth."
Quote from: burnyd on October 12, 2015, 11:38:15 AM
Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Im not going to say its unlimited scale but its close to it. After you fill up the ports on the spine you have to do a 3d model like facebook did but thats a ton of servers!
I posted this yesterday ironically.
https://danielhertzberg.wordpress.com/2015/10/11/333/
Cool! ;)
Quote from: AspiringNetworker on October 12, 2015, 02:09:02 PM
Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Yep... just imagine a 64-way ECMP design.
"One of our spine switches went down. Wanna go handle it?"
"Naahhh.. I'll deal with it next week. We only lost 1/64th of our bandwidth."
Haha I couldnt even imagine the amount of shear leaf switches and the amount of ports that would require. I read that in some of the 7500 documentation its like holy crap who in the world would need that.
Quote from: burnyd on October 12, 2015, 03:01:14 PM
Quote from: AspiringNetworker on October 12, 2015, 02:09:02 PM
Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's, depending on hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
Yep... just imagine a 64-way ECMP design.
"One of our spine switches went down. Wanna go handle it?"
"Naahhh.. I'll deal with it next week. We only lost 1/64th of our bandwidth."
Haha I couldnt even imagine the amount of shear leaf switches and the amount of ports that would require. I read that in some of the 7500 documentation its like holy crap who in the world would need that.
Right? Not any of the guys I've ever worked with.. I can assure you of that. The largest I've seen so far is only 4. I don't get to work with the big boys though.
Quote from: that1guy15 on October 12, 2015, 01:01:27 PM
Quote from: burnyd on October 12, 2015, 11:38:15 AM
I posted this yesterday ironically.
https://danielhertzberg.wordpress.com/2015/10/11/333/ (https://danielhertzberg.wordpress.com/2015/10/11/333/)
Great post dude!!
Hey burnyd
Reading your post and this BGP bit
"The peerings in a BGP leaf spine architecture are rather easy. iBGP between each leaf switches and eBGP between each leaf to spine connection. ECMP is rather vital in this topology as BGP by default DOES NOT leverage multiple links in a ECMP fashion. So generally it has to be turned on."
Can you elaborate a little bit?
- each spine has its own ASN (you wrote earlier).so you're not rolling with the IEFT draft proposal idea where the spines share ASN?
- but leaves do not connect to each other, nor do you have an underlying iGP to enable multi-hop iBGP - so what is each leaf switch iBGP peering to?
- presumably the spines do not peer at all?
Aspiringnetworker, in your article, re: the spines in each cluster/pod sharing the same ASN. but they're not connected to each other as its CLOS, so you do not have any iBGP peering?
Quote from: routerdork on October 08, 2015, 02:16:07 PM
Quote from: AspiringNetworker on October 08, 2015, 11:10:32 AM
If it helps at all, I wrote about this subject based on Petr Lapukhov's work:
http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
So question on this. Very informative by the way. Using the same AS on each Leaf was for the benefit of the configuration template? Doesn't seem that efficient to me to make it one thing and then prepend it with another. If you are prepending it, you are using it. Or maybe I missed something else in the concept?
The point of prepending was only to track routes via AS_PATH to a particular leaf switch if you desired to do so, since all leaf switches were in the same AS. Using the same AS obviously means you only need a single line in your configuration for that piece that you can apply to every leaf, and a single bgp listen command at every spine. The source route tracking isn't really required, but can be handy if you desire to leverage it.
Quote from: wintermute000 on October 21, 2015, 12:23:55 AM
Aspiringnetworker, in your article, re: the spines in each cluster/pod sharing the same ASN. but they're not connected to each other as its CLOS, so you do not have any iBGP peering?
No there is never any peering between spine switches in a L3 ECMP design. The reason the spine switches have the same ASN is for built-in route loop prevention of BGP.
Imagine a route is advertised from "Spine1" down to "Leaf1". Leaf1 then in turn advertises it, right? So then the route gets advertised to "Spine2". Spine2, since it's in the same ASN as Spine1, notices in the AS_PATH of the route it's own ASN - so it doesn't add the route. Built-in route loop prevention.
Cheers
Which design have you seen more of in the wild? Ie reuse ASN or not
I've been reading a lot of the Cisco Live presentations on VXLAN/MSDC/Nexus/DC Architecture/etc. Something I've been thinking about with all this new DC stuff...I have yet to see any article, presentation, etc. state anything about QoS. Maybe I haven't come across the right presentations yet but I would think this is still going to be needed?
Quote from: wintermute000 on October 21, 2015, 03:39:42 PM
Cheers
Which design have you seen more of in the wild? Ie reuse ASN or not
Honestly I haven't dealt with any big guys yet personally, and those are usually the ones that deploy this. The ones I do know about from talking to other folks though use a per-rack AS and then re-use those ASNs at each "cluster" or "pod". With 4-byte ASNs though... you could not do that and even in a huge data center not worry about running out of ASNs.. but if you introduce another vendor that doesn't support it you're SOL... which is why the re-use of ASNs at each pod of ToRs was chosen.
Quote from: routerdork on October 28, 2015, 09:36:43 AM
I've been reading a lot of the Cisco Live presentations on VXLAN/MSDC/Nexus/DC Architecture/etc. Something I've been thinking about with all this new DC stuff...I have yet to see any article, presentation, etc. state anything about QoS. Maybe I haven't come across the right presentations yet but I would think this is still going to be needed?
Yeahhh that's a reocurring theme I've noticed moving from working on Campus type stuff to DC. I haven't once in my entire time dealing with folks in my current role configured QoS - not once. Well, that's a lie, I did for certification testing, but that certification testing was built around a Campus environment to support Voice. I dunno if it's just the huge pipes you see in the DC compared to the Campus or what but honestly I'm thankful. I view QoS as a crutch and a PITA. I wouldn't want to introduce that into my DC, personally.
Just Googled and found this - apparently Ivan P. agrees, at least partially:
http://blog.ipspace.net/2015/02/do-we-need-qos-in-data-center.html (http://blog.ipspace.net/2015/02/do-we-need-qos-in-data-center.html)
you need QoS when you run FCoE or iSCSI over the same converged LAN. That's a clear use case. But most people I've seen are still splitting out their storage networks, despite Cisco's massive push for CNAs and all the FC/FCoE stuff they shoehorned into the 7ks. doesn't help that most network jockeys like us don't know much about FC or FCoE - all I remember is the 2 hours of slides I got during my DCUFI training LOL and that's just a bit of a haze.
Quote from: wintermute000 on October 28, 2015, 04:49:22 PM
you need QoS when you run FCoE or iSCSI over the same converged LAN. That's a clear use case. But most people I've seen are still splitting out their storage networks, despite Cisco's massive push for CNAs and all the FC/FCoE stuff they shoehorned into the 7ks. doesn't help that most network jockeys like us don't know much about FC or FCoE - all I remember is the 2 hours of slides I got during my DCUFI training LOL and that's just a bit of a haze.
Yes, true. When you bring IP storage into the mix is when you start looking at things like DCBX, etc. I think while FC still has a presence, as the speeds have increased to 10/40/100G, people are starting to see the value and cost savings of running IP-based storage rather than need specialized adapters and other equipment, and people with that particular skill set.
Quote from: AspiringNetworker on October 28, 2015, 07:25:02 PM
I think while FC still has a presence, as the speeds have increased to 10/40/100G, people are starting to see the value and cost savings of running IP-based storage rather than need specialized adapters and other equipment, and people with that particular skill set.
Strictly speaking, DCBX etc. is for ethernet based storage which is not only iSCSI but FCoE - and FCoE still requires FC knowledge and config, just like IP over Ethernet requires IP routing knowledge and config separate from ethernet. IIRC all the FC stuff in Nexus 7k for example is separate from the IP stuff and has to be configured and maintained separately - it just happens to run over layer 2 ethernet. Anyhow my point was that us IP guys are guilty for letting the issue lie as we're not FC skilled so haven't exactly been pushing hard for convergence.
Good point re: increasing speeds of ethernet, though I don't know how FC has progressed, but as you say esp in the 40 to 100G realm it makes sense to converge since you just have so much bandwidth. Theres still the practical reality of the difference in priorities and mindsets between a traditional data network and a storage network.
Some interesting comments in this discussion I googled up, alludes to several limitations of FCoE that are hazily coming back to me now
http://forums.theregister.co.uk/forum/1/2014/02/11/fcoe_faster_than_fibre_channel_who_knows/ (http://forums.theregister.co.uk/forum/1/2014/02/11/fcoe_faster_than_fibre_channel_who_knows/)
FC needs to die. DCBX is being used in a lot of supercomputer/GPU setups from what I understand due to the way pause frames work and how its sort of lossless to a degree. But you really have to be pushing like 25/40/100gb links super hard to need anything like that.
Quote from: burnyd on October 29, 2015, 10:23:44 AM
FC needs to die. DCBX is being used in a lot of supercomputer/GPU setups from what I understand due to the way pause frames work and how its sort of lossless to a degree. But you really have to be pushing like 25/40/100gb links super hard to need anything like that.
+1 ... which of course doesn't make the FC guys happy since they have such a niche skill set but hey... everyone's got to adapt - not just the storage guys. Leave it to me to get so heavily involved in networking in a time where so much is changing so rapidly... ugh. At least in the DC space anyway... I guess I like making life hard on myself. ;P
Well, to be fair I dont think people who work with big booming expensive SAN's and arrays will be employed much longer unless they figure something out. That field is change dramatically.
not to mention that their secret sauce (FC) is basically a bunch of ethernet type concepts with different names to confuse the uninitiated.
But yeah with VSAN, nutanix etc. there is definitely a massive shift away from the big bad break the bank SAN / shared storage architecture that has dominated the last 20 years. I have a feeling Dell might have paid at the top of the market. At the other end of the scale - I was in a small customer the other day and he was showing me this crazy local storage like solution (looked like SAS cables) but simultaneously cabled - and accessible concurrently - to all 3 of his hosts. He swore it was perfect and acted just like shared storage as far as he was concerned, but it cost the same as an external DAS, not a NAS or a SAN, and he didn't need to buy 10G switches/NIC cards/SFPs. The only thing 'wrong' with it was that he reckoned you have to manually remount it if a host reboots/fails, but in a small environment where you know that fact its not such a big deal.
Quote from: routerdork on October 08, 2015, 02:16:07 PM
Quote from: AspiringNetworker on October 08, 2015, 11:10:32 AM
If it helps at all, I wrote about this subject based on Petr Lapukhov's work:
http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
So question on this. Very informative by the way. Using the same AS on each Leaf was for the benefit of the configuration template? Doesn't seem that efficient to me to make it one thing and then prepend it with another. If you are prepending it, you are using it. Or maybe I missed something else in the concept?
So, necroing here but discovered something. I liked the idea of prepending for route source tracing, like was mentioned by another member, but as you mentioned - "If you are prepending it, you are using it" - and no, you didn't miss something - I did.
I was doing some testing with Ansible and got side-tracked with noticing this:
Here's the bgp table on one of my spines:
Every 2s: sh ip bgp Dec 19, 2015 07:13:47
BGP routing table information for VRF default
Router identifier 192.168.254.1, local AS number 64600
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - EC
S - Stale, c - Contributing to ECMP, b - backup
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Li
Network Next Hop Metric LocPref Weight Path
* > 192.168.254.1/32 - 1 0 - i
* >Ec 192.168.254.3/32 192.168.255.1 0 100 0 65000 i
* ec 192.168.254.3/32 192.168.255.3 0 100 0 65000 i
* >Ec 192.168.254.4/32 192.168.255.3 0 100 0 65000 i
* ec 192.168.254.4/32 192.168.255.1 0 100 0 65000 i
* > 192.168.254.5/32 192.168.255.5 0 100 0 65001 i
This is a VXLAN environment where I'm not advertising the host subnets - only the loopbacks which are the VXLAN tunnel endpoint addresses. Now here's the table after I push route-map config via Ansible to prepend an ASN:
Every 2s: sh ip bgp Dec 19, 2015 07:17:21
BGP routing table information for VRF default
Router identifier 192.168.254.1, local AS number 64600
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - EC
S - Stale, c - Contributing to ECMP, b - backup
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Li
Network Next Hop Metric LocPref Weight Path
* > 192.168.254.1/32 - 1 0 - i
* > 192.168.254.3/32 192.168.255.1 0 100 0 65000 70001
* 192.168.254.3/32 192.168.255.3 0 100 0 65000 70002
* > 192.168.254.4/32 192.168.255.3 0 100 0 65000 70002
* 192.168.254.4/32 192.168.255.1 0 100 0 65000 70001
* > 192.168.254.5/32 192.168.255.5 0 100 0 65001 70003
Now I see what ToRs are responsible for what routes (70001 is LEAF1, 70002 is LEAF2, etc), but ECMP is gone. Now, THAT said, this kinda showed me another interesting tidbit.... if you look at the old bgp table again:
Every 2s: sh ip bgp Dec 19, 2015 07:13:47
BGP routing table information for VRF default
Router identifier 192.168.254.1, local AS number 64600
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - EC
S - Stale, c - Contributing to ECMP, b - backup
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Li
Network Next Hop Metric LocPref Weight Path
* > 192.168.254.1/32 - 1 0 - i
* >Ec 192.168.254.3/32 192.168.255.1 0 100 0 65000 i
* ec 192.168.254.3/32 192.168.255.3 0 100 0 65000 i
* >Ec 192.168.254.4/32 192.168.255.3 0 100 0 65000 i
* ec 192.168.254.4/32 192.168.255.1 0 100 0 65000 i
* > 192.168.254.5/32 192.168.255.5 0 100 0 65001 i
If we think about this in a VXLAN scenario... so traffic hits a leaf switch, gets VXLAN-encapsulated, and is destined for 192.168.254.4 for example. There's two paths to that IP, because I have two leaf switches that are MLAG-peered in the same AS (iBGP peering). So, looking at the route on this spine for that IP:
BGPDC-SPINE1(config)#sh ip route 192.168.254.4
VRF name: default
Codes: C - connected, S - static, K - kernel,
O - OSPF, IA - OSPF inter area, E1 - OSPF external type 1,
E2 - OSPF external type 2, N1 - OSPF NSSA external type 1,
N2 - OSPF NSSA external type2, B I - iBGP, B E - eBGP,
R - RIP, I - ISIS, A B - BGP Aggregate, A O - OSPF Summary,
NG - Nexthop Group Static Route
B E 192.168.254.4/32 [200/0] via 192.168.255.1, Ethernet1
via 192.168.255.3, Ethernet2
So technically.. even though that IP is actually one hop away... because I have both leaf switches in the same AS, some traffic will be hashed to the other leaf switch and will have to cross the peer link in order to hit the VTEP IP... no big deal - iBGP will handle it.. but not the MOST optimal path to take. Prepending the AS for route source tracing actually also has a side effect of addressing this. Now my route to 192.168.254.4 points to the leaf switch that it resides on.
Then I wondered, what's the point? Aren't I effectively turning these leaves into discrete AS's by doing that? Well, sorta, but I do maintain the benefit of making the leaf switches an iBGP peering, which in turn enables me to leverage dynamic BGP peering at the spine without it getting too messy:
BGPDC-SPINE1(config)#sh run sec router bgp
router bgp 64600
router-id 192.168.254.1
maximum-paths 32 ecmp 32
**bgp listen range 192.168.255.0/30 peer-group ARISTA remote-as 65000
bgp listen range 192.168.255.4/31 peer-group ARISTA remote-as 65001**
neighbor ARISTA peer-group
neighbor ARISTA maximum-routes 12000
network 192.168.254.1/32
Using eBGP between peered leaf switches, this would be messy because I'd need a listen range statement for every single leaf switch - at least until they add the ability to specify an AS range in the command.
"[size=0px] because I have both leaf switches in the same AS, some traffic will be hashed to the other leaf switch and will have to cross the peer link in order to hit the VTEP IP"[/size]
[/size]
[/size]Sorry confused. Do you mean the spine switch? i.e. ECMP so going via a different spine?
[/size]Not sure what you mean by hashed to the other leaf switch?
[/size]
[/size]Are the 192.168.1.3, 5 spines or leaves?
We're looking at traffic hitting a spine switch, then being ECMP-routed to two different leaf switches. Just pretend it's a two-tier leaf/spine with 2 spine switches, and 3 leaf switches. Leaf1 and Leaf2 or MLAG and iBGP peers. Leaf3 is standalone.
.3, .4, .5 are leaf switch loopback0 - Leaf1, Leaf2, and Leaf3, respectively. These are used as the VTEP addresses. In my hypothetical scenario, pretend VXLAN traffic has left LEAF3 and is being sent to LEAF2 - it has to hit the Spine first and then the spine has to make a routing decision.
.1 is Spine1
Though I will say, this is probably where virtual VTEP comes in... will need to add that and look at it....
Yep... with virtual VTEP it looks better.
So instead of using Loopback0 on Leaf1 and 2 (Within the same "rack") I created Loopback1 on both leaves and gave them the same address (2.2.2.1) and advertised that instead. Now the spine looks cleaner:
BGPDC-SPINE1(config)#sh ip bgp
BGP routing table information for VRF default
Router identifier 192.168.254.1, local AS number 64600
Route status codes: s - suppressed, * - valid, > - active, E - ECMP head, e - ECMP
S - Stale, c - Contributing to ECMP, b - backup
Origin codes: i - IGP, e - EGP, ? - incomplete
AS Path Attributes: Or-ID - Originator ID, C-LST - Cluster List, LL Nexthop - Link Local Nexthop
Network Next Hop Metric LocPref Weight Path
* >Ec 2.2.2.1/32 192.168.255.1 0 100 0 65000 70001 i
* ec 2.2.2.1/32 192.168.255.3 0 100 0 65000 70002 i
* > 2.2.2.2/32 192.168.255.5 0 100 0 65001 70003 i
* > 192.168.254.1/32 - 1 0 - i
* > 192.168.254.3/32 192.168.255.1 0 100 0 65000 70001 i
* 192.168.254.3/32 192.168.255.3 0 100 0 65000 70002 70001 i
* > 192.168.254.4/32 192.168.255.3 0 100 0 65000 70002 i
* 192.168.254.4/32 192.168.255.1 0 100 0 65000 70001 70002 i
* > 192.168.254.5/32 192.168.255.5 0 100 0 65001 70003 i
Now you have ECMP, source route tracing, and the suboptimal path is eliminated. It won't matter which leaf the spine sends it to (which leaf of the pair that is, of course)
I dont get it why are you prepending with a leaf/spine network?
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
Quote from: AspiringNetworker on December 21, 2015, 05:53:20 PM
We're looking at traffic hitting a spine switch, then being ECMP-routed to two different leaf switches. Just pretend it's a two-tier leaf/spine with 2 spine switches, and 3 leaf switches. Leaf1 and Leaf2 or MLAG and iBGP peers. Leaf3 is standalone.
.3, .4, .5 are leaf switch loopback0 - Leaf1, Leaf2, and Leaf3, respectively. These are used as the VTEP addresses. In my hypothetical scenario, pretend VXLAN traffic has left LEAF3 and is being sent to LEAF2 - it has to hit the Spine first and then the spine has to make a routing decision.
.1 is Spine1
Though I will say, this is probably where virtual VTEP comes in... will need to add that and look at it....
So first of all I have never fooled around with spine/leaf setup, so its very possible I'm missing something obvious here.
With that said I have a problem with your scenario.
Why would the ip of a VTEP exist in 2 different leafs?
Also ECMP in a spine/leaf setup is as far as I know supposed to happen in the leaf.
Ie LEAF3 has 2 paths to LEAF2 via either SPINE1 or SPINE2.
Quote from: matgar
Why would the ip of a VTEP exist in 2 different leafs?
Both leafs are effectively acting as a single logical unit - think of them as two ToRs in the same rack. To get to the same resources (servers, etc.), you could go to either leaf - why treat them as two discrete entities when they both do the same job?
Quote from: matgar
Also ECMP in a spine/leaf setup is as far as I know supposed to happen in the leaf.
Ie LEAF3 has 2 paths to LEAF2 via either SPINE1 or SPINE2.
Doesn't ECMP happen anywhere there is more than one route with equal costs to the same destination? So if the spine needed to reach a host, and there were two equal-cost paths to reach it via LEAF1 or LEAF2, is that not ECMP? I'm not being snarky here I'm seriously asking - I've been known for being dumb before and if I'm misunderstanding the definition of ECMP I'd like to alleviate that.
EDIT - Oh, and welcome to the forum.
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
Quote from: burnyd on December 23, 2015, 10:05:13 AM
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
It's not really to address any issue - just something you could do if you wanted to. I personally don't see a huge benefit, and in this case with VXLAN it's almost completely useless since the only thing I'm advertising is the loopback addresses which already identify each ToR. If it wasn't a 100% VXLAN environment and I was advertising host subnets I could somewhat see the benefit, but still it's optional.
Quote from: burnyd on December 23, 2015, 10:05:13 AM
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
Cases like this are the reason IPv6 link-local is the way to go. Just need more adoption of IPv6... or more flexibility with MPBGP using IPv6 AFI to carry IPv4 prefixes.
Quote from: that1guy15 on December 23, 2015, 10:43:58 AM
Quote from: burnyd on December 23, 2015, 10:05:13 AM
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
Cases like this are the reason IPv6 link-local is the way to go. Just need more adoption of IPv6... or more flexibility with MPBGP using IPv6 AFI to carry IPv4 prefixes.
Not even following what you guys are talking about.... lol. What's the goal? Why so complex? What's the driver to do whatever it is you're describing that current methods can't address? Just curious at this point.
Quote from: AspiringNetworker on December 23, 2015, 12:23:08 PM
Quote from: that1guy15 on December 23, 2015, 10:43:58 AM
Quote from: burnyd on December 23, 2015, 10:05:13 AM
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
Cases like this are the reason IPv6 link-local is the way to go. Just need more adoption of IPv6... or more flexibility with MPBGP using IPv6 AFI to carry IPv4 prefixes.
Not even following what you guys are talking about.... lol. What's the goal? Why so complex? What's the driver to do whatever it is you're describing that current methods can't address? Just curious at this point.
qft!!!!!
Quote from: AspiringNetworker on December 23, 2015, 12:23:08 PM
Quote from: that1guy15 on December 23, 2015, 10:43:58 AM
Quote from: burnyd on December 23, 2015, 10:05:13 AM
Quote from: AspiringNetworker on December 22, 2015, 08:29:08 AM
The idea was route source tracing just by looking at the AS_PATH you'll know what ToR it belongs to if you have pairs of them in the same AS.... or if you're using the same AS for all of them.
I see. Generally you have enough bandwidth if traffic lands on ToR 1 for Tor 2 loopback / vtep that its not an issue. Haha if its an issue go ahead and add another spine switch. Also, go to your internal thing and look at the bgp unnumbered request I put in. The one that allows bgp over ipv6 link local ips and overlays ipv4. Right now today I want to use 169.254.x.x ips and keep reusing the ip's everywhere but its more of a political thing internally.
Cases like this are the reason IPv6 link-local is the way to go. Just need more adoption of IPv6... or more flexibility with MPBGP using IPv6 AFI to carry IPv4 prefixes.
Not even following what you guys are talking about.... lol. What's the goal? Why so complex? What's the driver to do whatever it is you're describing that current methods can't address? Just curious at this point.
https://docs.cumulusnetworks.com/display/DOCS/Configuring+Border+Gateway+Protocol+-+BGP
Check out the portion on BGP unnumbered.
My biggest driver is not needing to allocate and maintain address space for PTP links ( /30 or /31s) when they are nothing but transit. With Ipv6 link-local establishes and you move on, no real need to provision anything else on the link.
Of course with this use-case and others there are factors you have to take into account. But from my standpoint it can simplify provisioning.
Quote from: that1guy15 on December 24, 2015, 09:39:45 AM
My biggest driver is not needing to allocate and maintain address space for PTP links ( /30 or /31s) when they are nothing but transit. With Ipv6 link-local establishes and you move on, no real need to provision anything else on the link.
Of course with this use-case and others there are factors you have to take into account. But from my standpoint it can simplify provisioning.
Elaborate? I'll have to try to read through this more thoroughly later, but I don't see how this really saves you any work. At first glance of the doc Dan linked it seems like it saves you some IPv4 space, but that's just because you're leveraging IPv6 underneath it? Now you have to configure dual stack? How does this work for vendor interop? How do you traceroute if you need to? How does this simplify automation/provisioning? This isn't me challenging you - this is me asking to be educated.
Well at first glance, what vendor interop? Unless you're mixing vendors in your leaf/spine fabric - but then again, ipv6 link local routing is standard for say OSPFv3.
What always blows my mind is ip unnumbered. Even if I know how to get it working, I just don't 'get' it, esp. when in context of routing protocols, like the cumulus proposal for ip unnumbered OSPF. WTF does the database look like? What are the transit link entries under the type 1? what about the type 2s?
Anyone know whether that works in Cisco land (IOU preferably lol), worth half an hour of my time to lab and see? (or excuse to finally lab up some cumulus? LOL I'll throw it on the ever growing pile containing vMX, vEOS, the new version 15 vSRX, yada yada)
Also re-reading some stuff above and aspiring networker, can you clarify - what peer link are you talking about in this quote?? I see your BGP / route table output but I am still confused at your setup. Do you have a diag? I mean, in a classic CLOS fabric, each spine is SUPPOSED to have only one link to each leaf, correct? The ECMP is from leaf to another leaf (i.e. via multiple spines).
"So technically.. even though that IP is actually one hop away... because I have both leaf switches in the same AS, some traffic will be hashed to the other leaf switch and will have to cross the peer link in order to hit the VTEP IP... no big deal - iBGP will handle it.. but not the MOST optimal path to take. Prepending the AS for route source tracing actually also has a side effect of addressing this. Now my route to 192.168.254.4 points to the leaf switch that it resides on."
As an aside, can you lab a VXLAN VTEP via vEOS? Multicast included I presume?
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
Just found out before we went on Xmas break that we lost that massive Arista opportunity, the customer went with Cisco. We're still not sure what went down, aside from the obvious pants dropping from the teal team. I'm curious to know whether there was any stick that accompanied the offer to lube up
re: That1guy, have you labbed this in any detail? the NH implications are interesting and painful.
http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp (http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp)
https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers (https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers)
Quote from: AspiringNetworker on December 24, 2015, 12:40:12 PM
Quote from: that1guy15 on December 24, 2015, 09:39:45 AM
My biggest driver is not needing to allocate and maintain address space for PTP links ( /30 or /31s) when they are nothing but transit. With Ipv6 link-local establishes and you move on, no real need to provision anything else on the link.
Of course with this use-case and others there are factors you have to take into account. But from my standpoint it can simplify provisioning.
Elaborate? I'll have to try to read through this more thoroughly later, but I don't see how this really saves you any work. At first glance of the doc Dan linked it seems like it saves you some IPv4 space, but that's just because you're leveraging IPv6 underneath it? Now you have to configure dual stack? How does this work for vendor interop? How do you traceroute if you need to? How does this simplify automation/provisioning? This isn't me challenging you - this is me asking to be educated.
Yes combine that with ipv6 autoconfig and bgp dynamic listener and you have no need to statically configure peers or ever worry about p2p links. You pretty much plug shit in wherever with the same AS on all the leaf switches spines remove private AS then done...... just sayin.
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
Well at first glance, what vendor interop? Unless you're mixing vendors in your leaf/spine fabric - but then again, ipv6 link local routing is standard for say OSPFv3.
What always blows my mind is ip unnumbered. Even if I know how to get it working, I just don't 'get' it, esp. when in context of routing protocols, like the cumulus proposal for ip unnumbered OSPF. WTF does the database look like? What are the transit link entries under the type 1? what about the type 2s?
Anyone know whether that works in Cisco land (IOU preferably lol), worth half an hour of my time to lab and see? (or excuse to finally lab up some cumulus? LOL I'll throw it on the ever growing pile containing vMX, vEOS, the new version 15 vSRX, yada yada)
Also re-reading some stuff above and aspiring networker, can you clarify - what peer link are you talking about in this quote?? I see your BGP / route table output but I am still confused at your setup. Do you have a diag? I mean, in a classic CLOS fabric, each spine is SUPPOSED to have only one link to each leaf, correct? The ECMP is from leaf to another leaf (i.e. via multiple spines).
"So technically.. even though that IP is actually one hop away... because I have both leaf switches in the same AS, some traffic will be hashed to the other leaf switch and will have to cross the peer link in order to hit the VTEP IP... no big deal - iBGP will handle it.. but not the MOST optimal path to take. Prepending the AS for route source tracing actually also has a side effect of addressing this. Now my route to 192.168.254.4 points to the leaf switch that it resides on."
As an aside, can you lab a VXLAN VTEP via vEOS? Multicast included I presume?
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
Just found out before we went on Xmas break that we lost that massive Arista opportunity, the customer went with Cisco. We're still not sure what went down, aside from the obvious pants dropping from the teal team. I'm curious to know whether there was any stick that accompanied the offer to lube up
re: That1guy, have you labbed this in any detail? the NH implications are interesting and painful.
http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp (http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp)
https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers (https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers)
Thats the only thing traceroute. If you have a operations staff who is easily confused they will be like wtf? But if someone wanted to they could source the ICMP redirects from the loopback which could be routeable its up to the organization / owner.
If you set this up to leverage ipv6 auto configuration which is similar to dhcp and they started talking to each other and dynamically specified peers in the right AS on a ztp thats like plug and play bgp right there. The only thing you would even care to advertise in my case or most cases might be a few server subnets. In my case I would have like a few SVI's only because I am using nsx.
Quote from: burnyd on December 25, 2015, 08:40:33 AM
Quote from: AspiringNetworker on December 24, 2015, 12:40:12 PM
Quote from: that1guy15 on December 24, 2015, 09:39:45 AM
My biggest driver is not needing to allocate and maintain address space for PTP links ( /30 or /31s) when they are nothing but transit. With Ipv6 link-local establishes and you move on, no real need to provision anything else on the link.
Of course with this use-case and others there are factors you have to take into account. But from my standpoint it can simplify provisioning.
Elaborate? I'll have to try to read through this more thoroughly later, but I don't see how this really saves you any work. At first glance of the doc Dan linked it seems like it saves you some IPv4 space, but that's just because you're leveraging IPv6 underneath it? Now you have to configure dual stack? How does this work for vendor interop? How do you traceroute if you need to? How does this simplify automation/provisioning? This isn't me challenging you - this is me asking to be educated.
Yes combine that with ipv6 autoconfig and bgp dynamic listener and you have no need to statically configure peers or ever worry about p2p links. You pretty much plug shit in wherever with the same AS on all the leaf switches spines remove private AS then done...... just sayin.
Im not sure how they unnumbered ospf stuff works but the bgp stuff is really cool.
As far as the mlag vtep stuff works what steve is trying to say is that you have 1 IP for vtep's and not two. You anycast the vtep address and you only need 1 vtep for two top of rack switches. On the backend these guys are actually mapping their arp to mac tables within linux to each other over their peer link so they keep in sync.
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
Well at first glance, what vendor interop? Unless you're mixing vendors in your leaf/spine fabric - but then again, ipv6 link local routing is standard for say OSPFv3.
What always blows my mind is ip unnumbered. Even if I know how to get it working, I just don't 'get' it, esp. when in context of routing protocols, like the cumulus proposal for ip unnumbered OSPF. WTF does the database look like? What are the transit link entries under the type 1? what about the type 2s?
Anyone know whether that works in Cisco land (IOU preferably lol), worth half an hour of my time to lab and see? (or excuse to finally lab up some cumulus? LOL I'll throw it on the ever growing pile containing vMX, vEOS, the new version 15 vSRX, yada yada)
Also re-reading some stuff above and aspiring networker, can you clarify - what peer link are you talking about in this quote?? I see your BGP / route table output but I am still confused at your setup. Do you have a diag? I mean, in a classic CLOS fabric, each spine is SUPPOSED to have only one link to each leaf, correct? The ECMP is from leaf to another leaf (i.e. via multiple spines).
"So technically.. even though that IP is actually one hop away... because I have both leaf switches in the same AS, some traffic will be hashed to the other leaf switch and will have to cross the peer link in order to hit the VTEP IP... no big deal - iBGP will handle it.. but not the MOST optimal path to take. Prepending the AS for route source tracing actually also has a side effect of addressing this. Now my route to 192.168.254.4 points to the leaf switch that it resides on."
As an aside, can you lab a VXLAN VTEP via vEOS? Multicast included I presume?
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
Just found out before we went on Xmas break that we lost that massive Arista opportunity, the customer went with Cisco. We're still not sure what went down, aside from the obvious pants dropping from the teal team. I'm curious to know whether there was any stick that accompanied the offer to lube up
re: That1guy, have you labbed this in any detail? the NH implications are interesting and painful.
http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp (http://www.noction.com/blog/ipv4_bgp_vs_ipv6_bgp)
https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers (https://supportforums.cisco.com/document/84261/advertising-ipv6-prefixesroutes-over-ipv4-ebgp-peers)
I can show you my GNS3 setup when I'm more motivated :P But it's really easy - two spines in AS 64600, and then three leaves split up in two AS's. Leaf 1 and 2 are physically connected to each other in AS 65000 in "rack1" and leaf3 is standalone in AS 65001 in "rack2". There is 1 connection from each leaf to each spine.
Oh, and far as vEOS VXLAN support - yes, and yes (I think). Whenever I configure VXLAN I use HER instead of multicast - and frankly I have little to no experience working with multicast. vEOS supports pretty much anything that isn't a hardware specific feature.
And as far as "standard" protocols.. as anyone who's done a good amount of vendor interop testing knows... not everyone implements everything in every standard.... for numerous reasons. Testing should always be done to confirm and nothing should ever be assumed.
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
"The networking team has to build the network infrastructure before having all the relevant input data"
Pffft - when does THAT ever happen...
Sorry for the spam...
Winter - if you do find out if there was anything in particular that sealed the deal for Cisco, I'd love to know. I probably suspect it was the typical pants dropping on price though (I wonder how long they can keep doing that, or how many times they can pull it off before customers realize it's only a one-time price, and when they go to renew, prepare to get YOUR pants dropped).
I know it may happen once in a while due to some feature, but I don't remember ever losing for technical reasons.
Quote from: AspiringNetworker on December 25, 2015, 06:08:02 PM
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
"The networking team has to build the network infrastructure before having all the relevant input data"
Pffft - when does THAT ever happen...
HER is the way to go unless you guys would just let everyone use cvx inside the switch as a kvm vm.....Im not bitter about that at all lol. But HER just works and it works well.
Quote from: burnyd on December 26, 2015, 03:47:07 PM
Quote from: AspiringNetworker on December 25, 2015, 06:08:02 PM
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
"The networking team has to build the network infrastructure before having all the relevant input data"
Pffft - when does THAT ever happen...
HER is the way to go unless you guys would just let everyone use cvx inside the switch as a kvm vm.....Im not bitter about that at all lol. But HER just works and it works well.
Yah that stuff will get old real fast. Its not just for arista its also with other network vendors as well. As more people get involved and network switches converged into different products ie evo sddc,hyper converged etc I dont see it playing out very well for cisco. You will always have those guys who 100% rely on the vendor to do all the work and they will stay with a company like cisco but I see that fading away.
Quote from: burnyd on December 26, 2015, 03:49:20 PM
Quote from: burnyd on December 26, 2015, 03:47:07 PM
Quote from: AspiringNetworker on December 25, 2015, 06:08:02 PM
Quote from: wintermute000 on December 25, 2015, 04:50:04 AM
As for your MLAG / virtual VTEP... oh boy, more Ivan flashbacks
https://www.ipspace.net/Redundant_Server-to-Network_Connectivity (https://www.ipspace.net/Redundant_Server-to-Network_Connectivity)
"The networking team has to build the network infrastructure before having all the relevant input data"
Pffft - when does THAT ever happen...
HER is the way to go unless you guys would just let everyone use cvx inside the switch as a kvm vm.....Im not bitter about that at all lol. But HER just works and it works well.
Yah that stuff will get old real fast. Its not just for arista its also with other network vendors as well. As more people get involved and network switches converged into different products ie evo sddc,hyper converged etc I dont see it playing out very well for cisco. You will always have those guys who 100% rely on the vendor to do all the work and they will stay with a company like cisco but I see that fading away.
Yeah... unfortunately having "one throat to choke" comes with it's own cons (and price tag).
Regarding CVX, I believe that takes a little more resources than what you'll get out of a VM on a switch - at least to any level of real scale.
OK OK I get it. When you are talking MLAG, the arista switches are NOT in a traditional stack, its some kind of Nexus vPC type feature correct? (i.e. the switches are still separate entities and separate control planes but can present a shared etherchannel to a downstream host)
Also, what is HER?
Quote from: AspiringNetworker on December 23, 2015, 08:50:54 AM
Quote from: matgar
Why would the ip of a VTEP exist in 2 different leafs?
Both leafs are effectively acting as a single logical unit - think of them as two ToRs in the same rack. To get to the same resources (servers, etc.), you could go to either leaf - why treat them as two discrete entities when they both do the same job?
My understanding/way of thinking is that the VTEP is part of the "invisible" network infrastructure where simplicity is wanted and that its within the VXLANS that you might want to do configure anycast or server load-balancing etc.
Quote from: AspiringNetworker on December 23, 2015, 08:50:54 AM
Quote from: matgar
Also ECMP in a spine/leaf setup is as far as I know supposed to happen in the leaf.
Ie LEAF3 has 2 paths to LEAF2 via either SPINE1 or SPINE2.
Doesn't ECMP happen anywhere there is more than one route with equal costs to the same destination? So if the spine needed to reach a host, and there were two equal-cost paths to reach it via LEAF1 or LEAF2, is that not ECMP? I'm not being snarky here I'm seriously asking - I've been known for being dumb before and if I'm misunderstanding the definition of ECMP I'd like to alleviate that.
EDIT - Oh, and welcome to the forum.
True ECMP happens wherever there is more than one equal cost route. It was less a comment of what ECMP is, but rather where its expected to happen in your design.
My meaning was that for predictability (troubleshooting/understanding) purposes if nothing else, it would be better if each spine only had one route to the destination.
All I've read of spine/leaf architecture has been with the design that the multiple routes are in the leafs and not in the spines. (well with exception to your post that is.)
New ways to do things isn't necessarily bad but in this case you seem to have ended up with unexpected/unwanted behavior.
Edit: And thanks for the welcome.
Quote from: AspiringNetworker on December 25, 2015, 06:14:34 PM
Sorry for the spam...
Winter - if you do find out if there was anything in particular that sealed the deal for Cisco, I'd love to know. I probably suspect it was the typical pants dropping on price though (I wonder how long they can keep doing that, or how many times they can pull it off before customers realize it's only a one-time price, and when they go to renew, prepare to get YOUR pants dropped).
I know it may happen once in a while due to some feature, but I don't remember ever losing for technical reasons.
Related, found this hilarious comments thread, some serious insider (allegedly) b1tching. If TLDR go near the bottom when they start talking about CAP - Cisco's Customer Assurance Program - and the alleged massive boondoggle of the symantec ACI deployment, leading from/into general slagging off of the Insieme BU and the lack of market readiness of ACI.
http://www.bradreese.com/blog/10-4-2015.htm#COMMENT (http://www.bradreese.com/blog/10-4-2015.htm#COMMENT)
Having seen two live ACI deployments (small ones by US standards too) and having heard their horror stories, I'm not really that surprised and do lean towards the b1tching being towards the 'truth' side of the equation
Quote from: wintermute000 on December 27, 2015, 10:37:33 PM
OK OK I get it. When you are talking MLAG, the arista switches are NOT in a traditional stack, its some kind of Nexus vPC type feature correct? (i.e. the switches are still separate entities and separate control planes but can present a shared etherchannel to a downstream host)
Also, what is HER?
Head end replication.
MLAG is like vpc but it actually works.
out of curiosity, what about vPC do you find not working? (aside from bugs... QA is down the toilet these days at the big C.... as well as the idiosyncracies of routing over a vPC or not)
I know Juniper has a similar feature as well, can't recall what its called.
In my mind MLAG = multi chassis etherchannel = a stack or VSS = single control plane but I guess its just terminology
Quote from: wintermute000 on December 28, 2015, 03:36:01 PM
out of curiosity, what about vPC do you find not working? (aside from bugs... QA is down the toilet these days at the big C.... as well as the idiosyncracies of routing over a vPC or not)
How much time do you have haha.
Bugs are really where the problem lies.
I think simplicity is key there.. though it would take time to really dig into the details to adequately answer the question.
From a 10,000 foot view, the simple fact you have to have a second dedicated peer-link (is this still the case?) kinda highlights a complexity problem... and some of the other limitations that I have yet to run into with MLAG - it just works. I'm sure it sounds bias... but don't take it from me.. ask other folks who've worked with it - I have yet to hear complaints.
Oh man did I miss out on this thread while away for Christmas!