Leaf & Spine Architectures

Started by routerdork, October 08, 2015, 09:01:13 AM

Previous topic - Next topic

routerdork

I don't work in a DC and my company uses a third-party for ours. So I haven't really had the chance to keep up with current trends, technologies, etc. I've interviewed with a few places that have or plan to have DC's so I've been reading up on anything I can find. I read an article yesterday by Juniper that covered using BGP with BFD things pretty well.

One thing that got me thinking was their subnet choice. In their example they assumed each Leaf was 48 ports for endpoints. Because of this they said a /26 subnet should be assigned to each Leaf so that each endpoint could have an address and it leaves a few left over but was the most efficient use of IP space. So my question is...was this article only talking about physical servers and I should plan for a larger subnet if using VM's? Or is there something else going on with DC endpoints that I'm missing?
"The thing about quotes on the internet is that you cannot confirm their validity." -Abraham Lincoln

dlots

IMO unless your tight on IP space always leave yourself lots of room to grow, cause nothing sucks more than re-doing your DC in a hurry, and as you said eventually your going to want to stick lots of VMs in there.  Anyway I would never setup a system of any kind with no room at all to grow.

Otanx

I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.

-Otanx

NetworkGroover

Engineer by day, DJ by night, family first always

NetworkGroover

Quote from: Otanx on October 08, 2015, 10:36:19 AM
I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.

-Otanx

Agree.  Some very large data center customers who I can't name stick to /24s per rack.
Engineer by day, DJ by night, family first always

icecream-guy

get with the times /64's all around.....  IPv6 :rock:
:professorcat:

My Moral Fibers have been cut.

routerdork

Quote from: AspiringNetworker on October 08, 2015, 11:10:32 AM
If it helps at all, I wrote about this subject based on Petr Lapukhov's work:

http://aspiringnetworker.blogspot.com/2015/08/bgp-in-arista-data-center_90.html
So question on this. Very informative by the way. Using the same AS on each Leaf was for the benefit of the configuration template? Doesn't seem that efficient to me to make it one thing and then prepend it with another. If you are prepending it, you are using it. Or maybe I missed something else in the concept?
"The thing about quotes on the internet is that you cannot confirm their validity." -Abraham Lincoln

routerdork

Quote from: AspiringNetworker on October 08, 2015, 11:14:44 AM
Quote from: Otanx on October 08, 2015, 10:36:19 AM
I am just digging into this like you, but my understanding is that your VMs will be using VXLAN overlays so the address space they are in is not part of the rack address space. I still don't like using /26s for a rack as dlots says, you will end up with expansion problems the second you need a few VIPs for clusers or something. So unless you are running out of address space I stick to /24s. It just makes it simple for the server guys who think /24s are a law or something, and get confused when you start doing /25 or /26 masks. The less confusion the server people have the less I have to prove it isn't the network.

-Otanx

Agree.  Some very large data center customers who I can't name stick to /24s per rack.
I like the /24 idea so that I could coordinate the 2nd/3rd octets with the hostnames/AS's/racks/locations/etc. but I'm just anal like that.
"The thing about quotes on the internet is that you cannot confirm their validity." -Abraham Lincoln

AnthonyC

Speaking of spine/leaf; can someone point out the reason why we don't have/want spine-to-spine connection (as least based on various designs that I have seen)?  Is there some math reason based on CLOS?
"It can also be argued that DNA is nothing more than a program designed to preserve itself. Life has become more complex in the overwhelming sea of information. And life, when organized into species, relies upon genes to be its memory system."

Reggle

You don't need it. Traffic between two systems should transfer a leaf, a spine, and a leaf. Always the same fixed number of hops (layer 2 or layer 3), always the shortest path.

wintermute000

#10
Exactly, and keeps latency/jitter always the same, enables ECMP, etc.
Yeah it does feel intuitively 'wrong' LOL to see core devices not connected to one another! Just got to train oneself out of that mindset when dealing with leaf/spine
I don't think the subnet size matters too much, as everyone has said, whether its NSX or ACI or whatever, all the actual hosts are going to be on VXLAN segments tunnelled from your hosts and/or VTEPs, you only need enough IPs for your hosts/VTEPs (and by host I mean in the Vmware sense)

icecream-guy

I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's,  depending on  hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.
:professorcat:

My Moral Fibers have been cut.

burnyd

Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's,  depending on  hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.

Im not going to say its unlimited scale but its close to it.  After you fill up the ports on the spine you have to do a 3d model like facebook did but thats a ton of servers!

I posted this yesterday ironically.

https://danielhertzberg.wordpress.com/2015/10/11/333/

that1guy15

That1guy15
@that1guy_15
blog.movingonesandzeros.net

NetworkGroover

Quote from: ristau5741 on October 09, 2015, 07:52:12 AM
I think the greatest thing about the leaf/spine architecture is that is can scale well above the 5 9's,  depending on  hardware, network architecture, and cash in pocket, the more spines one has with servers multi-homed to different leafs (leaves?) provides enough redundancy where one can scale to 10 9's, 20 9's or more, to a point where more than half the network can be down without affecting services.

Yep... just imagine a 64-way ECMP design.

"One of our spine switches went down.  Wanna go handle it?"

"Naahhh.. I'll deal with it next week.  We only lost 1/64th of our bandwidth."
Engineer by day, DJ by night, family first always