Understanding ESXi host connections...

Started by NetworkGroover, November 12, 2015, 06:48:28 PM

Previous topic - Next topic

NetworkGroover

Hey guys.  So my knowledge on the switch side is pretty good, but on the VMware side I'm definitely lacking.  I'm usually working with folks that are pretty knowledgeable on the VMware side but this time I'm not, and they're seeing a push of ESX from a kickstart VM to bare metal taking 45 minutes, but if they shut down one of the switches so only a single link is available, it takes about 3 minutes.  Doesn't make a whole lot of sense to me.  What compounds the problem is the folks I'm working with, I can't be onsite, can't see their screen, and seem to be lacking a bit themselves as they said they "did test DVS and LACP (mode on)" - obviously that's a problem.

How do you generally configure your switches, and the VMware side of things, depending on if you're using standard vSwitches or DVS?
Engineer by day, DJ by night, family first always

wintermute000

#1
LACP all day every day (why waste NICs on standby or awkwardly load balancing/pinning when LACP auto balances everything and is redundant too).
Theres also the old chestnut where the default vmware load balancing policy is not same as the standard Cisco switch.
Make them go from "on" to LACP, this will rule out a lot of crap because if anything's not matching the LACP won't form as it will fail the auto negotiate.

Start here. Get them to draw out for your their vnic/port group to physical nic mappings. Be aware that vmware in its wisdom calls physical NICs.... vmnics. I'm not making it up.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088

Finally, an obscure one, but make sure that this hilarious semi-bug isn't the issue (you'll know it if linux is ok but windows is borked). http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html


As a networker, Its worth going through a VCP textbook and just reading the networking sections, just for situations like now. I've saved myself a lot of grief by being able to talk to vmware guys in their 'language' (as well as being able to quickly spot the incompetent ones)

burnyd

meh, lacp to hosts is worthless.  If you are not running mlag.  VMware does not allow active active or active passive between two different lag's its really dumb.  I am not sure if that has changed in 6.0 or not.

You can also send different port groups to different links.  If you are running NSX you can ping different vtep interfaces to different uplinks there is a lot you can do without running lacp.  I dont mind lacp from switch to switch but switch to VMware has some wacky issues in my experience. 

Getting back to the point.  I am not understanding what you mean from kick start vm to bare metal?  If you need VMware help just pm me. Or if you have my number since you are like 3 hours behind just give me a ring.

wintermute000

#3
i like it because it rules out stupid issues like even mispatching (the good old on to on port channel, someone mispatches one link, the port channel comes on, STP loop hello). 


You're right in the no active/active except for mlag issue, stupid me. Just haven't see it too much as everything I've seen has drunk the Cisco kool aid (vPC or 3750 stacks, in both cases etherchannel is the way to go)


EDIT appears that from 5.5 onwards its all good to have multiple LACP and can pin or load balance between them as per individual NIC uplinks. I'll make a note to lab it out after my big exam on Monday... what am I doing posting here ROFL
http://cloudmaniac.net/vsphere-enhanced-lacp-support/

burnyd

tldr the blog post but looking at the picture at the bottom it looks like one big lag.  One lag is supported but if you have for example on a given dvport group...

lag1 - active
lag2 - standby

this will not work! 

wintermute000

I'll dig into it later but Noted, thanks!

NetworkGroover

Quote from: burnyd on November 12, 2015, 07:15:30 PM
meh, lacp to hosts is worthless.  If you are not running mlag.  VMware does not allow active active or active passive between two different lag's its really dumb.  I am not sure if that has changed in 6.0 or not.

You can also send different port groups to different links.  If you are running NSX you can ping different vtep interfaces to different uplinks there is a lot you can do without running lacp.  I dont mind lacp from switch to switch but switch to VMware has some wacky issues in my experience. 

Getting back to the point.  I am not understanding what you mean from kick start vm to bare metal?  If you need VMware help just pm me. Or if you have my number since you are like 3 hours behind just give me a ring.

The way they described it is they're using a "kickstart" VM residing in another host to push ESXi to other bare-metal hosts. Make sense?
Engineer by day, DJ by night, family first always

burnyd

No idea what you are saying.  Get more information from them.

NetworkGroover

This seems to me, from what I've remotely seen and they've described (I have zero hands-on access) a very simple deployment... several hosts directly connected to a pair of switches running MLAG.  They're running ESXi 6.0 Update 1.   I'm kind of hamstrung here... since due to their policy (very strict), there's very little info I can get directly from them.  It's a very simple config from the switch perspective... the big unknown is if they're improperly configuring things on the VMware side.

burnyd I'd give you a call, but I don't want to waste your time since I'm not even 100% sure what questions to pose to you.  I think I need more info from them and to get a better understanding of how this works in general first.

Worst come the worst, we'll have a guy going onsite to their facility next week as I'm just covering for them while they're out.  I just wanted to learn more and maybe find some good best practice docs or what not to pass along in the mean time.
Engineer by day, DJ by night, family first always

NetworkGroover

Engineer by day, DJ by night, family first always

NetworkGroover

Quote from: wintermute000 on November 12, 2015, 07:09:36 PM
LACP all day every day (why waste NICs on standby or awkwardly load balancing/pinning when LACP auto balances everything and is redundant too).
Theres also the old chestnut where the default vmware load balancing policy is not same as the standard Cisco switch.
Make them go from "on" to LACP, this will rule out a lot of crap because if anything's not matching the LACP won't form as it will fail the auto negotiate.

Start here. Get them to draw out for your their vnic/port group to physical nic mappings. Be aware that vmware in its wisdom calls physical NICs.... vmnics. I'm not making it up.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088

Finally, an obscure one, but make sure that this hilarious semi-bug isn't the issue (you'll know it if linux is ok but windows is borked). http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html


As a networker, Its worth going through a VCP textbook and just reading the networking sections, just for situations like now. I've saved myself a lot of grief by being able to talk to vmware guys in their 'language' (as well as being able to quickly spot the incompetent ones)

Yeah I already educated them on switch config in that "mode on" wasn't turning on LACP  :o   and what the result would be.  Thanks for the other tips - I'll see what I can dig up.

Engineer by day, DJ by night, family first always

NetworkGroover

#11
So I watched that video, Winter, and yes that is odd but I guess I get it - it's an extended NIC of the VM.. so "vmnic" - or at least that's MY story and I'm sticking to it ;P

So from the looks of this... if I have two switches configured in MLAG, or vPC if they were Cisco.. on the VMware side, where does LACP come into play?  Also, what does dVS have to do with that?  The way they were making it sound, it almost seemed like you couldn't have LACP without dVS.  Watching that video, I didn't see any option to enable LACP, so do standard vSwitches not support it?

EDIT - The sad part of all this is, I took official VMware courses in college but had zero interest in it (I was a noob that was saying "I don't want to touch systems - just routers and switches!"), so I did what I needed to do well in the course but pursued it no further.  SOME of this stuff rings a bell - but I've forgotten probably 90% or more of it.
Engineer by day, DJ by night, family first always

packetherder

I wouldn't worry too much about vSwitch configuration. Sounds like this is all taking place in a pre-boot environment that's as lean as possible. So no vSwitches or or any of the other fanciness in ESXi proper. It also might be why it can't handle the MLAG. I'd google around for ways to PXE boot an LACP bundle. Immediate search comes up with arista specfic info from /r/networking about port-channel lacp fallback. Maybe that helps.

https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/

burnyd


NetworkGroover

Quote from: packetherder on November 13, 2015, 01:12:19 PM
I wouldn't worry too much about vSwitch configuration. Sounds like this is all taking place in a pre-boot environment that's as lean as possible. So no vSwitches or or any of the other fanciness in ESXi proper. It also might be why it can't handle the MLAG. I'd google around for ways to PXE boot an LACP bundle. Immediate search comes up with arista specfic info from /r/networking about port-channel lacp fallback. Maybe that helps.

https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/

This is interesting - thanks for the link!
Engineer by day, DJ by night, family first always