Hey guys. So my knowledge on the switch side is pretty good, but on the VMware side I'm definitely lacking. I'm usually working with folks that are pretty knowledgeable on the VMware side but this time I'm not, and they're seeing a push of ESX from a kickstart VM to bare metal taking 45 minutes, but if they shut down one of the switches so only a single link is available, it takes about 3 minutes. Doesn't make a whole lot of sense to me. What compounds the problem is the folks I'm working with, I can't be onsite, can't see their screen, and seem to be lacking a bit themselves as they said they "did test DVS and LACP (mode on)" - obviously that's a problem.
How do you generally configure your switches, and the VMware side of things, depending on if you're using standard vSwitches or DVS?
LACP all day every day (why waste NICs on standby or awkwardly load balancing/pinning when LACP auto balances everything and is redundant too).
Theres also the old chestnut where the default vmware load balancing policy is not same as the standard Cisco switch.
Make them go from "on" to LACP, this will rule out a lot of crap because if anything's not matching the LACP won't form as it will fail the auto negotiate.
Start here. Get them to draw out for your their vnic/port group to physical nic mappings. Be aware that vmware in its wisdom calls physical NICs.... vmnics. I'm not making it up.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088 (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088)
Finally, an obscure one, but make sure that this hilarious semi-bug isn't the issue (you'll know it if linux is ok but windows is borked). http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html (http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html)
As a networker, Its worth going through a VCP textbook and just reading the networking sections, just for situations like now. I've saved myself a lot of grief by being able to talk to vmware guys in their 'language' (as well as being able to quickly spot the incompetent ones)
meh, lacp to hosts is worthless. If you are not running mlag. VMware does not allow active active or active passive between two different lag's its really dumb. I am not sure if that has changed in 6.0 or not.
You can also send different port groups to different links. If you are running NSX you can ping different vtep interfaces to different uplinks there is a lot you can do without running lacp. I dont mind lacp from switch to switch but switch to VMware has some wacky issues in my experience.
Getting back to the point. I am not understanding what you mean from kick start vm to bare metal? If you need VMware help just pm me. Or if you have my number since you are like 3 hours behind just give me a ring.
i like it because it rules out stupid issues like even mispatching (the good old on to on port channel, someone mispatches one link, the port channel comes on, STP loop hello).
You're right in the no active/active except for mlag issue, stupid me. Just haven't see it too much as everything I've seen has drunk the Cisco kool aid (vPC or 3750 stacks, in both cases etherchannel is the way to go)
EDIT appears that from 5.5 onwards its all good to have multiple LACP and can pin or load balance between them as per individual NIC uplinks. I'll make a note to lab it out after my big exam on Monday... what am I doing posting here ROFL
http://cloudmaniac.net/vsphere-enhanced-lacp-support/
tldr the blog post but looking at the picture at the bottom it looks like one big lag. One lag is supported but if you have for example on a given dvport group...
lag1 - active
lag2 - standby
this will not work!
I'll dig into it later but Noted, thanks!
Quote from: burnyd on November 12, 2015, 07:15:30 PM
meh, lacp to hosts is worthless. If you are not running mlag. VMware does not allow active active or active passive between two different lag's its really dumb. I am not sure if that has changed in 6.0 or not.
You can also send different port groups to different links. If you are running NSX you can ping different vtep interfaces to different uplinks there is a lot you can do without running lacp. I dont mind lacp from switch to switch but switch to VMware has some wacky issues in my experience.
Getting back to the point. I am not understanding what you mean from kick start vm to bare metal? If you need VMware help just pm me. Or if you have my number since you are like 3 hours behind just give me a ring.
The way they described it is they're using a "kickstart" VM residing in another host to push ESXi to other bare-metal hosts. Make sense?
No idea what you are saying. Get more information from them.
This seems to me, from what I've remotely seen and they've described (I have zero hands-on access) a very simple deployment... several hosts directly connected to a pair of switches running MLAG. They're running ESXi 6.0 Update 1. I'm kind of hamstrung here... since due to their policy (very strict), there's very little info I can get directly from them. It's a very simple config from the switch perspective... the big unknown is if they're improperly configuring things on the VMware side.
burnyd I'd give you a call, but I don't want to waste your time since I'm not even 100% sure what questions to pose to you. I think I need more info from them and to get a better understanding of how this works in general first.
Worst come the worst, we'll have a guy going onsite to their facility next week as I'm just covering for them while they're out. I just wanted to learn more and maybe find some good best practice docs or what not to pass along in the mean time.
Quote from: burnyd on November 13, 2015, 09:37:22 AM
No idea what you are saying. Get more information from them.
I think they are talking about this but on 6.0 update 1:
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004582 (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004582)
Quote from: wintermute000 on November 12, 2015, 07:09:36 PM
LACP all day every day (why waste NICs on standby or awkwardly load balancing/pinning when LACP auto balances everything and is redundant too).
Theres also the old chestnut where the default vmware load balancing policy is not same as the standard Cisco switch.
Make them go from "on" to LACP, this will rule out a lot of crap because if anything's not matching the LACP won't form as it will fail the auto negotiate.
Start here. Get them to draw out for your their vnic/port group to physical nic mappings. Be aware that vmware in its wisdom calls physical NICs.... vmnics. I'm not making it up.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088 (http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004088)
Finally, an obscure one, but make sure that this hilarious semi-bug isn't the issue (you'll know it if linux is ok but windows is borked). http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html (http://www.cisco.com/c/en/us/support/docs/ip/address-resolution-protocol-arp/118630-technote-ipdt-00.html)
As a networker, Its worth going through a VCP textbook and just reading the networking sections, just for situations like now. I've saved myself a lot of grief by being able to talk to vmware guys in their 'language' (as well as being able to quickly spot the incompetent ones)
Yeah I already educated them on switch config in that "mode on" wasn't turning on LACP :o and what the result would be. Thanks for the other tips - I'll see what I can dig up.
So I watched that video, Winter, and yes that is odd but I guess I get it - it's an extended NIC of the VM.. so "vmnic" - or at least that's MY story and I'm sticking to it ;P
So from the looks of this... if I have two switches configured in MLAG, or vPC if they were Cisco.. on the VMware side, where does LACP come into play? Also, what does dVS have to do with that? The way they were making it sound, it almost seemed like you couldn't have LACP without dVS. Watching that video, I didn't see any option to enable LACP, so do standard vSwitches not support it?
EDIT - The sad part of all this is, I took official VMware courses in college but had zero interest in it (I was a noob that was saying "I don't want to touch systems - just routers and switches!"), so I did what I needed to do well in the course but pursued it no further. SOME of this stuff rings a bell - but I've forgotten probably 90% or more of it.
I wouldn't worry too much about vSwitch configuration. Sounds like this is all taking place in a pre-boot environment that's as lean as possible. So no vSwitches or or any of the other fanciness in ESXi proper. It also might be why it can't handle the MLAG. I'd google around for ways to PXE boot an LACP bundle. Immediate search comes up with arista specfic info from /r/networking about port-channel lacp fallback. Maybe that helps.
https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/ (https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/)
Are they referring to autodeploy?
Quote from: packetherder on November 13, 2015, 01:12:19 PM
I wouldn't worry too much about vSwitch configuration. Sounds like this is all taking place in a pre-boot environment that's as lean as possible. So no vSwitches or or any of the other fanciness in ESXi proper. It also might be why it can't handle the MLAG. I'd google around for ways to PXE boot an LACP bundle. Immediate search comes up with arista specfic info from /r/networking about port-channel lacp fallback. Maybe that helps.
https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/ (https://www.reddit.com/r/networking/comments/2kslop/how_to_handle_lacpbonding_with_systems_that_need/_that_need/)
This is interesting - thanks for the link!
MLAG probably doesn't have much to do with it, since it appears as a normal port channel to the downstream host - however a scripted install probably doesn't start with a dvswitch (not sure, haven't looked into this particular aspect) - so no LACP. You might be stuck with 'on'.
I'd start with getting their script and analysing what exactly they're setting up vswitch wise, then go from there. If they're matching a switch etherchannel ON with a standard active/passive pairing of vmnics for mgt then that's your problem right there.
https://pubs.vmware.com/vsphere-60/index.jsp?topic=%2Fcom.vmware.vsphere.networking.doc%2FGUID-D34B1ADD-B8A7-43CD-AA7E-2832A0F7EE76.html
Incidentally, I'm not terribly sure what the benefits of scripted install are over autodeploy aside from reliance on the autodeploy TFTP, but if you're deploying so many ESXis so often that you need an autodeploy solution you'd think that you'd want to get away from local installs entirely (also ease of patching... reboot and boom boot of new image/patches).