Sticky nexus situation

icecream-guy · August 03, 2016, 02:20:33 PM

Two 5k's each with links to their own 10G FEXs
ESX servers with 10G uplinks to each FEX (2 up-links, one to each FEX)
port-channel configured across the FEX via VPC
port-channels, VPC, and interfaces all correctly configured
when the port-channels are up, both links in use,
server connected, one port to each FEX in the port-channel
the server admin looses connectivity, sees high packet loss via ICMP
looses access to VCenter running on one of the ESX servers
it being a layer 2 switch, and test pings are within the same vlan.
when a single interface of the port channel is down, everything is fine
we had two 10G uplinks from the 5K to the FEX
TAC troubleshooting session saw tail drops on the FEX uplinks
I added two more links between each 5K and FEX
server admin still seeing loss of connectivity issues
TAC troubleshooting session #2, not sure of the cause
any ideas? ever seen anything like this?

that1guy15 · August 03, 2016, 03:29:57 PM

Im not clear when this issue is seen. During a failure or during normal operations of the Nexus?

Sounds like traffic is not being properly balanced between the two FEXs or one side is black-holeing the traffic.

wintermute000 · August 03, 2016, 04:53:47 PM

Esxi uplink mode not correctly set including the balancing algorithm?

Dieselboy · August 03, 2016, 09:14:28 PM

How long do you wait? I think based on the OP that this is a new config you're doing and in the process you get this issue? I know with my Cisco ESX chassis, there can be long outages from the vswitch on the ESX.

On my ESX I have a vswitch and on the chassis I have 2 links, one to each nexus. There's no port channel configured. If the active link goes down, there's a lengthy downtime. It's on my low priority list to look into it further. Will probably do this now, since I'm thinking about it. I don't recall seeing any LACP stuff in there.

How's the vswitch configured on ESX?

Do you have any devices which you can plug into the fex, which is just a single device with an IP which you can ping? This will verify that traffic is getting through to the fex fine, and could verify the issue at the ESX end.

EDIT - LACP is available from ESXi 5.1. I'm running 5.0 on the UCS servers.

icecream-guy · August 04, 2016, 07:55:12 AM

issues are seen when both interfaces (one on each switch) are up and in port-channel mode.

ESX uplinks are correctly configured, I have no insight, as I have no access to the server side,
the server side load balances on source-dest-ip, no suport for MAC in the MIX, the switch side is source-dest-ip+source-dest-mac as it doesn't support only source-dest-ip, setting source-dest-ip on the switch includes the mac too, according to research.

the issues starts within a minute or so after bringing up the port-channels, we trying to move the ESX Server connections that were individual into port-channels for better redundancy, the port channel part is the new config

don't know how the vswitch is configures on the ESX, not my domain, I can only trust the server admin that all is good.

single connected device work fine, even the ESX servers work fine with 2 connections up as individual, there are 3 other FEX's connected that show no issues.

so from TAC this morning, the engineer supervisor says that we hitting buffer exhaustion (buffer limitations on the 2232 is 1280KB)
due to hardware limitations on the fex, moving links to the parent 5K is recommended.

that1guy15 · August 04, 2016, 09:24:37 AM

Huh...

How full is the FEX and what type of load in on that guy? How much load are you introducing with the new servers?

icecream-guy · August 04, 2016, 11:05:31 AM

Quote from: that1guy15 on August 04, 2016, 09:24:37 AM
Huh...

How full is the FEX and what type of load in on that guy? How much load are you introducing with the new servers?

each FEX has 22 of 32 10G connections up/up. load, i have no idea, how would I calculate that? txload and rx loads on the phy interface are 1/255

that1guy15 · August 04, 2016, 01:08:41 PM

Id monitor the load on the fabric ports of the 5ks. With that many ports populated I wonder if you are hitting buffer/ASIC load.

I have never dug into the 2K architecture and ASIC distribution or even what commands are there to check. Surely TAC can pull this from a show tech.... The 2232 does have a 4:1 over-subscription when fully populated. If you are hitting this you might consider jumping models to 40G uplinks or as you said screw the FEX and run straight to the 5Ks.

You might even do temp monitoring on your uplink interfaces for load to get an idea of it. Or check the interface monitoring on the server within your NMS.

Dieselboy · August 04, 2016, 08:37:37 PM

Is this SAN traffic on the same links? I have had a buffer problem before and the utilisation was very low because it was average. The problem happened during a burst. So over time, say 1 hour the dropped packet rate was less than 1% (it was like 0.5% or lower!) but there was an issue because if you managed to do a show int repeatedly and caught the issue then you would get over 1% drop during the burst.

Can you allocate more shared memory to the buffer pool to allow for individual links grabbing a bunch of memory for the burst, which will then be released back to the pool shortly after? As long as multiple links don't burst at the same time. This is what we done on the IOS switches that we discovered this on. I don't know if this is possible on nexus.

You can't do traceroute mac on nexus either - but is there an equivelant with the ISIS L2 routing? Bit irrelevant now since you know what the issue is but one to keep for the future.

wintermute000 · August 05, 2016, 01:25:17 AM

I'm still banking on port channel config. ESXi is finicky when it comes to port channels, esp LACP, esp pre 5.5, esp non VDS. Read all the caveats here

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004048

Dieselboy · August 05, 2016, 03:55:41 AM

Ohrly. Thanks for the link, I was going to link in one of my VMware boxes via LACP as it's on 5.5 (Riverbed). Would you steer away from it?

wintermute000 · August 05, 2016, 08:08:09 AM

Well the doco says "Supported Cisco configuration: EtherChannel Mode ON – ( Enable EtherChannel only)" and that was the old school way as before 5.1 you couldn't even do LACP anyway.
Ditto with the example Cisco syntax (mode on).
But if it works for HP switches then no reason why not?

The usual gotcha is 'route based on IP hash' on the ESXi side.

Try it and see? don't forget to check the load balance algo is matching on both sides.

And oh LACP is only supported on dvswitch, not regular vswitch.

Vmware jockeys HATE LACP for some reason (historical incompatibility?) and may continue to prefer LBT, which makes our lives easy I suppose.
Remember in UCS environment its hidden from them as well as the LACP is can be kept between the FI and the Nexus ToRs. The blades can use LBT to the FI merrily unaware of the upstream LACP.

Dieselboy · August 07, 2016, 12:58:38 AM

Can you tell me what LBT stands for? I can only think of one meaning and it is definitely out of context here and not related.

IMO from network to "OTHER" the other being the server, there should be a channel negotiation protocol to avoid issues whereby the network is sending traffic down a nailed up channel but the server side has issues and therefore loss of connectivity.

What doco are you checking? Because I've seen Cisco written "support forum" type documentation which gives port channel details for a WLC. The doco still says today that you MUST use LACP "channel mode ON". But this is wrong, ON mode is nailed up whereas LACP is negotiated - unless I've gone seriously wrong somewhere.
https://supportforums.cisco.com/document/81681/lag-link-aggregation#Controller_only_supports_the_on_mode_of_the_LAG

Quote
"Make sure the port-channel on the switch is configured for the IEEE standard Link Aggregation Control Protocol (LACP), not the Cisco proprietary Port Aggregation Protocol (PAgP)."Controller only supports the on mode of the LAG.

wintermute000 · August 07, 2016, 08:58:43 PM

LBT = vmware Load Based Teaming

From a switch perspective you just run dumb trunks (or access if the design is only 1 VLAN I guess). Dvswitch/vswitch implements MAC split horizon so no STP is necessary. The ESXi decides which uplink to use. If it moves then the switch will need to update CAM tables as normal.

Also my mistake, UCS FI does not support LACP, another reason why vmware designs don't see LACP much.

TLDR just use mode on and it will work.

Dieselboy · August 08, 2016, 07:52:28 AM

Thanks! I didn't know that about the VMware side. Because of that, I made the links resilient but it's using STP.

My FI's are doing LACP, they're 6248:

interface port-channel2
speed 10000
description 20GB vPC Etherchannel trunk to CIN-6248-1-A
switchport mode trunk
switchport trunk allowed vlan 2-4,6-7,12,15-19
vpc 2
interface Ethernet1/51
description 10GB Etherchannel member trunk to CIN-6248-1-A port E1/32
switchport mode trunk
switchport trunk allowed vlan 2-4,6-7,12,15-19
channel-group 2 mode active

sh port-chan sum
2 Po2(SU) Eth LACP Eth1/51(P)

Is it the 2k's that don't?