Switching gotchas

Started by deanwebb, February 16, 2015, 09:20:35 PM

Previous topic - Next topic

deanwebb

When you're working on switching problems, what are some pitfalls that you have to keep an eye out for?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

that1guy15

The most popular are:
-Switchport trunk allowed vlan X - forgetting the add/remove
-Adding config to a Port-Channel member interface instead of the Port-Channel itself. And down goes the uplink!
-Not using VTP version 3 in production and blowing up the VLAN database by introducing a re-purposed switch.
-Switch Stacking - Please make sure the stack members line up in the order you rack them. Or renumber your switches after removing them from the stack.


That1guy15
@that1guy_15
blog.movingonesandzeros.net

deanwebb

Ah, yes, the repurposed switch! Expand on that gotcha, it's a good one.

Also, another gotcha is not labeling everything, including both ends of every cable.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Reggle

Check all individual interfaces before and after port-channel manipulations. Things I've seen...
- BPDU filter on one interface, adding/removing an interfaces, rehash all, BPDU is suddenly hashed to the filtered port.
- An IOS on a 6500 that didn't individually apply the 'switchport trunk allowed vlan add' command to the ports in the config.
- MST + High CPU + Adding/Removing VLAN on port-channel = hash mismatch and STP reconvergence because it couldn't figure out everything in time. Also the port-channel may briefly split in two, showing a port-channel 5A and 5B for example. First solve high CPU.

Check if your VLAN exists everywhere where it should be. MST will not care if a switch in between is missing it and blackhole accordingly. So will vPC and FabricPath.

MST and two uplinks between switches, not a port-channel, with differing VLANs is asking for issues because it can't figure out what to (un)block. Notable are some third-party blade switches. Related to that, although according to the 802.1d standard a device either actively must participate in spanning-tree, or forward it transparently as any other frame, don't assume this is always true. Plenty of devices (IP Phones, Load Balancers,...) that will just plainly filter it out.

MST must be spanning-tree root in mixed environments. It will not accept otherwise and shut down uplinks.
DTP + Different VTP domains = link shut.

Encapsulated protocols generally don't have PAK_PRIORITY. So while BFD and IS-IS will work fine natively, they will may dropped due to microbursts over OTV for example. Also, switching and WAN lines: test the provider MTU before deploying. What someone told you was a CWDM may just be a L2VPN that will blackhole jumbo frames.


killabee

Verify connectivity audibly, visually, and programmatically.  If someone says they plugged something in...
-Verify they hear a "tink" and they see a link light
-Verify the interface status goes up (and verify it goes down when they unplug it)
-Verify you see a MAC address
-Verify you see incrementing bits
-If it's IP enabled, verify you see an ARP entry on the gateway
-If it's a PoE device, also verify that power is delivered to the device as well
-If all else fails, default the interface, shut/no shut, or move to different interface
-If it's a dual-sup chassis and weirdness with ARP/MAC learning is happening to everything, fail over the sup!

I've encountered issues where PoE switches bring the link up, but don't deliver power; silent hosts bring the link up, but don't send any traffic (preventing MAC learning, and sometimes revealing that DHCP is not enabled on the device); user says they're plugging into port X on stack 1, but you see it on port Y, Stack 2; software bugs that fail to place voice traffic on port in correct VLAN, despite proper config; software bugs that "remember" cold MACs/VLANs...

Don't assume...verify.