stp-2-dispute_detected - what fun!

Started by LynK, January 28, 2015, 11:46:05 AM

Previous topic - Next topic

LynK

Had this bad boy this morning.... :angry: :angry: :angry:

On our new 2960x stack, lost all connectivity to it, due to a STP dispute. Cisco seems to think it was a bad cable, however I am not too certain that is the case. Going to run 2 new uplink cables and see if it resolves the issue.

Anyone else experience this, and what did you do to prevent?

According to cisco, this is their recommended action:
Quote
Recommended Action: Issue the show spanning-tree inconsistentports command to review the list  of interfaces with Dispute. Dispute is caused if the peer in not receiving the Superior BPDUs sent  by this interface. That is why the peer continues to send its own Inferior BPDUs. Determine why  devices connected to the listed ports are not receiving BPDUs. One reason could be a failure in the  cable: if the link has a failure that makes it unidirectional (you can not transmit but you can receive)  it should be replaced with a proper cable.
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

that1guy15

If copper very unlikley. Fiber yeah, make sure light is making it both ways. Also if this trunk in question is a port-channel connected to multiple members of the stack you should check your stack for issues.

What about STP are both sides configured in the same STP mode?

That1guy15
@that1guy_15
blog.movingonesandzeros.net

LynK

Quote from: that1guy15 on January 28, 2015, 03:02:40 PM
If copper very unlikley. Fiber yeah, make sure light is making it both ways. Also if this trunk in question is a port-channel connected to multiple members of the stack you should check your stack for issues.

What about STP are both sides configured in the same STP mode?

These are 2 fiber up links on different switch stacks.

Here is the STP & PC config:


spanning-tree mode rapid-pvst
spanning-tree extend system-id
spanning-tree vlan 1-4094 priority 61440
!
interface Port-channel24
description ***UPLINK TO NEXUS***
switchport trunk native vlan 400
switchport trunk allowed vlan 102,108,160,200,300,400,777
switchport mode trunk
load-interval 30
end


here is the individual tenG configs:

interface TenGigabitEthernet1/0/1
switchport trunk native vlan 400
switchport trunk allowed vlan 102,108,160,200,300,400,777
switchport mode trunk
srr-queue bandwidth share 10 10 60 20
queue-set 2
priority-queue out
mls qos trust dscp
macro description EgressQoS
channel-group 24 mode active
!
interface TenGigabitEthernet3/0/1
switchport trunk native vlan 400
switchport trunk allowed vlan 102,108,160,200,300,400,777
switchport mode trunk
shutdown
srr-queue bandwidth share 10 10 60 20
queue-set 2
priority-queue out
mls qos trust dscp
macro description EgressQoS
channel-group 24 mode active
end
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

wintermute000

turn on udld and loop guard on fibre links by default, works for me

sgtcasey

Something to keep an eye out for with UDLD is if anything at all stops the UDLD hello packets (or whatever they're called) your fiber links will drop.  If you do not have errdisable recovery cause udld enabled along with errdisable recovery <time> (think those are the commands) then you'll need to go to the switch stack and console into it to bring it back online.

I've always kept UDLD going on device using fiber to prevent a bad fiber pair from throwing problems into the network.  Drop the busted fiber, I get the alert, send a ticket to the cable folks to test/replace as needed, bring back online once fixed.  Easy.

Until one fine morning around 2:00am my pager goes off that most of the access switches at a site (a 3 hour drive away) just dropped offline.  I go to connect to the core 4507 to check it out and nothing.  An hour or so later (thankfully I was able to reach a local contact who I talked through consoling into stuff for me) every thing was back up.  The cause was a bug with our 4507 code where the supervisor would simply stop processing/forwarding traffic.  It simply stopped working but from the outside it looked fine (all lights green and such).

When it stopped sending traffic that included all UDLD stuff.  All the access switches doing what they were configured to do err-disabled all of the fiber links due to UDLD requiring someone to physically go to them all to bring back online.  We decided to add in the errdisable recovery udld command so the ports would come back up automatically if this event happens again.  If we have a true UDLD issue the port will bounce and alert the team and someone can just go shut it manually.

So far so good.
Taking the sh out of IT since 2005!

LynK

#5
@winter / casey

-enabled udld, and loopguard on both the core/access, and i also replaced both uplinks (for safe measure). Rebooted the stack, and it came up, and the links have been clean all night (knock on wood).

Here is some of the output of the logs for informational inquiries:

Jan 28 21:22:42.080: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistentpeer vlan id 102 on StackPort3 VLAN108. (ACCESS_STACK-3)
Jan 28 21:22:42.080: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0102. Inconsistent peer vlan. (ACCESS_STACK-3)
Jan 28 21:22:42.084: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking StackPort3 on VLAN0108. Inconsistent local vlan. (ACCESS_STACK-3)
Jan 28 21:22:42.101: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0777. Inconsistent peer vlan. (sACCESS_STACK-3)
Jan 28 21:22:42.622: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0160. Inconsistent peer vlan. (ACCESS_STACK-3)
Jan 28 21:22:42.622: %SPANT
Jan 28 21:22:43.073: %LINEPROTO-5-UPDOWN: Line protocol on Interface TenGigabitEthernet1/0/1, changed state to up
Jan 28 21:22:42.622: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0300. Inconsistent peer vlan. (ACCESS_STACK-3)
Jan 28 21:22:42.622: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0400. Inconsistent peer vlan. (ACCESS_STACK-3)
Jan 28 21:22:44.062: %LINK-3-UPDOWN: Interface Port-channel24, changed state to up
Jan 28 21:22:44.953: %LINEPROTO-5-UPDOWN: Line protocol on Interface TenGigabitEthernet3/0/1, changed state to up
Jan 28 21:22:45.072: %LINEPROTO-5-UPDOWN: Line protocol on Interface Port-channel24, changed state to up
Jan 28 21:22:47.529: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistentpeer vlan id 777 on Port-channel24 VLAN102.
Jan 28 21:22:47.529: %SPANTREE-2-BLOCK_PVID_PEER: Blocking Port-channel24 on VLAN0777. Inconsistent peer vlan.
Jan 28 21:22:47.529: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking Port-channel24 on VLAN0102. Inconsistent local vlan.
Jan 28 21:22:46.579: %SPANTREE-2-UNBLOCK_CONSIST_PORT: Unblocking StackPort3 on VLAN0300. Port consistency restored. (ACCESS_STACK-3)
Jan 28 21:22:46.582: %SPANTREE-2-UNBLOCK_CONSIST_PORT: Unblocking StackPort3 on VLAN0400. Port consistency restored. (ACCESS_STACK-3)
Jan 28 21:22:46.596: %SPANTREE-2-BLOCK_PVID_PEER: Blocking StackPort3 on VLAN0400. Inconsistent peer vlan. (ACCESS_STACK-3)
Jan 28 21:22:46.596: %SPANTREE-2-RECV_PVID_ERR: Received BPDU with inconsistent peer vlan id 777 on StackPort3 VLAN300. (ACCESS_STACK-3)
Jan 28 21:22:46.596: %SPANTREE-2-BLOCK_PVID_LOCAL: Blocking StackPort3 on VLAN0300. Inconsistent local vlan. (ACCESS_STACK-3)
Jan 28 21:22:49.525: %SPANTREE-2-LOOPGUARD_BLOCK: Loop guard blocking port Port-channel24 on VLAN0108.
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

icecream-guy

looks like the port channel configuration's are different on each side of the link
:professorcat:

My Moral Fibers have been cut.

LynK

Quote from: ristau5741 on January 29, 2015, 11:08:05 AM
looks like the port channel configuration's are different on each side of the link

here is the core P-C

interface port-channel40
  description ***P-C TO NDC_ACCESS_STACK***
  switchport mode trunk
  switchport trunk native vlan 400
  switchport trunk allowed vlan 102,108,160,200,300,400,777
  spanning-tree guard loop
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

sgtcasey

That fact you were seeing these errors on the stack ports makes me wonder if there might not be a busted stack port cable or some sort of weird software bug.  Are you doing any odd configuration with your stack ports?  How are the stack ports connected up?
Taking the sh out of IT since 2005!

LynK

#9
Quote from: sgtcasey on January 29, 2015, 05:01:52 PM
That fact you were seeing these errors on the stack ports makes me wonder if there might not be a busted stack port cable or some sort of weird software bug.  Are you doing any odd configuration with your stack ports?  How are the stack ports connected up?


  Switch #    Port 1       Port 2
  --------    ------       ------
      1         2             3
      2         1             3
      3         2             1
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

sgtcasey

I've never seen it done that way before.  I always go 1 to 2, 1 to 2, 1 to 2 all the way down the stack with the last switch in the stack going to the top switch in the stack to complete the ring.  I'm not saying the way you have it won't work perfectly fine.  :)

Odd problem for sure.  It might be time for a TAC case or a good look through the Bug Search Tool.
Taking the sh out of IT since 2005!

icecream-guy

stupid question, but then there are none,   are you sure the switches are numbered correctly?  sometime it gets confusing.

sgtcasey,  9 stacks can't be wired that way as the stack cable is too short.  the extra lond stack cable would be needed


also check stack-ring speed see if there is an issue with the stack cabling

show switch [ stack-member-number | detail | neighbors | stack-ports | stack-ring speed ]
:professorcat:

My Moral Fibers have been cut.

sgtcasey

Quote from: ristau5741 on January 30, 2015, 11:03:47 AM
sgtcasey,  9 stacks can't be wired that way as the stack cable is too short.  the extra lond stack cable would be needed

Yep.  We use the longer stacking cable for all switch stacks larger than 3 and still go 1 to 2, etc.
Taking the sh out of IT since 2005!

LynK

@ristau,

I am 100% certain i have the switches with the correct #'s


Switch#  Role   Mac Address     Priority Version  State
----------------------------------------------------------
*1       Master e089.9d05.bb00     15     4       Ready
2       Member 5cfc.6603.2d00     1      4       Ready
3       Member e089.9d05.b180     14     4       Ready
!
!
Stack Ring Speed        : 10G
Stack Ring Configuration: Full
Stack Ring Protocol     : FlexStack
!
!
         Stack Port Status             Neighbors
Switch#  Port 1     Port 2           Port 1   Port 2
--------------------------------------------------------
  1        Ok         Ok                2        3
  2        Ok         Ok                1        3
  3        Ok         Ok                2        1

Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

LynK

We had our secondary link go down again. Luckily this time UDLD/loopguard caught it. However when on the 1 uplink, i see packet loss to both the SVI of the access switch and core switch. I had to perform an emergency reboot on the switches.

Looking into the IOS further (EX5) there are known issues with SFPs. Cisco is having me upgrade to

c2960x-universalk9-mz.152-3.E

hopefully this fixes our issues.
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"