new circuit, same config, OSPF flapping

Started by netspork, March 06, 2018, 02:03:22 AM

Previous topic - Next topic

netspork

Hi all, this should be very simple - we have an old 3550 at a site served by a metro ethernet provider and a very simple ospf config (see below - just redistributing connected/statics).  Never had any issues.  Today we flipped over to a new metro-e provider. Both providers terminate in the same core router and again, same config (literally copy/pasted from one subinterface to another).  OSPF came up, but now seems to die every minute:

Mar  6 02:28:18.325 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done
Mar  6 02:29:15.525 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done
Mar  6 02:30:03.379 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done
Mar  6 02:30:59.863 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done


Soooo... checked with the provider, they swear they aren't doing anything that would muck around with multicast, etc.

I'm running through the debug output a bit, but frankly I don't know what I'm looking for - there's no big ERROR! ERROR!, I see the options flags matching in send/recv...  This little error is not really digging stuff up for me on google that's useful, just other people sending debug info but no one commenting on this in particular:

Cannot see ourself in hello from X.X.X.1 on Vlan49, state INIT

Debug from edge below, core after that:


Mar  6 02:26:26.011 EST: OSPF: Synchronized with X.X.X.1 on Vlan49, state FULL
Mar  6 02:26:26.011 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done
Mar  6 02:26:26.495 EST: OSPF: Rcv LS UPD from X.X.X.1 on Vlan49 length 196 LSA count 1
Mar  6 02:26:28.511 EST: OSPF: Send with youngest Key 1
Mar  6 02:26:31.471 EST: OSPF: Rcv LS UPD from X.X.X.1 on Vlan49 length 196 LSA count 1
Mar  6 02:26:31.475 EST: OSPF: Send with youngest Key 1
Mar  6 02:26:31.763 EST: OSPF: Send with youngest Key 1
Mar  6 02:26:41.764 EST: OSPF: Send with youngest Key 1
Mar  6 02:26:51.764 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:01.765 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:11.765 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.465 EST: OSPF: Cannot see ourself in hello from X.X.X.1 on Vlan49, state INIT
Mar  6 02:27:12.465 EST: OSPF: Neighbor change Event on interface Vlan49
Mar  6 02:27:12.465 EST: OSPF: DR/BDR election on Vlan49
Mar  6 02:27:12.465 EST: OSPF: Elect BDR 0.0.0.0
Mar  6 02:27:12.465 EST: OSPF: Elect DR X.X.X.18
Mar  6 02:27:12.465 EST:        DR: X.X.X.18 (Id)   BDR: none
Mar  6 02:27:12.465 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.469 EST: OSPF: 2 Way Communication to X.X.X.1 on Vlan49, state 2WAY
Mar  6 02:27:12.469 EST: OSPF: Neighbor change Event on interface Vlan49
Mar  6 02:27:12.469 EST: OSPF: DR/BDR election on Vlan49
Mar  6 02:27:12.469 EST: OSPF: Elect BDR X.X.X.1
Mar  6 02:27:12.469 EST: OSPF: Elect DR X.X.X.18
Mar  6 02:27:12.469 EST:        DR: X.X.X.18 (Id)   BDR: X.X.X.1 (Id)
Mar  6 02:27:12.469 EST: OSPF: Send DBD to X.X.X.1 on Vlan49 seq 0xF48 opt 0x52 flag 0x7 len 32
Mar  6 02:27:12.469 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.469 EST: OSPF: Neighbor change Event on interface Vlan49
Mar  6 02:27:12.469 EST: OSPF: Elect BDR X.X.X.1
Mar  6 02:27:12.469 EST: OSPF: Elect DR X.X.X.18
Mar  6 02:27:12.469 EST:        DR: X.X.X.18 (Id)   BDR: X.X.X.1 (Id)
Mar  6 02:27:12.469 EST: OSPF: Neighbor change Event on interface Vlan49
Mar  6 02:27:12.469 EST: OSPF: DR/BDR election on Vlan49
Mar  6 02:27:12.469 EST: OSPF: Elect BDR X.X.X.1
Mar  6 02:27:12.469 EST: OSPF: Elect DR X.X.X.18
Mar  6 02:27:12.473 EST:        DR: X.X.X.18 (Id)   BDR: X.X.X.1 (Id)
Mar  6 02:27:12.473 EST: OSPF: Rcv DBD from X.X.X.1 on Vlan49 seq 0x18B6 opt 0x52 flag 0x7 len 32  mtu 1500 stat
e EXSTART
Mar  6 02:27:12.473 EST: OSPF: First DBD and we are not SLAVE
Mar  6 02:27:12.473 EST: OSPF: Rcv DBD from X.X.X.1 on Vlan49 seq 0xF48 opt 0x52 flag 0x2 len 1112  mtu 1500 sta
te EXSTART
Mar  6 02:27:12.473 EST: OSPF: NBR Negotiation Done. We are the MASTER
Mar  6 02:27:12.477 EST: OSPF: Send DBD to X.X.X.1 on Vlan49 seq 0xF49 opt 0x52 flag 0x3 len 1112
Mar  6 02:27:12.477 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.481 EST: OSPF: Rcv DBD from X.X.X.1 on Vlan49 seq 0xF49 opt 0x52 flag 0x0 len 32  mtu 1500 state
EXCHANGE
Mar  6 02:27:12.481 EST: OSPF: Send DBD to X.X.X.1 on Vlan49 seq 0xF4A opt 0x52 flag 0x1 len 32
Mar  6 02:27:12.481 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.481 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:12.481 EST: OSPF: Send LS REQ to X.X.X.1 length 12 LSA count 1
Mar  6 02:27:12.485 EST: OSPF: Rcv DBD from X.X.X.1 on Vlan49 seq 0xF4A opt 0x52 flag 0x0 len 32  mtu 1500 state
EXCHANGE
Mar  6 02:27:12.485 EST: OSPF: Exchange Done with X.X.X.1 on Vlan49
Mar  6 02:27:12.485 EST: OSPF: Rcv LS UPD from X.X.X.1 on Vlan49 length 196 LSA count 1
Mar  6 02:27:12.485 EST: OSPF: Synchronized with X.X.X.1 on Vlan49, state FULL
Mar  6 02:27:12.485 EST: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.1 on Vlan49 from LOADING to FULL, Loading Done
Mar  6 02:27:14.986 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:16.978 EST: OSPF: Rcv LS UPD from X.X.X.1 on Vlan49 length 196 LSA count 1
Mar  6 02:27:19.478 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:21.766 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:21.818 EST: OSPF: Rcv LS UPD from X.X.X.1 on Vlan49 length 196 LSA count 1
Mar  6 02:27:21.818 EST: OSPF: Send with youngest Key 1
Mar  6 02:27:31.766 EST: OSPF: Send with youngest Key 1


core:

Mar  6 02:25:38.271 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:25:48.012 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:25:57.473 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:06.921 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:16.798 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: X.X.X.18 address X.X.X.18 is dead
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: X.X.X.18 address X.X.X.18 is dead, state DOWN
Mar  6 02:26:18.269 EDT: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.18 on GigabitEthernet0/1/4.2391 from FULL to DOWN,
Neighbor Down: Dead timer expired
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Nbr X.X.X.18: Clean-up dbase exchange
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Neighbor change event
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR/BDR election
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect BDR X.X.X.1
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect DR X.X.X.1
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect BDR 0.0.0.0
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect DR X.X.X.1
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR: X.X.X.1 (Id)
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391:    BDR: none
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Reset flush timer
Mar  6 02:26:18.269 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Remember old DR X.X.X.18 (id)
Mar  6 02:26:25.990 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: 2 Way Communication to X.X.X.18, state 2WAY
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Neighbor change event
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR/BDR election
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect BDR 0.0.0.0
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect DR X.X.X.18
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect BDR X.X.X.1
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect DR X.X.X.18
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR: X.X.X.18 (Id)
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391:    BDR: X.X.X.1 (Id)
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Nbr X.X.X.18: Prepare dbase exchange
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send DBD to X.X.X.18 seq 0x1689 opt 0x52 flag 0x7 len 32
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Set flush timer
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Remember old DR X.X.X.1 (id)
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Neighbor change event
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR/BDR election
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect BDR X.X.X.1
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Elect DR X.X.X.18
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: DR: X.X.X.18 (Id)
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391:    BDR: X.X.X.1 (Id)
Mar  6 02:26:25.994 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:26.000 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Rcv DBD from X.X.X.18 seq 0x1C0A opt 0x52 flag 0x7 len 32  m
tu 1500 state EXSTART
Mar  6 02:26:26.000 EDT: OSPF-1 ADJ   Gi0/1/4.2391: NBR Negotiation Done. We are the SLAVE
Mar  6 02:26:26.000 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Nbr X.X.X.18: Summary list built, size 54
Mar  6 02:26:26.000 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send DBD to X.X.X.18 seq 0x1C0A opt 0x52 flag 0x2 len 1112
Mar  6 02:26:26.000 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:26.005 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Rcv DBD from X.X.X.18 seq 0x1C0B opt 0x52 flag 0x3 len 1112
mtu 1500 state EXCHANGE
Mar  6 02:26:26.005 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send DBD to X.X.X.18 seq 0x1C0B opt 0x52 flag 0x0 len 32
Mar  6 02:26:26.005 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Rcv DBD from X.X.X.18 seq 0x1C0C opt 0x52 flag 0x1 len 32  m
tu 1500 state EXCHANGE
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Exchange Done with X.X.X.18
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Synchronized with X.X.X.18, state FULL
Mar  6 02:26:26.009 EDT: %OSPF-5-ADJCHG: Process 1, Nbr X.X.X.18 on GigabitEthernet0/1/4.2391 from LOADING to FU
LL, Loading Done
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send DBD to X.X.X.18 seq 0x1C0C opt 0x52 flag 0x0 len 32
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Rcv LS REQ from X.X.X.18 length 36 LSA count 1
Mar  6 02:26:26.009 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1
Mar  6 02:26:26.495 EDT: OSPF-1 ADJ   Gi0/1/4.2391: Send with youngest Key 1


And this is really all there is to the ospf config...

The only difference on the core side is that there are more "no passive-interface" statements and more "network" statements.
router ospf 1
router-id X.X.X.18
log-adjacency-changes
area 0 authentication message-digest
redistribute static subnets
passive-interface default
no passive-interface Vlan49
network X.X.X.0 0.0.0.255 area 0
network X.X.X.Y 0.0.0.255 area 0
!
!
interface Vlan49
ip address X.X.X.18 255.255.255.252
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 7 secretzzzz
!
! core side interface (ASR-1002-X)
!
interface GigabitEthernet0/1/4.2391
encapsulation dot1Q 2391
ip address X.X.X.17 255.255.255.252
no ip redirects
no ip unreachables
no ip proxy-arp
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 7 secretzzzzzz


Stumped, any ideas?

icecream-guy

my wild a$$ guess, it that it's your OSPF network type.
:professorcat:

My Moral Fibers have been cut.

netspork

I could really use a guide to all of them:

265-canal-switch(config-if)#ip ospf network ?
  broadcast            Specify OSPF broadcast multi-access network
  non-broadcast        Specify OSPF NBMA network
  point-to-multipoint  Specify OSPF point-to-multipoint network
  point-to-point       Specify OSPF point-to-point network


Not so much when it's a normal network, but when there's something in between that might alter or block traffic required for OSPF to work.

At this site, the underlying "metro-e" network is GPON-based and there's an ONT that's probably disposable/generic crap.  Who knows what it does when it sees multicast.  In other areas I have wireless bridges.  These too can do weird things to multicast because "transparent" seems to be a fluid definition.  I say this after adding an IgniteNet MetroLinq 60GHz bridge where I have a different problem and the switch is being helpful and telling me the dead timer is expiring.  Why?  Who knows.

Of all these ospf network types, what are each useful for in some non-enterprise settings like this?  I mean, I have both problematic sites all wired up with statics, so I can test.  I guess I'll just cycle through all four and see if any stop the flapping... :)

netspork

Holy shit, two birds with one stone.

Resolved both with the "point-to-multipoint" type.

Anyone feel like commenting on why that's the fix?  Or what's different between Sidera (now "Crown Castle", which is an idiotic name) and Pilot's GPON service that would "break" OSPF when set to the default type (broadcast I guess)?

netspork

And... that was incorrect.  Just different default timers on these different network types, so it took longer to flap.

SimonV

I would switch to point-to-point, which will disable DR/BDR elections, and define manual neighbour statements. This will switch from multicast to unicast.

netspork

Quote from: SimonV on March 07, 2018, 02:05:53 AM
I would switch to point-to-point, which will disable DR/BDR elections, and define manual neighbour statements. This will switch from multicast to unicast.

Awesome.  That kind of did it, but "point-to-point" did not allow static neighbors to be set.  For that I had to change to "non-broadcast".  I'm now pairing that with the manual/static neighbor declarations and that has fixed one site.  I'll be trying the same on a wonky wireless link that is supposed to be a transparent bridge, but likely isn't.

This is a chart I found that was helpful:


wintermute000

#7
you issue sounds multicast related. Just go straight up NBMA and define neighbors, that way you keep your full mesh behaviour but just have to define neighbors manually.

There is a SHEDLOAD of material on OSPF network types and specifically deployment in different underlying network topologies. I suggest you research a bit more before complaining that the ? output is insufficient because seriously there is a metric ton of material on this out there.


This bit doesn't make sense. If the far end is not sending the identities of both routers in the HELLO exchange then the adjacency shouldn't even form in the first place as this is how the protocol works (i.e. the far end replies with a HELLO containing both its local ID and the one its heard). But according to your log its doing the exchange correctly but shortly after stops sending the correct info which is why the adjacency is resetting, if I take the debug literally. In fact this is an extremely old school OSPF attack vector (faking hellos with blank IDs to reset an adjacency). But again it suggests some kind of multicast issue, though I can't explain why the initial multicast hellos on 224.0.0.5 seem to be OK.


Mar  6 02:27:12.465 EST: OSPF: Cannot see ourself in hello from X.X.X.1 on Vlan49, state INIT




You may also have an MTU issue. But that typically manifests with an OK small packet HELLO exchange but failure to exchange large packets for database LSAs (due to MTU), you're reaching FULL and not stuck flapping between INIT/EXCHANGE (or are you?). However it is suspicious how the DB exchange appears to be the last thing before the adjacency is torn down.



netspork

I have another few links to play with, but honestly, I'm probably just going to go with static neighbor definitions going forward.

I mean, I pretty much have two types of links that I need to run OSPF over, and I don't really trust either:

- Some flavor of metro-ethernet delivered over PON, which means I'm on a network that's primarily for small/medium businesses and probably has only a handful of customers running any kind of non-DIA traffic.  And the last hop in that setup is some little GPON ONT that is basically disposable hardware.  I would not bet any sort of money on what the ONT or the OLT boxes do to multicast traffic.  I would not bet any sort of money that those managing that network care about this.

- Links over wireless gear built for the WISP industry.  Sure, it's supposed to be a transparent bridge, but again, do I trust Ubiquiti has multicast handled right in their very-modified fork of DD-WRT?  Do I dare inquire as to what the "multicast enhancement" checkbox on the "Network" tab of the config does?  On the IgniteNet MetroLinq 60GHz gear, do I trust that their fork of DD-WRT is doing sane things with multicast?  I mean, it's nice gear and where else can I get links between buildings at more than 1Gb/s for just over $1K for both ends?  But are they doing the "right thing" when they tie the wireless interface(s) (there's a 5GHz backup radio) and the ethernet interface into a bridge interface WRT multicast?

I'm just not sure I want to go down either rabbit hole.  Somewhere on my bookshelf I actually have some old Cisco Press book on OSPF, I had a decent understanding of it, but that was when I was running it over frame relay and ATM links and I was full-time with an ISP and spent at least 50% of my day on the network side of things.  This nonsense of troubleshooting OSPF over transports with unknown properties sounds like as much fun as chasing down modem drops in the dialup days.

icecream-guy

Quote from: netspork on March 16, 2018, 12:12:52 AM
I have another few links to play with, but honestly, I'm probably just going to go with static neighbor definitions going forward.

I mean, I pretty much have two types of links that I need to run OSPF over, and I don't really trust either:

- Some flavor of metro-ethernet delivered over PON, which means I'm on a network that's primarily for small/medium businesses and probably has only a handful of customers running any kind of non-DIA traffic.  And the last hop in that setup is some little GPON ONT that is basically disposable hardware.  I would not bet any sort of money on what the ONT or the OLT boxes do to multicast traffic.  I would not bet any sort of money that those managing that network care about this.

- Links over wireless gear built for the WISP industry.  Sure, it's supposed to be a transparent bridge, but again, do I trust Ubiquiti has multicast handled right in their very-modified fork of DD-WRT?  Do I dare inquire as to what the "multicast enhancement" checkbox on the "Network" tab of the config does?  On the IgniteNet MetroLinq 60GHz gear, do I trust that their fork of DD-WRT is doing sane things with multicast?  I mean, it's nice gear and where else can I get links between buildings at more than 1Gb/s for just over $1K for both ends?  But are they doing the "right thing" when they tie the wireless interface(s) (there's a 5GHz backup radio) and the ethernet interface into a bridge interface WRT multicast?

I'm just not sure I want to go down either rabbit hole.  Somewhere on my bookshelf I actually have some old Cisco Press book on OSPF, I had a decent understanding of it, but that was when I was running it over frame relay and ATM links and I was full-time with an ISP and spent at least 50% of my day on the network side of things.  This nonsense of troubleshooting OSPF over transports with unknown properties sounds like as much fun as chasing down modem drops in the dialup days.

Shouldn't this be in the "current frustrations" thread.   LoL, KISS,  if there is no need for auto-detection of neighbors or redundancy, and they peer to peer links, where you control both sides, by all means, basic configuration is all you need.
:professorcat:

My Moral Fibers have been cut.