IOS bug of the day

Started by icecream-guy, February 17, 2016, 11:05:15 AM

Previous topic - Next topic

SimonV

The frustrating part is that it's a stack. So I have to take out that one chassis and either fix it with the current release or up/downgrade the entire stack. And they asked me to do it *during business hours*  :partay:

deanwebb

Wow, that's really insane... a stack IOS downgrade during production...
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

SimonV

Well, all the virtual machines will be vmotion'ed to the other server room but I do prefer doing those things when no one is around to complain.

icecream-guy

one can reload individual members of a stack one at a time.
:professorcat:

My Moral Fibers have been cut.

SimonV

Removed stackpower from that switch, powered it down and removed it from the stack. Powered it on again as standalone, it went through the complete boot process without issues. Powered it down again, added to stack/stackpower again and booted. Problem solved... 

:printer:

deanwebb

You did it like a boss... or a gangsta... or...

A GANGSTA BOSS!

:gangsta:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

SimonV

Glad it was this easy to fix and that I didn't do a blind downgrade like Cisco recommended :) Fail to see the logic though, for the switch nothing has changed.

deanwebb

Honestly, finding stuff like that should count for something. Like a CCIE-Experience. No, you didn't sit for an exam or a practical, but DAMN, you figured out a really insane problem. Figure out 6 in a year like that, and you get the cert, something like that.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

SimonV

CSCtk68692 - kron-initiated 'write mem' locks nvram indefinitely

After our outsourcing partner implemented an auto-save command  :mrgreen:

deanwebb

Quote from: SimonV on September 18, 2017, 12:59:45 PM
CSCtk68692 - kron-initiated 'write mem' locks nvram indefinitely

After our outsourcing partner implemented an auto-save command  :mrgreen:

:no: :frustration: :whatudo:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

SimonV

CSCvd78303 - ARP functions fail after 213 days of uptime, drop with error 'punt-rate-limit-exceeded'

Symptom:
An ASA, after reaching an uptime of roughly 213 days will fail to process ARP packets leading to a condition where all traffic eventually stops passing through the affected device. Since not all existing ARP entries time out at the same time, not all connections may fail at the same time.

Additional symptoms include:
- ASA does not have ARP entries in its ARP table. show arp is empty
- The output of show asp drop and ASP drop captures indicate a rapidly increasing counter for punt-rate-limit exceeded and the dropped packets are predominantly ARP.

Conditions:
This is seen when the ASA's uptime reaches 213 days.

Workaround:
Perform a pre-planned reboot of the device before approaching the 213 days (5124 hours) of up time. After the reboot, it will give you another 213 days of up time.


How about that workaround 

:umad:

icecream-guy

Quote from: SimonV on October 31, 2017, 09:54:06 AM
CSCvd78303 - ARP functions fail after 213 days of uptime, drop with error 'punt-rate-limit-exceeded'

Symptom:
An ASA, after reaching an uptime of roughly 213 days will fail to process ARP packets leading to a condition where all traffic eventually stops passing through the affected device. Since not all existing ARP entries time out at the same time, not all connections may fail at the same time.

Additional symptoms include:
- ASA does not have ARP entries in its ARP table. show arp is empty
- The output of show asp drop and ASP drop captures indicate a rapidly increasing counter for punt-rate-limit exceeded and the dropped packets are predominantly ARP.

Conditions:
This is seen when the ASA's uptime reaches 213 days.

Workaround:
Perform a pre-planned reboot of the device before approaching the 213 days (5124 hours) of up time. After the reboot, it will give you another 213 days of up time.


How about that workaround 

:umad:

9.1.7.16 fixes issue, we've been deploying that one for months as their uptime gets close to 213.5
:professorcat:

My Moral Fibers have been cut.

Otanx

I remember when that one came out. I could have sworn I had verified and we were not affected. Then one Friday I started seeing a couple remote ASAs go offline. I don't remember what tipped me to look at this, but I found it. I had RANCID grab the uptime of all our firewalls. Most of them were 213 days X hours. Called our helpdesk, and told them there were going to be unscheduled rolling outages. Then had RANCID push the reload command to all the firewalls. Called the couple sites that were already offline to have someone power cycle the gear. The next week we did a change ticket to move to a fixed release.

-Otanx

deanwebb

Gotta love operations, the way they don't update until there's an outage!
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

icecream-guy

Quote from: deanwebb on November 01, 2017, 07:37:25 AM
Gotta love operations, the way they don't update until there's an outage!

we'll sometimes its hard to get maintenance windows, especially if it's not broke.

if it's not broke, there should be no need to fix it, right?

after an outage, OMG, it's all broke and we need to fix everything ASAP

us to management:  we been trying to get maintenance windows to fix this crap and we told you so, but Noooooo, you wouldn't give us no windows.......
:professorcat:

My Moral Fibers have been cut.