Networking-Forums.com

Professional Discussions => Everything Else in the Data Center => Topic started by: deanwebb on August 25, 2015, 12:39:55 PM

Title: HA WTF?
Post by: deanwebb on August 25, 2015, 12:39:55 PM
Got a high-availability device cluster... the secondary has had a slightly flaky history, but it's always, "wait a few minutes, and it'll come back", so nothing is done for it.

Friday, did an SP upgrade on it. Primary took the upgrade just fine, the secondary... well, it lost HA and reports as "disconnected". Can't SSH to it.

Today, vendor recommended rebooting it. Shouldn't affect the primary at all. That's what HA is for, right?

:challenge-accepted:

So I fire up the Raritan and cycle power on the secondary.

Both it and the primary go down and then come back up with a Linux OS "soft lockup" error.

:rage: :kramer: :frustration: :facepalm4:

Vendor says the error message indicates the possibility of bad hardware. They recommend rebooting it.

:facepalm2:

Of course, I'm in a very large company, so that's gonna take some time to accomplish.

Lesson learned: If part of an HA pair shows flaky behavior, don't "wait a few minutes." RMA the fartknocker ASAP.

:coolstory:
Title: Re: HA WTF?
Post by: Reggle on August 29, 2015, 02:14:06 AM
That's what HA is for. So you can replace the one with issues.
Title: Re: HA WTF?
Post by: deanwebb on August 29, 2015, 10:17:50 AM
Sure is... but, dang, we're a little gunshy with this pair. If taking the HA unit out brings down the primary, that's not HA... that's HF... High Failure!
Title: Re: HA WTF?
Post by: mmcgurty on August 31, 2015, 08:01:29 AM
I have (4) Cisco WLC's and the primary server just up and died on Friday afternoon at 3:05PM EST.  I was very surprised it actually failed over properly to the others and load balanced the AP's connected to them.  A replacement is supposed to be shipping today.