Vendor HA Recommendations vs. Reality

Started by deanwebb, April 30, 2018, 09:31:17 AM

Previous topic - Next topic

deanwebb

"We only support a direct HA connection. If you put a switch between the devices for the HA connection, we don't support that."

^ How many times have you heard that from a vendor? How many vendors may support just one switch in the path, but the devices still have to be in physical proximity (IE, no WAN or MLAN link) for the HA connection to be considered supported?

I can understand not supporting if there's another vendor's switch in the path, as there could be an issue with the switch that breaks HA that would not be directly related to the vendor's gear.

But at the same time, I know of more than one organization that requires HA devices to be in separate buildings of the datacenter, just in case one gets taken out by a tornado or fire or flood. The other building, about a mile away, is still standing with all the HA stuff in it. These firms would rather not see both halves of an HA pair taken out in an outage and then have to use the DR unit in the datacenter on another continent.

Or is this simply a misunderstanding of what HA really is? HA means high availability, as in, should the hardware fail, there's another one right there to keep doing what the other unit was doing. It's only for the hardware failing due to a defect, not from an external event such as would constitute a disaster. Trying to make HA into a localized DR as a step just before using remote DR may not be the right way to go.

Or is it? If your DC buildings are all on 10Gb or faster links, why not stretch that HA? If the vendor hollers, take it under advisement and do it anyway. You get HA and localized DR, all for the same price, right? So long as the network speed is fast enough to do replication, the latency added from a mile of wire isn't going to be all that noticeable, even to the systems doing the HA work. If they're built to use a 100Mb line direct connect and get a 10Gb line stretched over 2km, they're well within speed parameters.

So... do you get vendors saying no HA over distances greater than from one rack to the next? And if you do get those messages, do you follow them or just go with the localized DR, as planned?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

It seems like a misunderstanding... This is where I start asking questions and annoying people (as a consequence of asking questions and probing). But how else can you say "what do you mean you only support direct HA connection... Can you tell me why?" One could take it as though they are implying that they are not using Ethernet as a mechanism to communicate between the two devices, ie a switch wouldn't work; or the keepalive timeout value is so low that they can't have latency. But the vendor should be able to give parameters to allow a design to be built. One of the possible problems I can foresee is if there was such an outage event and then it came to light that the equipment was set up in a non-supported way, the vendors like a 'get out clause'...

There's a few instances in my place that are running fine that the vendor has said is not supported or not possible, although nothing to do with HA. In those cases I try and test, and see if it performs as expected.

deanwebb

I think it's more of the "get out clause" than anything else.

"You got a switch between those devices and HA failed? Well, there's your problem, that's an unsupported config! Next!"

So then you recable things to get them to fail without a switch in between and then...

"Huh... still doesn't work? Well, which one of those boxes do you want to RMA?"
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

DesertFox

Hm, this probably is vendor dependent. For a customer of us we have 2 FW in HA pair and 2 Cisco routers with HSRP (I think- never check the actual config) working with L2 connectivity between. They are located in 2 different DC-s, distanced at 650 km aerial without issues (almost). Never heard that this is not supported config.

icecream-guy

Quote from: deanwebb on May 25, 2018, 09:14:14 AM
I think it's more of the "get out clause" than anything else.

"You got a switch between those devices and HA failed? Well, there's your problem, that's an unsupported config! Next!"

So then you recable things to get them to fail without a switch in between and then...

<you will need to upgrade to the latest (bleeding edge) firmware and software>

"Huh... still doesn't work? Well, which one of those boxes do you want to RMA?"

There, fixed that for you.
:professorcat:

My Moral Fibers have been cut.

deanwebb

Quote from: ristau5741 on May 27, 2018, 06:26:24 AM
Quote from: deanwebb on May 25, 2018, 09:14:14 AM
I think it's more of the "get out clause" than anything else.

"You got a switch between those devices and HA failed? Well, there's your problem, that's an unsupported config! Next!"

So then you recable things to get them to fail without a switch in between and then...

<you will need to upgrade to the latest (bleeding edge) firmware and software>

"Huh... still doesn't work? Well, which one of those boxes do you want to RMA?"

There, fixed that for you.

:yeahright:

Good call there, I left out that step.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.