VSS blew up at my old job

Started by dlots, March 14, 2017, 08:24:52 AM

Previous topic - Next topic

dlots

Before I left I advised my old job (the one that I left cause they wouldn't listen to me) not to do VSS, they didn't need the extra fast fail-over for anything in particular, or the extra bandwidth that comes with VPC vs Spanning-tree.  The Sr Engineers said I was dumb and it was super awesome and I didn't know what I was talking about.

Well they ran into a VSS bug the other week and lost both of their data-centers for a few hours and had to reboot everything.

Would it be wrong to call them and tell them "Told you so"?

LynK

but it is easy. we like easy.


everyone needs the extra fast failover!
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

deanwebb

Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Otanx

We did VSS in our first DC because "easy". Have not hit any bugs, but patching VSS is more involved, and with only one control plane a single configuration error can kill everything. In the process of building two new DCs, and have decided not to use VSS. Also have a side project to figure out how to break VSS and go to independent cores for DC1. Shouldn't be too hard. We were smart enough that the majority of the layer 3 interfaces have HSRP configured.

-Otanx

mlan

@dlots - I have seen this once myself, but the team did not have any split-brain detection configured.  Any idea what type of dual-active detection they were running for this incident?  I have been suggesting "fast-hello" for the VSS deployments in my environment.

Also, do you happen to know the bug ID?

dlots

Sorry no to both.
I wasn't very popular with the network team there, and don't talk to them much at all.

wintermute000

Shared blast radius. I'm not a fan in a DC scenario. For campus where you can usually bring it down on a Sunday etc different story

icecream-guy

:professorcat:

My Moral Fibers have been cut.

SimonV

So what's the opinion on stacking vs standalone core switches then?

I'm not a DC guy, doing mostly office and warehouse networks and I almost always install two separate L3 switches with RSTP and fast HSRP timers. Two-link etherchannels to both cores and load-balancing the VLANs by modifying STP cost.

I don't mind stacking in the access layer but I always get the feeling that it's too much of a risk in the core.

icecream-guy

Quote from: SimonV on March 15, 2017, 09:29:39 AM
So what's the opinion on stacking vs standalone core switches then?

I'm not a DC guy, doing mostly office and warehouse networks and I almost always install two separate L3 switches with RSTP and fast HSRP timers. Two-link etherchannels to both cores and load-balancing the VLANs by modifying STP cost.

I don't mind stacking in the access layer but I always get the feeling that it's too much of a risk in the core.

so who are you asking?  the network engineer that has to support, or the Cisco Engineer/sales guy?

from a nework engineer it's a PITA to support, upgrading, failed modules, failover, chassis have to be identical configs/. modules and such

if you are in a DC, use Nexus and build VPC's. In a campus environment, it's a nice core with all your access switches dual homed  to the VSS with MEC.
:professorcat:

My Moral Fibers have been cut.

mlan

Quote from: ristau5741 on March 15, 2017, 10:43:44 AM
In a campus environment, it's a nice core with all your access switches dual homed  to the VSS with MEC.

This is how I'm feeling about the topic.

Otanx

I am not a fan of stacking anywhere. I hit too many issues with stacking when it was new. I understand the idea of managing one config vs multiple, but once you automate all of that you don't care how many devices are being managed.

-Otanx

wintermute000

Maintenance. Cheaper than separate devices

mlan

Quote from: wintermute000 on March 15, 2017, 05:54:22 PM
Maintenance. Cheaper than separate devices

Also, the cost of stack modules/cables is sometimes less expensive than using separate optics and patch cables to aggregate access switches.  That might not be true for every vendor though.  Just playing devil's advocate here.  I enjoy these types of pros/cons discussions, and I usually learn something I didn't know/didn't think of. ;)

wintermute000

I trust stacking. I somewhat don't trust VSS.

Campus yes, dist yes, DC or massive/super-critical campus core no.