The Problems of A Company of a Certain Size...

Started by deanwebb, February 27, 2015, 03:43:54 PM

Previous topic - Next topic

wintermute000

or, nobody has the guts to properly analyse and design the access requirements/incredibly complex existing environment, so the top of the line solution goes in with more holes than swiss cheese.

I've seen an ASA5585-X cluster (10G interfaces the works) go in with permit any any for > 18 months.

deanwebb

Quote from: wintermute000 on June 23, 2015, 07:27:35 PM
or, nobody has the guts to properly analyse and design the access requirements/incredibly complex existing environment, so the top of the line solution goes in with more holes than swiss cheese.

I've seen an ASA5585-X cluster (10G interfaces the works) go in with permit any any for > 18 months.

Or, worse, it's such a boondoggle to make any changes that they sit in the box in the storage room for a year... while the company pays maintenance and support on them.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

routerdork

Quote from: deanwebb on June 23, 2015, 07:43:12 PM
Or, worse, it's such a boondoggle to make any changes that they sit in the box in the storage room for a year... while the company pays maintenance and support on them.
Sounds awfully familiar  :whistle:
"The thing about quotes on the internet is that you cannot confirm their validity." -Abraham Lincoln

icecream-guy

Quote from: routerdork on June 24, 2015, 08:42:34 AM
Quote from: deanwebb on June 23, 2015, 07:43:12 PM
Or, worse, it's such a boondoggle to make any changes that they sit in the box in the storage room for a year... while the company pays maintenance and support on them.
Sounds awfully familiar  :whistle:

We've got a couple of CSS 11500's in the warehouse in unopened boxes.  new old stock.
so we're cool when one of the production boxes dies
:professorcat:

My Moral Fibers have been cut.

deanwebb

Oh noes! That upgrade did not go well! I need to do a power cycle on a device, and it's not responding to SSH or remote KVM tools!

Small company: Jog around the corner to the server closet, right by the bathroom and the kitchen, maneuver around or over the box fans providing circulation, get to the back of the racks, and flip the power yourself.

Medium company: Let the rest of your team know that you'll be in the data center, walk down the hall, let your manager know as you pass him by, swipe your cardkey to get DC access, let the backup operator know what's up, reboot the gear yourself.

Large company: Contact your manager and your manager's manager to inform them of the outage. Call the data center oncall phone to report the incident. Fill out a Remedy ticket to initiate the power cycling procedure. Call the data center oncall phone again and tell the guy on the other end what the incident number is. Open a TAC case with the vendor. Wait. Wait some more. Let your manager and your manager's manager know that you're waiting. Take a call from your vendor TAM and give him an update. Wait even more. Take a call from the data center guy: he can't find the box. Drive across town to the data center and have the DC guy meet you there, sign you in, take your picture, escort you to where the device is and let you do the power cycle.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

SofaKing

Quote from: deanwebb on August 25, 2015, 12:31:14 PM


Large company: Contact your manager and your manager's manager to inform them of the outage. Call the data center oncall phone to report the incident. Fill out a Remedy ticket to initiate the power cycling procedure. Call the data center oncall phone again and tell the guy on the other end what the incident number is. Open a TAC case with the vendor. Wait. Wait some more. Let your manager and your manager's manager know that you're waiting. Take a call from your vendor TAM and give him an update. Wait even more. Take a call from the data center guy:

He has found the device and is ready to reboot.  You give him the go and he reboots the wrong device :developers:
Networking -  You can talk about us but you can't talk without us!

deanwebb

More large company fun: The call goes in at 11:55am.

No action by 1:15 pm. Call the DC guy... "We can get a guy there by 4pm..."

Explain what a "complete outage of service" means.

"OK, we can have someone there by 1:30. We have a big meeting that everyone's committed to, but I'll see who I can spare."

Send an email and CC my manager and my manager's manager...
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Nerm

All that red tape to flip a f*cking power switch.
:haha1:

deanwebb

Quote from: Nerm on August 25, 2015, 01:51:24 PM
All that red tape to flip a f*cking power switch.
:haha1:
Once I escalated the issue, the switch got flipped. Boxes came back up all nice and happy. :awesome:
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Otanx

Root cause analysis

Small company - Eh, it is working again.

Medium company - Google find a few posts where other people may have had the same problem, but there are enough differences in their reported configurations you are not sure. Can't find anything else, call that the reason. Let your boss know.

Large company - Get vendor involved. Lab up configuration, and try to duplicate issue. Read all bug reports you can find. Vendor can't duplicate problem, you can't duplicate problem. After two months root cause is declared unknown. Tickets are closed. A week later the problem shows up again.

Paranoid company of any size - We must have hackers. That is why it rebooted. Not just any hackers, we must be getting compromised by a nation state using APT. We need more security tools with malware sandboxing, and cloud based analytic systems to detect and prevent this in the future. You sit in the corner and play cyber security buzz word bingo. They get a $100K to buy a new tool, and you sneak in a new switch on the PR to replace the one that rebooted because the power supply is failing.

-Otanx

icecream-guy


EMPLOYMENT After screwups

Micro:  you fire yourself and your company goes out of business

Small:  The guy doing the "computers" gets fired, and someone else in the group "assumes the responsibility"

Medium: The network manager keeps his job, the grunt doing all the heavy lifting gets fired

Large: The entire team gets fired and is replaced with outsourced contractors   


LoL, I tried....
:professorcat:

My Moral Fibers have been cut.

deanwebb

Quote from: ristau5741 on August 26, 2015, 11:32:50 AM

EMPLOYMENT After screwups

Micro:  you fire yourself and your company goes out of business

Small:  The guy doing the "computers" gets fired, and someone else in the group "assumes the responsibility"

Medium: The network manager keeps his job, the grunt doing all the heavy lifting gets fired

Large: The entire team gets fired and is replaced with outsourced contractors   


LoL, I tried....


Wait, what, are you on the market now?
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

LynK

Quote from: ristau5741 on June 24, 2015, 11:20:39 AM
Quote from: routerdork on June 24, 2015, 08:42:34 AM
Quote from: deanwebb on June 23, 2015, 07:43:12 PM
Or, worse, it's such a boondoggle to make any changes that they sit in the box in the storage room for a year... while the company pays maintenance and support on them.
Sounds awfully familiar  :whistle:

We've got a couple of CSS 11500's in the warehouse in unopened boxes.  new old stock.
so we're cool when one of the production boxes dies

ristau. Quick FYI.. there is an issue with the CSSs if they go past a certain number of days of uptime. The box just stops working and needs to be rebooted entirely. The magic number is 828 days. 
Sys Admin: "You have a stuck route"
            Me: "You have an incorrect Default Gateway"

icecream-guy

Quote from: deanwebb on August 26, 2015, 11:47:25 AM
Quote from: ristau5741 on August 26, 2015, 11:32:50 AM

EMPLOYMENT After screwups

Micro:  you fire yourself and your company goes out of business

Small:  The guy doing the "computers" gets fired, and someone else in the group "assumes the responsibility"

Medium: The network manager keeps his job, the grunt doing all the heavy lifting gets fired

Large: The entire team gets fired and is replaced with outsourced contractors   


LoL, I tried....


Wait, what, are you on the market now?


I'm fine, just LOL at my meager attempt to be on topic.
:professorcat:

My Moral Fibers have been cut.

icecream-guy

Quote from: LynK on August 26, 2015, 02:05:26 PM
Quote from: ristau5741 on June 24, 2015, 11:20:39 AM
Quote from: routerdork on June 24, 2015, 08:42:34 AM
Quote from: deanwebb on June 23, 2015, 07:43:12 PM
Or, worse, it's such a boondoggle to make any changes that they sit in the box in the storage room for a year... while the company pays maintenance and support on them.
Sounds awfully familiar  :whistle:

We've got a couple of CSS 11500's in the warehouse in unopened boxes.  new old stock.
so we're cool when one of the production boxes dies

ristau. Quick FYI.. there is an issue with the CSSs if they go past a certain number of days of uptime. The box just stops working and needs to be rebooted entirely. The magic number is 828 days.

thanks for the heads up, we got it off the network a few weeks ago.
:professorcat:

My Moral Fibers have been cut.