Networking-Forums.com

Professional Discussions => Management Tools => Topic started by: icecream-guy on December 30, 2020, 04:45:48 PM

Title: Monitoring nightmare
Post by: icecream-guy on December 30, 2020, 04:45:48 PM
so, I get paged for critical issue, wide spread panic/ major incident because a single PSN node went down, not like we have 14 total PSN nodes across 2 disparate datacenters (we do),  boy I did send some hate mail for that one.  I could see if all nodes in one of the DC's went down, but an emergency call out alert for a single node down in 1 DC is ridiculous.  I think ITIL4 refers to something about silos and collaboration/communication.  I suggested a meeting to discuss next week, will see if that ever happens.
Title: Re: Monitoring nightmare
Post by: deanwebb on December 31, 2020, 08:37:45 AM
Ugh. It shouldn't be SEV ONE if fault tolerance kicks in and we're all still running as per.