upstream monitor

Started by icecream-guy, July 13, 2018, 11:28:59 AM

Previous topic - Next topic

icecream-guy

Out upstream provide was dropping some (considerable) packets,  little was seen by our monitoring systems, somewhat decrease in interface bandwidth, which was not caught in a timely fashion,  people taking smokeping would have helped, but I'm wondering if there are other monitoring tools for a situation like this
:professorcat:

My Moral Fibers have been cut.

deanwebb

Hire a big guy named Tony to show up at the upstream provider and have him sit in the IT area.

Big Tony announces to the team: "If I get a call dat da upstream routah is droppin' packets, I'm bustin' all youse guys' heads. Kapeesh?"
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

Dieselboy

I ping and graph our website and googles DNS and remote sites IPs, so I get alerts when there's too much loss out to the internet for one. If just one check is failing then I can assume it's just that one service with the issue but if multiple then it could be internet related. I do this really just to get some more insight into whats going on. Sometimes (actually, often) we're impacted in some way by a SEA-ME-WE cable cut.

If you have a VPN going over the same internet line then you could detect issues with icmp checks that way also, it just depends how severe the loss is. Excuse me for being basic here. If the packet loss is tiny, then it might not be realised though. You could use a check to access a web page and report back a status. For example, from one site, access a web page in the other site such as a management interface of a web app you have. This would involve a few more packets so might be picked up more easily this way?

Performance routing can be used but it's a licensed feature. It's a very nice feature though.

Also IPSLAs on routers with SNMP or syslog logging to alert of tracks going down. The device at the remote site can do some basic monitoring and reporting itself.

What sort of monitoring do you have at the moment that didnt detect this? In my experience sometimes multiple perception points are very helpful.

If I think there's an issue, it's really manual tasks for me at that point. I use ping plotter to a few different places and watch the results. I'll RDP to a remote machine and run the same there to get another perspective also.

On the other hand, it's difficult to detect problems outside of your own network. Once I have the ping plotter results I fire them off to the ISP and tell them there's a problem and can they fix it.

I use cacti to do the pings, the graphs and the alerts. It's easy to set up for that for a lack of a better tool.

icecream-guy

using smokeping, that should have raised the flags, but apparently NOC doesn't have access to the tool.  we do ping some remote sites,  have alot of coverage, for when interfaces or devices go down, maybe IPSLA may be what is needed. I see that it has loads of options.
:professorcat:

My Moral Fibers have been cut.

Dieselboy

Quote from: ristau5741 on July 16, 2018, 06:40:08 AM
using smokeping, that should have raised the flags, but apparently NOC doesn't have access to the tool.

TBH I havent used smokeping before, but it looks like it might be like pingplotter (might even be a bit better).. I think that the problem here is not the tool, per se; it's the experience or education to understand what needs to be done to investigate further. They could have gone old skool and used multiple cmd.exe windows to run simultaneous trace routes and pings to various places to work out what was wrong.

Since it's noc I am assuming you have some 1st line engineers that might be new to the IT world. Why not make a cheat-sheet with some steps to follow, some basic explanations of what the cheat sheet does and when to use it and what to look out for when they are performing it?

mmcgurty

I just deployed smokeping on a Raspberry Pi 3B+ in my home network to measure latency to a few different hosts on the Internet for my own knowledge.  I wasn't having any issues or anything but I was just curious what the results would look like and needed something for my growing Raspberry Pi collection to do.

icecream-guy

Quote from: mmcgurty on July 20, 2018, 08:09:08 AM
I just deployed smokeping on a Raspberry Pi 3B+ in my home network to measure latency to a few different hosts on the Internet for my own knowledge.  I wasn't having any issues or anything but I was just curious what the results would look like and needed something for my growing Raspberry Pi collection to do.

I got plans for my Pi.  just need to figure out how to connect and read temperature probes...for starters.
:professorcat:

My Moral Fibers have been cut.