IOS SLA monitoring

Started by Dieselboy, June 29, 2016, 11:25:59 PM

Previous topic - Next topic

Dieselboy

At silly-o'clock in the morning I see that my remote device loses OSPF relationships across the VPN tunnels. It looks like the internet may have gone down at the remote site, from the main site the nagios and cacti monitoring shows the site unreachable. It also shows the upstram ISP router (our gateway) also unreachable. However since there's no monitoring in-house at the remote site, all that this tells me is that the remote site was inaccessible from the main site. This could be a range of reasons for the loss of connectivity and it may not be the internet at the remote site but could be a transit issue.

I've been meaning to set up a basic IP SLA monitor from the IOS firewall at the remote site, so I can see when things go down and to give a bit more insight.

I found this Cisco doc and will give this a try and see how useful this is on the next occurrence.

https://supportforums.cisco.com/document/11935681/simple-eem-script-alert-ip-sla-failures

deanwebb

I've seen solutions that use the cellular network to send back info on the other side. Really sweet stuff, as they also allow for remote CLI access via mobile.
Take a baseball bat and trash all the routers, shout out "IT'S A NETWORK PROBLEM NOW, SUCKERS!" and then peel out of the parking lot in your Ferrari.
"The world could perish if people only worked on things that were easy to handle." -- Vladimir Savchenko
Вопросы есть? Вопросов нет! | BCEB: Belkin Certified Expert Baffler | "Plan B is Plan A with an element of panic." -- John Clarke
Accounting is architecture, remember that!
Air gaps are high-latency Internet connections.

icecream-guy

not sure if you are intending to monitor remote site from local, or run the IPSLA at the remote site.
Just make sure the SLA router has access to a mail relay and a way to send the email alerts offsite.  not getting alerts, or only getting alerts after the site comes back online, is quite useless.
:professorcat:

My Moral Fibers have been cut.

Dieselboy

I tried to find an internet SMTP relay but gave up. I found a website that boasted to list all open internet relays, but a handful of random ones I chose I couldn't connect to on TCP/25. Or if I could connect, it would not accept trying to send email through it.

As the site only has the one internet link anyway of course if the internet is down it cannot send an email. So I've also got a syslog written to the local device.

Monitoring is via cacti / nagios at the main site.
IPSLA at the remote site to at very least write a syslog when the IP SLA goes down and comes back too. So my monitoring will pick up that the site has gone offline, and I'll just check through the syslogs to see what actually happened once the site returns.

Because there's no monitoring at the site itself I couldn't determine if the site was losing the internet connection from the ISP, or whether there had been an ISP issue at the main office end (with regards to routing to the remote site) or if there was an issue on the internet in-path. This was just to give some further insight to narrow the problem down. :)

wintermute000

you could just set up a $5 a month VPS in digital ocean (for example) Singapore region and run the internet monitoring from there to rule out Australia to Sri Lanka transit.

pssst if you do use digitalocean, pls PM me for referral code!

Dieselboy

One of our AWS sites are located in Singapore. I had an issue earlier this week where average response time from the Sri Lanka office to AWS in Singapore was 3000ms! I used Dean's tcpping.exe tool and found that max latency to the HTTPS port was in fact 6500ms! I raised a case with the ISP and then routed traffic to AWS through our office in Australia across the VPN. Latency that way was just over 500ms which meant it was rediculously slow but it was at least loading.

I'm still waiting for the ISP to let me know what the issue was and how it was resolved but it's now back to between 60ms and 70ms. The trace route showed that all of the latency was added between 2 hops at Telstra. eg at hop #7 latency is 100ms but at hop #8 latency was over 3000ms.

$5 is pretty cheap. I'm more than happy for you to pay that for me
:awesome:

;)


:zomgwtfbbq: