Better than nagios?

Dieselboy · July 21, 2016, 08:50:53 PM

I have a working nagios set up, just waiting for me to go ahead and set up granular SNMP monitoring of things like physical interfaces of the switches AND OSPF relationships and route entries.

I have done this before but on those times I literally had to:
- use my Windows computer to poll the device using snmpwalk
- find each interface description OID and the up/down OID
- program all of these individual SNMP GET requests into Nagios

Just the above will take days to do.
It would be WAY quicker if I could do some interface monitoring like what Cacti does when you add a device:

- add the device
- refresh the interfaces (basically an SNMP walk)
- check the checkbox against which interfaces you want to monitor
- save

are there any free tools that would allow you to configure monitoring in this way? Nagios is fine but the text file editing is very time consuming.

Dieselboy · July 24, 2016, 11:58:12 PM

I done it

I managed to keep searching until something relevant came up.

I did start manually configuring this myself, using cacti. But it's been so long since I did any customisation in Cacti that I spent about half an hour or more just clicking around trying to find the relationships between things again. In the end I found a nice post: https://kb.groundworkopensource.com/display/SUPPORT/How+to+add+interface+operational+status+checks

It gives you an updated graph template so you don't need to do most of the manual work.

All I had to do was teak the "title" of the names of the thresholds / graphs that were created so that when the alert comes through it gives you: device name & interface name & and configured interface description
So the alert will be somewhat self explanatory depending on if you're up to date with your interface descriptions.

Piece 'o' cake - saved me about 2 weeks work in a few hours

that1guy15 · July 26, 2016, 10:40:08 PM

hehehe...

Nagios is an amazing tool and does its job very well. BUT its the most useless POS I have touched for most jobs. Nothing is automated and you have to do EVERYTHING manually. Sorry there are a ton of better options out there if you want to simplify your NMS.

BUT with that said I understand sometimes we are forced into the NMS we use. There are a number of front ends to Nagios that simplify adding new devices which should get the job done for you.

Else build a bash/python script to do the job for you.

Else, else sale your boss on any of the numerous NMS tools out now that do auto-discovery.

I feel your pain...

Dieselboy · July 27, 2016, 04:03:37 AM

It's hard to justify the huge expense that comes with a "decent" monitoring "solution" when the same can be done with freebies.

The OP for this (where I had to monitor my physical interfaces) was one of the reasons I'd not put the time into the nagios, to get this working. I actually bought the nagios image which included a gui configuration interface - which was built by nagios and is apparently part of the nagios XI - it was better than nothing but it was still crap. Sometimes the config gui would get out of sync with the text files and you would have to delete all your work so you could enter it in again.

I was trying to find a decent gui config but gave up earlier today.

I don't disagree that nagios does its job well in terms of monitoring and alerting, but getting there is the difficult part. And if your environment(s) are dynamic then you're going to have to put the time in, over and over again to keep it up to date. That's something which I just can't do. Example - those physical interface monitoring jobs I had to do. I'm currently at 91 individual interface snmp checks, for key links only across 2 sites and about 10 devices. I have a small network that is growing but still 91 individual checks. The amount of time that would have taken me with nagios and stupid text files compared to click click with Cacti that I implemented the other day. And they really are stupid text files, where there's no intelligence to them at all in relation to the device you're applying the check item to.

I will admit though that individual up/down interface checks cannot benefit from configuring a single check, or series of checks and then apply those to a group of devices. Some of my checks are virtual interfaces (VTI VPN Tunnels). Nagios is most powerful when you can configure 1 check and apply it to a group of similar devices. Such as CPU load for example.

Can you recommend a nice gui config for nagios? I would like to install one / multiple and see how they go. My guys don't really update nagios, I would like to make it easier for them as well.

icecream-guy · July 27, 2016, 07:32:56 AM

If it don't work, dump it. get something else, for as much time you say you are putting in, it may be cost savings to buy a better tool over the long term then spending hours and hours of billable time maintaining the beast, say your putting in 10 hours a week maintaining this thing, at 50 bucks an hour for a billable rate, that's 500 bucks a week you could be saving the company, and you could put your time to more profitable endeavors. 3 weeks you could probably pay for solarwinds.

Think solarwinds or whatsupgold something like that.

mmcgurty · July 27, 2016, 07:46:30 AM

I thought that NAGIOS XI let you work without having to use the CLI/scripting. I don't think it is too much but to be honest we were all NAGIOS XI until the lone admin left for another job. We ended up dumping NAGIOS XI and moving to Solarwinds NPM (still in process) because no one wanted to take over NAGIOS XI.

that1guy15 · July 27, 2016, 08:18:53 AM

Quote from: Dieselboy on July 27, 2016, 04:03:37 AM
Can you recommend a nice gui config for nagios? I would like to install one / multiple and see how they go. My guys don't really update nagios, I would like to make it easier for them as well.

I honestly have not played with any of them but a little googling around will give you a number of options to test. Also there are multiple auto-discovery plugins for Nagios that might get you over the hump.

wintermute000 · July 27, 2016, 05:26:08 PM

What'sup gold or statseeker
Why not just cacti?

Dieselboy · July 29, 2016, 04:05:30 AM

I got a quote for nagios xi when I was in London and the price was a few thousand USD. The company bought the pre-built VMWare which included the gui config end but it wasn't all that great. Still it allowed us to configure nagios a lot quicker for the most part. Still had problems with the gui not working properly though.

I'm trying to find something that will be a bit quicker than text files so that there's no excuse for not keeping monitoring up to date.

I've also been testing another product which connects to the hypervisor itself and discovers all the VMs itself. You can then set up alerting and monitoring but it's not free. The open source version is free but it's presently broken so I'm not sure of the full scope yet.

The challenge is that I need to monitor:
- the network
- the servers across multiple platforms such as
-- vmware
--kvm / RHEV
-- AWS
-- Openshift
-- I think thats it

With RHEV, there's 90 VMs there
On vmware, there's a good number

In terms of man hours - there's 3 of us, I'm the network person. The other two are flat out on dev-ops work. If I can make their life easier then all good