Overhaulin’ nagios

August 4, 2006

nagios logoI’ve spent most of the day yesterday playing with nagios 2.5. I’ve been using nagios from the very beginning… since it was called “netsaint”. I found myself to be very comfortable with the 1.2 version which sit there for a long time. It has been monitoring my server farm for a few years now, and didn’t have much to complain, apart from the lack of management tools, which lead to a tedious work of cut&paste from/to several conf. files.

Well it seems that at last things had changed… and changed a lot! By tradition I’m used to monitor frequently the software I use, mainly on freshmeat or sourceforge. I have to admit I forgot nagios for a while… maybe since it’s been running flawlessy for so much time. Anyway, rel 2.x is out and stable… It didn’t change a lot from the user/manager/ui perspective, but the engine, afair, had gone under a major refactoring.

The new one gives a sense of clean, robust application, with tons of features just sitting there waiting to be used. Since the geek in me prays for features everytime.. I dived in this pool 🙂

Installation was as usual pretty straightforward. The whole thing was up in a few minutes, and I could even migrate easily (by hand) my existing conf, since the whole hosts/services/etc stuff stays at the same place.. and config data is pretty much compatible with the older version.

Et voila’… new nagios up & running in almost 1 hour, considering downloading the whole stuff, upgrading nagios plugins to new versions, checking nrpe (remote plugin executor) on all my linux boxes running around, etc.

Still… I missed the webmanagement feature… I really hoped it could be inserted in 2.x… but still isn’t there. Anyway, I decided to give nagiosexchange.org (main repository for nagios plugins/extras/addons) a try… Well… it turned out that the amount of plugins there is just huge… and last but not least I found the incredible “groundwork monarch” which is a GREAT tool for web-managing the nagios conf and stuff. Perl based, ajax oriented, and mysql driven as backend db, I got it installed in a few minutes… and once I started using it my jaw nearly dropped off… clear and clean UI, huge amount of features, and after the first tests… really rock solid. It read/imported my static filebased configuration, and converted it in its relational schema on mysql. Then I started playing around… and wow…

That’s the way monitoring should be. Since I re-fell in love with nagios now.. I decided to strenghten my monitoring architecture: apart from the usual email/sms notifications, I’m planning to integrate notifications with otrs (amazing trouble ticketing solution… open source of course!). Also, I’ve seen lots of plugins for hw/environment monitoring based on dell hardware… All in all it’s about querying the hw using Dell openmanage tools…. and for my Dell-based server farm it’s a joy having all the stuff about temperatures, voltages, raid status, etc… all inside nagios.

Nonetheless… I used to run nagios on my production backupserver. Well… It worked fine… but during the years I always had the knowledge of one thing: there’s only one thing worse than having a severe failure: having a severe failure and not be aware of that. In this respect, I decided to overhaul the nagios architecture at my site… and put the whole thing on a couple of machines which run Veritas Cluster for filesharing purposes. So nagios now is a clustered application, which means that I have really higher availability on the monitoring service. This allows me to be really more aware of hw/network failures… and imho it’s a must have in a seriously monitored environment.

This could be accomplished easily also with opensource tools/solutions, such as linux-ha.net.

Enough for today


2 Responses to “Overhaulin’ nagios”

  1. Vitor Says:

    Can you tell me the plugins that you use?

    Do you use MIBs?

    I need to monitor the disks and memory a of two poweredge.

    Can i have some feedback from you?

    Tks anyway!

  2. drakpzone Says:

    no mibs for now… I actually wrote wrappers to the openipmi tools in order to get environmental/sensor info from dell servers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: