Nagios, or Perish

October 4, 2009

A huge overhaul in my monitoring architecture is in progress, and as it goes on I find myself more and more confident that going with nagios is THE way to go. Period. Overtime we grew a number of monitoring tools (either FOSS or not) that do (when things go as expected) what they’re meant to do… But nothing more.

Here’s what’s cool about nagiois: it does what you expect it to do, and a whole lot more. It’s no secret I’ve always been a fan of Ethan Galstad’s baby, but overtime I also found myself adopting something different for special purposes… and it didn’t go as expected: highly specialized monitoring tools, with 0 flexibility and 0 integration w/ the rest of the world. Moreover, those systems tend to make you become a ‘slave’ of the system itself, not only for setting them up, but also for running them… And that’s simply unacceprable.

People often say that Nagios has a steep learning curve, but I found that it only requires accurate planning (which is not an option for other monitoring tools as well!).

What’s wrong with other monitoring solutions? In my opinion, most of them fail due to the ‘everything in a box’ approach, while Nagios itself is extremely modular by its very nature.

Fact is, unless your architecture is pretty straight (read dull), you’ll end up needing something more than the out-of-the-box bells and whistles that many tools provide. My own architecture is everything but straight (or dull ๐Ÿ˜‰ ): kilos of systems, tons of apps, complex networks, and so on: this is where nagios fits, as it allows for an incremental approach that makes you start with the basics and add bells, whistles, and whatever as you go.

The incremental approach is the key: you should really monitor only what’s critical, what’s providing insight into your systems/apps, what’s valuable when your dealing with outages and faults. This is why I don’t like the ‘agent does everything’ approach: it gives you tons of data, which seems cool at first sight, but ends up being useless (or, even worse, confusing) in real world scenarios.

Nagios is also often criticized for its lack of graphical configuration frontends. Actually, there are a few good frontends, but after a LONG evaluation I ended up choosing the good old conf by hand. Nagios’ template based and inheritance based conf allows for some elegant configuration scheme, which (if carefully planned) results in a highly maintainable system.

The result? A clean conf structure (structured dir tree for conf files), easily expandable conf items (templates, etc), manageable exceptions (which is soooo fundamental), integration w/ other tools (read trouble ticketing, etc), network awareness (parent/child relationship), dependencies awareness (when it’s needed!), and bells/whistles (nagvis, pnp4nagios, etc).

Nagios rocks!

Advertisements

11 Responses to “Nagios, or Perish”

  1. Joselu Says:

    It’s great all the things Nagios can do.

    But, have test the “other” monitoring tools? Perhaps not all of them lack clean structures, open capabilities, customizable events and agents, etc.

    Take a look at Osmius : http://osmius.net

    • Matt Simmons Says:

      Hi, great entry!

      Like you, I too opted for configuring Nagios by hand. After looking at all of the GUI tools, I figured that it was much simpler and easier in the long run to build a sane hierarchy for the confs and go from there.

  2. marco Says:

    so you don’t like nagiosql as graphical configuration frontend? in my opinion it’s not something amazing but it’s useful at least, so the customers are more self-sufficient to add/remove/manage nagios’ entries..

    • marco Says:

      ..however nagios rocks! ๐Ÿ™‚

      • drakpzone Says:

        The problem with gui frontends for nagios is that I still have to find one that gives you the whole picture. They’re cool if you need basic config tools, but fall short as soon as you need complex setups.

        And yeah, nagios rocks ๐Ÿ™‚

  3. Raffaello Says:

    Well, I love Nagios. But I don’t dislike GUIs like NagiosQL. Maybe I misunderstood your words, but… what do you mean for “complex setups”? NQL can manage them as well, as far as you provide it with the needed checks. You can have different groups, different contacts, different parent/child relationships, different escalations… and, if you really need to touch the conf, you can do it by hands. The best feature provided by NQL, anyway, is the DB, I suppose.


  4. Hi,

    You should look at Shinken, a Nagios reimplementaton in Python that (easily) manage distributed and high availability environnements. It manage the Nagios configuration and plugins, but is also multiplatform, multisites compliant and it’s even faster than the old Nagios ๐Ÿ™‚

    It’s still young (0.1 version), but it’s quite powerful ๐Ÿ™‚

    There also a demo virtual machine if you can take 5minutes to test it (there is new Ninja and Thruk web interface with this VM).

    Of course, it’s open source ๐Ÿ™‚ (AGPL licence).

    You can take it at http://www.shinken-monitoring.org

    I hope you will like it like you like the former Nagios ๐Ÿ™‚

    Gabรจs Jean, former Nagios lover, now Shinken dev


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: