15 Feb 2008

How to monitor Bind with Munin

Unix sysadmin and never heard of Munin? Good news for you: You have a great tool waiting. Munin monitors your servers, stores the results and generates pretty graphs for you to interpret. Munin itself is written in Perl, but uses plugins, written in language of choice, to fetch relevant data. The default install comes with a number plugins that works out-of-the-box - most of them written in Perl or shell. But some plugins, or services, require manual intervention to work. Bind is such a service, so let's see how we can monitor Bind with Munin.

I install Munin everywhere I can. It's a really helpful tool. After I've started using Munin (and Nagios), I'm puzzled of how I managed without before. Munin gives you historical graphs and enables you to predict resource consumption trends: "Is there any memory increase during the last year? Are the number of mail/spam increasing? What about CPU load? Network throughput?" etc.

Some time ago, I was at a customer and installed Munin on a bunch of servers. The next day, the sysadmin called and thanked me. He finally knew why he had to reboot two of his Oracle server every week. There was some kind of memory leak eating away all memory before the server crashed. He contacted Oracle to come up with a fix.

Another example: You arrive at work, and a server has crashed/rebooted/panicked during the night. Now, why did it do that? If you know why, perhaps you can prevent it from happening again. Munin can be of great help here: Check the graphs right before the crash - seeing anything unusual? Increase in network traffic? What about CPU load? Memory? Number of processes? It can give you a really good indication of what went wrong.

Munin do have some limitations. It does not scale well (to hundreds of servers) and I find it particularly painful to create aggregated graphs (for example aggregated network graph of two or more hosts). But I know these issues are being worked on.

Okay, enough talk - let's monitor Bind:

First we need enable logging. Create a log directory and add log directives to the Bind configuration file (here on Debian):

  # mkdir /var/log/bind9
  # chown bind:bind /var/log/bind9
  # cat /etc/bind/named.conf.options
  ...
  logging {
        channel b_log {
                file "/var/log/bind9/bind.log" versions 30 size 1m;
                print-time yes;
                print-category yes;
                print-severity yes;
                severity info;
        };

        channel b_debug {
                file "/var/log/bind9/debug.log" versions 2 size 1m;
                print-time yes;
                print-category yes;
                print-severity yes;
                severity dynamic;
        };

        channel b_query {
                file "/var/log/bind9/query.log" versions 2 size 1m;
                print-time yes;
                severity info;
        };

        category default { b_log; b_debug; };
        category config { b_log; b_debug; };
        category queries { b_query; };
  };

Restart bind:

  # /etc/init.d/bind9 restart
  Stopping domain name service: named.
  Starting domain name service: named.

You can now see log files are being populated under /var/log/bind9/*

Next, configure Munin:

Make sure the munin-user ("munin") can read you bind log files.

We need two additional plugins: "bind" and "bind_rndc". If you can't find them in your default install, head over here.

The "bind" plugin should work right away. "bind9_rndc" however need to read the "rndc.key file, which only are readable by the user "bind". You have two options, either run the plugin as root or add the user "munin" to the group "bind" and enable the group "bind" to read the rndc.file. For the sake of simplicity, I run the plugin as root here. So you need to add:

  # cat /etc/munin/plugin-conf.d/munin-node
  ...
  [bind9_rndc]
  user root
  env.querystats /var/log/bind9/named.stats
  ...

Next restart Munin:

  # /etc/init.d/munin-node restart
  Stopping munin-node: done.
  Starting munin-node: done.

Munin run every five minutes, so go take a coffee. Wait.

After a while, graphs arrive:

And the bind_rndc plugin:

(Consult the "BIND 9 Administrator Reference Manual" if you have trouble interpreting the results.)

Nice huh?

No comments: