Index
Items in red seem to be the biggest problems for people - read these first..
Problems compiling NetSaint
Problems compiling the statusmap CGI
"NetSaint process may not be running" warnings in the CGIs
Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN"
When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data
Debugging "unknown variable" errors during configuration verification or runtime
Running multiple instances of NetSaint on the same machine
Changing the contents of the default web page
Missing data in the CGIs or errors about improper authorization
Problems finding the traceroute CGI
Requiring users to authenticate before accessing web interface
Displaying pretty host icons
Errors commiting commands via the command CGI
Monitoring virtual web servers that use host headers
Monitoring remote host information
Monitoring printers
Monitoring Windows NT servers
Sending SNMP traps to management hosts
Logging events to an external database
Troubleshooting problems with NetSaint
I'm having trouble compiling Netsaint - What can I do? | ||||||||||||
If you are running Linux, this is probably because you don't have the gcc compiler installed on your system. Either install the compiler yourself or ask your sysadmin to do it for you. If you are running SunOS, IRIX, HP-UX, *BSD, etc. make have to tweak the Makefile a bit. This may involve changing the compiler name, compiler options, and/or linker options. If you're getting errors about the strncat(), strncpy(), or snprintf() functions, you probably don't have the glibc libraries installed on your system. This tends to happen most often on HP-UX and Solaris boxes. I've tried to prevent potential buffer overflows in NetSaint and the CGIs by using these functions, so they are all over the code. If you don't want to install the glibc libraries for some reason, you'll have to find some other way to get everything compiled. If you have to make changes to the Makefile, configure script, or any code in order to compile NetSaint, let me know what OS you are running and what changes you had to make. I would like to include this information in future releases. |
||||||||||||
I can't find or am having trouble compiling the statusmap CGI... | ||||||||||||
If you compile all the CGIs, but don't find the statusmap CGI, you probably don't have Thomas Boutell's gd library installed correctly on your system. The gd library (and thus the statusmap CGI) also requires that you also have the zlib and png libraries installed. Version 1.6.3 or higher of the gd library is required, as the CGI generates a PNG image of your network layout. If you find that the statusmap CGI has not been compiled, make sure you have the gd library installed on your system and rerun the configure script with the following options: ./configure --with-gd-lib=LIBDIR --with-gd-inc=INCDIR Replace LIBDIR with the directory in which the gd library is installed (usually /usr/lib or /usr/local/lib) and replace INCDIR with the directory in which the header files for the gd library are installed (usually /usr/include or /usr/local/include). After you rerun the configure script, make sure to recompile the CGIs and install them in their proper location. |
||||||||||||
"NetSaint process may not be running" warnings in the CGIs | ||||||||||||
If you are getting erroneous messages about the NetSaint process not running while viewing the CGIs, its probably due to one of the following items:
The CGIs will not allow you to sumbit any commands while they think the NetSaint process is not running. This is done primarily to prevent people from accidentally submitting multiple shutdown/restart commands that don't get processed until NetSaint is started at some future time. |
||||||||||||
Hosts are incorrectly listed as being DOWN and/or services have a status of "HOST DOWN" |
||||||||||||
This seems to be one of the biggest issues for new users. 99.9% of the time this problem is due to an incorrect command definition for the host check command you specified in the host definition. A major cause for this problem was due to a syntax change to the command line arguments of the check_ping plugin. You need to make sure that the host check command is using the proper syntax for the version of the check_ping plugin that you have. You can check to see if the command works properly by executing it manually from the command line. Recent versions of the check_ping plugin require that a -p flag be used to specify the number of packets to send. Previous versions of the plugin did not require this flag - that's where the problem lies. Check your host check command definition(s) to make sure they are using the proper syntax. Example: command[check-host-alive]=/usr/local/netsaint/libexec/check_ping $HOSTADDRESS$ 100 100 1000.0 1000.0 -p 1 Important! Just because you have a service that is monitoring ping statistics for a host does not mean that the actual host status is being checked. The status of a host is only checked when a service check results in a non-OK state or if the host was previously down and a service check results in an OK state. Some symptoms of incorrect host check commands include:
|
||||||||||||
When hosts go down, I get notification about services instead of hosts and the service notifications contain incorrect data |
||||||||||||
Several people have reported this problem and I spent hours trying to find the problem until I realized it wasn't a bug in the code. If you get service notifications when you should be getting host notifications (and the service notifications you get seem to contain bogus data), check your contact definitions in the host config file. They are most likely incorrect. Make sure that you are not using the same notification command for service and host notification commands. Service and host notifications are very different and make use of macros which are not transferrable between each type. Look at the sample host config file provided with NetSaint to see what the contact definitions look like and how the service and host notification commands differ. If you're wondering what macros can be used in either type of notification, look at this table. |
||||||||||||
Debugging "unknown variable" errors during configuration file verification or runtime | ||||||||||||
When trying to run NetSaint or verify your configuration file data using the -v argument, NetSaint may print out a message like "Error in configuration file 'xxxxxxx.cfg' - Line 34 (Unknown variable)". A few simple checks will usually resolve this problem...
|
||||||||||||
How do I run multiple instances on NetSaint on the same machine? | ||||||||||||
You can run multiple instances of NetSaint on the same machine, if you ensure that the following variables are unique for each instance of NetSaint...
If you are using the web interface, you will have to setup separate directories to hold the CGIs for each instance of NetSaint and create appropriate script aliases in your web server configuration file. This is necessary because CGI configuration file must be unique for each setup of CGIs, as it contains a reference to which main configuration file the CGIs should read. Also, if you plan on running both copies of NetSaint is daemon mode, you'll need to change the #LOCK_FILE definition in the common/locations.h file before compiling the second copy. If you don't, both copies of NetSaint will try and use the lock file. The second one that is started will complain and the exit. Version 0.0.6 will include the ability to specify the location of the lock file in the main configuration file. One last thing you should check is your init script (if you're using one). The init script should start, stop, restart, and reload all copies of NetSaint (if that's what you want). |
||||||||||||
How do I change the contents of the default web page? | ||||||||||||
Several people have asked how to modify the default web page so that service detail or service overview information is displayed in the right hand frame (instead of the intro page). You can do this rather easily by modifying the frameset information in the index.html page (located in the root web directory for NetSaint) as follows.. Default Frame Configuration
<FRAMESET BORDER="0" FRAMEBORDER="0" FRAMESPACING="0" COLS="180,*"> Modified Configuration
<FRAMESET BORDER="0" FRAMEBORDER="0" FRAMESPACING="0" COLS="180,*"> Replace xxxxx with one of the following values, or anything else you may want...
Read the documentation on the CGIs for more information on what options each supports. |
||||||||||||
When I access the CGIs I don't see everything I should or I get authorzation errors... | ||||||||||||
If you believe you are unable to see all the information in the CGIs or if you are getting authorization errors, you probably haven't configured the web server to require authentication or haven't setup authorzation correctly. See the documentation on authentication and authorization in the CGIs here. |
||||||||||||
Where can I find the traceroute CGI? | ||||||||||||
Newer versions of the check_ping plugin are capable of producing HTML that provides a link to a traceroute CGI written by Ian Cass. The traceroute CGI is not included in the core distribution of NetSaint. However, you can find it in the contrib area of the downloads section at http://www.netsaint.org/download/contrib. |
||||||||||||
How do I requre users to authenticate before accessing the web interface? | ||||||||||||
See the documentation on authentication and authorization in the CGIs here. |
||||||||||||
How do I get those pretty pretty host icons to display in my CGIs? | ||||||||||||
If you want to associate images with particular hosts for use in the status, status map, status world, and extended information CGIs, you must define extended host information entries in your CGI configuration file. |
||||||||||||
I'm getting errors when attempting to commit commands to NetSaint via the command CGI | ||||||||||||
If you are getting 'Could not open command file somefile for update' errors when attempting to commit commands to NetSaint via the command CGI, the most likely problem is with directory and/or file permissions. Here is what you can do to fix it. Note: You must be root in order to do some of these steps... First, find the user that your web server process is running as. On many systems this is the user nobody, although it will vary depending on what OS/distribution you are running. Next, create a new group that will be granted permissions to update the NetSaint command file. Let's say you want to call the group 'nscmd'. On RedHat Linux you can use the following command to add a new group (other systems may differ): /usr/sbin/groupadd nscmd Next, add all users who should have access to the command file to the group you just created. In this example we'll just add the user nobody... /usr/sbin/usermod -G nscmd nobody Next, create the directory where the command file should be stored. By default, this is /usr/local/netsaint/var/rw, although it can be changed by modifying the command_file variable. mkdir /usr/local/netsaint/var/rw Next, change the group ownership of the directory used to hold the command file... chown -R .nscmd /usr/local/netsaint/var/rw Also check the group permissions on the directory. The group you created needs to have write access there. The last thing you'll have to do is restart your web server with a command similiar to the following.. /etc/rc.d/init.d/httpd restart Apparently Apache needs to be restarted in order to inherit the new group permissions you assigned. That's it. You should be able to commit commands to NetSaint via the CGI now (assuming you have the proper authorization). If you supplied the --with-command-grp=somegroup option when running the configure script, you can create the directory to hold the command file and set the proper permissions by running 'make install-commandmode'. |
||||||||||||
How do I monitor virtual web servers that use host headers? | ||||||||||||
If you are running a web server with multiple virtual servers and only one IP address, this applies to you. Let's say that your web server has an IP address of 192.168.0.1 and two virtual servers running on it - "www.myfirstdomain.com" and "www.myseconddomain.com". Both of these domain names resolve to the same IP address (192.168.0.1) during a DNS lookup. The check_http plugin can handle this type of situation without a problem. You will need to specify the virtual web site name as an additional command line argument to the plugin (using the -hn option). Example:
command[check_http2]=/usr/local/netsaint/check_http $HOSTADDRESS$ -u / -p 80 -hn $ARG1$ The check_http2 command defined here will use the check_http plugin to open a connection to port 80 of the host at IP address 192.168.0.1. It will then send an HTTP/1.1 request for the root document, along with either a "Host: www.myfirstdomain.com" or "Host: www.myseconddomain.com" in the request header. |
||||||||||||
How do I monitor remote host information? | ||||||||||||
Several people have asked how to use various plugins that check information on the local host to report information from remote hosts. Various methods for doing this are described below.. If you need to actually execute a plugin on a remote host and get the results back, you can use one of the following methods...
If all you need is to check disk space, etc. on a remote host, you can use one of the methods below...
|
||||||||||||
How can I monitor NT servers? | ||||||||||||
The good news is that NT has a lot of performance data that you can monitor. The bad news is that its difficult to do. Your best bet is probably going to be to install SNMP services on all your NT boxes. Ian Cass has written a FAQ on how to do this at http://elton.dev.knowledge.com/snmpfaq.html In order to expose NT performance counters for monitoring, you'll have to run the SNMP service on all servers you want to monitor. You'll also have to install any necessary performance MIBs for the services you want to monitor. I believe these can be found in the NT Resource Kit or in various server admin packages. If you've feeling extra lucky you can try to search the Microsoft site for the terms SNMP and MIB and maybe you'll find something... You can search the MRTG mailing list archives for more information on configuring NT servers to expose various performance counters via SNMP. I know this has been discussed in the past, as many people are graphing various NT performance statistics using MRTG. In fact, somebody from Microsoft is actually doing it - you can find their web page at http://snmpboy.rte.microsoft.com/. Once you've actually got the SNMP stuff working, you can use the check_snmp plugin to query your NT servers and generate alarms. A few people are looking into the possibilities of creating a service that runs under NT to facilitate easier remote monitoring. Once these efforts solidify, an announcement will be made on the NetSaint mailing lists. |
||||||||||||
How do I monitor printers? | ||||||||||||
Assuming you have HP printers with JetDirect© cards installed, you can use the HP printer plugin to monitor them. Before you begin monitoring printers you should carefully plan your configuration to match level of monitoring and response time you need. You need to balance this against the annoyance of getting alerted every time sometime takes the printer offline to manually feed a transparency, etc. A lot of admins probably don't care if the printer is jammed or is out of paper, but some tech support people in large corporations might find this to be a useful feature. Anyway, if you decide to do this you will need to do the following things:
|
||||||||||||
Can NetSaint send SNMP traps to management hosts? | ||||||||||||
Yes, but not directly. NetSaint relies on plugins to handle the gathering of service and host information and event handler scripts to handle events that occur with services and hosts. If you want to have NetSaint send an SNMP trap to a management host in the event that a particular service has a problem, you will have to write a service event handler script and add it to the event_handler option of the service definition. If you have the UCD-SNMP package installed on your host, you could have the script call the snmptrap command to actually send a trap message, depending on what type of service event occurred. Look at the example event handler script to get a better idea of how to write a script. |
||||||||||||
Can NetSaint log host and service events to an external database? | ||||||||||||
Not directly, but this can be done fairly easily. You'll probably want to define global host and service event handlers to do this. The global event handlers could call a script which inserts the appropriate event information into a database of your choosing. This would allow you to run queries and generate more detailed reports than what are available in the CGIs. |
||||||||||||
Something isn't working properly - How can I track down the problem? | ||||||||||||
I've worked in tech support for a few years and have spent my share of time on a helpdesk. Most people are vague when they report a problem and have no desire whatsoever to try and track down the problem - they just want you to fix it now. I hope you are not that type of person. NetSaint is relatively new and is probably chock full of bugs, so things will not always work properly. If you suspect that either the service check or notification routines are not working, here are a few things you can do to try and track down the problem... This first thing you should do is verify your configuration data by running NetSaint with the -v option. Example: ./netsaint -v /usr/local/netsaint/etc/netsaint.cfg If no errors are found, proceed to the next steps. If NetSaint reports some error, go back and fix your configuration files. The next step will take more time, but will give you more information on what is going on inside of NetSaint. When I first developed NetSaint I added a lot of debugging code to help me track down problems. I still use that code when I add new features or track down bugs myself. Here is how to use the debugging code... Reconfigure NetSaint and enable one or more debug options as follows, replacing the "--enable-DEBUGx" with one or more of the values from the table below: ./configure --prefix=/your/netsaint/directory --enable-DEBUGx Debugging Options
Recompile NetSaint. Verify your configuration data again - you'll see a lot more information this time if you have enabled the DEBUG1 option. Try redirecting output to a file so that you can view or print it at a later time. If you have defined either the DEBUG3 or DEBUG4 options, run NetSaint as a foreground process and start monitoring your services. Example: ./netsaint /usr/local/netsaint/etc/netsaint.cfg Kill NetSaint at an approprate point (i.e. after a service check fails) and look through the output. It should help you track down where the problem is occurring. Some code tweaking may be necessary on your part in order to fix things. Let me know if you have to make any such alterations so I can include the fix in future releases. If you are unable to determine or fix the problem on your own, email me the following items:
|
||||||||||||