Big Sister implements alarming in a server based manner. The agent is responsible for determining if a system or service is working correctly ("green"), if it is critical ("yellow") or it has failed ("red") - other stati do exist but are not relevant to alarming.
This status is noticed by the alarming module of the server. Depending on the configuration filebb_event_generator.cfg the server generates alarms on status changes.
The alarming configuration mainly consists of a set of rules. Each rule consists of a pattern matched against all status change, a definition of dependencies and a description of the action to be taken when an alarm is raised. The first two elements describe under what circumstances an alarm is to be raised while the last one describes how actually the alarm is raised.
Using this simple approach a few things can easily be configured either for individual checks, for individual hosts or for whole groups:
wait for a defined time period before reporting an alarm and do not report an alarm if the problem goes away within this period
regularly send reminders telling the administrator that a problem persists until the problem goes away
do not repeatedly send alarms for a multiply occurring problem
behave different depending on time of day or day of week (e.g. postpone alarms raised during the night to the early morning)
suppress alarms depending on what status other systems/services are in (e.g. do not report that a system is unreachable when Big Sister already knows that the whole network the system is connected to is down)
The main disadvantage of the existing rule based alarming configuration is that it is very hard to find a simple way to explain how it works. Unfortunately you will just have to read the whole section and hopefully understand the configuration at the end.
An alarming rule in the bb_event_generator.cfg file always starts with a pattern followed by a description describing what actions should be taken if the pattern matches.
Whenever a status change is detected, bb_event_generator.cfg goes through the config file and looks for matching patterns. Each variable associated with the matching patterns is then set as described. If multiple patterns are matching the associated variables are set in order.
Every time a status change is noticed the alarm generator does two things:
go through the pending alarms and check if the status change has some effect on one of them
if the status change is not related with one of the pending alarms: go through the list of rules, select all the matching rules and raise an alarm depending on their descriptive part
Usually each line in the configuration file represents one rule. Of course like in most Big Sister configuration files empty lines and lines starting with a '#' char- acter are treated as comments and are therefore simply ignored. A rule may span multiple lines: Lines terminated with a '\' character are joined with their following line.