Bacula 1.29 User's Guide Chapter 15
Back
Utility Programs
Index
Index
Next
Win32 Implementation

What To Do When Bacula Crashes (Kaboom)

If you are running on a Linux system, and you have a set of working configuration files, it is very unlikely that Bacula will crash. As with all software, however, it is inevitable that someday, it may crash, particularly if you are running on another operating system or using a new or unusual feature.

This chapter explains what you should do if one of the three Bacula daemons (Director, File, Storage) crashes.

Traceback

Each of the three Bacula daemons has a built-in exception handler which, in case of an error, will attempt to produce a traceback. If successful the traceback will be emailed to you.

For this to work, you need to ensure that a few things are setup correctly on your system:

  1. You must have an installed copy of gdb (the GNU debugger), and it must be on Bacula's path.
  2. The Bacula installed script file btraceback must be in the same directory as the daemon which dies, and it must be marked as executable.
  3. The script file btraceback.gdb must have the correct path to it specified in the btraceback file.
  4. You must have a mail program which is on Bacula's path.
If all the above conditions are met, the daemon that crashes will produce a traceback report and email it to you. If the above conditions are not true, you may be able to correct them by editing the btraceback file. In doing so, you can add a correct path to the gdb program, correct the path to the btraceback.gdb file, change the mail program or its path, or change your email address. The key line in the btraceback file is:
gdb -quiet -batch -x /home/kern/bacula/bin/btraceback.gdb \
     $1 $2 2>&1 | mail -s "Bacula traceback" your-address@xxx.com
Since each daemon has the same traceback code, a single btraceback file is sufficient if you are running more than one daemon on a machine.

Testing The Traceback

To "manually" test the traceback feature, you simply start Bacula then obtain the PID of the main daemon thread (there are multiple threads). Unfortunately, the output had to be split to fit on this page:
[kern@rufus kern]$ ps fax --columns 132 | grep bacula-dir
 2103 ?        S      0:00 /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2104 ?        S      0:00  \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2106 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
 2105 ?        S      0:00      \_ /home/kern/bacula/k/src/dird/bacula-dir -c
                                       /home/kern/bacula/k/src/dird/dird.conf
which in this case is 2103. Then while Bacula is running, you call the program giving it the path to the Bacula executable and the PID. In this case, it is:
./btraceback /home/kern/bacula/k/src/dird 2103
It should produce an email showing you the current state of the daemon (in this case the Director), and then exit leaving Bacula running as if nothing happened. If this is not the case, you will need to correct the problem by modifying the btraceback script.

Typical problems might be that gdb is not on the default path. Fix this by specifying the full path to it in the btraceback file. Another common problem is that the mail program doesn't work or is not on the default path. On some systems, it is preferable to use Mail rather than mail.

Getting A Traceback On Other Systems

It should be possible to produce a similar traceback on systems other than Linux, either using gdb or some other debugger. Solaris with gdb loaded works quite fine. On other systems, you will need to modify the btraceback program to invoke the correct debugger, and possibly correct the btraceback.gdb script to have appropriate commands for your debugger. If anyone succeeds in making this work with another debugger, please send us a copy of what you modified.

Manually Running Bacula Under The Debugger

If for some reason you cannot get the automatic traceback, or if you want to interactively examine the variable contents after a crash, you can run Bacula under the debugger. Assuming you want to run the Storage daemon under the debugger, you would do the following:
  1. Start the Director and the File daemon. If the Storage daemon also starts, you will need to find its PID as shown above (ps fax | grep bacula-sd) and kill it with a command like the following:
          kill -15 PID
          
    where you replace PID by the actual value.
  2. At this point, the Director and the File daemon should be running but the Storage daemon should not.
  3. cd to the directory containing the Storage daemon
  4. Start the Storage daemon under the debugger:
        gdb ./bacula-sd
        
  5. Run the Storage daemon:
         run -s -f -c ./bacula-sd.conf
         
    You may replace the ./bacula-sd.conf with the full path to the Storage daemon's configuration file.
  6. At this point, Bacula will be fully operational.
  7. In another shell command window, start the Console program and do what is necessary to cause Bacula to die.
  8. When Bacula crashes, the gdb shell window will become active and gdb will show you the error that occurred.
  9. To get a general traceback of all threads, issue the following command:
           thread apply all bt
           
    After that you can issue any debugging command.

Rejected Volumes After a Crash

Bacula keeps the number of files on each Volume in its Catalog database so that before appending to a tape, it can verify that the number of files are correct, and thus prevent overwriting valid data. If the Director or the Storage daemon crashes before the job has completed, the tape will contain one more file than is noted in the Catalog, and the next time you attempt to use the same Volume, Bacula will reject it due to a mismatch between the physical tape and the catalog.

The easiest solution to this problem is to label a new tape and start fresh. If you wish to continue appending to the current tape, you can do so by using the update command in the console program to change the Volume Files entry in the catalog. A typical sequence of events would go like the following:

- Bacula crashes
- You restart Bacula
Bacula then prints:
17-Jan-2003 16:45 rufus-dir: Start Backup JobId 13, Job=kernsave.2003-01-17_16.45.46
17-Jan-2003 16:45 rufus-sd: Volume test01 previously written, moving to end of data.
17-Jan-2003 16:46 rufus-sd: kernsave.2003-01-17_16.45.46 Error: I canot write on this volume because:
The number of files mismatch! Volume=11 Catalog=10
17-Jan-2003 16:46 rufus-sd: Job kernsave.2003-01-17_16.45.46 waiting. Cannot find any appendable volumes.
Please use the "label"  command to create a new Volume for:
    Storage:      SDT-10000
    Media type:   DDS-4
    Pool:         Default
To get out of this situation and use the same tape, you do the following:
Update choice:
     1: Volume parameters
     2: Pool from resource
Choose catalog item to update (1-2): 1
Defined Pools:
     1: Default
     2: File
Select the Pool (1-2):
+---------+------------+-----------+-----------+-----------+---------------------+--------------+---------+------+
| MediaId | VolumeName | MediaType | VolStatus | VolBytes  | LastWritten         | VolRetention | Recycle | Slot |
+---------+------------+-----------+-----------+-----------+---------------------+--------------+---------+------+
| 1       | test01     | DDS-4     | Error     | 352427156 | 2003-01-17 16:46:19 | 31536000     | 1       | 0    |
+---------+------------+-----------+-----------+-----------+---------------------+--------------+---------+------+
Enter MediaId or Volume name: 1
First, you chose to update the Volume parameters by entering a 1. In the volume listing that follows, notice how the VolStatus is Error. We will correct that after changing the Volume Files. Continuing, you respond 1,
Updating Volume "test01"
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: Volume Files
    10: Done
Select parameter to modify (1-10): 9
Warning changing Volume Files can result
in loss of data on your Volume

Current Volume Files is: 10
Enter new number of Files for Volume: 11
New Volume Files is: 11
Updating Volume "test01"
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: Volume Files
    10: Done
Select parameter to modify (1-10): 1
Here, you have selected 9 in order to update the Volume Files, then you changed it from 10 to 11, and you now answer 1 to change the Volume Status.
Current Volume status is: Error
Possible Values are:
     1: Append
     2: Archive
     3: Disabled
     4: Full
     5: Used
     6: Read-Only
Choose new Volume Status (1-6): 1
New Volume status is: Append
Updating Volume "test01"
Parameters to modify:
     1: Volume Status
     2: Volume Retention Period
     3: Volume Use Duration
     4: Maximum Volume Jobs
     5: Maximum Volume Files
     6: Maximum Volume Bytes
     7: Recycle Flag
     8: Slot
     9: Volume Files
    10: Done
Select parameter to modify (1-10): 10
Selection done.
At this point, you have changed the Volume Files from 10 to 11 to account for the last file being written but not updated in the database, and you changed the Volume Status back to Append.

This was a lot of words to describe something quite simple.

The Volume Files option exists only in version 1.29 and later, and you should be careful using it. Generally, if you set the value to that which Bacula said is on the tape, you will be OK, especially if the value is one more than what is in the catalog.


Back
Utility Programs
Index
Index
Next
Win32 Implementation
Bacula 1.29 User's Guide
The Network Backup Solution
Copyright © 2000-2003
Kern Sibbald and John Walker