![]()
Disaster Recovery Using BaculaGeneralWhen disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linux and Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, no luck. Apparently an "emergency boot" disk allowing access to the full system API without interference does not exist. Important ConsiderationsHere are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.
Bare Metal Recovery on LinuxThe remainder of this section concerns recovering a Linux computer, and parts of it relate to the Red Hat version of Linux. The Solaris procedures can be found below.A so called "Bare Metal" recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases. Bare Metal Recovery assumes that you have the following four items for your system:
RestrictionsIn addition, to the above assumptions, the following conditions or restrictions apply:
ScriptsThe scripts discussed below can be found in the rescue/linux subdirectory of the Bacula source code.Preparation for a Bare Metal RecoveryThere are two things you should do immediately on all (Linux) systems for which you wish to do a bare metal recovery:
Creating an Emergency Boot Disk
Here you have several choices:
This disk can then be booted and you will be in an environment with a number of important tools available. Some disadvantages of this environment as opposed to tomsrtbt are that you must enter linux rescue at the boot prompt or the boot will fail without a hard disk; it requires a disk boot image or a CDROM to be mounted, if the CDROM is released, you will loose a large number of the tools. Red Hat Installation DiskSpecific to Red Hat Linux, is to create an Installation floppy, which can also be used as an emergency boot disk. The advantage of this method is that it works in conjunction with the installation CDROM and hence during the first part of restoring the system, you have a much larger number of tools available (on the CDROM). This can be extremely useful if you are not sure what really happened and you need to examine your system in detail.To make a Red Hat Linux installation disk, do the following: mount the Installation CDROM (/mnt/cdrom) cd /mnt/cdrom/images dd if=boot.img of=/dev/fd0 bs=1440kNow that you have either an emergency boot disk or an installation floppy, you will be able to reboot your system in the absence of your hard disk or with a damaged hard disk. This method has the same disadvantages compared to tomsrtbt disk as mentioned above for the Emergency Boot Disk. Creating a Bacula Rescue DiskSimply having a boot disk is not sufficient to re-create things as they were. To solve this problem, we will create a Bacula Rescue disk. Everything that will be written to this disk will first be placed into the <bacula-src>/rescue/linux directory.The first step is while your system is up and running normally, you use a Bacula script called getdiskinfo to capture certain important information about your hard disk configuration (partitioning, formatting, mount points, ...). getdiskinfo will also create a number of scripts using the information found that can be used in an emergency to repartition your disks, reformat them, and restore a statically linked version of the Bacula file daemon so that your disk can be restored from within a minimal boot environment. The first step is to run getdiskinfo as follows: su cd <bacula-src>/rescue/linux ./getdiskinfogetdiskinfo works for either IDE or SCSI drives and recognizes both ext2 and ext3 file systems. If you wish to restore other file systems, you will need to modify the code. This script can be run multiple times, but really only needs to be run once unless you change your hard disk configuration. Assuming you have a single hard disk on device /dev/hda, getdiskinfo will create the following files:
df.bsi disks.bsi fstab.bsi ifconfig.bsi mount.bsi mount.ext2.bsi mount.ext3.bsi mtab.bsi route.bsi sfdisk.disks.bsi sfdisk.hda.bsi sfdisk.make.hda.bsiEach of these files contains some important piece of information (sometimes redundant) about your hard disk setup or your network. Normally, you will not need this information, but it will be written to the Bacula Rescue disk just in case. Since it is normally not used, we will leave it to you to examine those files at your leisure.
Building a Static File DaemonThe second of the three steps in creating your Bacula Rescue disk is to build a static version of the File daemon. Do so by either configuring Bacula as follows or by allowing the make_rescue_disk script described below make it for you:cd <bacula-src> ./configure <normal-options> --enable-static-fd make cd src/filed strip bacula-fd cp bacula-fd ../../rescue/linux cp bacula-fd.conf ../../rescue/linuxFinally, in <bacula-src>/rescue/linux, ensure that the WorkingDirectory, PIDDirectory, and SubSysDirectory all point to reasonable locations on a stripped down system. If you are using tomsrtbt you will also want to replace machine names with IP addresses since there is no resolver running. With the Linux Rescue disk, network address mapping seems to work. Don't forget that at the time this version of the Bacula File daemon runs, your file system will not be restored. In my bacula-fd.conf, I use /var/working. Writing the Bacula Rescue FloppyWhen you have everything you need (output of getdiskinfo, Bacula File daemon, ...), you create your rescue floppy by putting a blank tape into your floppy disk drive and entering:su ./make_rescue_diskThis script will reformat the floppy and write everything in the current directory and all files in the diskinfo directory to the floppy. If you supply the appropriate command line options, it will also build a static version of the Bacula file daemon and copy it along with the configuration file to the disk. Also using a command line option, you can make it write a compressed tar file containing all the files whose names are in backup.etc.list to the floppy. The list as provided contains names of files in /etc that you might need in a disaster situation. It is not needed, but in some cases such as a complex network setup, you may find it useful. Options for make_rescue_diskThe following command line options are available for the make_rescue_disk script:Usage: make_rescue_disk -h, --help print this message --make-static-bacula make static File daemon and add to diskette --copy-static-bacula copy static File daemon to diskette --copy-etc-files copy files in etc list to disketteBriefly the options are:
Now that you have both a system boot floppy and a Bacula Rescue floppy, assuming you have a full backup of your system made by Bacula, you are ready to handle nearly any kind of emergency restoration situation. Restoring Your SystemNow, let's assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:
You will take the following steps to get your system back up and running:
Boot with your Emergency FloppyFirst you will boot with your emergency floppy. If you use the Installation floppy described above, when you get to the boot prompt:boot:you enter linux rescue. If you are booting from tomsrtbt simply enter the default responses. When your machine finishes booting, you should be at the command prompt possibly with your hard disk mounted on /mount/sysimage (Linux emergency only). To see what is actually mounted, use: df Mount your Bacula Rescue FloppyMake sure that the mount point /mnt/floppy exists. If not, enter:mkdir -p /mnt/floppythe mount your Bacula Rescue disk and cd to it with: mount /dev/fd0 /mnt/floppy cd /mnt/floppyTo simplify running the scripts make sure the current directory is on your path by: PATH=$PATH:. Start the NetworkAt this point, you should bring up your network. Normally, this is quite simple and requires just a few commands. To simplify your task, we have created a script that should work in most cases by typing:./start_networkYou can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up. Unmount Your Hard Disk (if mounted)When you are sure you want to repartition your disk, normally, if your disk was damaged or if you are using tomsrtbt your hard disk will not be mounted. However, if it is you must first unmount it so that it is not in use. Do so by entering df and then enter the correct commands to unmount the disks. For example:umount /mnt/sysimage/boot umount /mnt/sysimage/usr umount /mnt/sysimage/proc umount /mnt/sysimage/where you explicitly unmount (umount) each sysimage partition and finally, the last one being the root. Do another df command to be sure you successfully unmount all the sysimage partitions. This is necessary because sfdisk will refuse to partition a disk that is currently mounted. As mentioned, this should never be necessary with tomsrtbt. Partition Your Hard Disk(s)If you are using tomsrtbt, you will need to do the following steps to get the correct sfdisk:rm -f sfdisk bzip2 -d sfdisk.bz2Do not do the above steps if you are using a standard Linux boot disk. Then proceed with partitioning your hard disk by: ./partition.hdaIf you have multiple disks, do the same for each of them. For SCSI disks, the repartition script will be named: partition.sda. If the script complains about the disk being in use, simply go back and redo the df command and umount commands until you no longer have your hard disk mounted. Note, in many cases, if your hard disk was seriously damaged or a new one installed, it will not automatically be mounted. If it is mounted, it is because the emergency kernel found one or more possibly valid partitions. If for some reason this proceedure does not work, you can use the information in partition.hda to re-partition your disks by hand using fdisk. Format Your Hard Disk(s)After partitioning your disk, you must format it appropriately. The formatting script will put back swap partitions, normal Unix partitions (ext2) and journaled partitions (ext3). Do so by entering for each disk:./format.hdaThe format script will ask you if you want a block check done. We recommend to answer yes, but realize that for very large disks this can take hours. Mount the Newly Formatted DisksOnce the disks are partitioned and formatted, you can remount them with the mount_drives script. All your drives must be mounted for Bacula to be able to access them. Run the script as follows:./mount_drives dfThe df will tell you if the drives are mounted. If not, re-run the script again. It isn't always easy to figure out and create the mount points and the mounts in the proper order, so repeating the ./mount_drives command will not cause any harm and will most likely work the second time. If not, correct it by hand before continuing. Unmount the CDROMNext, if you are using the Red Hat installation disk, unmount the CDROM drive by doing:umount /mnt/cdromThis is not necessary if you are running tomsrtbt. In doing this, I find it is always busy, and I haven't figured out how to unmount it (Linux boot only). Restore and Start the File DaemonNow, change (cd) to some directory where you want to put the image of the Bacula File daemon. I use the root directory my hard disk (mounted as /mnt/disk) because it is easy. Then install into the current directory Bacula by running the restore_bacula script from the floppy drive. For example:cd /mnt/disk mkdir -p /mnt/disk/ mkdir -p /mnt/disk/working /mnt/floppy/restore_bacula ls -lMake sure bacula-fd and bacula-fd.conf are both there. Edit the Bacula configuration file, create the working/pid/subsys directory if you haven't already done so above, and start Bacula by entering: chroot /mnt/disk /bacula-fdThe above command starts the Bacula File daemon with your the proper root disk location (i.e. /mnt/disk. If Bacula does not start correct the problem and start it. You can check if it is running by entering: ps faxYou can kill Bacula by entering: kill -TERM <pid>where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command. Now, you should be able to use another computer with Bacula installed to check the status by entering: status client=xxxxinto the Console program, where xxxx is the name of the client you are restoring. One common problem is that your bacula-fd.conf may contain machine addresses that are not properly resolved on this stripped down system because it is not running DNS. In that case, be prepared to edit bacula-fd.conf to replace the name of the Director's machine with its IP address. Or better yet, do this before building the Bacula rescue disk. Restore Your FilesOn the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:/then run the restore. You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot. Final StepAt this point, the restore should have finished with no errors, and all your files will be restored. One last task remains and that is to write a new boot sector so that your machine will boot. For lilo, you enter the following command:run_liloIf you are using grub instead of lilo, you must enter the following: run_grubNote, I've had quite a number of problems with grub because it is rather complicated and not designed to install easily under a simplified system. So, if you experience errors or end up unexpectedly in a chroot shell, simply exit back to the normal shell and type in the appropriate commands from the run_grub script by hand until you get it to install. RebootReboot your machine by simply entering exit until you get to the main prompt then enter ctl-d.If everything went well, you should now be back up and running. If not, re-insert the emergency boot floppy, boot, and figure out what is wrong. At this point, you will probably want to remove the temporary copy of Bacula that you installed. Do so with: rm -f /bacula-fd /bacula-fd.conf rm -rf /working Problems or BugsSince every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake. BugsWhen performing the bare metal recovery using the Red Hat emergency boot disk (actually the installation boot disk), I was never able to release the cdrom, and when the system came up /mnt/cdrom was soft linked to /mnt/disk/dev/hdd, which is not correct. I fixed this in each case by deleting and simply remaking it with mkdir -p /mnt/cdrom.tomsrtbtThis is a single floppy (1.722Meg) that really has A LOT of software. For example, by default (version 2.0.103) you get:AHA152X AHA1542 AIC7XXX BUSLOGIC DAC960 DEC_ELCP(TULIP) EATA EEXPRESS/PRO/PRO100 EL2 EL3 EXT2 EXT3 FAT FD IDE-CD/DISK/TAPE IMM INITRD ISO9660 JOLIET LOOP MATH_EMULATION MINIX MSDOS NCR53C8XX NE2000 NFS NTFS PARPORT PCINE2K PCNET32 PLIP PPA RTL8139 SD SERIAL/_CONSOLE SLIP SMC_ULTRA SR ST VFAT VID_SELECT VORTEX WD80x3 .exrc 3c589_cs agetty ash badblocks basename boot.b buildit.s busybox bz2bzImage bzip2 cardmgr cardmgr.pid cat chain.b chattr chgrp chmod chown chroot clear clone.s cmp common config cp cpio cs cut date dd dd-lfs debugfs ddate df dhcpcd-- dirname dmesg domainname ds du dumpe2fs e2fsck echo egrep elvis ex false fdflush fdformat fdisk filesize find findsuper fmt fstab grep group gunzip gzip halt head hexdump hexedit host.conf hostname hosts httpd i82365 ifconfig ile init inittab insmod install.s issue kernel key.lst kill killall killall5 ld ld-linux length less libc libcom_err libe2p libext2fs libtermcap libuuid lilo lilo.conf ln loadkmap login ls lsattr lsmod lua luasocket man map md5sum miterm mkdir mkdosfs mke2fs mkfifo mkfs.minix mknod mkswap more more.help mount mt mtab mv nc necho network networks nmclan_cs nslookup passwd pax pcmcia_core pcnet_cs pidof ping poweroff printf profile protocols ps pwd rc.0 rc.S rc.custom rc.custom.gz rc.pcmcia reboot rescuept reset resolv.conf rm rmdir rmmod route rsh rshd script sed serial serial_cs services setserial settings.s sh shared slattach sleep sln sort split stab strings swapoff swapon sync tail tar tcic tee telnet telnetd termcap test tomshexd tomsrtbt.FAQ touch traceroute true tune2fs umount undeb-- unpack.s unrpm-- update utmp vi vi.help view watch wc wget which xargs xirc2ps_cs yecho yes zcat In addition, at Tom's Web Site, you can find a lot of additional kernel drivers and other software (such as sdisk, which is used by Bacula. Building his floppy is a piece of cake. Simply download his .tar.gz file then: - detar the .tar.gz archive - become root - cd to the tomsrtbt-<version> directory - load a blank floppy with no bad sectors - ./install.s Solaris Bare Metal RecoveryThe same basic techniques as described above apply to Solaris:
Preparing Solaris Before a DisasterAs mentioned above, before a disaster strikes, you should prepare the information needed in the case of problems. To do so, in the rescue/solaris subdirectory enter:su ./getdiskinfo ./make_rescue_diskThe getdiskinfo script will, as in the case of Linux described above, create a subdirectory diskinfo containing the output from several system utilities. In addition, it will contain the output from the SysAudit program as described in Curtis Preston's book. This file diskinfo/sysaudit.bsi will contain the disk partitioning information that will allow you to manually follow the procedures in the "Unix Backup & Recovery" book to repartition and format your hard disk. In addition, the getdiskinfo script will create a start_network script. Once you have your your disks repartitioned and formatted, do the following:
Recovering a ServerAbove, we considered how to recover a client machine where a valid Bacula server was running on another machine. However, what happens if your server goes down and you no longer have a running Director, Catalog, or Storage daemon? There are several solutions:
The second suggestion is probably a much simpler solution, and one I have done myself. To do so, you might want to consider the following steps:
Bugs and Other ConsiderationsDirectory Modification and Access Times are ModifiedWhen Bacula restores a directory, it first must create the directory, then it populates the directory with its files and subdirectories. The act of creating the files and subdirectories updates both the modification and access times associated with the directory itself. As a consequence, all modification and access times of all directories will be updated to the time of the restore. This could be "corrected" by saving a list of all directories created during the restore, then when all files are restored, visit each of those directories and reset their modification and access times. This could possibly fail due to an out of memory condition -- don't forget that during a bare metal recovery, there is generally no swap file active.I'm not too worried about this, and will probably provide restoration of exact directory modification times in a future release. If anyone feels this is more important that I do, please let me know. Strange Bootstrap FilesIf any of you look closely at the bootstrap file that is produced and used for the restore (I sure do), you will probably notice that the FileIndex item does not include all the files saved to the tape. This is because in some instances there are duplicates (especially in the case of an Incremental save), and in such circumstances, Bacula restores only the last of multiple copies of a file or directory.Additional ResourcesMany thanks to Charles Curley who wrote Linux Complete Backup and Recovery HOWTO for the The Linux Documentation Project. This is an excellent document on how to do Bare Metal Recovery on Linux systems, and it was this document that made me realize that Bacula could do the same thing.You can find quite a few additional resources, both commercial and free at Storage Mountain, formerly known as Backup Central. And finally, the O'Reilly book, "Unix Backup & Recovery" by W. Curtis Preston covers virtually every backup and recovery topic including bare metal recovery for a large range of Unix systems.
|