|Red Hat Enterprise Linux 4: Introduction to System Administration|
|Prev||Chapter 8. Planning for Disaster||Next|
Backups have two major purposes:
To permit restoration of individual files
To permit wholesale restoration of entire file systems
The first purpose is the basis for the typical file restoration request: a user accidentally deletes a file and asks that it be restored from the latest backup. The exact circumstances may vary somewhat, but this is the most common day-to-day use for backups.
The second situation is a system administrator's worst nightmare: for whatever reason, the system administrator is staring at hardware that used to be a productive part of the data center. Now, it is little more than a lifeless chunk of steel and silicon. The thing that is missing is all the software and data you and your users have assembled over the years. Supposedly everything has been backed up. The question is: has it?
And if it has, can you restore it?
Look at the kinds of data processed and stored by a typical computer system. Notice that some of the data hardly ever changes, and some of the data is constantly changing.
The pace at which data changes is crucial to the design of a backup procedure. There are two reasons for this:
A backup is nothing more than a snapshot of the data being backed up. It is a reflection of that data at a particular moment in time.
Data that changes infrequently can be backed up infrequently, while data that changes often must be backed up more frequently.
System administrators that have a good understanding of their systems, users, and applications should be able to quickly group the data on their systems into different categories. However, here are some examples to get you started:
This data normally only changes during upgrades, the installation of bug fixes, and any site-specific modifications.
Should you even bother with operating system backups? This is a question that many system administrators have pondered over the years. On the one hand, if the installation process is relatively easy, and if the application of bugfixes and customizations are well documented and easily reproducible, reinstalling the operating system may be a viable option.
On the other hand, if there is the least doubt that a fresh installation can completely recreate the original system environment, backing up the operating system is the best choice, even if the backups are performed much less frequently than the backups for production data. Occasional operating system backups also come in handy when only a few system files must be restored (for example, due to accidental file deletion).
This data changes whenever applications are installed, upgraded, or removed.
This data changes as frequently as the associated applications are run. Depending on the specific application and your organization, this could mean that changes take place second-by-second or once at the end of each fiscal year.
This data changes according to the usage patterns of your user community. In most organizations, this means that changes take place all the time.
Based on these categories (and any additional ones that are specific to your organization), you should have a pretty good idea concerning the nature of the backups that are needed to protect your data.
You should keep in mind that most backup software deals with data on a directory or file system level. In other words, your system's directory structure plays a part in how backups will be performed. This is another reason why it is always a good idea to carefully consider the best directory structure for a new system and group files and directories according to their anticipated usage.
In order to perform backups, it is first necessary to have the proper software. This software must not only be able to perform the basic task of making copies of bits onto backup media, it must also interface cleanly with your organization's personnel and business needs. Some of the features to consider when reviewing backup software include:
Schedules backups to run at the proper time
Manages the location, rotation, and usage of backup media
Works with operators (and/or robotic media changers) to ensure that the proper media is available
Assists operators in locating the media containing a specific backup of a given file
As you can see, a real-world backup solution entails much more than just scribbling bits onto your backup media.
Most system administrators at this point look at one of two solutions:
Purchase a commercially-developed solution
Create an in-house developed backup system from scratch (possibly integrating one or more open source technologies)
Each approach has its good and bad points. Given the complexity of the task, an in-house solution is not likely to handle some aspects (such as media management, or have comprehensive documentation and technical support) very well. However, for some organizations, this might not be a shortcoming.
A commercially-developed solution is more likely to be highly functional, but may also be overly-complex for the organization's present needs. That said, the complexity might make it possible to stick with one solution even as the organization grows.
As you can see, there is no clear-cut method for deciding on a backup system. The only guidance that can be offered is to ask you to consider these points:
Changing backup software is difficult; once implemented, you will be using the backup software for a long time. After all, you will have long-term archive backups that you must be able to read. Changing backup software means you must either keep the original software around (to access the archive backups), or you must convert your archive backups to be compatible with the new software.
Depending on the backup software, the effort involved in converting archive backups may be as straightforward (though time-consuming) as running the backups through an already-existing conversion program, or it may require reverse-engineering the backup format and writing custom software to perform the task.
The software must be 100% reliable — it must back up what it is supposed to, when it is supposed to.
When the time comes to restore any data — whether a single file or an entire file system — the backup software must be 100% reliable.
If you were to ask a person that was not familiar with computer backups, most would think that a backup was just an identical copy of all the data on the computer. In other words, if a backup was created Tuesday evening, and nothing changed on the computer all day Wednesday, the backup created Wednesday evening would be identical to the one created on Tuesday.
While it is possible to configure backups in this way, it is likely that you would not. To understand more about this, we must first understand the different types of backups that can be created. They are:
The type of backup that was discussed at the beginning of this section is known as a full backup. A full backup is a backup where every single file is written to the backup media. As noted above, if the data being backed up never changes, every full backup being created will be the same.
That similarity is due to the fact that a full backup does not check to see if a file has changed since the last backup; it blindly writes everything to the backup media whether it has been modified or not.
This is the reason why full backups are not done all the time — every file is written to the backup media. This means that a great deal of backup media is used even if nothing has changed. Backing up 100 gigabytes of data each night when maybe 10 megabytes worth of data has changed is not a sound approach; that is why incremental backups were created.
Unlike full backups, incremental backups first look to see whether a file's modification time is more recent than its last backup time. If it is not, the file has not been modified since the last backup and can be skipped this time. On the other hand, if the modification date is more recent than the last backup date, the file has been modified and should be backed up.
Incremental backups are used in conjunction with a regularly-occurring full backup (for example, a weekly full backup, with daily incrementals).
The primary advantage gained by using incremental backups is that the incremental backups run more quickly than full backups. The primary disadvantage to incremental backups is that restoring any given file may mean going through one or more incremental backups until the file is found. When restoring a complete file system, it is necessary to restore the last full backup and every subsequent incremental backup.
In an attempt to alleviate the need to go through every incremental backup, a slightly different approach was implemented. This is known as the differential backup.
Differential backups are similar to incremental backups in that both backup only modified files. However, differential backups are cumulative — in other words, with a differential backup, once a file has been modified it continues to be included in all subsequent differential backups (until the next, full backup, of course).
This means that each differential backup contains all the files modified since the last full backup, making it possible to perform a complete restoration with only the last full backup and the last differential backup.
Like the backup strategy used with incremental backups, differential backups normally follow the same approach: a single periodic full backup followed by more frequent differential backups.
The effect of using differential backups in this way is that the differential backups tend to grow a bit over time (assuming different files are modified over the time between full backups). This places differential backups somewhere between incremental backups and full backups in terms of backup media utilization and backup speed, while often providing faster single-file and complete restorations (due to fewer backups to search/restore).
Given these characteristics, differential backups are worth careful consideration.
We have been very careful to use the term "backup media" throughout the previous sections. There is a reason for that. Most experienced system administrators usually think about backups in terms of reading and writing tapes, but today there are other options.
At one time, tape devices were the only removable media devices that could reasonably be used for backup purposes. However, this has changed. In the following sections we look at the most popular backup media, and review their advantages as well as their disadvantages.
Tape was the first widely-used removable data storage medium. It has the benefits of low media cost and reasonably-good storage capacity. However, tape has some disadvantages — it is subject to wear, and data access on tape is sequential in nature.
These factors mean that it is necessary to keep track of tape usage (retiring tapes once they have reached the end of their useful life), and that searching for a specific file on tape can be a lengthy proposition.
On the other hand, tape is one of the most inexpensive mass storage media available, and it has a long history of reliability. This means that building a good-sized tape library need not consume a large part of your budget, and you can count on it being usable now and in the future.
In years past, disk drives would never have been used as a backup medium. However, storage prices have dropped to the point where, in some cases, using disk drives for backup storage does make sense.
The primary reason for using disk drives as a backup medium would be speed. There is no faster mass storage medium available. Speed can be a critical factor when your data center's backup window is short, and the amount of data to be backed up is large.
But disk storage is not the ideal backup medium, for a number of reasons:
Disk drives are not normally removable. One key factor to an effective backup strategy is to get the backups out of your data center and into off-site storage of some sort. A backup of your production database sitting on a disk drive two feet away from the database itself is not a backup; it is a copy. And copies are not very useful should the data center and its contents (including your copies) be damaged or destroyed by some unfortunate set of circumstances.
Disk drives are expensive (at least compared to other backup media). There may be situations where money truly is no object, but in all other circumstances, the expenses associated with using disk drives for backup mean that the number of backup copies must be kept low to keep the overall cost of backups low. Fewer backup copies mean less redundancy should a backup not be readable for some reason.
Disk drives are fragile. Even if you spend the extra money for removable disk drives, their fragility can be a problem. If you drop a disk drive, you have lost your backup. It is possible to purchase specialized cases that can reduce (but not entirely eliminate) this hazard, but that makes an already-expensive proposition even more so.
Disk drives are not archival media. Even assuming you are able to overcome all the other problems associated with performing backups onto disk drives, you should consider the following. Most organizations have various legal requirements for keeping records available for certain lengths of time. The chance of getting usable data from a 20-year-old tape is much greater than the chance of getting usable data from a 20-year-old disk drive. For instance, would you still have the hardware necessary to connect it to your system? Another thing to consider is that a disk drive is much more complex than a tape cartridge. When a 20-year-old motor spins a 20-year-old disk platter, causing 20-year-old read/write heads to fly over the platter surface, what are the chances that all these components will work flawlessly after sitting idle for 20 years?
Some data centers back up to disk drives and then, when the backups have been completed, the backups are written out to tape for archival purposes. This allows for the fastest possible backups during the backup window. Writing the backups to tape can then take place during the remainder of the business day; as long as the "taping" finishes before the next day's backups are done, time is not an issue.
All this said, there are still some instances where backing up to disk drives might make sense. In the next section we see how they can be combined with a network to form a viable (if expensive) backup solution.
By itself, a network cannot act as backup media. But combined with mass storage technologies, it can serve quite well. For instance, by combining a high-speed network link to a remote data center containing large amounts of disk storage, suddenly the disadvantages about backing up to disks mentioned earlier are no longer disadvantages.
By backing up over the network, the disk drives are already off-site, so there is no need for transporting fragile disk drives anywhere. With sufficient network bandwidth, the speed advantage you can get from backing up to disk drives is maintained.
However, this approach still does nothing to address the matter of archival storage (though the same "spin off to tape after the backup" approach mentioned earlier can be used). In addition, the costs of a remote data center with a high-speed link to the main data center make this solution extremely expensive. But for the types of organizations that need the kind of features this solution can provide, it is a cost they gladly pay.
Once the backups are complete, what happens then? The obvious answer is that the backups must be stored. However, what is not so obvious is exactly what should be stored — and where.
To answer these questions, we must first consider under what circumstances the backups are to be used. There are three main situations:
Small, ad-hoc restoration requests from users
Massive restorations to recover from a disaster
Archival storage unlikely to ever be used again
Unfortunately, there are irreconcilable differences between numbers 1 and 2. When a user accidentally deletes a file, they would like it back immediately. This implies that the backup media is no more than a few steps away from the system to which the data is to be restored.
In the case of a disaster that necessitates a complete restoration of one or more computers in your data center, if the disaster was physical in nature, whatever it was that destroyed your computers would also have destroyed the backups sitting a few steps away from the computers. This would be a very bad state of affairs.
Archival storage is less controversial; since the chances that it will ever be used for any purpose are rather low, if the backup media was located miles away from the data center there would be no real problem.
The approaches taken to resolve these differences vary according to the needs of the organization involved. One possible approach is to store several days worth of backups on-site; these backups are then taken to more secure off-site storage when newer daily backups are created.
Another approach would be to maintain two different pools of media:
A data center pool used strictly for ad-hoc restoration requests
An off-site pool used for off-site storage and disaster recovery
Of course, having two pools implies the need to run all backups twice or to make a copy of the backups. This can be done, but double backups can take too long, and copying requires multiple backup drives to process the copies (and probably a dedicated system to actually perform the copy).
The challenge for a system administrator is to strike a balance that adequately meets everyone's needs, while ensuring that the backups are available for the worst of situations.
While backups are a daily occurrence, restorations are normally a less frequent event. However, restorations are inevitable; they will be necessary, so it is best to be prepared.
The important thing to do is to look at the various restoration scenarios detailed throughout this section and determine ways to test your ability to actually carry them out. And keep in mind that the hardest one to test is also the most critical one.
The phrase "restoring from bare metal" is a system administrator's way of describing the process of restoring a complete system backup onto a computer with absolutely no data of any kind on it — no operating system, no applications, nothing.
Overall, there are two basic approaches to bare metal restorations:
Here the base operating system is installed just as if a brand-new computer were being initially set up. Once the operating system is in place and configured properly, the remaining disk drives can be partitioned and formatted, and all backups restored from backup media.
A system recovery disk is bootable media of some kind (often a CD-ROM) that contains a minimal system environment, able to perform most basic system administration tasks. The recovery environment contains the necessary utilities to partition and format disk drives, the device drivers necessary to access the backup device, and the software necessary to restore data from the backup media.
Some computers have the ability to create bootable backup tapes and to actually boot from them to start the restoration process. However, this capability is not available to all computers. Most notably, computers based on the PC architecture do not lend themselves to this approach.
Every type of backup should be tested on a periodic basis to make sure that data can be read from it. It is a fact that sometimes backups are performed that are, for one reason or another, unreadable. The unfortunate part in all this is that many times it is not realized until data has been lost and must be restored from backup.
The reasons for this can range from changes in tape drive head alignment, misconfigured backup software, and operator error. No matter what the cause, without periodic testing you cannot be sure that you are actually generating backups from which data can be restored at some later time.
We are using the term data in this section to describe anything that is processed via backup software. This includes operating system software, application software, as well as actual data. No matter what it is, as far as backup software is concerned, it is all data.