Rsync-Backup
Rsync-Backup is a simple utility for backing up network-connected hosts to hard
disk(s). Rsync-Backup was inspired by Rsnapshot, but shares no code with it.
The basic design is to mirror all the data on each host to the backup server
each day, using Rsync as the transport utility. Unchanged files are
hard-linked to the previous backup. (Rsync provides a feature, --link-dest, to
do the hard-link.) Rsync only sends files that have changed. Periodically,
all the files are verified to ensure consistency (using Rsync's --checksum
option). Rsync-Backup's primary purpose is to schedule backups by executing
Rsync, delete old data, periodically verify existing data, generate reports,
and provide a convenient configuration interface.
Primary features:
- No special client software besides Rsync and SSH.
- Works on Windows through Cygwin's SSH server. (experimental)
- Backups are periodically verified by checksumming the files on the client and server.
Advantages over Rsnapshot:
-
Per-host configuration. The config file specifies defaults that can be overridden for individual hosts.
-
Nice daily reports of success/failure, space usage, largest files, etc.
-
Support for multiple backups in a single day. (Actually, this is not quite ready.)
-
The backup directories inside the backup tree's root are named $host/$time
rather than daily.0/$host. This way, all the hard-links to a particular file
are stored within a smaller subdirectory. Backups for each host are
independent of the other hosts. This has several advantages:
-
Each host can have different rotation parameters. For example, one host can
keep daily backups for a week and another for two weeks. (However, this is
not quite finished.)
-
If a host is off-line for a long time, the latest backup is never deleted.
-
The name of a file never changes once it's stored until it's deleted, which
makes it much easier to copy to off-site backup media. For example, you can
use Rsync to copy the whole tree to a removable disk without retransferring
unchanged data.
-
Backups for a particular host can be cleanly moved to another partition or
server without breaking any hard-links. The tree can be easily expanded onto
more disks by placing mountpoints or symlinks to other mountpoints inside its
tree.
-
Moving backups between disks is also easier. With Rsnapshot's method, even
performing an off-line disk replacement becomes nearly insurmountable on large
systems since tools such as tar, dump/restore, and Rsync run out of memory
(including swap space) when copying the whole tree on a large system because
the amount of memory required scales proportionally to the total number of
filenames on the backup server. With this method, you can much more easily
move one host at a time, which causes memory usage to scale proportionally to
the largest host instead.
Limitations: Rsync-Backup inherits some limitations from Rsync.
-
It's unlikely to ever support storing backups in a compressed format. Using a
compressed filesystem on the server can save space, but ideally the clients
should compress the data before sending it since that allows better performance
by spreading the load to more computers. Rsync supports compressing the data
while in transit, but it cannot remain compressed while it's written to disk.
-
Encrypting data such that the backup server need not be trusted isn't possible.
This functionality could be added, but requires significant work on Rsync. One
possibility is to make Rsync encrypt each block of each file before sending it.
This requires a particular bit of plain-text to yield the same cipher-text each
time it's encrypted so that Rsync can still checksum the encrypted blocks on
both sides. Each file's full pathname also should be encrypted, flattening the
destination into a single (admittedly huge) directory. Obviously, the key must
be stored only on the client. Solving this also requires solving the next
problem below.
-
Each client needs to allow read access for all files to the backup server's SSH
identity key. Usually this means allowing root to connect via SSH with that
key. For most networks, this creates a single point of failure for security.
If the backup server is compromised, any machine it backs up is also
compromised. There may be a way to mitigate this by restricting that SSH key
to executing a wrapper for Rsync that checks the arguments to be sure the
remote side is only reading files. If encrypted backups are implemented, such
a wrapper must also ensure that only encrypted data is sent.
Project status: Currently, distribution is via Git. Currently, the
best version is the latest. The code isn't complex or old enough to justify
separate development and stable releases yet. It is running in several
production environments, but as with any other young code, be prepared for some
hiccups. Documentation is currently available by running the commands with no
parameters.
- To get the lastest version: git clone git://git.devpit.org/rsync-backup/
If you're using Rsync-Backup, please let me
know. If enough people pester me, I'll set up a mailing list.