-
Notifications
You must be signed in to change notification settings - Fork 0
Syncers
Currently supported syncers:
- RSync
- Amazon Simple Storage Service (S3)
Storages are part of the main Backup procedure. The main backup procedure is the one where the following actions take place:
- The copying files/dumping databases to the
~/Backup/.tmp
directory - The packaging (tar'ing) of the copied/organized files
- The (optionally) compressing of the packaged file
- The (optionally) encrypting of the packaged file
- The storing of the packaged file
The last step is what storages do, store the final result of a backup file to the specified destination.
Syncers completely bypass this whole procedure. They are meant to instantly transfer directories of data from the production server to the backup server. This is extremely useful if you have lots of gigabytes of data you need to transfer, for example "user-uploaded-content" that over time built up to 50GB worth of images, music, videos or other heavy file formats. With a Syncer you would basically just say: "Keep a mirror of this directory (/var/apps/my_app/public/music/) on my backup server in (/var/backups/my_app/music)". Then every time you run Backup on this Syncer, it won't copy over 50GB of data, then tar it, and then transfer it. This'll transfer the actual user@production:/var/apps/my_app/public/music/ directory to user@backup:/var/backups/my_app/music. This way no additional disk storage on your production box will be used to store the temporary files (copy, tar, compress, encrypt) before the transfer, which can be very CPU intensive, slow, expensive and also cause your application(s) to become slow during this time.
Below are examples you can copy/paste in to your "Backup" configuration file.
These blocks should be placed between Backup::Model.new(:my_backup, 'My Backup') do
and end
.
sync_with RSync do |rsync|
rsync.ip = "123.45.678.90"
rsync.port = 22
rsync.username = "my_username"
rsync.password = "my_password"
rsync.path = "~/backups/"
rsync.mirror = true
rsync.compress = true
rsync.additional_options = ['--some-option']
rsync.directories do |directory|
directory.add "/var/apps/my_app/public/uploads"
directory.add "/var/apps/my_app/logs"
end
end
Additional Notes
RSync has the ability to transfer parts of files, rather than the full files when a file updates. For example, say you have a text file of 100KB in size. Now you add another 50 lines of text, increasing the size by 5KB (So now the total is 105KB). Now, the next time Backup gets invoked, it'll see that the file changed, and will only transfer the additional 5KB that got added to the text file, rather than transferring the whole 105KB over again.
The rsync.mirror
option, when set to true
will tell RSync to keep an exact mirror of the files, of your production box, on the backup server. This means that when files get removed from the /var/apps/my_app/public/uploads
directory, it'll also remove these files from the backup server during the next sync. When set to false
, it'll ignore removed files and just keep them on the backup server.
The rsync.compress
option, when set to true
will tell RSync to compress the data that'll be transferred. The compression is only meant for transferring data, this improves transfer speed and lowers bandwidth usage. Turning this option on will resort in more CPU usage for the compression. Once the changes have been transferred, it'll automatically uncompress back to it's original state.
The directory.add
method allows you to add the directories you want to sync from the production server to your backup server. When a path ends with a '/' (forward slash) it'll only sync the contents (and sub-directories) of that directory. If the provided path does not end with a '/', it'll create that directory on the backup server (thus, syncing the whole directory, including it's contents and all sub-directories).
sync_with S3 do |s3|
s3.access_key_id = "my_access_key_id"
s3.secret_access_key = "my_secret_access_key"
s3.bucket = "my-bucket"
s3.path = "/backups"
s3.mirror = true
s3.additional_options = ['--some-option']
s3.directories do |directory|
directory.add "/var/apps/my_app/public/uploads"
directory.add "/var/apps/my_app/logs"
end
end
Additional Notes
WHICH GEM TO USE Backup uses the S3Sync library to sync files and directories to Amazon S3. There are in fact two libraries, one that supports Ruby 1.8.x and one that supports Ruby 1.9.x. The library itself is quite old (2007 or so), and someone thankfully forked it and made it compatible with Ruby 1.9.x, however, now it's not compatible with Ruby 1.8.x. So what it comes down to is that you have to install the correct gem depending on the Ruby version you are going to use.
If you are using Ruby 1.9.x, install and use this S3Sync gem:
gem install aproxacs-s3sync
If you are using Ruby 1.8.x, install and use this S3Sync gem:
gem install s3sync
WARNING If you have both of the above installed, it'll use s3sync, and not aproxacs-s3sync. If you have both installed and you are using Ruby 1.9.x, then make sure to run gem uninstall s3sync
to remove the one that's only compatible with Ruby 1.8.x.
NOTE Unlike RSync, S3Sync does not have the ability to send only "parts" of file data. This means that it'll always overwrite files on S3 when they changed locally. So if you have a file locally that was synced to S3 when it was 50MB, and now it's 60MB and you want to sync it, then it'll have to transfer the whole 60MB over again, unlike RSync which only transfers the added 10MB. This is due to Amazon S3's filesystem, it is not possible (to my understanding) to get around this.
The s3.mirror
option, when set to true
will tell S3 to keep an exact mirror of the files, of your production box, on S3. This means that when files get removed from the /var/apps/my_app/public/uploads
directory, it'll also remove these files from the S3 bucket during the next sync. When set to false
, it'll ignore removed files and just keep them in your S3 bucket.
The directory.add
method allows you to add the directories you want to sync from the production server to your Amazon S3 bucket. When a path ends with a '/' (forward slash) it'll only sync the contents (and sub-directories) of that directory. If the provided path does not end with a '/', it'll create that directory on the backup server (thus, syncing the whole directory, including it's contents and all sub-directories).
If you want to create multiple RSync syncer objects then you might want to configure some default settings to reduce redundancy. For example, if you always want to enable mirroring and compression for RSync syncers, and you always want to use the same server ip, port, username and password, then you can do the following:
Backup::Configuration::Syncer::RSync.defaults do |rsync|
rsync.ip = "123.45.678.90"
rsync.port = 22
rsync.username = "my_username"
rsync.password = "my_password"
rsync.mirror = true
rsync.compress = true
end
With this in place, whenever you want to use the above default configuration you can just omit it from your backup models, like so:
sync_with RSync do |rsync|
rsync.path = "~/backups/for_my_other_app"
rsync.directories do |directory|
directory.add "/var/apps/my_other_app/public/uploads"
directory.add "/var/apps/my_other_app/logs"
end
end
Since we didn't specify the ip
, port
, username
, password
, mirror
and compress
options, it'll default to the values we specified in the configuration block.
To set default configuration for S3, use Backup::Configuration::Syncer::S3.defaults
.