I run several websites. About a year ago I setup a backup scheme to make sure my websites were backed up to my local computer once a day. This backup includes the databases and files on the hosting provider’s server. I use rsync (SSH) to reduce the amount of data that I have to copy. I then keep two copies of this data on my local system here. I felt pretty good about this scheme and have proven that I have enough data in my backup to ensure I can recover after hosting crashes, moving to new hosting providers, etc. but what I didn’t anticipate was what I would do if a hacker defaced my site and I didn’t catch it quickly enough. The answer is the modified files would be sent to my local backup overwriting the good data. I wasn’t keeping any sort of history – what I was storing locally was always the latest. I know this isn’t a terribly good practice, but I have yet to find a backup system that does incremental backups how I would like them keeping history without consuming a ton of space and still being “quick” to backup.
Recently, a friend’s website got defaced and fortunately they had “enough” data backed up and the defacement was simple enough that recovery was not a major issue, but it easily could have been if the hacker had been more malicious. This made me start to really think about what I need to be doing for website backups.
For version control at work, we use Subversion – which is really nice. I decided to try to implement something like Subversion at home for my website files / database backups. I started to setup Subversion until I discovered Git. In its simplest form, you can create a Git repository on a single directory and it can very easily and quickly handle the version control (much to the level of Subversion) on the files and directories within that directory. For my situation, I believe Git is the best choice. With just a few commands I created the repository and added the existing files to that repository and by adding two commands to my rsync backup script all new files are now automatically added to the repository (and deleted files are removed form the depository — but they can be recovered if need be). I plan to setup daily “tags” for the data so if the worst happened I could easily obtain the site as it was on any particular day with very little effort.