You should have a strong backup strategy in 2025
Backups are still necessary
Your cloud provider may offer a backup service, or multi availability zone (AZ) storages. Yet, does it guarantee that your data are sage? Check the SLA. It may not have committed to prevent data loss. By the way, even if you can sue them for penalties, when your data are lost, they are lost.
Beside, your provider can also suffer a global outage, or go bankrupt. That's why you need to backup your data to ensure your business survival.
It's an insurance
It's really rare that you truly need to restore a backup. Data are stored on distributed storage solutions. Servers have RAID to ensure redundancy at disk level. Your provider very few outage recorded. In this context, backing up every day seems like a waste of time and money.
Backup is much like an insurance. You are not likely to have a car accident. There's no reason for your house to burst into flame. Guess what, your provider can. When it happens, you're glad to be covered. When things goes bad for your IT, it will be a relief to have a backup to deploy your apps else where to recover and limit downtime and data loss.
It won't improve the SLA
Backup is the last resort when everything goes wrong. It won't improve your service level agreement (SLA) if you have any but ensure that your data and your customer's data are safe.
If a disaster occurs, backups only enable you to redeploy the app elsewhere with a minimal data loss.
Backup is complementary to other approaches to improve SLA, such has database and storage replication within your cloud provider.
How often should I backup
Frequency of backups is a business issue. How important are your customer's data. Can their businesses survive to a 24h data loss? A week data loss? Do you have a commitment on this?
Technically, you have to make a trade-off. Backup execution duration increases as you data grow. In many cases you have to interrupt a service during backup to ensure data integrity. Or at least provide read only access when it's feasible.
You can choose storage system that are easier to backup than a file system on disk.
S3 like Object storage can be versioned, asynchronously replicated within a provider and synced to another site with tools like Rclone or Veeam without downtime. PostgreSQL dumps operates on a snapshot of the data.
You can follow more exotic strategies to improve the process duration like this Gitlab's nice fix to long backups.
The old world 3-2-1 rules
3-2-1 is a traditional rule for backup strategy, it stands for:
- 3 copies of your data
- production and 2 more copies
- one copy off site
It sound like an old on-premise operator motto though. What can we do with our cloud providers?
What am I currently doing?
In my current mission, I am operating VM with PostgreSQL databases. I improved the backup strategy like this
- Backup every day on the VM, with 3 days of retention. Allows fast recovery.
- Backups are transferred to a S3 storage buckets replicated to another AZ.
- Backups are also transferred to a S3 run by another provider in case of a global provider outage
- S3 buckets used by applications are replicated in another AZ and every day to another provider.
Even more boring but necessary: testing the backup
Ok, you're backing up all your data following. Don't wait for the disaster to test them. Are you sure that all your data has been backed up off site? Do you ensure data integrity during transfers?
Also automate backup restoration so you don't have to skim pages of documentation while being under pressure. And of course, test the process often.
What about security
Imagine that you have configured your backups successfuly. Everything is sync off site BUT the destination storage is insecure. It's an easy access to steal your data.
Ensure your backup storage has a very limited access, more over if you store all your tenants backups at the same place