Recently I came across a question on StackOverflow that was asking about how they could backup Azure Blob storage. They finished this question asking I can’t be the only one who needs to do this”. This struck a chord with me, as I recall feeling exactly the same when I had a need to do this. It feels like something that is so obvious, that should be built into the platform. Yet it wasn’t and there didn’t seem to be much out there on how others were doing it. This was very frustrating! Things have changed a bit since I looked at this a year or two ago, but it is still a bit of a frustrating experience. I thought it worth documenting what the options are.
Why backup Azure Storage?
The question that always comes up when discussing this is “why bother”? Blob storage is replicated 3 times locally. If your using Geo Redundant storage then it is replicated another 3 times remotely, so why do you need to back it up? The purpose of these backups is not to handle hardware failure, Microsoft has got this pretty well covered with these replicas. What we are protecting against is users accidentally deleting files, overwriting them or data corruption. If any of these occur they will be immediately replicated and nearly impossible to recover from. This is why we may want to backup this data.
As mentioned, for some reason there is no backup functionality built into Blob storage. We need to put in place an alternative solution. Below are some ways of achieving this.
Migrate to Azure Files
This might seem like a strange suggestion, but Azure files (the SMB layer on top of Blob storage) does have backup capabilities built in. You can take snapshots of your whole file share and then store these in an Azure backup vault. Whilst Azure files is different to Blob storage and has its own limitations, it does operate in a similar manner and has a similar API. If your able to switch over to files it is something to consider, as you would then have backup out of the box.
Blob storage does have the ability to create snapshots. Unlike the Azure files version, these are snapshots of individual blobs, not the whole account. They exist only in the storage account and you cannot store them in a vault. There are also a few other limitations with Blob snapshots
- The snapshot is tied to and lives within the original Blob, it’s URI is the same as the original blob with the timestamp of snapshot appended
- Snapshots are read-only unless you choose to overwrite the base blob with the snapshot. You cannot restore a writable copy to another location.
If you have a relatively small amount of data, or your application is able to handle the creation and management of blobs programmatically then this could work as a backup strategy. But having to maintain individual snapshots for each blob is painful, as is the restore process.
Soft delete is a new option for blob storage accounts. When enabled and a blob is deleted it is not immediately purged. The blob is marked for deletion and removed from the users view for a configurable period of time. During this soft delete window, it is possible to restore the deleted blob. This can be done either through the portal or using the API. Once the soft delete window ends the file is permanently deleted.
If your main concern is around accidental deletion of blobs then this can be a good solution. This assumes you are able to spot the problem and restore during the soft delete window. This won’t deal with the accidental overwriting of blobs or corruption of data.
One of the most commonly used approaches for backing up blobs is to copy them to a second storage account using the AzCopy command line tool. This tool lets you undertake a copy from one storage account to another (not going via your PC). Using AZCopy you can create a backup of your data on a regular basis. To avoid copying over any overwrites or corruption you would need to make sure you are creating these copies in containers with data/time stamps or similar. Using this you will get a backup of your data, but there are some downsides:
- To ensure you are not replicating issues into your backup you will need to take full, separate copies of the data for each day/week, or replicate changes in a way that does not overwrite the original. If you don’t do this you are just setting up another replication.
- AZCopy does not work well with blobs that have snapshots. If you copy a blob that has 10 snapshots you will end up with 10 blobs at the other end, and there is no easy way to re-constitute these back into a single blob
- You need to be careful with how many concurrent files you copy. AZCopy has the option to set the number of copy threads. If you are copying large blobs then setting this too high can cause the copy to fail sometimes when it hits performance limits on the storage account
Az Copy can be pretty slow if you have large volumes of data or very large files. This is particularly true if you are doing other things with the storage accounts at the same time.
Undertaking a copy using AZCopy is a single line command specifying the source and destination storage account details:
AzCopy /Source:https://sourceaccount.blob.core.windows.net/mycontainer1 /Dest:https://destaccount.blob.core.windows.net/mycontainer2 /SourceKey:key1 /DestKey:key2 /S
Third Party Tools
There are many third-party tools that offer the ability to backup blob storage. Many backup vendors (Commvault, Symantec etc.) offer an option for blob storage backup as part of their backup suites (and many also offer the option to backup to blob storage as well). If you are already using these tools for other backups it can be easy to add the option to backup blobs. If you don’t use one of these solutions already though, it can be quite expensive to get started.
There used to be some standalone tools that focussed on Blob backup. At the time of writing, they all seem to have disappeared or are in the process of closing down.
This is a frustrating issue. There are very valid reasons why you may want to backup blob storage and yet the options for doing so are failry limited, and have been for some time. Hopefully in the future Microsoft will add some backup facilities, like they have added to Azure Files, that will resolve this problem. Until then we are stuck using tools like AZCopy or paying out for third party tools.