In an ideal world, all our cloud applications would be designed from the ground up to work with the cloud, they would be designed to work with cloud principals, make use of PaaS services and provide high availability. Unfortunately, this is often not the case. We are regularly tasked with moving existing on-premises applications into the cloud as a “lift and shift” type operation, until they can be redesigned to be cloud native. In these sort of applications, it is not uncommon for there to be a need for a shared file store. This can be a challenge in a cloud environment and was a topic I recently discussed at the Azure Saturday event in Munich (slides here).
Providing a shared file store an Azure has some inherent issues, particularly when looking to provide a fault tolerant solution to meet the SLA requirements. The most significant issue is one that is down to the design of the Azure platform, and that is the lack of shared storage. In an on premises world using SAN storage we can easily present the same storage to multiple machines and use tools like fail-over clustering to provide a highly available file share. In Azure, we cannot allocate storage to more than 1 machine at a time, which means we need to look at other avenues for our shared files.
The challenge – providing a reliable, available, file share without the use of shared storage. There are a few different ways this can be achieved, and unfortunately none of these solutions are perfect, each have drawbacks. The best solution will be the one that fits your needs with least downsides. In this article, we will consider the following:
- Azure Files
- Single File Server
- Storage Spaces Direct
- Storage Replica
- Third Party Solutions
We will mainly be looking at benefits and issues with each of these solutions, rather than deep diving into the technologies and how to set them up. I do plan on producing some more deep dive articles into some of this in the future.
These 6 options are ones I have picked out as being the most useful and compatible with services run in Azure. I am sure they are not the only options. I have also focused on services provided using Windows that run total in Azure, there are other services on the Linux platform, and hybrid solutions like Storsimple that we are not considering here.
Azure files is Microsoft’s platform as a service (PaaS) offering for providing SMB shares. Based on top of Azure storage, shares can be defined and accessed over UNC path or using the REST API. At first glance, Azure files seems like the simplest and quickest solution to the file share problem, as it provides:
- High Availability
- Redundancy and Replication (including geographically)
- SMB 2.1 and 3
- Low cost
These features are enabled out of the box simply by creating a file share. However, there are some significant issues with the platform that for some will make this a no-go
- Security – for many this is going to be the biggest issue. Access to the file share is using the storage account key, the top-level security key to access the storage account. It is not possible to use NTFS permissions, or even a SaS key. That means if you need users to directly access the share or map drives, you are going to be giving them the storage key in plain text. This can cause all sorts of issues if people inadvertently share they key, or leave and take it with them.
- Access – Azure file shares can be accessed externally to Azure, again using the UNC path and storage key. For some this will be a benefit, for others it is not something that would be allowed, and there is no way to turn it off.
- Performance – Throughput om Azure files is limited at 60MB/s (per the documentation, I have achieved slightly higher)
- Size – Azure file shares are limited to 5TB total and 1TB for a single file
- Backup – The Azure platform doesn’t provide an easy way to backup Azure storage, and this is true for File shares. Data is replicated to deal with hardware failure, but dealing with accidental changes/deletion isn’t something the platform provides. There are 3rd part tools that can assist with this, but with extra cost and complexity
Further Reading: Azure Files
Single File Server
Sometimes, the simplest approach can be the best. For certain workloads, just running a single file IaaS file server may suffice:
- SLA – Assuming your using premium storage, single instance VM’s now get a 99.9% SLA, not quite as good as 99.95% with two VM’s, but for some applications this may be enough
- Performance – Having a single server (of the right size and storage type) can provide very good performance, especially as there is no overhead for replication or synchronisation
- Backup – tools like Azure Backup and Site Recovery work well with this simple setup and provide easy ways to backup data or replicate to another region
- Cost- A single file server has very low cost, with no additional storage costs for data replication
This solution won’t work for everyone. The biggest issue is the single point of failure in the VM, even with the 99.9% SLA. There will still be a need to undertake maintenance on the VM, and when this occurs it will result in downtime. Your also very reliant on backup as your solution for dealing with any sort of DR event.
Further Reading : Single Instance SLA
Storage Spaces Direct
Storage Spaces Direct (S2D) is a new offering in Windows Server 2016 which allows you to build a file server cluster without any shared storage. Date is replicated across servers using a similar process to RAID, with Mirror or Parity options available. We’ll look at S2D in more detail in a future post, but if you want to read more about this now you can look here.
At first glance, S2D seems to provide a nearly perfect solution to storing files in Azure:
- It provides a true, Active/Active cluster with instant failover in the event of a server going down
- It is highly available and using Azure availability sets can survive planned maintenance events with zero downtime.
- It meets the Azure two machine SLA to achieve 99.95 availability.
- It can use Azure storage as a witness rather than requiring additional VM’s for this role
However, nothing is perfect and this holds true here:
- S2D has been designed for, and tested against, using larger files held open for long periods. Things like SQL databases, VHD files etc. It has not been tested for use as a regular file server holding lots of smaller files that are opened and closed regularly.
- From personal experience, whilst a lot of files function fine, I have seen some performance degradation with certain file types, such as MS Access
- Because we are replicating data across multiple servers, the amount of storage is going to be double or triple that which you actually want to provision, this increases costs
- There are limitations in what can be done with backup, due to the fact that S2D uses REFS as the file system. Azure backup and ASR do not support it currently
- Obviously, this requires server 2016, which may be an issue for some
S2D is a good way to get a true cluster up and running in Azure, if you are either using the types of files it is intended for, or if you are willing to use it with untested file types and test the performance yourself.
Storage Replica is another new offering in Server 2016. This is not a true cluster like S2D, but instead a way to replicate data between nodes, primarily for DR purposes.
Storage replica is similar to DFSR but has a few key advantages:
- Replication is done at the block level, unlike DFSR. This means it can deal with replicating open files, which has always been a problem with DFSR. It also means it is only replicating that which has changed
- It supports both synchronous and asynchronous replication
- No special file system like S2D, so it can be backed up easily using Azure backup or similar
- It can even be combined with S2D to replicate between clusters
As mentioned, storage replica is mainly intended as a DR solution, so has a couple of issues when it comes to looking at this as a solution for sharing files in Azure:
- Failover is not automatic, if you wish to switch over to the second node it requires running a PowerShell command
- Replication is one to one, so only two nodes are in every partnership
- There is no access to data on the secondary node when the primary is active, so no option for a read only replica or similar
- Again, it requires Server 2016
Storage Replica could be a great solution for handling planned maintenance, as it’s easy to switch over to the secondary and back again as required manually. However, if you’re looking for a solution that can automatically handle unplanned issues, then this is not going to cut it. Note that Server to Server and Cluster to Cluster replication is supported in Azure, stretch cluster is not.
DFS Replication is a time-tested solution for replicating data between multiple nodes. It’s been around since server 2003 R2 and is supported in Azure. DFSR has several known issues or idiosyncrasies, but if none of the other options listed here work it may be an appealing solution give that:
- It supports replication of data to multiple nodes
- It can be run in an active/active state so that if a node fails the others pick up the load automatically
- It meets the Azure requirements for 99.95% SLA
- Backup is again simple, due to the use of the standard Windows File System
As mention however, there are a few well-known limitations or issues with DFSR:
- Failover, whilst automatic is not instant. There can be a delay of up to 90 seconds when switching nodes
- DFSR cannot replicate open files, if files are held open for long periods there will be no replication to other nodes and this can leave you open to data loss
- Performance can be an issue with DFSR, especially when transferring lots of large files causing the replication queue to increase
- Replication is not synchronous, it is handled in the background after the file is written and can be delayed if there is a large replication queue
DFSR, in my view is an option to fall back to when none of the other options here will work. It will potentially do the job, however it has some big limitations and adds to management complexity significantly.
Further Reading: DFSR Overview,
Third Party Solutions
To round off this list, I wanted to mention that there are several third-party solutions that can provide solutions for SMB services in Azure. I don’t have direct experience with these, but it is worth mentioning. Some examples include SIOS Data Keeper and SoftNAS. These solutions mostly work by providing an additional layer of abstraction on top of Azure Storage and VM’s to present what appears to be shared storage to your file servers. This then means you can use traditional fail-over clustering to create your file server cluster.
This obviously has benefits in terms of being able to use a solution you are already familiar with and trained to use. That said, as you might expect these third-party solutions come at a fairly significant cost. They also add a good amount of complexity to your environment. The solutions will often need to be run on their own VM’s, and will require multiple VM’s to provide redundancy, which adds to the cost and management overhead. Because they are mimicking shared storage they then need to handle the replication of data between the nodes (given that it isn’t really shared storage) using their own replication methods.
As we have discussed, none of these solutions is, unfortunately, perfect. They all have downsides and for most people the decision on which one to use will be based on which has the least number of downsides for your project. Hopefully for some of you the issues mentioned here won’t be a factor for you and your choice is easy, for others you will need to decide which compromises to make. To help with this decision I have produced a flow chart that works through what we discussed here at a high level to help make a choice. Obviously, the hierarchy here is based on my views on the severity of these downsides, and your view may differ, but hopefully it might help start the conversation. You can also download a copy here.
If you need to host SMB services in Azure, I hope the whistle stop tour above has given you some ideas of where to look to get started and what to research. A few final suggestions on this topic:
- Future updates may change things, there are new features coming to Azure all the time that may remove some of the issues we have discussed, keep your eyes open.
- Appropriateness will vary by project, don’t try and pick a solution that works for all your projects, pick the appropriate one for your workload.
- Test! Make sure that the performance and reliability you need is present, especially if you are using a solution in an untested or unique way
- Ensure you understand the limitations and build them into your operations plans