Building an Infrastructure Pipeline Part 1 Version Control
Don’t forget to check out the other parts in this series:
Defining Infrastructure as Code is becoming prevalent in all areas of IT, but none more so than in the cloud. Be this Azure Resource Manager Templates, AWS Cloudformation or third party tools like Terraform. Once you’ve gone down this route, you open up the ability to treat your infrastructure code like any other code, and because it’s just code you can now leverage some of the tools developers have been using for years in their software lifecycle. In this series of posts we are going to take a look at how these developer tools can be used to support infrastructure in this new code based world. We’ll be covering:
- Source Control
- Testing
- Build and Package Management
- Release Management
Using these developer tools we can move from just writing some templates to deploy our VM’s to a fully managed pipeline that supports a robust approach to delivering production infrastructure.
In these articles we’re going to focus on Azure and utilise Visual Studio Team Services as our development platform, but there is no reason why these methods can’t be applied to other Cloud providers and other Continuous Integration platforms.
So, let’s get started and take a look at Source Control for Infrastructure!
Source Control for Infrastructure.
Source control has been around for decades, ever since more than one developer needed to work on a codebase there was a need to manage a central code repository and co-ordinate changes between the team. There are obviously many different solutions for version control, and any of these can be used for storing you infrastructure code, but we are going to focus on using Git, as it’s the most popular choice today.
Before we look at how to implement source control, we need to look at why, what benefits do we actually get from using version control, why wouldn’t we just stick all of our code in a file share and be done with it?
- Central repository – obviously the first thing that version control gives you is a central repository for your code. This makes it easy for all your team to access, and to for things like backups to work with, it also makes it much easier for a new team member to get all they need to get started from one location. But as we said, we can achieve this with a file share.
- Concurrent access – the first big win with version control is the ability for each team member to work on the files without conflicting with other team members. There’s no need to lock a file or take local copies manually, version control will let your team work concurrently on the same files and handle merging these changes together when you finish.
- Versioning – Each change made to your infrastructure code will be versioned, so if you need to you can see who made a change (or who broke things!) and even roll back a change if required.
- Audit– All changes are logged and recorded so you can see what was changed, who did it, and when
- Tagging – Need to not only see versions, but be able to locate a specific version of your code which was used for a specific environment. Tagging allows you to mark a specific version and be easily able to come back and use it later
- Branching – One step on from versioning, you can specifically retain multiple different active versions of your infrastructure code. This can be really useful for applying different versions of infrastructure code to different environments and so on. We’ll discuss this in more detail later in this article.
Finally, having a version control repository is really the bedrock that all of the next parts in the pipeline are going to be based on and where they will pull their data from, so even if none of the above are appealing to you, then if your interested in the next stages you will want to get yourself a version control repository.
Implementing Version Control
We’re clear on the benefits, so how do we implement it? Well firstly I’m not going to be able to give you a lesson on how to use Git or other version control tool in this article, so if you want to just get familiar with Git I would take a look at this great beginners guide.
We need to determine where you are going to put your source control repository. As I mentioned in this example we’re going to use VSTS as our provider for all of our CI services, so we will make use of the Git provider in VSTS, however the process should be similar for any Git provider. We will also be replicating our setup in GitHub so you can look at a live repository. GitHub is a great solution for Open Source or publically open projects, but obviously if your going to be storing confidential or sensitive information then you’d want to either use a private GitHub repo or another provider.
Once you’ve created your repository you want to clone it to your local machine, this will take a copy of the repository to your local machine for you to work on. We will be done using your client of choice. I tend to use SourceTree as I like the visual representation of the Git tree but you can use any Git client you like. Often your repository provider will populate the repo with a default readme file, so when you checkout the repository for the first time you may see this in the root of your folder.
Folder Structure
Now you have a blank repository your ready to start setting up your content. If your using a configuration management solution then you may have a fixed layout for your data, but if not then you can really organise this any way you like. I tend to follow this layout when I create an Infrastructure:
- Templates – This contains my ARM templates
- Parameters – Parameter files for use with ARM templates
- DSC – PowerShell Desired State Configuration Files
- Scripts – Any ancillary scripts I might use as part of my deployment process
- Readme.md – Generally I will use the read me to document the purpose of the repository and what the included files are for
You can see an example of this repository over on Github.
Editing Files
The next part is easy, you just need to work on your infrastructure as you would normally. Edit existing files, add new ones, delete old ones, Git will be keeping track of all this work you are doing. Once you reach a point where you are ready to commit some work to the repository, you can go ahead and use your Git client of choice to add and commit the data (again, see the Git beginners guide if this does not make sense). Once you do that your data is committed to the local copy of the repository on your machine, so you can see your version history, roll back etc. however it hasn’t reach the central server (or remote) yet and so your team can’t see it.
The final step then is to push your changes (again using your Git client) to your central server. At this point your changes will be merged with everyone else’s and be available for everyone. If there are any conflicts with other peoples changes it is at this point you will need to resolve these.
Once you have done all this you are generally going to want to pull from the remote so that you have all the latest changes from others as well (or you can do push and pull in one go with the sync command), and then your ready to go again working on your local copy of the repository.
Repositories
One thing to remember, particularly with Git, is that repositories are cheap. Generally I will focus a repository around a particular project or set of infrastructure and create a new one when I move on to the next thing, rather than trying to keep all of my configurations in a single repository. This seems to make management of resource much easier.
One thing I do have is a “resources” repository where I store generic or reusable components that might be usable in multiple sets of infrastructure. I then include this as a Git submodule in any repository where I want to use it. That way I can re-use code and keep these common files in a single location, which makes changing them much easier.
Branching
Git source control has the concept of branching, and many other solutions have a similar concept. This is where you can explicitly branch off from the main trunk of your code to create two parallel versions of your code. This is especially useful where you are working on a new version of your infrastructure, but you need to always ensure you have a version of code that is working, tested and can always be deployed. In Git each repository will always have a “Master” branch, this is the main branch you get out of the box and if your not going to use branching will be the one you always work in. In our scenario we will keep the master branch as our clean, production ready branch and we will create a second branch to work on. I tend to follow GitFlow for how I arrange my branches, so following that principal we will create a branch called “Develop” which will be our branch where we work on new code. If your interested in this approach you can read up on Gitflow and start using feature and release branches as well.
Once we’ve done all the work we need to do in our development branch we might deem this no ready to actually become part of our production workflow, so we want to get it into our Master branch. To do this we can merge the code from develop into master, this will bring the master branch inline with develop so that at the point of the merge the master branch now contains all the changes we made in develop.
In the example timeline below you can see the master branch in blue and the divergance and merging of the develop branch in red and green
Another great use from branching is environments. Lets say you have development, test and production environments. You will generally want to work on your code changes first in development then when your happy, move them to test and then production. If we make our master branch represent production, then we can create dev and test branches and move code between them, and so between environments using merging. Using this method you can easily progress changes through environments.
Summary
Hopefully this has given you a a good summary of why version control is an important feature when working with infrastructure as code, and over the next few articles you will see even more why it is the bedrock of our pipeline. There are lots more advanced things you can do with version control systems that could make your process even more slick. If these interest you I would very much recommend looking at some more advanced tutorials or courses on sites like Pluralsight.
In the next article we will be taking our version controlled code and looking at how we can test this.