In part one of this miniseries, I introduce you to the concept of Infrastructure as Code (IaC) and explain some of the benefits.
In summary, IaC brings many of the most useful software development work practices to the task of systems administration: knowledge sharing, peer review, and version control, to name a few. It also unlocks a world of possibilities for your Continuous Integration (CI) and Continuous Delivery (CD) pipeline.
As I explain in part one, there are two kinds of IaC tools:
So let’s focus on Hashicorp’s Terraform, a fantastic and versatile tool for orchestration written in Go and used by us at Crate.io for managing our infrastructure.
Using Terraform, infrastructure is defined using templates written in a Domain Specific Language (DSL). You can use a friendly command-line tool to interact with Terraform and to test or apply these templates. Infrastructure is then created and managed using the provider (or providers) of your choice.
Terraform treats infrastructure as a modular collection of immutable components.
When "applying" a new configuration, Terraform will survey the running components and roll back any manual changes that may have been made, bringing your system to a known state before applying the changes you want to make.
This disincentivizes manual changes that can lead to configuration drift, making the overall system more predictable, maintainable, and easier to reproduce.
We love using Terraform to define our infrastructure collaboratively using GitHub and then being able to "apply" it to any environment confident in the knowledge that we are moving from one known state to another.
In this post, I will tell you more about Terraform and provide examples from our own setup at Crate.io.
Terraform allows you to deploy connected infrastructure components across a wide range of different providers.
A provider can be anything in the realm of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), or Software as a Service (SaaS). This includes Microsoft Azure, Google Cloud, Amazon Web Services, GitHub, and much, much more.
Using configuration files called templates, you can write code which leverages the tools and resources that are specific to a particular provider.
Terraform is able to do this using provider plugins that are written specifically for each provider’s Application Programming Interface (API).
Plugins are responsible for taking Terraform actions and translating these into API calls that create, read, update, or delete the provider resources (aka CRUD).
Here's an example template file:
provider "aws" {
region = "us-east-1"
version = "~> 1.2"
}
resource "aws_s3_bucket" "my_bucket" {
bucket = "my-bucket-us-east-1"
region = "us-east-1"
acl = "private"
}
Here, we identify and configure AWS as a provider, and then specify an AWS S3 bucket resource. (A bucket is sort of an AWS-specific data storage silo.)
Terraform resources (in this instance, aws_s3_bucket
) are the components of your infrastructure, and they always belong to a specific provider.
With a file like this, you can ask Terraform to perform a dry run of the actions it would take to create the resource.
For example, if you ran this command:
$ terraform plan
Terraform might then tell you something like this:
Terraform will perform the following actions:
+ aws_s3_bucket.my_bucket
id:
acceleration_status:
acl: "private"
arn:
bucket: "my-bucket-us-east-1"
bucket_domain_name:
bucket_regional_domain_name:
force_destroy: "false"
hosted_zone_id:
region: "us-east-1"
request_payer:
versioning.#:
website_domain:
website_endpoint:
Phew. That's a lot.
But what's going on here?
Terraform is essentially presenting the difference between two states: the system state you have said you want and the actual state of the running system.
This state difference is given as the list of required changes that would be needed to move from one state to another.
Because Terraform terraform treats your infrastructure as a collection of immutable components, changes to a resource may result in it being deleted and recreated. However, sometimes attributes can be updated in place, depending on the resource. If Terraform is going to recreate a resource, the plan will be prefixed with -/+
to indicate that a resource will be destroyed and recreated.
Applying these changes is as simple as running this command:
$ terraform apply
Alternatively, you can output the plan to file. Plan files can then be saved, versioned, and used for input for execution at any time.
Terraform keeps track of resource states using a state file.
A state file is in JSON format and is automatically created the first time you run the apply command.
Every time you issue an apply command, Terraform refreshes the state file, plans what actions are needed based on the refreshed state file, and then performs those actions. For this reason, special care needs to be taken so that no changes are made to the running system in the short window between refreshing the state and performing a change.
If you're working in a team, you could keep the state file in a shared repository and take care to coordinate rollouts so that nobody is making concurrent changes. But that is a potentially risky business, and fortunately there is a better alternative.
Terraform supports remote backends for state that use storage with inbuilt locking, such as Azure Storage or AWS S3.
Using one of these remote backends, the state of a running system can be changed in a way that is atomic. That is, when you run the apply command, either you will secure a lock on the state file (thus ensuring that nobody else can make changes for the duration of command) or you won't, meaning the command will fail (because someone else is making changes).
Here's an example configuration that uses AWS S3 as a remote, lockable backend:
terraform {
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "terraform.tfstate"
encrypt = true
region = "eu-west-1"
dynamodb_table = "terraform-state-lock"
}
}
Under the hood, AWS S3 uses DynamoDB for locking.
Terraform makes use of graph theory.
Don't worry. You don't need to know anything about graph theory to use Terraform. But it is a useful way to think about what Terraform is doing.
Graph theory is used by Terraform when creating a new plan and refreshing state. Essentially, state changes are modeled as a dependency graph, and a plan is a way of traversing the graph so that every node is visited.
Terraform graphs are composed of three different types of node:
A resource nodes represent a single resource, e.g., an Azure Virtual Machine or an AWS S3 bucket.
A provider configuration node represents a single configured resource provider, e.g., an Azure account or an AWS account. All resource nodes belong to a provider configuration node.
Resource meta-nodes group together multiple resource nodes of the same type. This is done because it makes the resulting graphs more understandable for human beings but is not strictly necessary to solve the graph.
Terraform has even made it possible to visualize the dependency graph for your infrastructure, like so:
$ terraform graph
This command will output an image that looks something like this:
You can then use the dependency graph to understand your existing infrastructure or the changes Terraform is planning to make.
If you want to know more about this, Hashicorp's Paul Hinze presented a great talk about how Terraform uses graph theory.
Terraform allows modular configuration files. This reduces the amount of duplicate code necessary and thereby enables you to build a DRY infrastructure.
A module is just a collection of resources. Any directory containing Terraform configuration files can be considered a module.
It's a good idea to make use of modules if you have similar setups being used in different environments. It does require a little more up-front planning, but it massively helps the maintainability of your system as things expand and your team grows.
One example where modules are useful is if you're building a deployment pipeline, where the code is promoted from one environment to another.
Another example is if you're building a high availability system that duplicates the same configuration in multiple locations, for instance with Azure regions.
Terraform specifies a standard structure for modules, and you should follow this structure if you want to get the most benefit out of them.
Let's take a look at an example.
Suppose we wanted to configure some Azure infrastructure.
You might configure your storage with a file named modules/storage/main.tf
that contains the following:
resource "azurerm_managed_disk" "my-storage" {
name = "my-storage disk"
location = "${var.azure_region_name}"
resource_group_name = "${var.rg_name}"
storage_account_type = "Premium_LRS"
create_option = "Empty"
disk_size_gb = 32
count = "1"
}
In a separate file named modules/storage/variables.tf
, you might define azure_region_name
and rg_name
as variables like so:
variable "azure_region_name" {
description = "Azure region in which to create the resource"
}
variable "rg_name" {
description = "Azure Resource Group to create resources"
}
Note here that by defining them as variables, we're able to document their use.
This module, comprising two files, can then be invoked from a file in a separate directory. Let's call it dev/main.tf
. In this file, you provide the module source directory and specify values for the required variables. For example:
module "dev" {
source = "../modules/storage"
azure_region_name = "east-us-2"
rg_name = "storage-dev"
}
You might then create a staging/main.tf
file and a production/main.tf
file, for staging and production respectively.
I mentioned that Terraform can be used to enhance your deployment pipelines, but what does this actually involve?
Well, let's imagine that we want a deployment pipeline that starts with some continuous integration tests that run inside a test environment that is created from scratch every time and then destroyed afterward. If the tests pass, the code can be deployed to the testing environment.
The code can then be promoted through to the staging environment, and later on to production, based on whatever triggers you define.
Such a process might look like this:
Let's consider another example.
Say you've made some changes to your application on a separate branch. You want to be able to run the deployment pipeline to build the application as defined on your branch, thereby triggering a deployment process to a development environment.
You could use Packer to build an Amazon Machine Image (AMI) of your application using the code on your branch. When this is complete, Terraform updates your development environment with the latest version the infrastructure necessary for this application. Finally, the AMI is deployed to the updated development environment.
That might look like this:
Using this approach, you can ensure that your development environment is always running the latest version of your infrastructure configuration.
Similarly, if each of your deployment pipelines involves a step to update your infrastructure, updates can automatically be incorporated.
Furthermore, your infrastructure itself can go through a promotion process, similar to your application.
If changes are made to your dev infrastructure successfully, they can be then successively applied to your subsequent environments, reducing the chances of a bad change propagating to multiple environments.
Terraform is an excellent tool for managing your infrastructure as code which treats your infrastructure as a collection of immutable components that can be controlled in a way that is automated, predictable, and repeatable.
Using Terraform, infrastructure configuration can be modularized and versioned, bringing many of the best things about modern software development to the domain of systems administration.
These benefits have been tremendously useful for us at Crate.io and have allowed us to collaborate effectively as a distributed, multidisciplinary team. And this has been especially important for our work on our hosted CrateDB offering.
In part three of this miniseries, I will take a look at SaltStack, which is a configuration management tool we use alongside Terraform.