Ever feel like you’re constantly swimming against the tide just trying to keep up-to-date with your tools? Trust me, I’ve been there more times than I care to admit. There’s nothing like discovering a nifty new feature, only to realise it’s been out for years. Or when you casually mention something to a colleague, assuming it’s common knowledge, only to be met with a blank expression.
In this article, we’re diving into a handful of those types of features.
Cross-Object Variable Validation
Variable validation was originally introduced in Terraform 0.13, and is a crucial feature for ensuring that only expected values are provided by users. For instance, if your organisation restricts resource deployments to UK South or UK West, you could implement a simple validation rule in your modules as an initial guardrail:
variable "location" {
type = string
validation {
condition = contains(["uksouth", "ukwest"], var.location)
error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
}
}
While Terraform’s built-in functions cover most validation needs, there are times when it feels like you’re trying to fit a square peg into a round hole. Take, for instance, the use-case of limiting the number of instances in a Virtual Machine Scale Set based on the environment. Since variables can only validate their own values, you’ll end up crafting some convoluted logic in your main configuration.
# variables.tf
variable "environment" {
type = string
validation {
condition = contains(["dev", "prd"], var.environment)
error_message = "The value of the variable 'environment' must be either 'dev' or 'prd'."
}
}
variable "vmss_instance_count" {
type = number
default = 1
}
# main.tf
locals {
vmss_max_by_env = {
dev = 2
prd = 5
}
}
resource "azurerm_linux_virtual_machine_scale_set" "example" {
name = "example-vmss"
instances = (
var.vmss_instance_count <= lookup(local.vmss_max_by_env, var.environment) ?
var.vmss_instance_count :
lookup(local.vmss_max_by_env, var.environment)
)
...
}
In the code above we are looking up the environment in a local map, and then determining whether the maximum is exceeded. If the value is larger than the max, we are then overriding it. A bit clunky, but it works.
Things improved when lifecycle preconditions came along. With preconditions, we could also provide the user with a meaningful error, rather than just automatically overriding their input and confusing them.
resource "azurerm_linux_virtual_machine_scale_set" "example" {
name = "example-vmss"
instances = var.vmss_instance_count
lifecycle {
precondition {
condition = var.vmss_instance_count <= lookup(local.vmss_max_by_env, var.environment)
error_message = "The variable vmss_instance_count must be less than or eqaul to ${lookup(local.vmss_max_by_env, var.environment)} for ${var.environment}"
}
}
}
Referencing Other Variables
A bit better. But wouldn’t it be nice if we could just validate directly in the variable? Well, as of Terraform version 1.9 we can!
variable "environment" {
type = string
validation {
condition = contains(["dev", "prd"], var.environment)
error_message = "The value of the variable 'environment' must be either 'dev' or 'prd'."
}
}
variable "vmss_instance_count" {
type = number
default = 1
validation {
condition = var.environment == "dev" ? var.vmss_instance_count <= 2 : var.vmss_instance_count <= 5
error_message = "The variable 'vmss_instance_count' must be less than or equal to 2 for 'dev' environments or less than or equal to 5 for 'prd' environments."
}
}
With the introduction of cross-object variable validation, we can now reference other variable values. This not only streamlines our code but also allows us to catch errors earlier in the Terraform workflow.
Referencing Other Object Types
“But wait! It gets better…”
We are not just limited to validating other variable inputs. Validation blocks can now integrate into the Terraform dependency graph, enabling us to reference other types of objects, such as data sources.
An example of a practical application of this would be checking that users have provided a valid input for attributes that aren’t validated within the provider. Often providers won’t hardcode these checks for things that change frequently, such as VM sizes/instance. Let’s look at an example to illustrate this.
provider "aws" {}
data "aws_ec2_instance_types" "current" {
filter {
name = "current-generation"
values = ["true"]
}
}
variable "instance_type" {
description = "EC2 instance type to used (must be current)."
type = string
validation {
condition = contains(data.aws_ec2_instance_types.current.instance_types, var.instance_type)
error_message = "Invalid or non-current instance type provided."
}
}
Let’s make sure the plan goes through when we provide a valid, current image.
Great. And when we use an invalid value…
A powerful addition to the Terraform arsenal, I’m sure you’ll agree.
Provider-Defined Functions
A common question that I’m asked by students when delivering Terraform training is
“Can we create our own functions?”
Historically the answer has been “no” - you were only able to utilise the builtin Terraform functions. As of Terraform 1.8 however, HashiCorp has introduced further extensibility with provider-defined functions.
Now before you get too excited, you’re not going to be defining these functions in HCL. You’ll need to go deeper and be able to write a provider if you want to create your own, but already a number of the larger providers (such as Azure and AWS) have started adding functions.
Let’s have a look at the two Azure functions available at the time of writing.
normalise_resource_id
The normalise_resource_id function allows you to take an Azure resource ID and normalise the case-sensitive portions so they meet the requirements of the provider APIs.
locals {
normalised = provider::azurerm::normalise_resource_id("/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/resourcegroups/cloud-shell-storage-westeurope/providers/Microsoft.Storage/storageAccounts/dummyaccountname")
}
Ok, it works. I can’t see a huge value with this one, but it proves the concept 😃.
parse_resource_id
The second function (parse_resource_id) is a little more useful than the first. Imagine you’ve got a particular resource ID hard-coded into your code for some reason. You may need a particular attribute of the ID for a resource. Rather than having to use the split()
function to split the string and get to the sections you want as list indexes, you can now parse the values more easily.
locals {
hardcoded = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/cloud-shell-storage-westeurope/providers/Microsoft.Storage/storageAccounts/dummyaccountname"
parsed = provider::azurerm::parse_resource_id(local.hardcoded)
resource_type = local.parsed.resource_type
resource_provider = local.parsed.resource_provider
}
I imagine as time goes on, we will start to see some more useful functions appearing, and probably some providers that are purely dedicated to functions (if they don’t already exist - I’ve not dug about).
Testing Framework
This subject is an entire article in it’s own right (and may well be soon!), but let’s touch on it briefly. For the same reasons you’d use automated testing in your software development processes, automated testing of your Terraform code (e.g. modules) can help to ensure your code is working and secure, particularly if it’s integrated into an automated CI/CD pipeline.
Previously HashiCorp had an experimental test command, but it wasn’t great. I looked at it briefly, but stuck with Terratest - a testing framework written in Go. This was fine if you knew a bit of Go, but it wasn’t for everyone.
Version 1.6 of Terraform saw the release of the revamped testing framework, which is much nicer to work with. It consists of a number of .tftest.hcl (or .tftest.json) test files, each containing a few blocks
variables {}
- Any global variables you want to apply across all run blocks (unless overridden).provider {}
- Any global provider configuration you want to apply across all run blocks (unless overridden).run {}
- Specifies a number of things, such as:- The type of Terraform command you’re going to run.
- Any overrides for provider/variable configuration.
- Any assert blocks you may wish to implement (more on that shortly).
Note: the run blocks are executed in the order they appear within the configuration unlike other Terraform configuration files.
Simple Variable Tests
Let’s start with a simple example. Say you have some variable validation to ensure that a location variable is either ‘uksouth’ or ‘ukwest’. We can write a test to ensure this validation is working. That way if anyone changes this in the future, our tests will catch it.
# main.tf
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "4.3.0"
}
}
}
provider "azurerm" {
features {}
subscription_id = "00000000-0000-0000-0000-000000000000"
}
variable "location" {
type = string
validation {
condition = contains(["uksouth", "ukwest"], var.location)
error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
}
}
resource "azurerm_resource_group" "this" {
name = "rg-test-demo"
location = var.location
}
resource "azurerm_storage_account" "this" {
name = "stgaccttestdemo"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
account_tier = "Standard"
account_replication_type = "LRS"
}
# location.tftest.hcl
provider "azurerm" {
subscription_id = "00000000-0000-0000-0000-000000000000"
features {}
}
run "uksouth_location_allowed" {
command = plan
variables {
location = "uksouth"
}
}
run "ukwest_location_allowed" {
command = plan
variables {
location = "ukwest"
}
}
run "westeurope_location_denied" {
command = plan
variables {
location = "westeurope"
}
expect_failures = [var.location]
}
Notice here we have an “expect_failures” block. We are intentionally passing in a value that should fail, as it’s important to check things that shouldn’t work, don’t work. Running terraform test
provides us the following output:
Everything passed as expected. What if we “accidentally” introduce a typo into our uksouth validation condition (we shall change it to ‘uksout’)…
variable "location" {
type = string
validation {
condition = contains(["uksout", "ukwest"], var.location)
error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
}
}
When we run terraform test again, our tests fail because the value ‘uksouth’ is not permitted due to our typo of ‘uksout’ in our validation.
What if we remove the validation block entirely?
Even though we removed the variable validation block, our tests still fail. This is because we’ve told terraform that providing ‘westeurope’ SHOULD fail. When it doesn’t fail (due to the lack of validation), our tests do not pass.
Testing With Real Resources
So far none of our tests actually built any real resources, so let’s change that. Let’s build a quick and dirty Linux web server with a hello world webpage configured. Once built, lets test that it is reachable. First, the VM (etc.) configuration (yes I know… I hardcoded a crap password 😉):
# main.tf
terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
version = "4.3.0"
}
}
}
provider "azurerm" {
subscription_id = "00000000-0000-0000-0000-000000000000"
features {}
}
provider "http" {}
resource "azurerm_resource_group" "this" {
name = "rg-test-demo"
location = "uksouth"
}
resource "azurerm_virtual_network" "this" {
name = "vnet-demo"
address_space = ["10.0.0.0/16"]
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
}
resource "azurerm_subnet" "this" {
name = "subnet-demo"
resource_group_name = azurerm_resource_group.this.name
virtual_network_name = azurerm_virtual_network.this.name
address_prefixes = ["10.0.1.0/24"]
}
resource "azurerm_public_ip" "this" {
name = "public-ip-demo"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
allocation_method = "Static"
}
resource "azurerm_network_interface" "this" {
name = "nic-demo"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
ip_configuration {
name = "ipconfig-demo"
subnet_id = azurerm_subnet.this.id
private_ip_address_allocation = "Dynamic"
public_ip_address_id = azurerm_public_ip.this.id
}
}
resource "azurerm_network_security_group" "this" {
name = "nsg-demo"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
security_rule {
name = "allow_ssh"
priority = 1000
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = 22
source_address_prefix = "*"
destination_address_prefix = "*"
}
security_rule {
name = "allow_web"
priority = 1100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = 80
source_address_prefix = "*"
destination_address_prefix = "*"
}
}
resource "azurerm_network_interface_security_group_association" "this" {
network_interface_id = azurerm_network_interface.this.id
network_security_group_id = azurerm_network_security_group.this.id
}
# Start of Selection
resource "azurerm_linux_virtual_machine" "this" {
name = "linux-vm-demo"
resource_group_name = azurerm_resource_group.this.name
location = azurerm_resource_group.this.location
size = "Standard_B1s"
admin_username = "adminuser"
admin_password = "P@ssw0rd1234!"
disable_password_authentication = false
custom_data = base64encode(<<-EOF
#!/bin/bash
sudo apt update
sudo apt install -y apache2
echo '<h1>Welcome to the web server!</h1>' | sudo tee /var/www/html/index.html
sudo systemctl restart apache2
sudo systemctl enable apache2
EOF
)
network_interface_ids = [
azurerm_network_interface.this.id,
]
os_disk {
storage_account_type = "Standard_LRS"
caching = "ReadWrite"
}
source_image_reference {
publisher = "Canonical"
offer = "0001-com-ubuntu-server-jammy"
sku = "22_04-lts-gen2"
version = "latest"
}
connection {
type = "ssh"
user = self.admin_username
password = self.admin_password
host = self.public_ip_address
}
provisioner "remote-exec" {
inline = [
"while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 5; done",
"echo 'Cloud init process has finished.'"
]
}
}
data "http" "website" {
url = "http://${azurerm_linux_virtual_machine.this.public_ip_address}"
}
And our test file…
provider "azurerm" {
subscription_id = "00000000-0000-0000-0000-000000000000"
features {}
}
run "website_reachable" {
command = apply
assert {
condition = data.http.website.status_code == 200
error_message = "The website did not return a 200 OK."
}
}
This time when we run our test, it takes a lot longer. This is because Terraform is actually running an apply operation in the background. After a few minutes, we can see our test was successful.
Let’s remove the port 80 rule from the Network Security Group so that the website is unreachable and then retest…
resource "azurerm_network_security_group" "this" {
name = "nsg-demo"
location = azurerm_resource_group.this.location
resource_group_name = azurerm_resource_group.this.name
security_rule {
name = "allow_ssh"
priority = 1000
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_range = 22
source_address_prefix = "*"
destination_address_prefix = "*"
}
}
Conclusion
These are just a handful of features that are available, but they’re seldom used in the environments I encounter. Why not give them a try and level-up your IaC.
Hopefully you found this useful. If there are any subjects you’d love to see an article on, drop me a note at email@mikeguy.co.uk.