Ever feel like you’re constantly swimming against the tide just trying to keep up-to-date with your tools? Trust me, I’ve been there more times than I care to admit. There’s nothing like discovering a nifty new feature, only to realise it’s been out for years. Or when you casually mention something to a colleague, assuming it’s common knowledge, only to be met with a blank expression.

In this article, we’re diving into a handful of those types of features.

Cross-Object Variable Validation

Variable validation was originally introduced in Terraform 0.13, and is a crucial feature for ensuring that only expected values are provided by users. For instance, if your organisation restricts resource deployments to UK South or UK West, you could implement a simple validation rule in your modules as an initial guardrail:

variable "location" {
  type = string
  validation {
    condition     = contains(["uksouth", "ukwest"], var.location)
    error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
  }
}

While Terraform’s built-in functions cover most validation needs, there are times when it feels like you’re trying to fit a square peg into a round hole. Take, for instance, the use-case of limiting the number of instances in a Virtual Machine Scale Set based on the environment. Since variables can only validate their own values, you’ll end up crafting some convoluted logic in your main configuration.

# variables.tf

variable "environment" {
  type = string 
  validation {
    condition     = contains(["dev", "prd"], var.environment)
    error_message = "The value of the variable 'environment' must be either 'dev' or 'prd'."
  }
}

variable "vmss_instance_count" {
  type    = number
  default = 1
}
# main.tf

locals {
  vmss_max_by_env = {
    dev = 2
    prd = 5
  }
}

resource "azurerm_linux_virtual_machine_scale_set" "example" {
  name      = "example-vmss"
  instances = (
    var.vmss_instance_count <= lookup(local.vmss_max_by_env, var.environment) ?
    var.vmss_instance_count :
    lookup(local.vmss_max_by_env, var.environment)
  )
  ...
}

In the code above we are looking up the environment in a local map, and then determining whether the maximum is exceeded. If the value is larger than the max, we are then overriding it. A bit clunky, but it works.

Things improved when lifecycle preconditions came along. With preconditions, we could also provide the user with a meaningful error, rather than just automatically overriding their input and confusing them.

resource "azurerm_linux_virtual_machine_scale_set" "example" {
  name      = "example-vmss"
  instances = var.vmss_instance_count
  lifecycle {
    precondition {
      condition     = var.vmss_instance_count <= lookup(local.vmss_max_by_env, var.environment)
      error_message = "The variable vmss_instance_count must be less than or eqaul to ${lookup(local.vmss_max_by_env, var.environment)} for ${var.environment}"
    }
  }
}

Referencing Other Variables

A bit better. But wouldn’t it be nice if we could just validate directly in the variable? Well, as of Terraform version 1.9 we can!

variable "environment" {
  type = string 
  validation {
    condition     = contains(["dev", "prd"], var.environment)
    error_message = "The value of the variable 'environment' must be either 'dev' or 'prd'."
  }
}

variable "vmss_instance_count" {
  type    = number
  default = 1
  validation {
    condition     = var.environment == "dev" ? var.vmss_instance_count <= 2 : var.vmss_instance_count <= 5
    error_message = "The variable 'vmss_instance_count' must be less than or equal to 2 for 'dev' environments or less than or equal to 5 for 'prd' environments."
  }
}

With the introduction of cross-object variable validation, we can now reference other variable values. This not only streamlines our code but also allows us to catch errors earlier in the Terraform workflow.

Referencing Other Object Types

“But wait! It gets better…”

We are not just limited to validating other variable inputs. Validation blocks can now integrate into the Terraform dependency graph, enabling us to reference other types of objects, such as data sources.

An example of a practical application of this would be checking that users have provided a valid input for attributes that aren’t validated within the provider. Often providers won’t hardcode these checks for things that change frequently, such as VM sizes/instance. Let’s look at an example to illustrate this.

provider "aws" {}

data "aws_ec2_instance_types" "current" {
  filter {
    name   = "current-generation"
    values = ["true"]
  }
}

variable "instance_type" {
  description = "EC2 instance type to used (must be current)."
  type        = string
  validation {
    condition     = contains(data.aws_ec2_instance_types.current.instance_types, var.instance_type)
    error_message = "Invalid or non-current instance type provided."
  }
}

Let’s make sure the plan goes through when we provide a valid, current image.

Successful Plan

Great. And when we use an invalid value…

Failed Plan

A powerful addition to the Terraform arsenal, I’m sure you’ll agree.


Provider-Defined Functions

A common question that I’m asked by students when delivering Terraform training is

“Can we create our own functions?”

Historically the answer has been “no” - you were only able to utilise the builtin Terraform functions. As of Terraform 1.8 however, HashiCorp has introduced further extensibility with provider-defined functions.

Now before you get too excited, you’re not going to be defining these functions in HCL. You’ll need to go deeper and be able to write a provider if you want to create your own, but already a number of the larger providers (such as Azure and AWS) have started adding functions.

Let’s have a look at the two Azure functions available at the time of writing.

normalise_resource_id

The normalise_resource_id function allows you to take an Azure resource ID and normalise the case-sensitive portions so they meet the requirements of the provider APIs.

locals {
  normalised = provider::azurerm::normalise_resource_id("/SUBSCRIPTIONS/00000000-0000-0000-0000-000000000000/resourcegroups/cloud-shell-storage-westeurope/providers/Microsoft.Storage/storageAccounts/dummyaccountname")
}

Ok, it works. I can’t see a huge value with this one, but it proves the concept 😃.

parse_resource_id

The second function (parse_resource_id) is a little more useful than the first. Imagine you’ve got a particular resource ID hard-coded into your code for some reason. You may need a particular attribute of the ID for a resource. Rather than having to use the split() function to split the string and get to the sections you want as list indexes, you can now parse the values more easily.

locals {
  hardcoded         = "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/cloud-shell-storage-westeurope/providers/Microsoft.Storage/storageAccounts/dummyaccountname"
  parsed            = provider::azurerm::parse_resource_id(local.hardcoded)
  resource_type     = local.parsed.resource_type
  resource_provider = local.parsed.resource_provider
}

I imagine as time goes on, we will start to see some more useful functions appearing, and probably some providers that are purely dedicated to functions (if they don’t already exist - I’ve not dug about).

Testing Framework

This subject is an entire article in it’s own right (and may well be soon!), but let’s touch on it briefly. For the same reasons you’d use automated testing in your software development processes, automated testing of your Terraform code (e.g. modules) can help to ensure your code is working and secure, particularly if it’s integrated into an automated CI/CD pipeline.

Previously HashiCorp had an experimental test command, but it wasn’t great. I looked at it briefly, but stuck with Terratest - a testing framework written in Go. This was fine if you knew a bit of Go, but it wasn’t for everyone.

Version 1.6 of Terraform saw the release of the revamped testing framework, which is much nicer to work with. It consists of a number of .tftest.hcl (or .tftest.json) test files, each containing a few blocks

  • variables {} - Any global variables you want to apply across all run blocks (unless overridden).
  • provider {} - Any global provider configuration you want to apply across all run blocks (unless overridden).
  • run {} - Specifies a number of things, such as:
    • The type of Terraform command you’re going to run.
    • Any overrides for provider/variable configuration.
    • Any assert blocks you may wish to implement (more on that shortly).

Note: the run blocks are executed in the order they appear within the configuration unlike other Terraform configuration files.

Simple Variable Tests

Let’s start with a simple example. Say you have some variable validation to ensure that a location variable is either ‘uksouth’ or ‘ukwest’. We can write a test to ensure this validation is working. That way if anyone changes this in the future, our tests will catch it.

# main.tf

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "4.3.0"
    }
  }
}

provider "azurerm" {
  features {}
  subscription_id = "00000000-0000-0000-0000-000000000000"
}

variable "location" {
  type = string
  validation {
    condition     = contains(["uksouth", "ukwest"], var.location)
    error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
  }
}

resource "azurerm_resource_group" "this" {
  name     = "rg-test-demo"
  location = var.location
}

resource "azurerm_storage_account" "this" {
  name                     = "stgaccttestdemo"
  resource_group_name      = azurerm_resource_group.this.name
  location                 = azurerm_resource_group.this.location
  account_tier             = "Standard"
  account_replication_type = "LRS"
}
# location.tftest.hcl

provider "azurerm" {
  subscription_id = "00000000-0000-0000-0000-000000000000"
  features {}
}

run "uksouth_location_allowed" {
  command = plan
  variables {
    location = "uksouth"
  }
}

run "ukwest_location_allowed" {
  command = plan
  variables {
    location = "ukwest"
  }
}

run "westeurope_location_denied" {
  command = plan
  variables {
    location = "westeurope"
  }
  expect_failures = [var.location]
}

Notice here we have an “expect_failures” block. We are intentionally passing in a value that should fail, as it’s important to check things that shouldn’t work, don’t work. Running terraform test provides us the following output:

Successful Test

Everything passed as expected. What if we “accidentally” introduce a typo into our uksouth validation condition (we shall change it to ‘uksout’)…

variable "location" {
  type = string
  validation {
    condition     = contains(["uksout", "ukwest"], var.location)
    error_message = "The value of the variable 'location' must be either 'uksouth' or 'ukwest'."
  }
}

When we run terraform test again, our tests fail because the value ‘uksouth’ is not permitted due to our typo of ‘uksout’ in our validation.

Failed Test

What if we remove the validation block entirely?

Failed Test

Even though we removed the variable validation block, our tests still fail. This is because we’ve told terraform that providing ‘westeurope’ SHOULD fail. When it doesn’t fail (due to the lack of validation), our tests do not pass.

Testing With Real Resources

So far none of our tests actually built any real resources, so let’s change that. Let’s build a quick and dirty Linux web server with a hello world webpage configured. Once built, lets test that it is reachable. First, the VM (etc.) configuration (yes I know… I hardcoded a crap password 😉):

# main.tf

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "4.3.0"
    }
  }
}

provider "azurerm" {
  subscription_id = "00000000-0000-0000-0000-000000000000"
  features {}
}

provider "http" {}

resource "azurerm_resource_group" "this" {
  name     = "rg-test-demo"
  location = "uksouth"
}

resource "azurerm_virtual_network" "this" {
  name                = "vnet-demo"
  address_space       = ["10.0.0.0/16"]
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
}

resource "azurerm_subnet" "this" {
  name                 = "subnet-demo"
  resource_group_name  = azurerm_resource_group.this.name
  virtual_network_name = azurerm_virtual_network.this.name
  address_prefixes     = ["10.0.1.0/24"]
}

resource "azurerm_public_ip" "this" {
  name                = "public-ip-demo"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  allocation_method   = "Static"
}

resource "azurerm_network_interface" "this" {
  name                = "nic-demo"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name
  ip_configuration {
    name                          = "ipconfig-demo"
    subnet_id                     = azurerm_subnet.this.id
    private_ip_address_allocation = "Dynamic"
    public_ip_address_id          = azurerm_public_ip.this.id
  }
}

resource "azurerm_network_security_group" "this" {
  name                = "nsg-demo"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  security_rule {
    name                       = "allow_ssh"
    priority                   = 1000
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = 22
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "allow_web"
    priority                   = 1100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = 80
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

resource "azurerm_network_interface_security_group_association" "this" {
  network_interface_id      = azurerm_network_interface.this.id
  network_security_group_id = azurerm_network_security_group.this.id
}

# Start of Selection
resource "azurerm_linux_virtual_machine" "this" {
  name                            = "linux-vm-demo"
  resource_group_name             = azurerm_resource_group.this.name
  location                        = azurerm_resource_group.this.location
  size                            = "Standard_B1s"
  admin_username                  = "adminuser"
  admin_password                  = "P@ssw0rd1234!"
  disable_password_authentication = false
  custom_data = base64encode(<<-EOF
    #!/bin/bash
    sudo apt update
    sudo apt install -y apache2
    echo '<h1>Welcome to the web server!</h1>' | sudo tee /var/www/html/index.html
    sudo systemctl restart apache2
    sudo systemctl enable apache2
  EOF
  )
  network_interface_ids = [
    azurerm_network_interface.this.id,
  ]
  os_disk {
    storage_account_type = "Standard_LRS"
    caching              = "ReadWrite"
  }
  source_image_reference {
    publisher = "Canonical"
    offer     = "0001-com-ubuntu-server-jammy"
    sku       = "22_04-lts-gen2"
    version   = "latest"
  }
  connection {
    type     = "ssh"
    user     = self.admin_username
    password = self.admin_password
    host     = self.public_ip_address
  }
  provisioner "remote-exec" {
    inline = [
      "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do sleep 5; done",
      "echo 'Cloud init process has finished.'"
    ]
  }
}

data "http" "website" {
  url = "http://${azurerm_linux_virtual_machine.this.public_ip_address}"
}

And our test file…

provider "azurerm" {
  subscription_id = "00000000-0000-0000-0000-000000000000"
  features {}
}

run "website_reachable" {
  command = apply
  assert {
    condition     = data.http.website.status_code == 200
    error_message = "The website did not return a 200 OK."
  }
}

This time when we run our test, it takes a lot longer. This is because Terraform is actually running an apply operation in the background. After a few minutes, we can see our test was successful.

Successful Test

Let’s remove the port 80 rule from the Network Security Group so that the website is unreachable and then retest…

resource "azurerm_network_security_group" "this" {
  name                = "nsg-demo"
  location            = azurerm_resource_group.this.location
  resource_group_name = azurerm_resource_group.this.name

  security_rule {
    name                       = "allow_ssh"
    priority                   = 1000
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = 22
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }
}

Failed Test


Conclusion

These are just a handful of features that are available, but they’re seldom used in the environments I encounter. Why not give them a try and level-up your IaC.

Hopefully you found this useful. If there are any subjects you’d love to see an article on, drop me a note at email@mikeguy.co.uk.