Another long break from blogging whilst life got in the way! Nothing bad, just a busy time with work, an energetic (nearly) two-year old and a house move. I previously posted that I had moved my blog to AWS and that I’d follow up with some more information in a separate post - well here it is!

Static HTML

As I previously mentioned in an older blog (“Moving My Blog To AWS”) I wanted to move to a static HTML site - primarily for the security benefits that came with it and not having to worry about WordPress updates. For creating the underlying HTML, I am using a tool called Hugo - a static website generator written in Go.

I won’t be dwelling on Hugo in this post, but in short it allows you to easily create a website using a number of freely available templates (themes), and then write your posts and pages using simple markdown syntax. Once you’ve initially tweaked things to your liking, created your content and uploaded any images, you can create an entire static website with a single command. All the relevant HTML is then exported into a folder for you upload to the platform of your choice.

My Not-so-DevOps-y Approach

To start with I was generating these files locally and copying them manually to my AWS S3 bucket (more on that in a bit). It worked of course, but it was manual, slow and not something you should really be doing in 2019! Also, if anything happened to my local machine, I would have lost all my customisations and tweaking I worked hard to muddle my way through.

An Improvement

Storing all my files locally wasn’t a great idea in case of laptop issues, so my first move was to start storing everything in GitLab. I would push any local changes from my machine to the project repository where they would be stored in the cloud. This gave me some redundancy and protection in case my laptop died, but still wasn’t very automated - I was still copying files manually into AWS.

I then started looking into integrating GitLab and AWS and got to the point that once I had built the HTML locally, I would push it to GitLab and a pipeline job would automatically sync this to S3 using appropriately restricted credentials.

Removing Local Dependency

The above was way better and much quicker, but I still needed a local copy of Hugo for building the HTML before pushing the public files to GitLab. Wouldn’t it be better if that wasn’t the case and I could create a post using just simple markdown as long as I had access to GitLab? Yes - and that’s exactly what I did.

Now once I commit and push a new post to GitLab, the “runner” (we will come to that) will run a Hugo docker image with a specific command, build the static HTML, output the public HTML as “artifacts” and then sync these to S3 - creating any new files necessary and deleting any I may have chosen to remove.

The Nitty Gritty

Now we’ve covered the high-level approach, let’s have a little look under the hood to see how I am achieving all this wizardry! We will work back from the final piece - the website on AWS - and go through each of the main steps and considerations.

The AWS Website

The actual website files are hosted out of an AWS S3 bucket that is publicly visible. If you’ve got a simple static HTML website, then S3 is a great option for hosting it with high levels of scalability (not that I really need that yet) and data durability.

Static S3 Website

If I wasn’t fussed about using HTTPS and my own TLS certificate, then I could have just switched on “static website hosting” and stopped there (after a DNS CNAME record anyway).

Website Hosting on S3

Of course I did want to use HTTPS (and with a certificate issued under my own domain name), so I chose to use AWS’ CloudFront CDN to front everything. You may also notice in the screenshot above some “Redirection rules” - these are just taking some of the old URLs (from my old WordPress hosting) and rewriting them to the new URLs - just to cover any old cached content on Google for an interim period.

CloudFront CDN

CloudFront is AWS’ global CDN that allows you to get your content closer to your users. Whilst I’d love to be concerned about performance for my readers located on far reaching continents of the planet, this is unfortunately probably not much of an issue for my little old site! However, CloudFront also allows me to front the website with my own HTTPS certificate and then forward the requests on to the backend S3 buckets.

A CloudFront deployment is known as a “distribution” and there are a myriad of settings. My setup is pretty straightforward and I won’t get into the details too much. The basic idea is you provide it with your “origin” (where the actual site data is coming from) and a bunch of other specific settings (such as caching options, certificate details, TLS settings) and CloudFront will “do its thing” to cache your content and serve it speedily to end users at an edge location closest to them.

A point to note on the origin setting - this could be an AWS resource from a dropdown list (such as an existing S3 bucket or an elastic load balancer entry), or it could be any other public FQDN that is directly accessible over the Internet. When I originally set this up, I tried using the S3 bucket option from the dropdown, however I had problems related to the index page that was served up. Under the S3 website settings earlier, you may have spotted an “index document” setting. This setting means that if I request /posts/thispostname/ it would automatically serve up /posts/thispostname/index.html instead.

This setting worked fine when accessing the S3 bucket directly, however when CloudFront made the request on my behalf it didn’t seem to be implementing this behaviour. S3 was looking for the exact document (i.e. minus the /index.html) and failed. I played around with the “Default Root Object” on the CloudFront settings, but this didn’t seem to fix it. In the end I didn’t select the S3 bucket integration as the origin - instead I manually typed the S3 public FQDN (mikeguy.co.uk.s3-website.eu-west-2.amazonaws.com) and this did the trick (though introduced a couple of limitations/restrictions).

I didn’t really setup any specific caching settings but did tweak some HTTP/HTTPS settings and selected a custom certificate - pulling this in from AWS Certificate Manager instead of my previously used LetsEncrypt integration. Obviously it’s important my backend HTML files reference “https://mikeguy.co.uk" so that the browser doesn’t try to connect to a bunch of S3 URLs - this is achieved easily when building the site in Hugo.

AWS IAM Permissions

One of the most important things you can do with any public cloud solution is to get permissions correct and ensure you are providing access in line with a “least privilege” principle. Whilst this is only my personal site, I still followed this approach as I was going to have an external service (GitLab) connecting into my AWS environment.

I setup the following components:

  • An IAM Policy - This defines what a user/group can do within the AWS environment. I need to allow for GitLab to run my CI/CD pipeline jobs so gave the policy the bare minimum permissions required to do this. This policy ties too…
  • An IAM Group - There was only going to be one user here anyway, but may as well set it up to be scalable anyway! This group had…
  • An IAM User - The user and the associated credentials that GitLab will ultimately use to run pipeline jobs and make changes to the AWS environment. This was setup with programmatic access only.

The IAM policy syntax can easily be screwed up by new users until you are used to it. There are visual editors you can use to try and make things easier, but just make sure you do your homework and know what you are allowing access to before applying any policies. My policy (with a few values redacted) is shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::mikeguy.co.uk/*"
            ]
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": [
                "cloudfront:CreateInvalidation"
            ],
            "Resource": [
                "arn:aws:cloudfront::<ACCOUNT_NUMBER>:distribution/<DISTRIBUTION_ID>"
            ]
        },
        {
            "Sid": "3",
            "Effect": "Allow",
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::mikeguy.co.uk"
        }
    ]
}

This is broken down into the following three main permissions:

  1. Allow PUT, GET and DELETE actions on the public S3 bucket listed as a resource.
  2. Allow a CloudFront “invalidation” - this is basically to flush the cache when new content is added/deleted - there are smarter ways of doing this though and is a quick workaround.
  3. Allow listing of the S3 bucket in question (required for aws sync).

AWS S3 Bucket Policy

AWS has done a lot over the years to lock down public S3 buckets to the point you have to intentionally make it public by tweaking “Block public access” settings and creating a bucket policy. This is great for improving overall S3 bucket security (leaky buckets have been far too common unfortunately) and just means we have one more step to carry out. We need to add a bucket policy that will allow both GitLab and CloudFront the relevant access they need to S3.

Originally in my policy, GETs were restricted to the specific CloudFront ARN as I was using the S3 integration, however once setup I changed this to a “standard website” (the index.html issue) I had to treat CloudFront as a standard web user and open access up - for my use case this was not a problem though. My bucket policy is shown below - the syntax is virtually identical to the IAM policy, so you should be able to work out what it is doing.

{
    "Version": "2008-10-17",
    "Id": "PolicyForMikeguyWebsite",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::mikeguy.co.uk/*"
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<ACCOUNT_NUMBER>:user/<IAM_USER>"
            },
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": "arn:aws:s3:::mikeguy.co.uk/*"
        },
        {
            "Sid": "3",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::<ACCOUNT_NUMBER>:user/<IAM_USER>"
            },
            "Action": "s3:ListBucket",
            "Resource": "arn:aws:s3:::mikeguy.co.uk"
        }
    ]
}

GitLab Repo And Pipeline

Now we move to the GitLab side of things. Within GitLab I have a project setup for my website. This contains all the relevant Hugo files I need to be able to generate my static site - such as settings, HTML templates, images, markdown files etc.

GitLab Project

In addition to this I have a special file - gitlab-ci.yml. This is a special file that tells the GitLab CI/CD pipeline workflow what to do when I commit and push changes to the repository. Inside this file we have the following code:

stages:
  - build
  - deploy

build:
  stage: build
  image: registry.gitlab.com/pages/hugo:0.56.3
  script:
  - hugo -d public_html -b https://mikeguy.co.uk
  artifacts:
    paths:
    - public_html
  only:
  - master
  
deploy:
  stage: deploy
  image: python:latest
  script:
  - pip install awscli
  - aws s3 sync public_html s3://mikeguy.co.uk --delete
  - aws cloudfront create-invalidation --distribution-id <DISTRIBUTION_ID> --paths /posts/ /posts/*
  only:
  - master

This is the “DevOps Magic” that makes solutions like GitLab so powerful. The code above is telling GitLab to spin up a shared “runner” (think of this as a cloud VM that has access to your project repository) which then runs a bunch of tasks in a particular order. Typically, you would have more than just “build and deploy” stages in a production environment but let’s look through my simple use-case.

Build

  • image: - Within the build phase we are telling GitLab to spin up the “jguyomard/hugo-builder” Docker image. We won’t get into Docker in any detail, but just think of it as an image containing all the code Hugo needs to run.
  • script: - What script to run inside that Docker image. These are just standard Hugo commands and, in this instance, I’m passing the commands that say Hugo needs to build the HTML with the files in the repository (hugo -b https://mikeguy.co.uk") and that the destination for the output is “public_html” (-d public_html). This special destination is also declared below as an artifiact…
  • artifacts: - Artifacts are basically output from a job that can be passed to other parts of the pipeline. Think of “public_html” as a special folder that Hugo outputs all the static HTML to. We then pass this artifact over to the deploy job where it is used accordingly.
  • only: - We are telling GitLab to only run this for our master branch

Deploy

  • image: - Same concept as above, but we are using a different image. This time it is a lightweight python image.
  • script: - We install “awscli” (AWS’ Python module) once the image is running, then use the awscli syntax to sync the “public_html” artifact from the previous build stage to the S3 bucket (deleting any files if they’ve been removed - not a default). Finally, we create a CloudFront invalidation to refresh cached content.

Once a commit is made, GitLab runs through these steps (build then deploy) and makes the relevant API calls to AWS so it can push the data where it needs to be. How does it have credentials to do this? The AWS access key and secret key are passed to the runner as environment variables, courtesy of the project’s CI/CD settings…

GitLab CI/CD Settings

This removes the need for any manual intervention without having to hardcode any secrets into scripts.

Integrating With My Laptop

Finally, I have the ability to author files on my laptop and push them to GitLab programmatically. I prefer to do this using git CLI and the relevant commands, but there are GUI options available. Without turning this into a git tutorial, typical commands that I use for updating my website are:

  • git pull - Pull the up-to date repo and any changes from GitLab.
  • git add - Add any changes I’ve made locally to a staging list ready to be committed. This allows me to work on multiple things locally, but only push the ones I want to commit.
  • git commit -m “New blog post!” - Commit the changes locally with a message.
  • git push - Finally, push the committed changes in my local repo to the GitLab repo. Once this is done the CI/CD pipeline kicks off and the magic above happens.

Alternatively I could integrate this with an IDE such as Pycharm and use some quick keyboard shortcuts to achieve the same thing. And of course, one of the benefits of GitLab being cloud-based is that if I wanted to, I can write a post just in markdown using a text editor, login to the GitLab website from anywhere and post a blog without any special software - I just need my authentication credentials.

Summary

What I am doing is pretty basic in the grand scheme of things, but it is a good way to start playing around with these things and getting used to them. I’m sure there are better ways of doing some of what I’ve outlined above (I know there are for certain things) - but it is a start!

If you’re not already playing around with AWS and tools such as GitLab then I’d strongly recommend you do. With generous free-tier offerings this is easy to do at virtually no cost - just your time.

If you want to know more on the approach I use, need any advice or want to let me know of better ways to do certain aspects of this, then please drop me a message via LinkedIn, email or Twitter - I’d love to hear from you.