GitOps Superpowers

Code changes that show up on your cloud bill!

May 03, 2021

I've been using GitOps for the past few years and I wanted to reflect briefly on what I've learned, how it helps, and how it breaks down.

The term GitOps was developed by Weaveworks in 2017. It's a technique where you put the desired state of your system in a version controlled repository. When changes are made in the code, those changes get applied to your system automatically (they are "deployed"). There are lots of tools that implement this, including Weaveworks' own FluxCD tool, Terraform Cloud, and Vercel.

For a deeper dive on GitOps, gitops.tech is a great place to start.

Since I started using it, GitOps has made it considerably easier for me to build. Considerations like governance, tracking changes, and testing are much less daunting. Generally, what I like to do now is require changes to get a peer review before being accepted. For more sensitive environments some providers will let you assign ownership to specific parts of the system, so that those owners must be the ones to approve changes. This is great for enforcing the "four eyes principle" - that changes can't get made by a single person. Sometimes it can be good to define the desired infrastructure alongside software, as with Vercel or many projects that use Helm. In other cases it's much better to avoid this, such as when you're using GitOps to manage a cloud account.

All the benefits of defining systems with code roll up to the GitOps approach. Changes in the repository can be tested automatically, you can write reusable libraries, you can try your changes in test/QA environments before accepting them. The nice thing here is that you can accomplish this using git branches! Your CI tool probably has prebuilt jobs that can cover many of these use cases.

Unfortunately, not everything can always be defined declaratively and deployed automatically and on time. There are a lot of factors that could drive you back to "ClickOps". The most obvious one is when the code doesn't support a resource you need (e.g. there is no Terraform implementation for what you're doing.) Two different problems arise when this happens. The first is that the steps you took manually all add up to a debt, and they need to be taken again if you ever need to redeploy. They need to be documented, and they need to be automated. Failing to document them is very bad, and the task of automating them has a way of languishing in the backlog. The other problem is that you have chipped away at one of the benefits of GitOps. We want the team to trust that the source repository describes the desired state very well. If we have enough manual workarounds tacked on to that desired state it becomes much harder to look at the source repository and understand what's going on.

In addition to the problem of manual workarounds, there can be some other challenges with GitOps having to do with bootstrapping and kickstarting.

Bootstrapping a GitOps-managed system can be the cause of a lot of navel-gazing and Rube Goldberginess. There is usually no practical reason to automate every aspect of the bootstrapping, and it's turtles all the way down so why bother? (I have a strong urge to automate all the things, so this is mostly advice to myself.) An illustrative example: You may want to make an AWS Account Factory, but you wouldn’t write a puppet to automate the root account provisioning.
It can sometimes be necessary to "kickstart" resources as well, wherein we delete something that's in the desired state definition, like a container or storage volume, so that the automation pipeline can detect the missing resource and recreate it. There are a lot of different reasons this can end up being needed, often tied to implementation details that are at a lower level of abstraction than you're dealing with. (Immutable infrastructure methodologies seem to have a bunch of related problems here.)

Despite the challenges, I don't think I'll be transitioning out of GitOps any time soon, and in fact I want to manage more things with GitOps. It feels faster to click through the AWS dashboard and make an S3 bucket, but it's fast in the same way writing code without tests feels like saving time. I think there's more opportunity for data and analytics resources to be managed by GitOps, and treating configuration like data opens up all kinds of new possibilities.

Cover Photo by Zach Reiner on Unsplash

The Slip Box

Ready for more?