ECS Cluster Environment

Dependencies

State file bucket/config

You must have some way of storing terraform state files. We use and recommend the s3 backend, but you can change that configuration. See ./environments for examples of configuring backend.

We store our terraform state in an S3 bucket created by the ./environments/global_aws directory, which has an interactive setup, see that readme for more info.

Change management and workflows

This code is here to make infrastructure declarative rather than imperative. It secondarily includes modularization to make it hard for configuration to drift between preproduction / production or open source deployments. These are two separate concerns.

Declarative code changes are still managed imperatively with terraform apply, which can be made partially or fully automatic.

In general, production changes are applied manually after we are satisfied with preproduction, which may or may not be automatic. Developers should expect a flow like:

make a change to a shared module code
make matching change to configuration in ALL environment directories, so they can be reviewed together
apply this new SHA to staging and do validation as desired
apply this same SHA to production.

Now there is no drift between code, staging, and production - we are converged.

Rollbacks

Generally, rollbacks are done in emergencies and are done first in prod. (If done first in staging, this is really no different a process than a roll-forward). Rollbacks are the only situation in which we should expect to deploy production from an off-main branch. Changes may be infrastructure or code. In the infrastructure workflow:

make changes to terraform code that seems to fix the issue and apply it to production
if it resolves the issue, figure out how it needs to be applied to pre-prod for consistency and open a PR
when this PR is merged, it deploys to pre-prod and we are converged.

In general, code rollbacks can be done without a re-build, by deploying an old SHA, but it is preferable if there is time, to do a revert & roll-forward flow, because some operations (primarily database migrations) operate on assumptions of monotonic time. Additionally this flow makes it easier for rollbacks to include reverts of specific changes in the middle of the commit history without reverting everything more recent.

Adding/updating variables and configuration

Variables are the things that distinguish one environment from another. These include container variables and certain extra values such as infrastructure scaling / footprint parameters. There is a tradeoff between ease of configuration change and strength of guarantees given by similarity between staging and prod. First decide if your change should be applied identically to each environment, or warrants an increase in drift.

To add a variable, modify some terraform resource that depends on it and then thread your way back up. The most common case will be to add an environment variable to a container so will use that as example here:

modify modules/deployment/main.tf to add a variable to the appropriate invocation of container-generic.
modify modules/deployment/variables.tf to add the variable declaration. (This step is not needed if your new env var can be computed based on changes to the upstream infrastructure, such as a database URL.)
modify each invocation in environments/*/main.tf to add this new variable.

Proceed as above. Note that changes to task definitions (which include container configs) are not actually applied until you then trigger a new deploy using act/mask or the Github console.

Adding secrets

Secrets are a special variety of environment variable, whose process is just like the above but with an extra step after terraform apply and before mask ecs deploy:

To provide secrets to ECS containers, you should put them in AWS Secrets Manager. To do this, replicate the setup in modules/core-services/main.tf: create a resource that declares the existence of the secret. Since the purpose of this model is to avoid having copies of the secrets exist anywhere persistently except the single locked-down place, naturally the secret value itself (the “version”) can’t be passed through terraform.

So you must one-time only, or when changing the secret,

go to the AWS Secrets Manager console
and choose your new secret
select “Retrieve secret value” (unintuitive, because there is no value yet)
Console says “Value does not yet exist” and button you just clicked becomes “Set secret value”.
Probably, paste your secret in the “plain text” box (you can also do key-value pairs, but then must use the key in the address when retrieving.)