Note: Clone the project repo to follow along...
So far in this series we've learned the fundamentals of Amazon's Elastic Container Service, containerized a simple Node.js application, and deployed it to the cloud. In the final article of this series, we'll eliminate the toil of building and maintaining ECS infrastructure by automating everything we've learned using Terraform.
Before diving into Terraform, the first thing we'll need is a "container definition" to feed the
aws_ecs_task_definition resource. The good news is we've already built this while working through the manual configuration of our task.
By simply trimming off everything in the task definition except the
containerDefinitions list, you'll have all you need. The initial slog of figuring out the task definition JSON is paying dividends! This is just reusing a portion of an existing file, and the point of this article is the actual Terraform, so I'll simply link to the example from the project repo. Since the first article I did a bit of clean up – removing unnecessary values (nulls, empty lists) and templating a few more things to make reuse easier.
NOTE: To work through the entire demo deployment, you will need to modify one line in the container definition – the path to the Parameter Store secret. You also need to add the secret to test retrieval. In your AWS account, go to Services > Systems Manager > Parameter Store > Create parameter. For anything sensitive, always use
SecureString. If you use the same path and name, you will only need to insert your AWS Account ID in the container definition. Otherwise, adjust as needed.
To make this simpler for anyone to test drive, we'll use the default VPC and subnets that come with new AWS accounts. If you've deleted those, you can use Terraform's AWS VPC and subnet resources (or a module of your choice) to create space for our example.
A past series went into a lot of detail around creating network resources from scratch. Rather than rehash that I want to cover another common scenario – using data sources to discover existing network infrastructure. Beyond account defaults in this case, you will often have shared VPCs, subnets, NAT gateways, etc. that you can consume rather than having to re-create for each service.
aws_subnet_ids data source gives us a list of subnets matching specified criteria we can use elsewhere in our configuration. We'll use the private subnets to house our ECS tasks. Here simply using the
vpc_id gets the job done, but a common practice is using tags to make selection of appropriate resources intuitive.
Before we tackle ECS itself, we need to address IAM. When deploying manually, we leveraged the default
ecsTaskExecutionRole and fixed it up to allow access to Parameter Store and Secrets Manager. At the time it was easy to (ab)use, but we called out the best practice of using service-specific roles. As part of our automation, let's have Terraform manage any roles and policies for us:
While this is still wide open (we could further limit Parameter Store access to specific paths), it gives you a starter recipe for automating a fully functional service. Refer to the policy examples we ran through previously if you need to grant Secrets Manager access instead.
With network details gathered and IAM squared away, we're ready to take care of ECS. As you'll recall from previous articles, we need to create an ECR repository our ECS tasks can access. We'll also attach a lifecycle policy to our repository to avoid old images building up and wasting space.
Since we are using container insights and the
awslogs driver, when we manually created the ECS service we had to make sure we created the CloudWatch Log Group or our service wouldn't start. Now we can let Terraform manage that for us.
To make the ECS-specific bits more modular, a number of variables are used. These are referenced directly by our Terraform resources and exposed within the container definition via
templatefile. Aside from the service name, region and port, the ECS CPU units, memory reserve and hard limit are configurable. This is tunable enough for most services without overwhelming the operator with excess detail. Finding the right balance reduces cognitive load for others using your automation.
We leverage a lot of defaults in the service configuration, but do pull in the subnets discovered above and expose instance details. For our simple case we'll run a single task instance, so use
instance_percent_min = 0 and
instance_percent_max = 100. In the real world we could increase
instance_count and adjust the percentages as needed so we can use rolling updates to avoid downtime.
Refer to the project repo for the fully functional Terraform... Just adjust tfvars as needed, then you can run some tests in your account (refer to the README for specifics on configuring Terraform for use with AWS)!
If we curl the public IP of the deployed task on our container port, we can see the secret retrieval working (via our IAM role) as it exposes a value from Parameter Store – this is obviously only for the sake of example. Never expose secrets, including in logs!
Counting variable definitions and outputs, we've managed to automate away the toil of manually managing ECS-based services in less than a few hundred lines. Rather than simply recreating the exact service we deployed by clicking through the AWS console, we iterated and improved security though a service-specific IAM role and used lifecycle management to further reduce toil. Beyond the initial build, this gives us a framework we can use to continue extending our service, ensures consistency as we go, enables reuse when building similar services, and acts as documentation for ourselves or future team members – all the advantages of Infrastructure as Code.
That continues in the spirit of our minimally viable example service... In the real world you would likely have additional network configuration (perhaps an ALB fronting several tasks), more containers to manage (additional services, sidecars for monitoring or security), backing services to prepare, etc. You can keep adding these things yourself, but as the complexity grows you'll want to consider modules. Whether you consume modules from the Terraform Registry, GitHub authors you trust or create your own... they'll let you avoid copying and pasting code, further ensure consistency, make reuse even easier, and allow you to build increased confidence in shared components.
Hopefully this is enough to get you started toward the nirvana of running containerized services on AWS ECS. Terraform makes the initial infrastructure build and maintenance a breeze. Once your MVP is live, you can continue shipping updates with just a few commands... It's just a matter of building a new image with your code, pushing to ECR, and updating the ECS service to pull in the latest change. That's too much to cram in here, but for an example of how it could work refer to the do script.
See you next time!
- Managing Secrets for ECS Applications
- Modifying Account Settings (if you get tagging-related errors about ARN format)
- IAM policies for ECS Secrets Access
This is the final part of a multi-part series, jump to part one: