AWS EC2 Container Service — an overview

Amazon EC2 Container Service (ECS) is a container management service used to make it easier to install and operate Docker containers on AWS. It is designed to be highly scalable and to have high performance as well. As other services provided by Amazon, such as RDS, ECS handles fault-tolerance by itself, which means we don’t have to worry about that when designing our infrastructure using ECS.

Clusters

Clusters are a group of containers instances. ECS will create a default cluster when we run it for the first time, to make sure we have an environment to get up and running quickly. A cluster can have many container instance types, which means that we can perform different tasks inside the same cluster. An account can have many clusters at the same time, but a container can only be part of one cluster at a time.

As happens with other AWS services, a cluster belongs to a specific region. When creating a new cluster we must configure the following properties:

EC2 instance type: what instance should be used inside the cluster. This will affect how many tasks we can run inside the cluster.
Number of instances: how many instances should be launched. The instance is launched using the ECS AMI.
EBS Storage: the amount of GiB we want to launch our instance.
Key pair: the key pair to access to connect via SSH. It can be left empty, but this will make the instance inaccessible from SSH in the future.
Networking: the VPC we want to launch our cluster. By default, a VPC with two subnets in different AZs is created.

Containers Instances

Containers Instances are EC2 instances that run the ECS agent and are registered into a cluster. They should have a proper IAM role with the required permissions since they make calls to the ECS service.

Instances should not be “transferred” between clusters: if we want to register an instance into another cluster it is recommended that we terminate the current instance and create a new one. This is important because the container instances stores unique information between the instance and the ECS. The same applies when trying to change the container instance type: the right way to do it is also terminate the current container and launch a new one.

When ECS register an instance to a cluster it will update the status to ACTIVE and the agent connection status to TRUE. This will make the container instance available to run tasks sent by ECS. Stopping a container will update the agent connection status to FALSE, which will make the container stop running tasks. Deregistering a container instance will change the status to INACTIVE. This change will not report the container anymore when listing container instances.

Task Definitions

A task definition, as the name implies, defines the tasks that run inside the container. Task definitions are required to run Docker containers inside ECS. The following parameters are the most common when configuring a task definition:

The docker image to be used when running the container.
The CPU and memory usage limits.
The ports mapped from the container to the instance.
The command the container should run when started.
The environment variable to passe to the container.
The volumes that should be passed to the container.

The parameters above represent what we usually pass to Docker when running a new container. One of the main goals of a task definition is to not repeat ourselves when launching new instances. Instead of having to configure a script with the desired parameters, or to copy and paste it in the console to launch another container, we have a task definition that knows how to launch new containers with the desired configuration.

These are not the only configurable parameter on a task definition. We can configure AWS specific features, like AMI roles, and so on. A list of all parameters can be found here.

Services

An ECS service allows us to maintain a number of instances of a task definition inside a cluster. A service is important to mantain the desired number of instances running: if a task fails or stop for any reason the service scheduler will launch another instance as a replacement, mantaining the same number of instances active.

When launching new tasks, a service will try to balance them across Availability Zones. The balancing strategy consider the following criteria:

Determine which of the container instances can support the task definition. The service will look if the container instance have enough CPU, memory, available ports and so on.
With the containers instances filtered by the previous step, the EC2 Service sorts them from the least used AZs to the most used one. The AZ with fewer containers has priority over the other ones.
After figuring out the best AZ the EC2 Service will then place the task in the optimal place, considering both the AZ and the container instances regarding running tasks (the fewer the better).