Amazon Virtual Private Cloud (VPC) allows us to define our virtual network inside the cloud.
To properly use EC2 instances we must have a VPC configured. AWS already comes with one by default, allowing users to launch an EC2 instance without having to configure anything else. It comes with internet access enabled, which makes possible to update and install applications on the instance. Although the default VPC comes in handy and works just fine it is highly recommended to build our own from scratch when we are in a production environment. The following sections detail the components used on a VPC.
Subnets and route tables
A subnet is a range of IPs inside the VPC, configured using an IPv4 CIDR block. Each subnet must be associated with a route table. A subnet can have only one route table, but we can add the same route table for as many subnets as we want. Every route table has a local route, which cannot be deleted. The local route is used for communication within the VPC.
A subnet must belong to a specific Availability Zone (AZ). Each subnet can have only one AZ and it’s recommended to create more than one subnet in different AZs to avoid a single point of failure in a production environment.
A subnet can also be considered private or public. What determines its “visibility” is whether it has an Internet Gateway attached or not: if it has it is considered public, otherwise, it is private. As happens with route table, each subnet can have only one Internet Gateway attached to it. We should use private subnets to host instances that do not need to communicate with the internet, improving the security of our environment.
NAT Instances and NAT Gateway
EC2 instances running inside private subnets might have to access the internet in some specific cases. Imagine that we have to update the software inside our instance: although it shouldn’t be accessible from the internet (receive inbound traffic), we still want to allow the instance to get the latest versions available on the web (send outbound traffic). The problem is that we don’t have an Internet Gateway attached to our route table. which means that our subnet won’t recognize an IP from outside our VPC. The solution to this problem is to use a NAT device.
A NAT device will forward the traffic on the private subnet and send the response back. Attaching a NAT device on the route table instead of an Internet Gateway allows the subnet to remain private, which means that it still won’t be accessible from the internet. There are two types of NAT device that can be used on AWS right now. NAT instances and NAT gateways.
A NAT instance is an EC2 instance with a specific AWS AMI running on it. We can choose any instance type and size, but they will affect the availability and bandwidth: we can use a t2.nano to serve as our NAT and it will work, but it will result in a very slow connection with the internet, especially if we have many instances making requests at the same time.
Having an EC2 instance to run a database were very common, which made AWS launch RDS. The same thing happened to NAT instances: since many people used this feature, AWS launched a specific service to accomplished that, which is called NAT Gateway. A NAT Gateway, just like RDS, will take away responsibilities from us: Amazon guarantees the availability for us. A NAT Gateway can burst up to 10 Gbps of bandwidth, which is usually enough to maintain a subnet properly working. AWS recommends the use of NAT Gateways over NAT instances. They offer a comparison between them if you want further details.
Security Groups are used as a firewall in front of an EC2 instance. They are used to define what kind of traffic is allowed inside our instance as well as the source of the traffic. An instance can have multiple security groups, limited to 5 per instance. It only supports allow rules, which means that anything that is not explicitly defined as an allow will have the access denied. Security Groups are also stateful: if a request is allowed to come in it will be allowed go out and vice-versa.
When creating a new Security Group all outbound traffic is allowed by default, but we can change it to be more restrictive. We can choose which ports or range of ports will be allowed, as well as the source of the traffic. The source of the traffic will be an IP or another Security Group. If the source is another Security Group any component that has the given Security Group attached can send requests in the ports that were specified.
Network Access Control List (ACL) is an optional layer of security that operates at the subnet level, which means that traffic blocked on ACL will never reach the instances. Unlike Security Groups, that only support allows rules, ACLs supports both deny and allow rules. Another difference when comparing them with Security Group is that they are stateless: if we have a rule allowing something to go inbounds we must have a rule allowing the traffic to go outbounds as well. If we don’t have both rules available the traffic cannot leave our subnet.
ACL rules are processed in ascending order: each rule will have a unique number and the lowest number that matches the traffic sent will be executed. This allows us to defined a rule that behaves like an exception when specific cases apply.
A common use of this feature is to block access from an IP that is making malicious requests to the subnet. In this scenario, we would define an allow rule that would be valid for everyone (In CIDR notation this is expressed as 0.0.0.0/0) and put an asterisk as the role number. An asterisk makes it the default option that will be executed when no other rule was matched.
After that, we specify a deny rule to an IP that is causing us trouble and give it a number to identify the rule execution priority. From now on requests made from that IP will match the deny rule first, which will block them from entering the subnet. All other IPs will execute the default rule that allows traffic to go into the subnet.
A good practice when defining the number of the rule is to leave a gap between each of the rules. This makes the infrastructure easier to maintain when we have to add a rule that has to execute between two rules that already exist. In a scenario where rules A and B are defined as 1 and 2, we cannot add rule C in the middle of them, we would need to change the priority of all of them to make it work properly. If rule A was defined as priority 10 and rule B as priority 20 we could add rule C as any number between 11 and 19 without changing anything we have done previously.
When designing a production environment we often have to communicate between different VPCs. A company that has different AWS accounts for each team is a classic example: on of the teams needs to access an EC2 instance of the other team, but the machine is located on a private subnet and can only be accessed through the VPC. To make this communication happen we need to set up a VPC Peering.
A VPC peering creates a direct connect between two VPCs. The VPCs don’t need to be on the same account, but they cannot have a conflict in CIDR block ranges since it would not be possible to distinguish to which subnet a request with a conflicted IP must be sent. Another thing to keep in mind is that only VPCs inside the same region can peer.
VPC Peering is not transitive. Imagine we have three VPCs: A, B, and C. If we peer A with B and B with C can we access EC2 instances from VPC C on VPC A or vice-versa? The answer is no, the only way that two VPCs can communicate is if they are explicitly connected.