Some companies might need to transfer data from their data centers to AWS. Using a normal internet connection can raise some problems:
- Internet-based connections might be inconsistent over time: there is no guarantee that data transferred from point A to point B will not change the routing. Real-time applications might not behave as expected in such environment.
- High latency: Send data over the internet might lead to a higher latency, depending on how many hops the data have to go over to reach the final destination.
- When large data sets need to be transferred to the cloud companies will have to use the bandwidth that the rest of the company is using to perform other tasks. Beyond having the bandwidth of other areas of the company compromised, sometimes the current internet contract bandwidth is not going to be enough and will require a new contract to be negotiated. These contracts usually have a minimum commitment and will be more expensive that what the company is already paying.
To solve this problem Amazon provides Direct Connect, a service that let us establish a dedicated network connection from our premises to AWS.
When using Direct Connect we pay for data transferred over the dedicated connection on-demand. This data is charged at the Direct Connect transfer rate, which is cheaper than the Internet data transfer rates. Another advantage of using Direct Connect is the data’s routing will not change, ensuring a more stable connection and low latency numbers over time.
To use Direct Connect we must create a virtual interface. There are two types of virtual interfaces:
- Private Virtual Interfaces: are used to connect with internal IP addresses inside AWS EC2. It is a dedicated private connection which works like a VPN
- Public Virtual Interfaces: used to connect with AWS public endpoints, such as S3. It requires a public CIDR block range.
CloudFront is a global CDN service that allows us to accelerate the delivery of web assets. This is possible because CloudFront sends the content from the origin to the user’s nearest edge location, reducing the latency of the request.
An origin can be an S3 bucket or an Elastic Load Balancer (ELB). Since the ELB DNS name is not that easy to remember we can integrate CloudFront with Route53, allowing the company to deliver content in a URL such as http://cdn.mycompany.com.
CloudFront is designed for caching. When a user requests data and there is no cache on the nearest edge location to be retrieved, CloudFront goes to the origin, retrieves the data and store it into the cache. The data will be available until the cache expires, which means that new requests made from users whose nearest edge location is that one will always return the cached file.
Caching objects for longer periods of time increases performance since the origin will be free to perform other tasks. We must be careful, though: put a new version of a cached object will require us to invalidate the current cache. Invalidations are charged by Amazon and can be quite expensive if we want to invalidate too many objects. In such situations, it might be better to create a new distribution and change the DNS on Route53 to the new location.
Route 53 is a DNS provided by AWS to be high available and scalable. It allows developers to:
- Register domain names: allow us to register a name to a website, such as mycompany.com. The available domains are listed here.
- Route internet traffic to the resources for your domain: allow users to get our content when the domain we registered is open on the internet.
- Check the health of our resources: sends automated requests to guarantee that a resource is still up and available.
After we have a domain registered, we will create a hosted zone. A hosted zone is a container that holds information about routing traffic to the domain and its subdomains. Inside the hosted zone we’ll create many Record Sets. Record Sets have the following properties:
- Name: the subdomain name we want to configure. If we leave it blank it will configure the domain name. (e.g. leave it blank, which considers mycompany.com)
- Type: the type of our record set. The full list can be found here. (e.g. A — IPv4 address)
TTL: how long a record set is cached by DNS resolvers. (e.g. 1 minute)
Value: the value the name will be translated. It is based on the type selected. (e.g. 192.0.2.235)
- Routing Policy: which routing policy should be use to this record set. (e.g. Simple)
When choosing the TTL value it is important to notice that higher values will make future changes take longer to propagate. Choosing the right routing policy it’s important as well. The following Routing policies are available:
- Simple: as the name suggests it will make a simple route to the value informed on the record set.
- Weighted: It will send requests to the record set based on the given weight. A weight can have a value between 1 and 100. The higher the weight the higher the probability to be routed to that record set. Two record sets with the same name, defined with a weighted routing policy of 70/30 will receive 70% and 30% of the requests, respectively. It can be used as a failover solution as well. In this case, we can put the weights as 100/0. If Route 53 cannot deem the record set with weight 100 healthy it will automatically send the requests to the one with weight 0.
- Latency: goes to the region with less latency available. It also can behave as a failover solution when the other record sets aren’t healthy.
- Failover: used as a failover. We can define the primary and secondary record sets. If the primary is not responding anymore the traffic is routed to the secondary.
- Geolocation: ensures the request that comes from a specific location is routed to the correct place. This makes possible to be in compliance with laws that requires the content don’t leave a specific region, being the European Union an example of it.