IP Adress Failover for highly available AWS Services

Abstract

This documents describes the implementation of high available failover services for applications, which rely on IP address based communication only.
It shows how to configure IP addresses in a Virtual Private Network (VPC), which will route network traffic to a node A or an alternate node B as needed. The document describes how to change the setup of a VPC in case of a failure of node A. The document shows step by step how to automatically assign a service IP address to a standby node B when needed.
The document outlines two different technologies to achieve the same purpose. This allows the implementer to pick the technology that is most suitable for a given infrastructure and the switch over requirements.

Introduction

High available failover architectures are based on a concept where consumers reach a service provider A through a network connection. The core idea is to reroute the consumer traffic to a standby service B when the initial service A fails to provide a given service.
Amazon Web Services provides many building blocks to achieve the purpose to failover network consumers to a new network service. The following solutions are commonly used, they are however not subject of this document:

  • AWS Elastic Load Balancers (ELB) in conjunction with AWS Autoscaling allows rerouting traffic to other service providers when needed. AWS Elastic Load Balancers support many other protocols beyond http and https. They may however not work with some proprietary and legacy protocols.
  • Domain Name Service (DNS) failover with AWS Route 53 : This approach allows redirecting network consumers which lookup services by name to get redirected to a different network service provider. This concept works well with software solutions, which use name resolution to reach their service provider. Some legacy applications rely however on a communication through IP addresses only.

This paper focuses on network consumers which need to reach a service through a given, fixed IP address.

The two solutions work for any protocol. Both solutions require that a given network consumer can reconnect to the same IP address if the original service hangs and times out.

Amazon Web Services (AWS) offers two solutions to failover IP addresses, which should be chosen, based on the network and high availability requirements.

The first solution is based on the fact that AWS manages IP addresses as separate build blocks with the name “Elastic Network Interface” (ENI). Such an ENI hosts an IP address and it can be attached and detached on the fly from an EC2 instance. This allows redirecting the traffic to such an IP address by detaching and re-attaching ENIs to EC2 instances. The limitation of this concept is that it is limited to a single availability zone (AZ).

An IP address has to be part of a subnet. And a subnet has to be assigned to a given AZ. Attaching the same ENI to instances in two different AZs isn’t possible since a given IP address can belong to one subnet only.

This limitation may not be important for some high availability solutions. There may be however the need to leverage the key features of availability zones by running failover instances in two different availability zones.

The second solution is overcoming this limitation with Overlay IP addresses. AWS allows creating routing tables in a VPC which route any traffic for an IP address to an instance no matter where it is in the VPC. These IP addresses are called Overlay IP addresses.

The Overlay IP address can route traffic to instances in different availability zones. This comes with the challenge that the Overlay IP address has to be an IP address that isn’t part of the VPC. The general routing rules wouldn’t work otherwise.

On premises network consumers like desktops who try to access such an IP address have to be routed to the AWS VPC knowing that the Overlay IP address is not part of the regular subnet of the VPC. This leads to the extra effort to have to route on premises consumers to the AWS VPC with an additional subnet which isn’t part of the VPC itself.