pabis.eu

NAT Instance on AWS from scratch

11 June 2023

It's obvious that among the best practices in terms of security is to deploy some of your instance in a private network, where the outside world can't access them directly. They should be hidden behind a load balancer or some other instance. However, that prevents the instance from accessing the internet as well.

In AWS this can be solved in multiple ways. First of all, you can use a NAT Gateway. It's a managed service that does hide instances with private IPs behind a common gateway with a public IP. It's a great solution, highly-available but it's also quite expensive.

NAT Gateway pricing vs cheapest EC2

Another possibility is to enable IPv6 in our VPC, assign IPv6 addresses to our instances and route them to Egress-only Internet Gateway. This solution is even better as we don't pay for the NAT Gateway, it's simpler but it works only on IPv6 stack.

But what if we need to access an IPv4-only resource over the internet and we don't care about high-availability - for example a private RPM package registry? To keep things working in IPv4 and cheap at the same time, we can use a NAT instance. This is just an EC2 instance in a public subnet that forwards traffic between public internet and private subnet. It works the same as NAT Gateway but it is our responsibility to keep it running, secure and make it robust and reliable.

Today we will implement such an instance and configure our example VPC to use it. The draft of the architecture is seen as follows: one (NAT) instance in public subnet with assigned public IP will be guided by route table to forward the default traffic to the Internet Gateway from it's primary network interface. Next up we will add another network interface to this instance that will be associated with the private subnet, giving our NAT instance two private IPs: one in the public subnet on the network card #0 and another in the private subnet on the network card #1. In the private subnet we will disable public IP association and create a custom route table for it that will route the default traffic to the second network interface of the NAT instance. All the instances in the private subnet will then follow the routing rules and hit the NAT instance. The diagram of this description looks like this:

NAT instance architecture

Repository for this post is available here: https://github.com/ppabis/nat-instance

Building the draft VPC

Let's start from the top of the diagram: the VPC and the Internet Gateway. In Terraform it's a matter of few lines of code. We will use IP addresses of range 10.8.0.0 - 10.8.255.255.

# vpc.tf
resource "aws_vpc" "nat-test" {
  cidr_block = "10.8.0.0/16"
  enable_dns_hostnames = true
  enable_dns_support = true
  tags = { Name = "nat-test" }
}

resource "aws_internet_gateway" "igw" {
  vpc_id = aws_vpc.nat-test.id
  tags = { Name = "igw" }
}

Next up let us create the two green and blue subnets - public and private. For the public one we can also already create a routing table and associate IGW with it. For public I chose 10.8.1.0 - 10.8.1.255 range.

# public-subnet.tf
resource "aws_subnet" "public-subnet" {
    availability_zone = "eu-central-1b"
    cidr_block = "10.8.1.0/24"
    vpc_id = aws_vpc.nat-test.id
    map_public_ip_on_launch = true
    tags = { Name = "public-subnet" }
}

# Create and associate new routing table
resource "aws_route_table" "public-rtb" {
    vpc_id = aws_vpc.nat-test.id
}

resource "aws_route_table_association" "public-rtb" {
  subnet_id = aws_subnet.public-subnet.id
  route_table_id = aws_route_table.public-rtb.id
}

# Route default traffic to internet gateway
resource "aws_route" "default" {
    route_table_id = aws_route_table.public-rtb.id
    destination_cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.igw.id
}

For private we will use IPs 10.8.2.0 - 10.8.2.255. Let's also add an empty route table to it.

# private-subnet.tf
resource "aws_subnet" "private-subnet" {
  availability_zone = "eu-central-1b"
  cidr_block = "10.8.2.0/24"
  vpc_id = aws_vpc.nat-test.id
  tags = { Name = "private-subnet" }
}

resource "aws_route_table" "private-rtb" {
  vpc_id = aws_vpc.nat-test.id
}

resource "aws_route_table_association" "private-rtb" {
  route_table_id = aws_route_table.private-rtb.id
  subnet_id = aws_subnet.private-subnet.id
}

Public subnet

So far we have the basic draft of our virtual datacenter. Let's create a test instance that we will use to test the connectivity to the internet. We will put it into the public subnet. This will be our future NAT instance. I will open SSH to it but you can configure it to work with Systems Manager (we will do this for private instances later on). I will use Amazon Linux 2023 AMI. The security group will allow SSH access from my IP subnet, so I will also import my public key to AWS.

# public-instance.tf
data "aws_ami" "amazon-linux-2023" {
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-2.0.2023*-arm64-gp2"]
  }
  most_recent = true
}

resource "aws_instance" "nat-instance" {
  associate_public_ip_address = true
  subnet_id = aws_subnet.public-subnet.id
  vpc_security_group_ids = [aws_security_group.nat-sg.id]
  key_name = aws_key_pair.nat-kp.key_name
  ami = data.aws_ami.amazon-linux-2023.id
  instance_type = "t4g.nano"
  availability_zone = "eu-central-1b"
  tags = { Name = "NAT-Instance" }
}

# Import public key
resource "aws_key_pair" "nat-kp" {
  public_key = file("~/.ssh/id_ed25519.pub")
}

# Allow SSH from my IP range
resource "aws_security_group" "nat-sg" {
  vpc_id = aws_vpc.nat-test.id
  name = "NAT"

  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port = 22
    to_port = 22
    protocol = "tcp"
    cidr_blocks = ["158.0.0.0/8"] # Change this to some other IP range
  }
}

# This is the IP to connect to via SSH
output "ip" {
  value = aws_instance.nat-instance.public_ip
}

By opening SSH session to the new instance and trying to ping google.com as well as testing updates we can verify that our instance indeed can access the internet.

Private subnet

Let's start by creating two private instances. By default I won't configure SSH for them. You can do it and use the public instance as a bastion host but in this example I prefer to use Systems Manager. The full configuration for what is required for this to work is in the following file ssm.tf. It sets up all the necessary components for Systems Manager to work - IAM role, policy and endpoints in the VPC.

What we will do now is start two example instances in the private subnet. Their security groups will allow only egress traffic.

resource "aws_instance" "private-instance" {
  count = 2
  ami = data.aws_ami.amazon-linux-2023.id
  subnet_id = aws_subnet.private-subnet.id
  vpc_security_group_ids = [aws_security_group.private.id]
  tags = { Name = "Private-Instance-${count.index}" }
  # Allow connecting from Systems Manager
  iam_instance_profile = aws_iam_instance_profile.ssm-profile.name
  instance_type = "t4g.nano"
  availability_zone = "eu-central-1b"
}

resource "aws_security_group" "private" {
  vpc_id = aws_vpc.nat-test.id
  name = "Private"
  egress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

So now we have private instances that can't access the internet. Let's try by connecting to them via Systems Manager and pinging google.com or doing sudo yum update.

As expected we cannot update nor ping anything outside (although we have DNS at least!). Our current setup looks like this diagram.

Current setup

Configuring NAT

In order to match the diagram at the beginning of this post we need to add a new network interface and configure the routing table for the private subnet to use the other interface. To our public-instance.tf file we will add a network interface resource that will be attacked to the instance and configure the security group to allow all ingress traffic from the private subnet. We also attach the same security group to this network interface. This new "network card" will have its "cable connected" into the private subnet.

# public-instance.tf
...

resource "aws_security_group" "nat-sg" {
  vpc_id = aws_vpc.nat-test.id
  name = "NAT"
  ...

  ingress {
    from_port = 0
    to_port = 0
    protocol = "-1"
    security_groups = [ aws_security_group.private.id  ]
  }
}

resource "aws_network_interface" "private-sub-ni" {
  subnet_id = aws_subnet.private-subnet.id
  attachment {
    device_index = 1
    instance = aws_instance.nat-instance.id
  }
  security_groups = [aws_security_group.nat-sg.id]
  source_dest_check = false # Important flag
}

What is more important is to disable source/destination check on this network interface. If we don't do this, packets that are not directed to this interface or its subnet will be dropped. This is needed for NAT to work - imagine if we want to reach Google's DNS server from our private instance. The packet sent to the new interface will be addressed to 8.8.8.8 but the interface will drop it because it is not addressed to it.

Next we need to create a new route for the private route table and connect the default route to this new network interface. In private-subnet.tf we need to add the following configuration.

# private-subnet.tf
...

resource "aws_route" "private-public" {
  route_table_id = aws_route_table.private-rtb.id
  destination_cidr_block = "0.0.0.0/0"
  network_interface_id = aws_network_interface.private-sub-ni.id  
}

Let's SSH to our instance and see if it was assigned the new network interface and the private IP address from private subnet. If not we can try rebooting the instance.

[ec2-user@ip-10-8-1-227 ~]$ ifconfig
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 10.8.1.227  netmask 255.255.255.0  broadcast 10.8.1.255
...
eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 10.8.2.148  netmask 255.255.255.0  broadcast 10.8.2.255

Our interfaces have names eth0 and eth1. The last part that we need is to configure the NAT instance to forward traffic. First we enable forwarding in the system. Then we use iptables to configure nat table on eth0 interface. Next we configure the FORWARD chain to allow all stateful connections from eth0 to eth1 (responses from the internet) and forward any connections from eth1 to eth0. In the SSH session, run the following commands:

$ sudo sysctl -w net.ipv4.ip_forward=1
$ sudo iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
$ sudo iptables -A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT
$ sudo iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT

This script can be added also to user_data in instance's Terraform resource.

resource "aws_instance" "nat-instance" {
  associate_public_ip_address = true
  subnet_id = aws_subnet.public-subnet.id
...
  user_data = <<-EOF
  #!/bin/bash
  sysctl -w net.ipv4.ip_forward=1
  iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
  iptables -A FORWARD -i eth0 -o eth1 -m state --state RELATED,ESTABLISHED -j ACCEPT
  iptables -A FORWARD -i eth1 -o eth0 -j ACCEPT
  EOF
}

Now we can finally test the internet access on the private instances. To be fully convinced that our instances are using the NAT instance we can curl to https://api.ipify.org that will respond with our public IP address. We can also compare it to the public IP address that AWS assigned to our NAT instance.

[ec2-user@ip-10-8-1-45 ~]$ curl https://api.ipify.org && echo
3.64.126.205

sh-4.2$ bash
[ssm-user@ip-10-8-2-120 bin]$ curl https://api.ipify.org && echo
3.64.126.205

sh-4.2$ bash
[ssm-user@ip-10-8-2-86 ~]$ curl https://api.ipify.org && echo
3.64.126.205