pabis.eu

Slim app for getting SSE-C encrypted files from S3 in Go

10 October 2023

In one of the previous posts, we were implementing a small application in Go that was helpful to download configuration files from SSM Parameter Store. However, SSM is limited to 4KB or 8KB with Advanced tier. To solve this problem, today we will implement a similar application that will download the configuration files from S3.

GitHub repo

Encryption options

SSM offers SecureString type that is encrypted with KMS. However, in S3 we have some other options: S3 AWS managed encryption that only ensures that files on the disk are encrypted. This mode is transparent to the end users. KMS is the same option as in SSM case - we can use AWS managed keys or custom ones. If the user has permissions to use this key, this is noted in CloudTrail but in the end it is also transparent.

Another interesting feature is SSE-C. This encryption mode requires the user to pass encryption key with each request. It works only in HTTPS mode but this shouldn't be a concern in 2023.

Creating the project

Let's create a new Git repo and Go module with dependencies.

$ mkdir s3-ssec-get && cd s3-ssec-get
$ git init
$ go mod init github.com/ppabis/s3-ssec-get
$ go get github.com/aws/aws-sdk-go-v2/service/s3
$ go get github.com/aws/aws-sdk-go-v2/config

The application will recursively download files from S3 prefix. We will pass four arguments to the application: bucket name, prefix in the bucket, SSE-C key and target output directory. In our main file, we will define the following code:

import (
    "fmt"
    "os"
)

func main() {
    if len(os.Args) < 5 {
        fmt.Println("Usage: s3-ssec-get <bucketName> <prefix> <key> <outputDir>")
        os.Exit(1)
    }

    bucketName := os.Args[1]
    prefix := os.Args[2]
    key := os.Args[3]
    outputDir := os.Args[4]
}

Next up we will initialize AWS SDK so that it automatically finds the most adequate credentials: whether we run in EC2, ECS or with local access key. We will also specify the region. For simplicity I will hardcode it to eu-central-1 but if can be easily loaded from environment variables.

import (
    ...
    "context"
    "github.com/aws/aws-sdk-go-v2/config"
)

func main() {
    ...
    cfg, err := config.LoadDefaultConfig(context.TODO(), config.WithRegion("eu-central-1"))
    if err != nil {
        fmt.Printf("[ERROR] unable to load SDK config, %v\n", err)
        os.Exit(2)
    }
}

Recursively downloading objects in S3

In the new file, create a new function that will take all the arguments and config as parameters. This function will list all objects with the given prefix from the bucket. This version is simplified as ListObjectsV2 can return up to 1000 objects. If you have more, you need to use pagination.

func RecursiveGetObject(cfg aws.Config, bucketName string, prefix string, ssecKey string, output string) {
    client := s3.NewFromConfig(cfg)

    // Get all the objects in the bucket
    objects, err := client.ListObjectsV2(context.TODO(), &s3.ListObjectsV2Input{
        Bucket: &bucketName,
        Prefix: &prefix,
    })

    if err != nil {
        log.Fatalf("[ERROR] listing objects in bucket %s: %s\n", bucketName, err)
    }

    // Iterate over all the objects
    for _, object := range objects.Contents {
        log.Default().Printf("getting object %s", *object.Key)
    }
}

Next we will determine the actual output path for the object and create all necessary directories if they don't exist. For this, define a helper function. For example, if our prefix is config/ and we have nested structure like: config/common.json, config/production/database.json, with output directory set to /etc/app, we want to create /etc/app/common.json and /etc/app/production/database.json.

import (
    ...
    "path/filepath"
)

func ensurePath(key string, prefix string, output string) string {
    // Remove the prefix from the key
    newKey := key[len(prefix):]
    // Append the new key to the output directory
    output = filepath.Join(output, newKey)
    // Check if all the parent directories exist or create them
    dir := filepath.Dir(output)
    if _, err := os.Stat(dir); os.IsNotExist(err) {
        os.MkdirAll(dir, 0755)
    }

    return output
}

Now is the time for actual downloading. But for SSE-C configuration we need first to give the key and its MD5 hash. This is a somewhat tricky part because the key itself should be base64 and MD5 hash should be also base64 from the raw form of the key. Let's do a function for that.

import (
    ...
    "crypto/md5"
    "encoding/base64"
)
func keyMd5(ssecKey string) string {
    rawKey, err := base64.StdEncoding.DecodeString(ssecKey)
    if err != nil {
        log.Fatalf("[ERROR] decoding ssecKey: %s\n", err)
        return ""
    }
    hasher := md5.New()
    hasher.Write(rawKey)
    keyHashB64 := base64.StdEncoding.EncodeToString(hasher.Sum(nil))
    return keyHashB64
}

So the final downloader function can look like this: we get the object handle from S3, open the output file, read the object with buffering and write it to the file. Plus handling of errors.

func transferObject(client *s3.Client, bucketName string, key string, ssecKey string, output string) error {
    keyHash := keyMd5(ssecKey)

    object, err := client.GetObject(context.TODO(), &s3.GetObjectInput{
        Bucket:               &bucketName,
        Key:                  &key,
        SSECustomerAlgorithm: aws.String("AES256"),
        SSECustomerKey:       &ssecKey,
        SSECustomerKeyMD5:    &keyHash,
    })

    if err != nil {
        return fmt.Errorf("[ERROR] getting object %s: %s", key, err)
    }

    of, err := os.OpenFile(output, os.O_CREATE|os.O_WRONLY, 0644)
    if err != nil {
        return fmt.Errorf("[ERROR] opening file %s: %s", output, err)
    }
    defer of.Close()

    buf := make([]byte, 1024)
    for {
        n, err := object.Body.Read(buf)
        if err != nil && err != io.EOF {
            return fmt.Errorf("[ERROR] reading object %s: %s", key, err)
        }

        _, werr := of.Write(buf[:n])
        if werr != nil {
            return fmt.Errorf("[ERROR] writing object %s: %s", key, werr)
        }

        if err == io.EOF {
            break
        }
    }

    return nil
}

After the call to log in RecursiveGetObject, add calls to the new functions.

...
    // Iterate over all the objects
    for _, object := range objects.Contents {
        log.Default().Printf("getting object %s", *object.Key)
        output := ensurePath(*object.Key, prefix, outputDir)
        err := transferObject(client, bucketName, *object.Key, ssecKey, output)
        if err != nil {
            log.Fatalf("[ERROR] getting object %s: %s\n", *object.Key, err)
        }
    }

Testing the application

To test the application, we will upload some files to S3 with SSE-C encryption. But where to get the key from? One of the options is to use OpenSSL or any other tool to create random bytes but I will go full AWS and get it from KMS as it has such functionality. Do not lose the key as we will need it later.

$ export MY_KEY=$(aws kms --region eu-central-1 generate-random --number-of-bytes 32 --query Plaintext --output text)
$ echo $MY_KEY
lrl4BGpHGzDRrA/q1HrA5jRhcfF+Ldj7/yMOyvylYhs=

To upload with AWS CLI we also need MD5 of this key. And it's just as tricky as in Go case. This version should work on Mac and Linux.

$ export MY_KEY_MD5=$(echo -n $MY_KEY | base64 -d | openssl md5 | cut -d' ' -f2 | xxd -r -p | base64)

Now we can upload the files. I created a bucket with totally random hex name for testing. Let's put some data into the files to verify that they are downloaded later on.

$ aws s3 mb --region eu-central-1 s3://$(openssl rand -hex 16)
make_bucket: 50d0ed31162e47ba2d2fee532c22504f
$ echo "common configuration" > common.json
$ echo "database production config" > database.json
$ aws s3api put-object --bucket 50d0ed31162e47ba2d2fee532c22504f\
 --key config/common.json\
 --sse-customer-algorithm AES256\
 --sse-customer-key $MY_KEY\
 --sse-customer-key-md5 $MY_KEY_MD5\
 --body common.json
$ aws s3api put-object --bucket 50d0ed31162e47ba2d2fee532c22504f\
 --key config/production/database.json\
 --sse-customer-algorithm AES256\
 --sse-customer-key $MY_KEY\
 --sse-customer-key-md5 $MY_KEY_MD5\
 --body database.json

For both uploads we should get response with "ETag". This means that the MD5 of the key was correct.

Now I will start a new instance on AWS with Amazon Linux 2023. I will attach to it IAM role and add permissions to read from this S3 bucket. I will also install Go runtime, copy over all the sources and build the application.

$ # Copy the sources
$ ssh ec2-user@$(terraform output --raw public-ip) mkdir app
$ scp -r . ec2-user@$(terraform output --raw public-ip):~/app/
$ ssh ec2-user@$(terraform output --raw public-ip)
$ # On the machine
$ sudo yum install -y golang
$ cd app && go build -o s3-ssec-get
$ sudo cp s3-ssec-get /usr/local/bin/

And test downloading the files from S3.

$ # Verify that we can reach that bucket and objects with IAM role
$ aws s3 ls s3://50d0ed31162e47ba2d2fee532c22504f/config
                           PRE config/
$ # The same key as generated previously
$ export MY_KEY=lrl4BGpHGzDRrA/q1HrA5jRhcfF+Ldj7/yMOyvylYhs=
$ s3-ssec-get 50d0ed31162e47ba2d2fee532c22504f config/ $MY_KEY /tmp
$ cat /tmp/common.json 
common configuration
$ cat /tmp/production/database.json 
database production config