pabis.eu

AWS S3 Logging Options - CloudTrail and Server Access Logs

21 October 2023

As the preparation for SCS-C02 exam, some questions came up for me regarding S3 access logging. Are anonymous requests logged? Are website requests logged? Are CloudFront requests logged? I wanted to test it out to see whether CloudTrail Data trail and Server Access Logging collect all those events or are they both needed to be complementary for all possibilities.

Create a test bucket

First I created several buckets to test the logging options: pabiseu-test-logging, pabiseu-test-logging-trail and pabiseu-test-logging-serverlogs. All the settings in those buckets are default.

Test buckets for logging

Setting up CloudTrail

Next we will create CloudTrail Trail for Data events. It will log events into pabiseu-test-logging-trail bucket. We will customize the trail settings so that it only records events directly related to pabiseu-test-logging bucket. Go to CloudTrail and in the left panel (can be hidden) select Trails.

Trails

In there you can create individual custom trails. Click Create trail and select Data events. Make the trail store the data in some other bucket, in my case pabiseu-test-logging-trail.

Data events

Instead of logging all events, we will create a custom selector. Add a field resources.ARN that starts with S3 bucket ARN. For convenience, use the Browse button. I selected arn:aws:s3:::pabiseu-test-logging/.

Custom selector

Next do some actions in the test S3 bucket. Try uploading some files, deleting, accessing them using Open in S3, copying raw URL and getting Access Denied.

Querying the logs with Athena

Go to CloudTrail Event History and click Create Athena Table. Select the trail bucket we set up earlier.

Create Athena Table

Next in Athena go to Settings, Edit and set query results location. I selected the same bucket as the trail but with a new prefix: s3://pabiseu-test-logging-trail/athena

Athena settings

Athena query location

Now we can run a test query inside Athena. In the query editor run the following to scan through the logs.

SELECT * FROM "default"."cloudtrail_logs_pabiseu_test_logging_trail" WHERE errorcode = 'AccessDenied' LIMIT 10;

If you tried to access any files without permissions (for example using public HTTP link), it should be logged here. (In the example below there's one more line related to the trail bucket. CloudTrail will log it's own events as well if you choose to log all events).

Athena query

Enabling server access logs

Go to S3, choose your test bucket settings and find Server access logging under Properties tab. Select another bucket that will be dedicated for these logs. In my case it's pabiseu-test-logging-serverlogs. Again try to access some files with authentication, via presigned URL, without access, upload, delete, etc. This time the logs will not be generated almost immediately, so give it some time.

We will create another Athena table that will query these logs. Following official AWS guide the script for creating this table looks the following. Adapt the first and last lines to your liking and with your bucket name.

CREATE EXTERNAL TABLE `default.server_access_logs`(
  `bucketowner` STRING, 
  `bucket_name` STRING, 
  `requestdatetime` STRING, 
  `remoteip` STRING, 
  `requester` STRING, 
  `requestid` STRING, 
  `operation` STRING, 
  `key` STRING, 
  `request_uri` STRING, 
  `httpstatus` STRING, 
  `errorcode` STRING, 
  `bytessent` BIGINT, 
  `objectsize` BIGINT, 
  `totaltime` STRING, 
  `turnaroundtime` STRING, 
  `referrer` STRING, 
  `useragent` STRING, 
  `versionid` STRING, 
  `hostid` STRING, 
  `sigv` STRING, 
  `ciphersuite` STRING, 
  `authtype` STRING, 
  `endpoint` STRING, 
  `tlsversion` STRING,
  `accesspointarn` STRING,
  `aclrequired` STRING)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES ( 
  'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://pabiseu-test-logging-serverlogs/'

After you get some logs, try querying them with Athena

SELECT "httpstatus", "errorcode", "key", "bucket_name", "operation", "requester", "useragent"
FROM "default"."server_access_logs"
WHERE operation LIKE 'REST.%.OBJECT'
limit 20;

The results should look similar like this. It is visible that some of the requests are authenticated - requester is known - and others are anonymous.

Athena query on server logs

Testing with S3 static website

Let's enable public access on the bucket and add policy that will allow getting objects from public/ prefix.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AllowPublicRead",
            "Effect": "Allow",
            "Principal": "*",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::pabiseu-test-logging/public/*"
            ]
        }
    ]
}

Go to bucket properties and enable static website hosting. Try to access files from the root and from the public prefix. Try also accessing inexistent files in both public/ and root of the bucket. Take note of the time you did these requests, as it might be useful for filtering the logs.

Next go to Athena and query both CloudTrail table and server logs table in two separate tabs.

SELECT "requestdatetime", "httpstatus", "operation", "requester", "key", "request_uri" FROM "default"."server_access_logs" order by "requestdatetime" DESC LIMIT 10;
SELECT "eventtime", "requestparameters", "useridentity" FROM "default"."cloudtrail_logs_pabiseu_test_logging_trail" order by eventtime DESC LIMIT 10;

CloudTrail logs the requests from the website endpoint in requestparameters under Host key in the JSON. User identity in this case is anonymous and ARN is equal to null. In server access logs the requester is empty.

CloudTrail logs

Server Access logs

Testing with CloudFront

Let's create a CloudFront distribution - first public, we will work on authorization in the next stage.

Select your bucket as the origin, select Public access, don't enable Origin Shield. Allow GET, HEAD, OPTIONS methods. Do not enable WAF. Select price class only for North America and Europe. Wait for the distribution to be created.

CloudFront distribution

Copy the Domain name, and paste into your browser. Try accessing file in the root and files under public/ prefix (the URL will look like http://abcdef123.cloudfront.net/public/<file>). Try accessing inexistent files as well.

That's how the requests look like in CloudTrail and Server Access logs for CloudFront via Public access. The user agent reported in CloudTrail is [Amazon CloudFront]. There's not trace of any authentication and the principal is null.

CloudTrail

Next we will change setting a bit and enable Origin Access Control. Go to your CloudFront Distribution, edit the origin and change the setting to Origin Access Control. Add new control setting with default settings. Below you should see a blue box with information about bucket policy. For convenience copy the policy and replace it in the bucket. Just change the Resource value to include /public/* prefix, so that we can get some 403s. Use link under the box for convenience.

Test the same paths as before on CloudFront. While the policy is replaced, we cannot access S3 website endpoint anymore as we are restricted only to CloudFront that does signed requests to S3.

Now the requests should be different. For example in CloudTrail logs we clearly see that the requests are signed in additionaleventdata column even though the principal is still null. Also the user agent is now canonical name of the service.

CloudTrail logs

In case of server access logs, signature is also visible in sigv column. For OAC requests the user agent is not passed, rather the requester is svc:cloudfront.amazonaws.com. With public access CloudFront acts just like a browser.

Server logs

Conclusion

It seems that no matter which logging method we use, all of them seem to log all the requests. Server logs are more convenient for viewing actions that happened in the bucket. CloudTrail on the other hand logs everything and requires more filtering and more extractions from inside the reported JSON and JSON like structures but provides much more granular context on the event.