AWS S3 Logging Options - CloudTrail and Server Access Logs
21 October 2023
As the preparation for SCS-C02 exam, some questions came up for me regarding S3 access logging. Are anonymous requests logged? Are website requests logged? Are CloudFront requests logged? I wanted to test it out to see whether CloudTrail Data trail and Server Access Logging collect all those events or are they both needed to be complementary for all possibilities.
Create a test bucket
First I created several buckets to test the logging options:
pabiseu-test-logging
, pabiseu-test-logging-trail
and
pabiseu-test-logging-serverlogs
. All the settings in those buckets are
default.
Setting up CloudTrail
Next we will create CloudTrail Trail for Data events. It will log events into
pabiseu-test-logging-trail
bucket. We will customize the trail settings so
that it only records events directly related to pabiseu-test-logging
bucket.
Go to CloudTrail and in the left panel (can be hidden) select Trails
.
In there you can create individual custom trails. Click Create trail
and
select Data events
. Make the trail store the data in some other bucket, in
my case pabiseu-test-logging-trail
.
Instead of logging all events, we will create a custom selector. Add a field
resources.ARN
that starts with S3 bucket ARN. For convenience, use the
Browse
button. I selected arn:aws:s3:::pabiseu-test-logging/
.
Next do some actions in the test S3 bucket. Try uploading some files, deleting,
accessing them using Open
in S3, copying raw URL and getting Access Denied
.
Querying the logs with Athena
Go to CloudTrail Event History and click Create Athena Table
. Select the trail
bucket we set up earlier.
Next in Athena go to Settings
, Edit
and set query results location. I
selected the same bucket as the trail but with a new prefix:
s3://pabiseu-test-logging-trail/athena
Now we can run a test query inside Athena. In the query editor run the following to scan through the logs.
SELECT * FROM "default"."cloudtrail_logs_pabiseu_test_logging_trail" WHERE errorcode = 'AccessDenied' LIMIT 10;
If you tried to access any files without permissions (for example using public HTTP link), it should be logged here. (In the example below there's one more line related to the trail bucket. CloudTrail will log it's own events as well if you choose to log all events).
Enabling server access logs
Go to S3, choose your test bucket settings and find Server access logging
under Properties
tab. Select another bucket that will be dedicated for these
logs. In my case it's pabiseu-test-logging-serverlogs
. Again try to access
some files with authentication, via presigned URL, without access, upload,
delete, etc. This time the logs will not be generated almost immediately, so
give it some time.
We will create another Athena table that will query these logs. Following official AWS guide the script for creating this table looks the following. Adapt the first and last lines to your liking and with your bucket name.
CREATE EXTERNAL TABLE `default.server_access_logs`(
`bucketowner` STRING,
`bucket_name` STRING,
`requestdatetime` STRING,
`remoteip` STRING,
`requester` STRING,
`requestid` STRING,
`operation` STRING,
`key` STRING,
`request_uri` STRING,
`httpstatus` STRING,
`errorcode` STRING,
`bytessent` BIGINT,
`objectsize` BIGINT,
`totaltime` STRING,
`turnaroundtime` STRING,
`referrer` STRING,
`useragent` STRING,
`versionid` STRING,
`hostid` STRING,
`sigv` STRING,
`ciphersuite` STRING,
`authtype` STRING,
`endpoint` STRING,
`tlsversion` STRING,
`accesspointarn` STRING,
`aclrequired` STRING)
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
'input.regex'='([^ ]*) ([^ ]*) \\[(.*?)\\] ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) (-|[0-9]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) (\"[^\"]*\"|-) ([^ ]*)(?: ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*) ([^ ]*))?.*$')
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
's3://pabiseu-test-logging-serverlogs/'
After you get some logs, try querying them with Athena
SELECT "httpstatus", "errorcode", "key", "bucket_name", "operation", "requester", "useragent"
FROM "default"."server_access_logs"
WHERE operation LIKE 'REST.%.OBJECT'
limit 20;
The results should look similar like this. It is visible that some of the
requests are authenticated - requester
is known - and others are anonymous.
Testing with S3 static website
Let's enable public access on the bucket and add policy that will allow getting
objects from public/
prefix.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "AllowPublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject"
],
"Resource": [
"arn:aws:s3:::pabiseu-test-logging/public/*"
]
}
]
}
Go to bucket properties and enable static website hosting. Try to access files
from the root and from the public prefix. Try also accessing inexistent files
in both public/
and root of the bucket. Take note of the time you did these
requests, as it might be useful for filtering the logs.
Next go to Athena and query both CloudTrail table and server logs table in two separate tabs.
SELECT "requestdatetime", "httpstatus", "operation", "requester", "key", "request_uri" FROM "default"."server_access_logs" order by "requestdatetime" DESC LIMIT 10;
SELECT "eventtime", "requestparameters", "useridentity" FROM "default"."cloudtrail_logs_pabiseu_test_logging_trail" order by eventtime DESC LIMIT 10;
CloudTrail logs the requests from the website endpoint in requestparameters
under Host
key in the JSON. User identity in this case is anonymous
and ARN
is equal to null
. In server access logs the requester is empty.
Testing with CloudFront
Let's create a CloudFront distribution - first public, we will work on authorization in the next stage.
Select your bucket as the origin, select Public
access, don't enable Origin
Shield. Allow GET, HEAD, OPTIONS methods. Do not enable WAF. Select price class
only for North America and Europe. Wait for the distribution to be created.
Copy the Domain name, and paste into your browser. Try accessing file in the
root and files under public/
prefix (the URL will look like
http://abcdef123.cloudfront.net/public/<file>
). Try accessing inexistent files
as well.
That's how the requests look like in CloudTrail and Server Access logs for
CloudFront via Public access. The user agent reported in CloudTrail is
[Amazon CloudFront]
. There's not trace of any authentication and the principal
is null
.
Next we will change setting a bit and enable Origin Access Control. Go to your
CloudFront Distribution, edit the origin and change the setting to
Origin Access Control
. Add new control setting with default settings. Below
you should see a blue box with information about bucket policy. For convenience
copy the policy and replace it in the bucket. Just change the Resource
value
to include /public/*
prefix, so that we can get some 403s. Use link under the
box for convenience.
Test the same paths as before on CloudFront. While the policy is replaced, we cannot access S3 website endpoint anymore as we are restricted only to CloudFront that does signed requests to S3.
Now the requests should be different. For example in CloudTrail logs we clearly
see that the requests are signed in additionaleventdata
column even though the
principal is still null
. Also the user agent is now canonical name of the
service.
In case of server access logs, signature is also visible in sigv
column. For
OAC requests the user agent is not passed, rather the requester is
svc:cloudfront.amazonaws.com
. With public access CloudFront acts just like a
browser.
Conclusion
It seems that no matter which logging method we use, all of them seem to log all the requests. Server logs are more convenient for viewing actions that happened in the bucket. CloudTrail on the other hand logs everything and requires more filtering and more extractions from inside the reported JSON and JSON like structures but provides much more granular context on the event.