Monitor recency of your systems with CloudWatch

13 March 2023

In the previous post we created a Datadog check and monitor to keep track of how many packages can be upgraded on our systems, when was the last time we rebooted the system and what is the major release of the operating system. Today, we will replicate the same metrics but using AWS CloudWatch service. Let's start by creating test instances.

Repository for this post is available here: CloudWatch-System-State.

Test instances

To create test instances we will use Terraform. You can also create the instances manually in the Console - just spin up a Ubuntu and some RedHat based instance with public IP and default settings. Code in this post is not complete as it will take up too much space. You can find the complete code for specific components here:

Copy the above files and adapt them for yourself - which public key to upload to AWS instance in keys.tf, what IPs to allow SSH access from in newly created terraform.tfvars as ssh-subnet = "99.88.22.0/24". If you are creating instances manually, keep "Allow SSH" checkbox checked when launching the new instance - except a bit larger disk and OS, everything should be default.

Launching a new instance manually

The Terraform boilerplate code should look like this:

$ ls -l
-rw-r--r--  1 ppabis  staff    389 Mar  4 18:02 amis.tf
-rw-r--r--  1 ppabis  staff    106 Mar  4 18:02 keys.tf
-rw-r--r--  1 ppabis  staff    136 Mar  4 18:02 main.tf
-rw-r--r--  1 ppabis  staff    283 Mar  4 18:02 sg.tf
-rw-r--r--  1 ppabis  staff     29 Mar  4 17:58 terraform.tfvars
-rw-r--r--  1 ppabis  staff    125 Mar  4 18:02 variables.tf

Let's create a new file instances.tf which will define two instances, one Ubuntu and one AlmaLinux. We will also output their public IPs.

resource "aws_instance" "ubuntu" {
  ami           = data.aws_ami.ubuntu-arm.id
  instance_type = "t4g.micro"
  key_name      = aws_key_pair.ssh.key_name
  tags = { Name = "Ubuntu" }
  vpc_security_group_ids = [ aws_security_group.ssh.id ]
}


resource "aws_instance" "alma" {
  ami           = data.aws_ami.alma-arm.id
  instance_type = "t4g.micro"
  key_name      = aws_key_pair.ssh.key_name
  tags = { Name = "Alma" }
  vpc_security_group_ids = [ aws_security_group.ssh.id ]
}

output "ubuntu" { value = aws_instance.ubuntu.public_ip }
output "alma" { value = aws_instance.alma.public_ip }

Set up your credentials to AWS, run terraform init, terraform apply.

Prepare for CloudWatch

Let's SSH to one of the instances and install AWS CLI following the official guide (use awscli-exe-linux-x86_64.zip if you have t2, t3 or t3a instance).

$ ssh ubuntu@3.4.5.6 # replace with your instance's IP and -i to select private key
ubuntu@ip-10-0-0-1:~$ curl "https://awscli.amazonaws.com/awscli-exe-linux-aarch64.zip" -o "awscliv2.zip"
# the above link depends on your architecture, it can be also awscli-exe-linux-x86_64.zip
ubuntu@ip-10-0-0-1:~$ unzip awscliv2.zip
ubuntu@ip-10-0-0-1:~$ sudo ./aws/install

Try running aws command to see if you see help message.

ubuntu@ip-10-0-0-1:~$ aws
usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters]
To see help text, you can run:
  aws help
  aws <command> help
  aws <command> <subcommand> help

Great, it seems that our CLI tools are working. Let's try issuing a metric to CloudWatch.

ubuntu@ip-10-0-0-1:~$ aws cloudwatch put-metric-data --metric-name "Test" --namespace "Test" --value 1
Unable to locate credentials. You can configure credentials by running "aws configure".

It seems that we have no credentials associated with our instance. Normally we use access keys to authenticate with AWS. However, for EC2 instances, it's worth to set roles instead - they don't require any credentials to be stored on the instance. Let's create a role for our instance in a new file role.tf.

resource "aws_iam_role" "ec2-cloudwatch" {
    name = "ec2-cloudwatch"
    assume_role_policy = <<EOF
    { 
      "Version": "2012-10-17",
      "Statement": [ {
        "Action": "sts:AssumeRole",
        "Principal": { "Service": "ec2.amazonaws.com" },
        "Effect": "Allow",
        "Sid": ""
      } ]
    }
    EOF
}

resource "aws_iam_instance_profile" "ec2-cloudwatch" {
    name = "ec2-cloudwatch"
    role = aws_iam_role.ec2-cloudwatch.name
}

If you are using AWS console, go to IAM, create a new role, then create a new policy and finally attach role to policy.

Creating a new role and policy in AWS console

Attaching policy to role

Here we create a role and instance profile that will let us attach the role to an instance (roles have also other uses than just EC2 instances). Let's attach our new role to our two instances. We can specify it directly in the instance's resource block as iam_instance_profile but then disassociation is not possible as of Terraform 1.3.9 and AWS provider 4.57.0 - you can only change it to another instance profile. Let's modify our instances and add the profile.

...
  key_name      = aws_key_pair.ssh.key_name
  iam_instance_profile = aws_iam_instance_profile.ec2-cloudwatch.name
  tags = { Name = "Ubuntu" }
...
  key_name      = aws_key_pair.ssh.key_name
  iam_instance_profile = aws_iam_instance_profile.ec2-cloudwatch.name
  tags = { Name = "Alma" }

If you are using AWS console, go to EC2, right click on your instance, move down to "Security" and "Modify IAM role".

Changing IAM role in AWS console

After applying, SSH to one of the instances and try issuing a metric again.

ubuntu@ip-10-0-0-1:~$ aws cloudwatch put-metric-data --metric-name "Test" --namespace "Test" --value 1
An error occurred (AccessDenied) when calling the PutMetricData operation: User: arn:aws:sts::0123456789:assumed-role/ec2-cloudwatch/i-0abcde123456 is not authorized to perform: cloudwatch:PutMetricData because no identity-based policy allows the cloudwatch:PutMetricData action

That result looks more promising. We will create a new policy and attach it to the role. Add these two blocks in role.tf and apply.

resource "aws_iam_policy" "cloudwatch" {
    name = "CloudWatch-PutMetric"
    policy = <<EOF
    {
      "Version": "2012-10-17",
      "Statement": [ {
        "Sid": "Stmt1",
        "Effect": "Allow",
        "Action": [ "cloudwatch:PutMetricData" ],
        "Resource": "*"
      } ]
    }
    EOF
}

resource "aws_iam_role_policy_attachment" "cloudwatch" {
    role = aws_iam_role.ec2-cloudwatch.name
    policy_arn = aws_iam_policy.cloudwatch.arn
}

After 5 or 10 seconds, things should propagate. Now when you SSH to one of the instances and issue a value for the metric, there should be no error. Let's look on the metrics in our console to ensure that they are there.

CloudWatch metrics

Reporting metrics from the instance

Now it's time to create a script that will periodically report metrics to CloudWatch with the values we want. Just as with Datadog case, we will use Python. Create a new file metrics.py in some common place such as /opt. Copy functions from previous script available here. We will need get_upgradable_packages, get_days_since_last_reboot and get_os_major_version functions. First remove indentation - these functions will be global, outside of any class. Remove self from each of the functions. Replace each get_subprocess_output with subprocess.Popen:

    # Old method: out, _, _ = get_subprocess_output( [ "lsb_release", "-rs" ], self.log )
    proc = subprocess.Popen( [ "lsb_release", "-rs" ], stdout=subprocess.PIPE, stderr=subprocess.PIPE ) # New method
    out, _ = proc.communicate() # New method
    out = out.decode( "utf-8" ) # New method

Add the following imports at the top of the file and after the transformations, the file should look something like this:

import os, shutil, socket, subprocess, boto3
from datetime import datetime

def get_upgradable_packages():
    ...

def get_days_since_last_reboot():
    ...

def get_os_major_version():
    ...

Let's test what the file produces. So at the bottom add this test code that will print values for every metric we need:

if __name__ == "__main__":
    print( f"Upgradable packages: { get_upgradable_packages() }" )
    print( f"Days since last reboot: { get_days_since_last_reboot() }" )
    print( f"OS major version: { get_os_major_version() }" )

Now run the script on the remote machine.

ubuntu@ip-10-0-0-1:~$ sudo apt update
ubuntu@ip-10-0-0-1:~$ python3 metrics.py 
Upgradable packages: 2
Days since last reboot: 0
OS major version: 20

Connecting the script to CloudWatch and scheduling it

So the next step is to install boto3 package which is the Python equivalent of AWS CLI/API (why "boto" see here. Type sudo pip3 install boto3 into terminal. You might need to install pip using apt install python3-pip or dnf install python38-pip. Then add it to the imports in the script. Now we need to create a function that will send data to CloudWatch.

# Helper method to quickly make a MetricData object
def create_metric(name, value, unit, dimensions):
    return {
        "MetricName": name,
        "Dimensions": dimensions,
        "Unit": unit,
        "Value": value
    }

if __name__ == "__main__":
    dimensions = [
        { "Name": "Hostname", "Value": socket.gethostname() },
        { "Name": "Distribution", "Value": os.environ["OS_DISTRIBUTION"] if "OS_DISTRIBUTION" in os.environ else "Unknown" }
    ]

    packages = create_metric( "UpgradablePackages", get_upgradable_packages(), "Count", dimensions)
    days_since_last_reboot = create_metric( "DaysSinceLastReboot", get_days_since_last_reboot(), "Count", dimensions)
    os_major_version = create_metric( "OSMajorVersion", get_os_major_version(), "None", dimensions)

    cw = boto3.client( "cloudwatch", region_name="eu-central-1" )
    cw.put_metric_data(
        Namespace = "SystemState",
        MetricData = [
            packages,
            days_since_last_reboot,
            os_major_version
        ]
    )

Now we need to schedule the script to run periodically. We will use cron for that. Type sudo crontab -e and add the following line to report the metric every hour. For RedHat systems of RHEL 8, install python38 packages and use /usr/bin/python3.8 instead.

* */1 * * * OS_DISTRIBUTION=Ubuntu /usr/bin/python3 /opt/metrics.py

If you see GNU nano or pico editor, exit with Ctrl+X, but if vi is the default editor, press i to insert text, then ESC, and :wq, Return to save and exit.

After some hours we should see the metrics in CloudWatch. We can also change cron expression to */10 * * * * to report every 10 minutes and see the metrics quicker.

CloudWatch metrics

As you can see, reporting metrics to CloudWatch is not much more complicated than doing it with Datadog.

In the ideal, it should be as easy as writing one line of code to create a new metric. - The DevOps Handbook Kim, Humble, et al.

pabis.eu

Monitor recency of your systems with CloudWatch

Test instances

Prepare for CloudWatch

Reporting metrics from the instance

Connecting the script to CloudWatch and scheduling it