Module 9 of 13 · AWS Fundamentals · Beginner

CloudWatch & Monitoring

Duration: 50 min

CloudWatch is AWS's monitoring and observability service. It collects metrics from AWS resources, stores logs, and triggers alarms. This module covers metrics, alarms, logs, dashboards, and SNS notifications.

CloudWatch Metrics

Metrics are data points about your resources. AWS services automatically publish metrics (CPU usage, network traffic, request count, etc.). You can also publish custom metrics.

Metrics are organized by namespace (e.g., AWS/EC2, AWS/RDS). Each metric has dimensions (e.g., InstanceId, DBInstanceIdentifier).

CloudWatch Alarms

Alarms monitor metrics and trigger actions when thresholds are breached. An alarm can send SNS notifications, trigger Lambda functions, or auto-scale resources.

Alarms have three states: OK (metric is healthy), ALARM (threshold breached), and INSUFFICIENT_DATA (not enough data).

CloudWatch Logs

Logs capture application and system output. Log groups organize logs by application or service. Log streams are sequences of log events.

You can filter logs, create metric filters to extract metrics from logs, and set retention policies.

CloudWatch Dashboards

Dashboards visualize metrics in real-time. You can create custom dashboards with multiple widgets showing different metrics.

Hands-On: Create Alarm and Dashboard

Create an SNS topic for notifications:

aws sns create-topic --name my-alerts

Create an alarm for EC2 CPU:

aws cloudwatch put-metric-alarm --alarm-name high-cpu \
  --alarm-description "Alert when CPU is high" \
  --metric-name CPUUtilization --namespace AWS/EC2 \
  --statistic Average --period 300 --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --alarm-actions arn:aws:sns:us-east-1:ACCOUNT:my-alerts

List alarms:

aws cloudwatch describe-alarms --alarm-names high-cpu

Create a custom metric:

aws cloudwatch put-metric-data --namespace MyApp \
  --metric-name RequestCount --value 100

Create a dashboard:

aws cloudwatch put-dashboard --dashboard-name my-dashboard \
  --dashboard-body '{
    "widgets": [
      {
        "type": "metric",
        "properties": {
          "metrics": [["AWS/EC2", "CPUUtilization"]],
          "period": 300,
          "stat": "Average",
          "region": "us-east-1",
          "title": "EC2 CPU"
        }
      }
    ]
  }'

Python Boto3 Example

import boto3
from datetime import datetime, timedelta

cloudwatch = boto3.client('cloudwatch')

# Put custom metric
cloudwatch.put_metric_data(
    Namespace='MyApp',
    MetricData=[
        {
            'MetricName': 'RequestCount',
            'Value': 100,
            'Unit': 'Count',
            'Timestamp': datetime.utcnow()
        }
    ]
)

# Create alarm
cloudwatch.put_metric_alarm(
    AlarmName='high-cpu',
    MetricName='CPUUtilization',
    Namespace='AWS/EC2',
    Statistic='Average',
    Period=300,
    Threshold=80,
    ComparisonOperator='GreaterThanThreshold',
    AlarmActions=['arn:aws:sns:us-east-1:ACCOUNT:my-alerts']
)

# Get metric statistics
response = cloudwatch.get_metric_statistics(
    Namespace='AWS/EC2',
    MetricName='CPUUtilization',
    StartTime=datetime.utcnow() - timedelta(hours=1),
    EndTime=datetime.utcnow(),
    Period=300,
    Statistics=['Average']
)

for datapoint in response['Datapoints']:
    print(f"Time: {datapoint['Timestamp']}, CPU: {datapoint['Average']}%")

CloudWatch Logs Example

import boto3

logs = boto3.client('logs')

# Create log group
logs.create_log_group(logGroupName='/aws/lambda/my-function')

# Create log stream
logs.create_log_stream(
    logGroupName='/aws/lambda/my-function',
    logStreamName='2024-01-01'
)

# Put log events
logs.put_log_events(
    logGroupName='/aws/lambda/my-function',
    logStreamName='2024-01-01',
    logEvents=[
        {
            'message': 'Function started',
            'timestamp': int(datetime.utcnow().timestamp() * 1000)
        }
    ]
)

# Query logs
response = logs.filter_log_events(
    logGroupName='/aws/lambda/my-function',
    filterPattern='ERROR'
)

for event in response['events']:
    print(event['message'])

Terraform Example

resource "aws_cloudwatch_metric_alarm" "high_cpu" {
  alarm_name          = "high-cpu"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "CPUUtilization"
  namespace           = "AWS/EC2"
  period              = 300
  statistic           = "Average"
  threshold           = 80
  alarm_description   = "Alert when CPU is high"
  alarm_actions       = [aws_sns_topic.alerts.arn]
}

resource "aws_sns_topic" "alerts" {
  name = "my-alerts"
}

resource "aws_cloudwatch_dashboard" "main" {
  dashboard_name = "my-dashboard"

  dashboard_body = jsonencode({
    widgets = [
      {
        type = "metric"
        properties = {
          metrics = [["AWS/EC2", "CPUUtilization"]]
          period  = 300
          stat    = "Average"
          region  = "us-east-1"
          title   = "EC2 CPU"
        }
      }
    ]
  })
}

Quiz 1

❓ What is CloudWatch?

Quiz 2

❓ What are CloudWatch metrics?

Quiz 3

❓ What is a CloudWatch alarm?

Quiz 4

❓ What are the three states of a CloudWatch alarm?

Quiz 5

❓ What is a CloudWatch dashboard?

← Previous Continue interactively → Next →

Related Courses