CloudWatch & Monitoring
Duration: 50 min
CloudWatch is AWS's monitoring and observability service. It collects metrics from AWS resources, stores logs, and triggers alarms. This module covers metrics, alarms, logs, dashboards, and SNS notifications.
CloudWatch Metrics
Metrics are data points about your resources. AWS services automatically publish metrics (CPU usage, network traffic, request count, etc.). You can also publish custom metrics.
Metrics are organized by namespace (e.g., AWS/EC2, AWS/RDS). Each metric has dimensions (e.g., InstanceId, DBInstanceIdentifier).
CloudWatch Alarms
Alarms monitor metrics and trigger actions when thresholds are breached. An alarm can send SNS notifications, trigger Lambda functions, or auto-scale resources.
Alarms have three states: OK (metric is healthy), ALARM (threshold breached), and INSUFFICIENT_DATA (not enough data).
CloudWatch Logs
Logs capture application and system output. Log groups organize logs by application or service. Log streams are sequences of log events.
You can filter logs, create metric filters to extract metrics from logs, and set retention policies.
CloudWatch Dashboards
Dashboards visualize metrics in real-time. You can create custom dashboards with multiple widgets showing different metrics.
Hands-On: Create Alarm and Dashboard
Create an SNS topic for notifications:
aws sns create-topic --name my-alertsCreate an alarm for EC2 CPU:
aws cloudwatch put-metric-alarm --alarm-name high-cpu \
--alarm-description "Alert when CPU is high" \
--metric-name CPUUtilization --namespace AWS/EC2 \
--statistic Average --period 300 --threshold 80 \
--comparison-operator GreaterThanThreshold \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:my-alertsList alarms:
aws cloudwatch describe-alarms --alarm-names high-cpuCreate a custom metric:
aws cloudwatch put-metric-data --namespace MyApp \
--metric-name RequestCount --value 100Create a dashboard:
aws cloudwatch put-dashboard --dashboard-name my-dashboard \
--dashboard-body '{
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [["AWS/EC2", "CPUUtilization"]],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "EC2 CPU"
}
}
]
}'Python Boto3 Example
import boto3
from datetime import datetime, timedelta
cloudwatch = boto3.client('cloudwatch')
# Put custom metric
cloudwatch.put_metric_data(
Namespace='MyApp',
MetricData=[
{
'MetricName': 'RequestCount',
'Value': 100,
'Unit': 'Count',
'Timestamp': datetime.utcnow()
}
]
)
# Create alarm
cloudwatch.put_metric_alarm(
AlarmName='high-cpu',
MetricName='CPUUtilization',
Namespace='AWS/EC2',
Statistic='Average',
Period=300,
Threshold=80,
ComparisonOperator='GreaterThanThreshold',
AlarmActions=['arn:aws:sns:us-east-1:ACCOUNT:my-alerts']
)
# Get metric statistics
response = cloudwatch.get_metric_statistics(
Namespace='AWS/EC2',
MetricName='CPUUtilization',
StartTime=datetime.utcnow() - timedelta(hours=1),
EndTime=datetime.utcnow(),
Period=300,
Statistics=['Average']
)
for datapoint in response['Datapoints']:
print(f"Time: {datapoint['Timestamp']}, CPU: {datapoint['Average']}%")CloudWatch Logs Example
import boto3
logs = boto3.client('logs')
# Create log group
logs.create_log_group(logGroupName='/aws/lambda/my-function')
# Create log stream
logs.create_log_stream(
logGroupName='/aws/lambda/my-function',
logStreamName='2024-01-01'
)
# Put log events
logs.put_log_events(
logGroupName='/aws/lambda/my-function',
logStreamName='2024-01-01',
logEvents=[
{
'message': 'Function started',
'timestamp': int(datetime.utcnow().timestamp() * 1000)
}
]
)
# Query logs
response = logs.filter_log_events(
logGroupName='/aws/lambda/my-function',
filterPattern='ERROR'
)
for event in response['events']:
print(event['message'])Terraform Example
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = 1
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = 300
statistic = "Average"
threshold = 80
alarm_description = "Alert when CPU is high"
alarm_actions = [aws_sns_topic.alerts.arn]
}
resource "aws_sns_topic" "alerts" {
name = "my-alerts"
}
resource "aws_cloudwatch_dashboard" "main" {
dashboard_name = "my-dashboard"
dashboard_body = jsonencode({
widgets = [
{
type = "metric"
properties = {
metrics = [["AWS/EC2", "CPUUtilization"]]
period = 300
stat = "Average"
region = "us-east-1"
title = "EC2 CPU"
}
}
]
})
}Quiz 1
❓ What is CloudWatch?
Quiz 2
❓ What are CloudWatch metrics?
Quiz 3
❓ What is a CloudWatch alarm?
Quiz 4
❓ What are the three states of a CloudWatch alarm?
Quiz 5
❓ What is a CloudWatch dashboard?