CloudWatch Logs captures output from your ECS containers.
View Logs
Tail logs in real-time:
aws logs tail /ecs/{infra_name}-prd --follow
Search recent logs:
aws logs filter-log-events \
--log-group-name /ecs/{infra_name}-prd \
--filter-pattern "ERROR" \
--start-time $(date -d '1 hour ago' +%s)000
Replace {infra_name} with your infra_name from settings.py (e.g., agentos-aws-template).
ECS Service Status
View service status and recent events:
aws ecs describe-services \
--cluster {infra_name}-prd \
--services {infra_name}-prd-service \
--query 'services[0].{status:status,running:runningCount,desired:desiredCount,events:events[:5]}'
List running tasks:
aws ecs list-tasks --cluster {infra_name}-prd
What Success Looks Like
After a successful deployment, logs show:
INFO: Started server process [1]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000
Health check passing:
INFO: 192.168.x.x - "GET /health HTTP/1.1" 200 OK
Warning Signs
| Log Pattern | Meaning | Action |
|---|
database is locked | DuckDB concurrency issue | Reduce workers to 1 |
connection refused | Can’t reach RDS | Check security group |
OOMKilled | Out of memory | Increase task memory |
CannotPullContainerError | ECR auth expired | Re-run auth_ecr.sh |
SIGTERM then restart loop | Health check failing | Check app logs for errors |
Health Checks
The load balancer checks /health every 30 seconds.
| Target Status | Meaning |
|---|
| healthy | Task passing health checks |
| unhealthy | Health check failing |
| draining | Task being replaced |
If unhealthy, check:
- Container logs for startup errors
- Security group allows port 8000 from ALB
- Database connectivity (
DB_HOST, DB_PASS)
Log Retention
CloudWatch retains logs indefinitely by default. Set a retention policy to control costs:
aws logs put-retention-policy \
--log-group-name /ecs/{infra_name}-prd \
--retention-in-days 30
| Retention | Monthly Cost (10GB/day) |
|---|
| 7 days | ~$3 |
| 30 days | ~$15 |
| 90 days | ~$45 |
Alerts (Optional)
Create a CloudWatch alarm for task failures:
aws cloudwatch put-metric-alarm \
--alarm-name "{infra_name}-task-failures" \
--metric-name "FailedTasks" \
--namespace "AWS/ECS" \
--statistic Sum \
--period 300 \
--threshold 1 \
--comparison-operator GreaterThanOrEqualToThreshold \
--dimensions Name=ClusterName,Value={infra_name}-prd \
--evaluation-periods 1 \
--alarm-actions [YOUR_SNS_TOPIC_ARN]
See AWS SNS documentation to create a notification topic.