ECS Task Issues
Load balancer shows unhealthy targets
Load balancer shows unhealthy targets
Cause: Container not responding to health checksVerify the Should return:
/health endpoint works:{"status": "ok", "instantiated_at": "..."}If this fails, check CloudWatch logs for startup errors:Task keeps restarting (health check flapping)
Task keeps restarting (health check flapping)
Cause: Container starts but fails health checksCheck the logs for the startup sequence:Look for:
Application startup complete- Container startedSIGTERM- Health check failed, container being killed
- Database connection failing (check
DB_HOST,DB_PASS) - Missing environment variables
- App crashes after startup
'database is locked' errors
'database is locked' errors
Cause: Multiple uvicorn workers with DuckDBDuckDB requires single-writer access. Ensure your command uses one worker:Do NOT increase
--workers if using Pal agent.Pal loses data after restart
Pal loses data after restart
Cause: No EFS configuredPal stores data in DuckDB at
/data/pal.db. Without EFS, this is lost on container restart.See: EFS Setup GuideSecrets not available in task
Secrets not available in task
Cause: IAM permissions or secret doesn’t existVerify secrets exist:If missing, redeploy with
ag infra up prd:aws to create them from your YAML files.Docker & ECR Issues
'no basic auth credentials' on image push
'no basic auth credentials' on image push
Cause: Docker not authenticated to ECRRun the authentication script:Or manually:ECR tokens expire after 12 hours. Re-run if you get this error after a break.
Image push times out
Image push times out
Large images can timeout on slow connections. Try:
- Build with
-fflag to ensure fresh layers - Check your network connection
- Consider using GitHub Actions for CI/CD builds
Database Issues
Database connection fails silently
Database connection fails silently
Cause: Special characters in passwordAvoid
@, #, %, & in DB_PASS. These require URL encoding and cause silent connection failures.Safe characters: alphanumeric, !, -, _Cannot connect to RDS from ECS
Cannot connect to RDS from ECS
Check security group allows ECS to access RDS:The database security group must allow inbound port 5432 from the ECS security group.
Cannot connect to RDS from local machine
Cannot connect to RDS from local machine
RDS must be in a public subnet with
publicly_accessible=True (the default).Add your IP to the security group or use a bastion host.EFS Issues
Mount target not found
Mount target not found
Ensure mount targets exist in the same subnets as your ECS tasks:Each subnet in
aws_subnet_ids needs its own mount target.Permission denied on EFS
Permission denied on EFS
Check that your access point uses UID/GID The POSIX user should be
61000 to match the container user:Uid: 61000, Gid: 61000.